## Characterizing the network (II)

Let's continue recalling what you've learned before about node importances, by plotting the degree distribution of a network. This is the distribution of node degrees computed across all nodes in a network.

### Instructions
    - Plot the degree distribution of the GitHub collaboration network G. Recall that there are four steps involved here:
        - Calculating the degree centrality of G.
        - Using the .values() method of G and converting it into a list.
        - Passing the list of degree distributions to plt.hist().
        - Displaying the histogram with plt.show().

In [None]:
# Import necessary modules
import matplotlib.pyplot as plt
import networkx as nx

# Plot the degree distribution of the GitHub collaboration network
plt.hist(list(nx.degree_centrality(G).values()))
plt.show()

## Characterizing the network (III)

The last exercise was on degree centrality; this time round, let's recall betweenness centrality!

A small note: if executed correctly, this exercise may need about 5 seconds to execute.

### Instructions
    - Plot the betweenness centrality distribution of the GitHub collaboration network. You have to follow exactly the same four steps as in the previous exercise, substituting nx.betweenness_centrality() in place of nx.degree_centrality().

In [None]:
# Import necessary modules
import matplotlib.pyplot as plt
import networkx as nx

# Plot the degree distribution of the GitHub collaboration network
plt.hist(list(nx.betweenness_centrality(G).values()))
plt.show()

## Matrix plot

Let's now practice making some visualizations. The first one will be the Matrix plot. In a Matrix plot, the matrix is the representation of the edges.

### Instructions
    - Make a Matrix plot visualization of the largest connected component subgraph, with authors grouped by their user group number.
        - First, calculate the largest connected component subgraph by using the nx.connected_components(G) inside the provided sorted() function. Python's built-in sorted() function takes an iterable and returns a sorted list (in ascending order, by default). Therefore, to access the largest connected component subgraph, the statement is sliced with [-1].
        - Create the matrix plot h. You have to specify the parameters graph and group_by to be the largest connected component subgraph and 'grouping', respectively.
        - Draw the matrix plot to the screen.

In [None]:
# Import necessary modules
from nxviz import matrix
import matplotlib.pyplot as plt

# Calculate the largest connected component: largest_ccs
largest_ccs = sorted((G.subgraph(c) for c in nx.connected_components(G)), key=lambda x: len(x))[-1]

# Create the customized Matrix plot: h
h = matrix(largest_ccs, group_by='grouping')

# Draw the Matrix plot to the screen
plt.show()

## Arc plot

Next up, let's use the Arc plot to visualize the network. You're going to practice sorting the nodes in the graph as well.

Note: this exercise may take about 4-7 seconds to execute if done correctly.

### Instructions
    - Make an Arc plot of the GitHub collaboration network, with authors sorted by degree. To do this:
        - Iterate over all the nodes in G, including the metadata (by specifying data=True).
        - In each iteration of the loop, calculate the degree of each node n with nx.degree() and set its 'degree' attribute. nx.degree() accepts two arguments: A graph and a node.
        - Create the arc plot a by specifying two parameters: the graph argument, which is G, and the sort_by argument, which is 'degree', so that the nodes are sorted.
        - Display the arc plot to the screen.

In [None]:
# Import necessary modules
from nxviz import arc
import matplotlib.pyplot as plt

# Iterate over all the nodes in G, including the metadata
for n, d in G.nodes(data=True):

    # Calculate the degree of each node: G.node[n]['degree']
    G.nodes[n]['degree'] = nx.degree(G, n)

# Create the Arc plot: a
a = arc(G, sort_by='degree')

# Draw the Arc plot to the screen
plt.show()

## Circos plot

Finally, you're going to make a Circos plot of the network!

### Instructions
    - Make a Circos plot of the network, again, with GitHub users sorted by their degree, and grouped and colored by their 'grouping' key. To do this:
        - Iterate over all the nodes in G, including the metadata (by specifying data=True).
        - In each iteration of the loop, calculate the degree of each node n with nx.degree() and set its 'degree' attribute.
        - Create the circos plot c by specifying three parameters in addition to the graph G: the sort_by argument, which is 'degree', and the group_by and node_color_by arguments, which are both 'grouping'.
        - Draw the Circos plot to the screen.

In [None]:
# Import necessary modules
from nxviz import circos
import matplotlib.pyplot as plt

# Iterate over all the nodes, including the metadata
for n, d in G.nodes(data=True):

    # Calculate the degree of each node: G.node[n]['degree']
    G.nodes[n]['degree'] = nx.degree(G, n)

# Create the Circos plot: c
c = circos(G, sort_by='degree', group_by='grouping', node_color_by='grouping')

# Draw the Circos plot to the screen
c
plt.show()

## Finding cliques (I)

You're now going to practice finding cliques in G. Recall that cliques are "groups of nodes that are fully connected to one another", while a maximal clique is a clique that cannot be extended by adding another node in the graph.

### Instructions
    - Count the number of maximal cliques present in the graph and print it.
        - Use the nx.find_cliques() function of G to find the maximal cliques.
        - The nx.find_cliques() function returns a generator object. To count the number of maximal cliques, you need to first convert it to a list with list() and then use the len() function. Place this inside a print() function to print it.

In [None]:
# Calculate the maximal cliques in G: cliques
cliques = nx.find_cliques(G)

# Count and print the number of maximal cliques in G
print(len(list(cliques)))

## Finding cliques (II)

Great work! Let's continue by finding a particular maximal clique, and then plotting that clique.

### Instructions
    - Find the author(s) that are part of the largest maximal clique, and plot the subgraph of that/one of those clique(s) using a Circos plot. To do this:
        - Use the nx.find_cliques() function to calculate the maximal cliques in G. Place this within the provided sorted() function to calculate the largest maximal clique.
        - Create the subgraph consisting of the largest maximal clique using the .subgraph() method and largest_clique.
        - Create the Circos plot object using the subgraph G_lc (without any other arguments) and plot it.

In [None]:
# Import necessary modules
import networkx as nx
from nxviz import circos
import matplotlib.pyplot as plt

# Find the author(s) that are part of the largest maximal clique: largest_clique
largest_clique = sorted(nx.find_cliques(G), key=lambda x:len(x))[-1]

# Create the subgraph of the largest_clique: G_lc
G_lc = G.subgraph(largest_clique)

# Create the Circos plot: c
c = circos(G_lc)

# Draw the Circos plot to the screen
plt.show()

## Finding important collaborators

Almost there! You'll now look at important nodes once more. Here, you'll make use of the degree_centrality() and betweenness_centrality() functions in NetworkX to compute each of the respective centrality scores, and then use that information to find the "important nodes". In other words, your job in this exercise is to find the user(s) that have collaborated with the most number of users.

### Instructions
    - Compute the degree centralities of G. Store the result as deg_cent.
    - Compute the maximum degree centrality. Since deg_cent is a dictionary, you'll have to use the .values() method to get a list of its values before computing the maximum degree centrality with max().
    - Identify the most prolific collaborators using a list comprehension:
        - Iterate over the degree centrality dictionary deg_cent that was computed earlier using its .items() method. What condition should be satisfied if you are seeking to find user(s) that have collaborated with the most number of users? Hint: It has do to with the maximum degree centrality.
    - Hit 'Submit Answer' to see who the most prolific collaborator(s) is/are!

In [None]:
# Compute the degree centralities of G: deg_cent
deg_cent = nx.degree_centrality(G)

# Compute the maximum degree centrality: max_dc
max_dc = max(deg_cent.values())

# Find the user(s) that have collaborated the most: prolific_collaborators
prolific_collaborators = [n for n, dc in deg_cent.items() if dc == max_dc]

# Print the most prolific collaborator(s)
print(prolific_collaborators)


## Characterizing editing communities

You're now going to combine what you've learned about the BFS algorithm and concept of maximal cliques to visualize the network with an Arc plot.

The largest maximal clique in the Github user collaboration network has been assigned to the subgraph G_lmc. Note that for NetworkX version 2.x and later, G.subgraph(nodelist) returns only an immutable view on the original graph. We must explicitly ask for a .copy() of the graph to obtain a mutable version.

### Instructions
    - Go out 1 degree of separation from the clique, and add those users to the subgraph. Inside the first for loop:
        - Add nodes to G_lmc from the neighbors of G using the .add_nodes_from() and .neighbors() methods.
        - Using the .add_edges_from(), method, add edges to G_lmc between the current node and all its neighbors. To do this, you'll have create a list of tuples using the zip() function consisting of the current node and each of its neighbors. The first argument to zip() should be [node]*len(list(G.neighbors(node))), and the second argument should be the neighbors of node.
    - Record each node's degree centrality score in its node metadata.
        - Do this by assigning nx.degree_centrality(G_lmc)[n] to G_lmc.nodes[n]['degree centrality'] in the second for loop.
    - Visualize this network with an Arc plot sorting the nodes by degree centrality (you can do this using the keyword argument sort_by='degree centrality').

In [None]:
# Import necessary modules
from nxviz import arc
import matplotlib.pyplot as plt

# Identify the largest maximal clique: largest_max_clique
largest_max_clique = set(sorted(nx.find_cliques(G), key=lambda x: len(x))[-1])

# Create a subgraph from the largest_max_clique: G_lmc
G_lmc = G.subgraph(largest_max_clique).copy()

# Go out 1 degree of separation
for node in list(G_lmc.nodes()):
    G_lmc.add_nodes_from(G.neighbors(node))
    G_lmc.add_edges_from(zip([node]*len(list(G.neighbors(node))), G.neighbors(node)))

# Record each node's degree centrality score
for n in G_lmc.nodes():
    G_lmc.nodes[n]['degree centrality'] = nx.degree_centrality(G_lmc)[n]

# Create the Arc plot: a
a = arc(G_lmc, sort_by='degree centrality')

# Draw the Arc plot to the screen
plt.show()

## Recommending co-editors who have yet to edit together

Finally, you're going to leverage the concept of open triangles to recommend users on GitHub to collaborate!

### Instructions
    - Compile a list of GitHub users that should be recommended to collaborate with one another. To do this:
        - In the first for loop, iterate over all the nodes in G, including the metadata (by specifying data=True).
        - In the second for loop, iterate over all the possible triangle combinations, which can be identified using the combinations() function with a size of 2.
        - If n1 and n2 do not have an edge between them, a collaboration between these two nodes (users) should be recommended, so increment the (n1), (n2) value of the recommended dictionary in this case. You can check whether or not n1 and n2 have an edge between them using the .has_edge() method.
    - Using a list comprehension, identify the top 10 pairs of users that should be recommended to collaborate. The iterable should be the key-value pairs of the recommended dictionary (which can be accessed with the .items() method), while the conditional should be satisfied if count is greater than the top 10 in all_counts. Note that all_counts is sorted in ascending order, so you can access the top 10 with all_counts[-10].

In [None]:
# Import necessary modules
from itertools import combinations
from collections import defaultdict

# Initialize the defaultdict: recommended
recommended = defaultdict(int)

# Iterate over all the nodes in G
for n, d in G.nodes(data=True):

    # Iterate over all possible triangle relationship combinations
    for n1, n2 in combinations(list(G.neighbors(n)), 2):

        # Check whether n1 and n2 do not have an edge
        if not G.has_edge(n1, n2):

            # Increment recommended
            recommended[(n1, n2)] += 1

# Identify the top 10 pairs of users
all_counts = sorted(recommended.values())
top10_pairs = [pair for pair, count in recommended.items() if count > all_counts[-10]]
print(top10_pairs)
