# Module 1 NetworkX Tutorial

In this tutorial, we will show examples of several tools you may find helpful for this module's programming assignment.

In this module, you will be working with an undirected graph representing the American football season of the year 2000, where nodes are teams and we form an undirected edge between teams that have played against each other at least once during the season. Below is an example of a node in the dataset:

```
node [
    id 1
    label "FloridaState"
    wins 11
    losses 2
    conference 0
]
```

As you can see, each node has an ID, a 'label' variable representing the name of the college it represents, the number of wins and losses it had in the season, and its 'conference.' In American Football, conferences are groups of teams that are in a league together, and play against each other to win the league title. It is not uncommon however, for teams to play teams that are outside of their league. 

Below, we'll import some necessary libraries and read the graph.

In [1]:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import adjusted_rand_score


G_football = nx.read_gml('assets/football1.gml', label='id')

# Accessing a node in the graph

In this module's assignment, you may use functions that give you a set of node ids. You will need to access these nodes in the graph by their `id`, and obtain features such as their `label` or `win` count. 

Let's say we have node id 1, and we want to access its 'label' field to get the name of the node's college.

We first access all the nodes in G_football with `.node` and then subscript into it with [1], to obtain a dictionary containing node 1's information. We can provide a key to this with `[]` as well. Below, we access the 'label' field of the node.

In [2]:
G_football.nodes[1]['label']

'FloridaState'

# Computing Correlation Coefficients

In this module's programming assignment, you may need to measure the correlation between the values of two equally sized lists. Below, we provide an example of how this is done. 

In [3]:
X = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

correlation_matrix = np.corrcoef(X, y)
correlation_coefficient = correlation_matrix[0, 1]
print(f'Correlation Coefficient: {correlation_coefficient}')

Correlation Coefficient: 0.9999999999999999


# Computing Assortativity Coefficients

In this module, you learned about attribute, degree, and numeric assortativity measurements and what they represent. In this module, you will need to compute these measurements. Below, we provide examples of how you could obtain these metrics.

### Attribute Assortativity

Below is a line which gives the attribute assortativity coefficient, giving us a coefficient representing the tendency of nodes with the same attributes to form edges between one another. In this case, since we select "conference," we obtain a measure representing the tendency of universities in the same conference to play one another.

In [4]:
nx.attribute_assortativity_coefficient(G_football, attribute="conference")

0.6275381679111909

### Numeric Assortativity


Below is a line which gives the numeric assortativity coefficient, giving us a coefficient representing the tendency of nodes with similar numeric attributes to form edges with each other. In the context of our football dataset, as we've selected the 'wins' attribute, the coefficient tells us the tendency of teams with similar win counts to play against other.

In [5]:
nx.numeric_assortativity_coefficient(G_football, attribute="wins")

-0.049806582644503085

### Degree Assortativity


Below is a line which gives the degree assortativity coefficient, giving us a coefficient representing the tendency of nodes with similar degrees to form edges between one another. In the context of our football dataset, this tells us the tendency of teams to play other teams with a similar number of total games played. This coefficient is actually just the numeric assortativity coefficient, but with the 'degree' as the numeric argument. 



In [6]:
nx.degree_assortativity_coefficient(G_football)

0.16244224957444287

If we were working with a directed graph, we could specify the 'x' and 'y' arguments used when calculating the coefficient as the in-degree or out-degree of each node. Below is code that creates a small directed graph and sets the 'x' and 'y' arguments.

In [7]:
G = nx.DiGraph()

edges = [
    ("1", "2"),
    ("2", "3"),
    ("3", "1"),
    ("3", "4"),
    ("4", "2")
]

G.add_edges_from(edges)

in_degree_assortativity = nx.degree_assortativity_coefficient(G, x='in', y='in')

out_degree_assortativity = nx.degree_assortativity_coefficient(G, x='out', y='out')

out_in_assortativity = nx.degree_assortativity_coefficient(G, x='out', y='in')

print(f"In-Degree Assortativity: {in_degree_assortativity}")
print(f"Out-Degree Assortativity: {out_degree_assortativity}")
print(f"Out-In Degree Assortativity: {out_in_assortativity}")

In-Degree Assortativity: -0.4082482904638645
Out-Degree Assortativity: -0.4082482904638645
Out-In Degree Assortativity: -0.6666666666666686


# Structural Holes

### Constraint

In this module's assignment, you will be working with the dictionary you obtain from `nx.constraint`. 

The function returns a dictionary, where each key is a node `id` and the value is the constraint coefficient. 

In the context of our dataset, lets say we have a college 'A'. The coefficient corresponding to A in the dictionary measures how invested/connected college A's neighbors are in nodes that are also connected to college A. If college A has a low coefficient, we might see it as a broker spanning structural holes, which you learned about in this module's lectures.

Remember, you can loop through each value in a dictionary with `.items()`, as shown below. 

In [8]:
constraints = nx.constraint(G_football)

for node_id, constraint_value in constraints.items():
    print(f'{node_id} : {constraint_value}')

0 : 0.1586688769711707
1 : 0.1843312066370778
2 : 0.15886454507193143
3 : 0.15275783402203855
4 : 0.17942117982563766
5 : 0.16208239185485607
6 : 0.14832231723803696
7 : 0.1684377869605142
8 : 0.18895464790656374
9 : 0.19116700437887518
10 : 0.19056019283746561
11 : 0.16851853446076936
12 : 0.1713481086113662
13 : 0.1671528394387147
14 : 0.18007545658606267
15 : 0.15483155417814512
16 : 0.17938519302719158
17 : 0.1636418277440066
18 : 0.17931127327647134
19 : 0.18591357318489177
20 : 0.16690390000683014
21 : 0.1869044691847096
22 : 0.19868485759169455
23 : 0.1866161890210225
24 : 0.15073731251913075
25 : 0.20558149109426346
26 : 0.17265228558518603
27 : 0.16804585634287617
28 : 0.18398266238894742
29 : 0.1980083327641555
30 : 0.1944149989754798
31 : 0.17266412349557855
32 : 0.16732406711745557
33 : 0.24121673553719014
34 : 0.17516330294155777
35 : 0.1862871388566355
36 : 0.15141666128615705
37 : 0.20986011504982205
38 : 0.15836940234636204
39 : 0.17427469662819023
40 : 0.16306819130447

### Effective Size

Rather similarly, you can compute the effective size of each node in a dictionary format, as shown below. 

In the context of our dataset, this is a measure of how non-redundant a team's opponents are. For example, if a team plays against many teams who do not play each other, the node has a high effective size. If a team plays against teams that mostly also play each other, it has a 'redundancy' in its competition and a low effective size.

In [9]:
effective_sizes = nx.effective_size(G_football)

for node_id, es in effective_sizes.items():
    print(f'{node_id} : {es}')

0 : 8.166666666666666
1 : 7.166666666666667
2 : 7.833333333333333
3 : 8.333333333333334
4 : 6.818181818181818
5 : 8.0
6 : 8.5
7 : 7.666666666666667
8 : 6.2727272727272725
9 : 6.2727272727272725
10 : 6.0
11 : 7.2
12 : 7.0
13 : 7.181818181818182
14 : 6.6
15 : 8.166666666666666
16 : 6.818181818181818
17 : 7.545454545454545
18 : 7.0
19 : 6.636363636363637
20 : 7.363636363636363
21 : 6.454545454545454
22 : 5.909090909090909
23 : 6.454545454545454
24 : 8.0
25 : 5.909090909090909
26 : 7.0
27 : 7.363636363636363
28 : 6.333333333333334
29 : 6.090909090909091
30 : 6.2727272727272725
31 : 7.181818181818182
32 : 7.181818181818182
33 : 4.2
34 : 7.181818181818182
35 : 6.636363636363637
36 : 7.0
37 : 5.7272727272727275
38 : 7.909090909090909
39 : 6.818181818181818
40 : 7.545454545454545
41 : 5.2
42 : 6.142857142857143
43 : 8.09090909090909
44 : 7.363636363636363
45 : 5.7272727272727275
46 : 5.7272727272727275
47 : 7.181818181818182
48 : 7.363636363636363
49 : 5.545454545454546
50 : 5.888888888888889


# Conducting K-Core decomposition

In this module's assignment, you may need to conduct K-core decomposition. 

First, however, we are going to introduce the next dataset you will be working with. This one is an undirected graph representing a community of dolphins, where each node represents a dolphin and each edge represents a friendship between dolphins. Each dolphin has the following attributes:
```
  node [
    id 6
    label "DN21"
    smelliness 58
  ]
 ```

We will read in the dataset below.


In [10]:
G_dolphins = nx.read_gml("assets/dolphins.gml")

Now, to conduct K-core decomposition on the dolphins dataset, we will use the `nx.core_number` function. Similar to the constraint function, this also returns a dictionary. The key is the `id` of a dolphin, and the value is the highest value k of a k-core containing that dolphin. 

In [11]:
core_numbers = nx.core_number(G_dolphins)

for node_id, highest_core in core_numbers.items():
    print(f'{node_id} : {highest_core}')

Beak : 4
Beescratch : 4
Bumper : 3
CCL : 3
Cross : 1
DN16 : 3
DN21 : 4
DN63 : 4
Double : 4
Feather : 4
Fish : 4
Five : 1
Fork : 1
Gallatin : 4
Grin : 4
Haecksel : 4
Hook : 4
Jet : 4
Jonah : 4
Knit : 4
Kringel : 4
MN105 : 4
MN23 : 1
MN60 : 3
MN83 : 4
Mus : 3
Notch : 3
Number1 : 3
Oscar : 4
Patchback : 4
PL : 4
Quasi : 1
Ripplefluke : 2
Scabs : 4
Shmuddel : 3
SMN5 : 1
SN100 : 4
SN4 : 4
SN63 : 4
SN89 : 2
SN9 : 4
SN90 : 4
SN96 : 4
Stripes : 4
Thumper : 3
Topless : 4
TR120 : 2
TR77 : 4
TR82 : 1
TR88 : 2
TR99 : 4
Trigger : 4
TSN103 : 4
TSN83 : 2
Upbang : 4
Vau : 2
Wave : 2
Web : 4
Whitetip : 1
Zap : 4
Zig : 1
Zipfel : 2
