# NetworkX


## Installing NetworkX


If you are running this notebook online (in Google Colaboratory, for example), you can install NetworkX by running the following command:


In [121]:
# !pip install networkx

## 4. Centrality


Centrality is a measure of the importance of nodes in a network based on their position and connectivity. There are different types of centrality measures, each capturing a different aspect of node importance.

Some common centrality measures include degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality.


### A) Degree Centrality-undirected graphs


Degree centrality is a measure of the importance of a node in a network based on the number of connections it has to other nodes. The degree centrality of a node $i$ can be calculated as:

$$C_D(i) = \frac{k_i}{n-1}$$

where $k_i$ is the degree of node $i$, i.e., the number of edges that are incident to the node, and $n$ is the total number of nodes in the network. The denominator $n-1$ is used to account for the fact that a node cannot be connected to itself.

The degree centrality of a node ranges from 0 to 1, with a higher value indicating a more central node in the network. Nodes with a high degree centrality are typically well-connected to other nodes, and their removal from the network can have a significant impact on its connectivity.


In [None]:
graph_karate = nx.karate_club_graph()
graph_karate = nx.convert_node_labels_to_integers(graph_karate, first_label=1)
nx.draw(graph_karate, with_labels=True)

In [None]:
# degree centrality
degCent = nx.degree_centrality(graph_karate)
degCent

In [None]:
# sort based on degree centrality
sorted_degcent = {
    k: v for k, v in sorted(degCent.items(), key=lambda item: item[1], reverse=True)
}
sorted_degcent

In [None]:
# degree centrality of a node

degCent[34]

In [None]:
# draw a network with node sizes based on their degree centrality

# create a list of node sizes based on degree centrality
node_sizes = [10000 * v * v for v in degCent.values()]

# draw the graph
nx.draw(G, with_labels=True, node_size=node_sizes, pos=nx.spring_layout(G))

In [None]:
# colors based on degree centrality
node_colors = [v for v in degCent.values()]
# draw the graph
nx.draw(
    G,
    with_labels=True,
    node_size=node_sizes,
    pos=nx.spring_layout(G),
    node_color=node_colors,
    cmap=plt.cm.PuBu,
)  # Greens

# PuBu stands for "Pu" (purple) to "Bu" (blue), and it is a sequential colormap that ranges from light purple to dark blue.

### B) Degree Centrality – Directed Networks


Undirected networks: use degree

Directed networks: use in-degree or out-degree


In [None]:
# describe a directedG gllobally
# directed graph
G = nx.DiGraph()

G.add_edge("A", "B")
G.add_edge("A", "D")
G.add_edge("A", "C")
G.add_edge("B", "D")

# draw the nodes with labels
nx.draw(G, with_labels=True)

In [None]:
# indegree
indegCent = nx.in_degree_centrality(G)
indegCent

In [None]:
# out-degree
outdegCent = nx.out_degree_centrality(G)
outdegCent

In [None]:
# A specific node
outdegCent["A"]

### C) Closeness Centrality


Closeness centrality is a measure of the average distance of a node to all other nodes in the network. The closeness centrality of a node $i$ can be calculated as:

$$C_C(i) = \frac{1}{\sum\limits_{j \neq i} d_{ij}}$$

where $d_{ij}$ is the shortest path distance between nodes $i$ and $j$. The closeness centrality of a node ranges from 0 to 1, with a higher value indicating a shorter average distance to all other nodes in the network.

The closeness centrality of a node measures how quickly it can spread information or influence throughout the network, as nodes with a shorter average distance to all other nodes can communicate more efficiently. In addition, nodes with a high closeness centrality are often located in the center of the network, and their removal can have a significant impact on the network's connectivity.


In [None]:
# Closness centrality
closeCent = nx.closeness_centrality(graph_karate)
closeCent

In [None]:
# Another way to compute Closeness Centrality of a node
NodeNumber = 34
(len(graph_karate.nodes()) - 1) / sum(
    nx.shortest_path_length(graph_karate, NodeNumber).values()
)

### D) Betweenness Centrality


Betweenness centrality is a measure of the extent to which a node lies on the shortest paths between other nodes in the network. The betweenness centrality of a node $i$ can be calculated as:

$$C_B(i) = \sum\limits_{s \neq i \neq t} \frac{\sigma_{st}(i)}{\sigma_{st}}$$

where $s$ and $t$ are two nodes in the network, $\sigma_{st}$ is the total number of shortest paths between $s$ and $t$, and $\sigma_{st}(i)$ is the number of shortest paths between $s$ and $t$ that pass through node $i$.

The betweenness centrality of a node ranges from 0 to 1, with a higher value indicating a greater number of shortest paths that pass through the node. Nodes with a high betweenness centrality are often located on the "bridges" between different clusters or communities in the network, and their removal can have a significant impact on the network's connectivity.


In [None]:
btwnCent = nx.betweenness_centrality(graph_karate, endpoints=False)
# endpoints = False states that each node dose not included in computation for shortest path numeration
btwnCent

In [None]:
sorted_btwnCent = {
    k: v for k, v in sorted(btwnCent.items(), key=lambda item: item[1], reverse=True)
}
sorted_btwnCent

betwenness centrality values will
be larger in graphs with many nodes. To control for
this, we divide centrality values by the number of
pairs of nodes in the graph (excluding i)


In [None]:
# comparison betweeness centrality in networks with diffrent number of nodes:
# more nodes => bigger betweeness centrality => useing normalization
btwnCent = nx.betweenness_centrality(
    graph_karate, normalized=True, endpoints=False
)  # defualt = normalize!
btwnCent

Computing betweenness centrality of all nodes can be
very computationally expensive.

Approximation: rather than computing
betweenness centrality based on all pairs of nodes s,t ,
we can approximate it based on a sample of nodes.


In [None]:
# betweenness centrality approximation
btwnCent_approx = nx.betweenness_centrality(
    graph_karate, normalized=True, endpoints=False, k=10
)  # number of samples = k
btwnCent_approx

In [None]:
sorted_btwnCent = {
    k: v
    for k, v in sorted(btwnCent_approx.items(), key=lambda item: item[1], reverse=True)
}
sorted_btwnCent

#### Betweenness Centrality – Subsets


In [None]:
btwnCent_subset = nx.betweenness_centrality_subset(
    graph_karate,
    [34, 21, 30, 16, 27, 15, 23, 10],
    [1, 4, 13, 11, 6, 12, 17, 7],
    normalized=True,
)
btwnCent_subset

### E) Betweenness Centrality – Edges


Betweenness centrality for edges is a measure of the extent to which an edge lies on the shortest paths between other edges in the network. The betweenness centrality of an edge $e$ can be calculated as:

$$C_B(e) = \sum_{s \neq e \neq t} \frac{\sigma_{st}(e)}{\sigma_{st}}$$

where $s$ and $t$ are two nodes in the network, $\sigma_{st}$ is the total number of shortest paths between $s$ and $t$, and $\sigma_{st}(e)$ is the number of shortest paths between $s$ and $t$ that pass through edge $e$.

The betweenness centrality of an edge ranges from 0 to 1, with a higher value indicating a greater number of shortest paths that pass through the edge. Edges with a high betweenness centrality are often located on the "bridges" between different clusters or communities in the network, and their removal can have a significant impact on the network's connectivity.


In [None]:
btwnCent_edge = nx.edge_betweenness_centrality(graph_karate, normalized=True)
btwnCent_edge

### G) Eigenvalue Centrality


Eigenvalue centrality is a measure of the importance of a node in a network based on the importance of its neighbors. The eigenvalue centrality of a node $i$ can be calculated as the principal eigenvector of the adjacency matrix $\mathbf{A}$ of the network:

$$\mathbf{Av} = \lambda \mathbf{v}$$

where $\mathbf{v}$ is the eigenvector corresponding to the largest eigenvalue $\lambda$. The eigenvalue centrality of node $i$ is then given by the $i$-th element of $\mathbf{v}$.

The eigenvalue centrality of a node ranges from 0 to 1, with a higher value indicating a greater importance of the node and its neighbors in the network. Nodes with a high eigenvalue centrality are often located in the center of the network and are well-connected to other highly connected nodes, and their removal can have a significant impact on the network's connectivity.


In [None]:
# Compute the adjacency matrix of the network
A = nx.adjacency_matrix(graph_karate)

# Compute the principal eigenvector of the adjacency matrix
eigenvector_centrality = nx.eigenvector_centrality_numpy(graph_karate)
eigenvector_centrality

In [None]:
sorted_EigenCent = {
    k: v
    for k, v in sorted(
        eigenvector_centrality.items(), key=lambda item: item[1], reverse=True
    )
}
sorted_EigenCent

#### Summery (comparison)


In [None]:
import pandas as pd

# Create a dictionary of data
data = {
    "Degree": list(degCent.values()),
    "Closeness": list(closeCent.values()),
    "Betweenness": list(btwnCent.values()),
    "Eigenvalue": list(eigenvector_centrality.values()),
}

# Create a pandas dataframe from the dictionary
df = pd.DataFrame(data)
df

In [None]:
# sorting
df_sorted = df.sort_values(by="Degree", ascending=False)
df_sorted

In [None]:
df_two_rows = df.iloc[:2]
df_two_rows