# NetworkX: Degree, betweenness and closeness

In [2]:
%sh /databricks/python3/bin/pip3 install networkx

In [3]:
import networkx          as nx
import pandas            as pd
import numpy             as np
import matplotlib.pyplot as plt
import itertools

## 1. Degree and degree centrality

__Degree__ is used to identify an important node in the network and defined as a number of edges attached to a node. In-degree refers to number of edges pointing into the node and out-degree refers to number of edges going out from the node. In and out degree work for directed graphs.

In [6]:
G=nx.DiGraph()

Let's add egdes and nodes to graph `G`.

In [8]:
edges=[("Jane","Joe"), ("Joe","Martin"), ("Martin","Harris"), ("Harris","Calvin"), 
       ("Jane","Rob"), ("Harris","Rob"),("Rob","Martin"),("Jane","Martin"),("Rose","Martin")]

G.add_edges_from(edges)
BLUE = "#05006d"
nx.draw(G, node_color=BLUE, 
           with_labels=True, 
           node_size=4200, 
           font_size=18, 
           font_weight='bold', 
           font_color='white', 
           arrowsize=40)
display(plt.show())
plt.clf()

Here we can observe that Martin has highest connection with 6 other people or neighbors. 4 out of 5 connections are directed towards Martin and 1 is directed out. To define degree of a node, we use `nx.degree()` function. In and out degrees are defined using `nx.in_degree()` and `nx.out_degree()` functions respectively which are only used in directed graphs. When ties are associated to some positive aspects such as friendship or collaboration, indegree is often interpreted as a form of popularity, and outdegree as gregariousness.

$$ Degree{\ \}centrality=\frac{Number{\ \}of{\ \}neighboring{\ \}nodes}{All{\ \}possible{\ \}number{\ \}of{\ \}neighboring{\ \}nodes} $$

In [11]:
nx.degree(G)

__Degree centrality__ is normalized degree value that is calculated by dividing number of degrees to possible number of degrees that each node can have.
Degree centrality measures how many direct, ‘one hop’ connections each node has to other nodes within the network.
We use degree centrality for finding very connected individuals, popular individuals, individuals who are likely to hold most information or individuals who can quickly connect with the wider network.

`nx.degree_centrality` function is used.

In [14]:
dc=nx.degree_centrality(G)

sorted(dc.items(), 
       key=lambda item: item[1], 
       reverse=True)

From degree centrality, we conclude that Martin is the most important node in this network and can be characterized as popular and well connected.

## 2. Betweenness centrality

__Betweenness centrality__

Betweenness measures the number of times a node lies on the shortest path between other nodes. It's used for finding the individuals who influence the flow around a system. Betweenness centrality is normalized betweenness value that is calculated by diving number of shortest path through node by all possible number of shortest paths.

__Shortest path__

A path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized.

$$ Betweenness{\ \}centrality=\frac{Number{\ \}of{\ \}shortest{\ \}paths{\ \}through{\ \}node}{All{\ \}possible{\ \}number{\ \}of{\ \}shortest{\ \}paths} $$

`nx.betweenness_centrality()` returns normalized betweenness centrality. To calculate betweenness, set `nomalized` parameter to `False`.

In [21]:
betweenness=nx.betweenness_centrality(G, normalized=False)

sorted(betweenness.items(), key=lambda item: item[1], reverse=True)

In [22]:
bc=nx.betweenness_centrality(G)

sorted(bc.items(), key=lambda item: item[1], reverse=True)

Here Martin and Harris have the highest betweenness centrality, meaning that these two nodes influence the information flow of the network.

## 3. Closeness centrality

This measure scores each node based on their ‘closeness’ to all other nodes within the network. This measure calculates the shortest paths between all nodes, then assigns each node a score based on its sum of shortest paths. For finding the individuals who are best placed to influence the entire network most quickly. Closeness centrality can help find good ‘broadcasters’, but in a highly connected network you will often find all nodes have a similar score. What may be more useful is using Closeness to find influencers within a single cluster.
Thus the more central a node is, the closer it is to all other nodes

In [26]:
cc=nx.closeness_centrality(G)

sorted(cc.items(), key=lambda item: item[1], reverse=True)

Other centrality measures, refer to https://networkx.github.io/documentation/networkx-1.9/reference/algorithms.centrality.html

In [28]:
dcs=[]
bcs=[]
ccs=[]
for u, v in sorted(dc.items(), key=lambda item: item[0], reverse=False):
  dcs.append(v)
for u, v in sorted(bc.items(), key=lambda item: item[0], reverse=False):
  bcs.append(v)
for u,v in sorted(cc.items(), key=lambda item: item[0], reverse=False):
  ccs.append(v)
pd.DataFrame({"degree centrality": dcs, "betweenness centrality": bcs, "closeness centrality": ccs}, index=sorted(G.nodes, reverse=False))

##4. Degree, betweenness and closeness centrality of weighted graph

In [30]:
weighted_edges=[("Harris", "Calvin", 5), ("Joe", "Martin", 4), ("Jane", "Rob", 2),("Harris", "Rob", 3), 
                ("Rob", "Martin", 1), ("Jane", "Martin", 8), ("Martin", "Harris", 4), ("Jane", "Joe", 2),("Rose","Martin", 6)]
G.add_weighted_edges_from(weighted_edges)
G.edges.data("weight")

In [31]:
BLUE = "#05006d"
weights = [G[u][v]['weight'] for u,v in edges]
nx.draw(G, node_color=BLUE, 
           with_labels=True, 
           node_size=4200, 
           font_size=18, 
           font_weight='bold', 
           font_color='white', 
           arrowsize=40,
           width=weights)
display(plt.show())
plt.clf()

In [32]:
[G.degree(),
G.degree(weight="weight")]

In [33]:
pd.DataFrame([nx.betweenness_centrality(G),
nx.betweenness_centrality(G, weight="weight")], index=["Unweighted","Weighted"]).transpose()

In [34]:
pd.DataFrame([nx.closeness_centrality(G),
nx.closeness_centrality(G, distance="weight")], index=["Unweighted","Weighted"]).transpose()

The End