In [0]:
#importing networkx and pandas libraries
import networkx as nx
import pandas as pd

### Dataset:

The network was generated using email data from a large European research institution. We have anonymized information about all incoming and outgoing email between members of the research institution. There is an edge (u, v) in the network if person u sent person v at least one email. The e-mails only represent communication between institution members (the core), and the dataset does not contain incoming messages from or outgoing messages to the rest of the world.

The dataset also contains "ground-truth" community memberships of the nodes. Each individual belongs to exactly one of 42 departments at the research institute.

This network represents the "core" of the email-EuAll network, which also contains links between members of the institution and people outside of the institution (although the node IDs are not the same).

[Dataset Link](http://snap.stanford.edu/data/email-Eu-core.html)

In [0]:
#reading dataset file into pandas dataframe
df=pd.read_csv('email-Eu-core.txt',sep=' ',header=None,names=['Node1','Node2'])

In [0]:
df.head()

Unnamed: 0,Node1,Node2
0,0,1
1,2,3
2,2,4
3,5,6
4,5,7


In [0]:
 #Creating Directed Graph Instance
DG = nx.DiGraph()

In [0]:
#Extracting node information from pandas dataframe
node1=list(df[0])
node2=list(df[1])

#### 1) Number of nodes?

In [0]:
#Adding nodes to Directed graph
DG.add_nodes_from(node1)
DG.add_nodes_from(node2)

In [0]:
print("Number of Nodes in the Dataset:"+str(len(list(DG.nodes))))

Number of Nodes in the Dataset:1005


#### Number of edges?

In [0]:
#adding edges to graph using tuples created from dataframe
edges=list(zip(df[0],df[1]))

In [0]:
DG.add_edges_from(edges)

In [0]:
print("Number of Edges in the Dataset:"+str(len(list(DG.edges))))

Number of Edges in the Dataset:25571


#### Number of connected components?
Creating undirected graph from directed graph for finding connected components as connected components function in networkx works on undirected graph.

In [0]:
g = DG.to_undirected()

In [0]:
p=nx.connected_components(g)

In [0]:
print("Number of Connected Components:" + str(len(list(p))))

Number of Connected Components:20


In [0]:
#finding largest component in the graph
largest_cc = max(nx.connected_components(g), key=len)

In [0]:
print("Number of nodes in largest component:"+str(len(largest_cc)))

Number of nodes in largest component:986


In [0]:
#finding number of nodes in each component
print("Number of nodes in each connected components:"+str([len(c) for c in sorted(nx.connected_components(g), key=len, reverse=True)]))

Number of nodes in each connected components:[986, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


#### The diameter (longest shortest path)?

In [0]:
#Creating subgraph from amongst connected components which has highest number of nodes
Gc = max(nx.connected_component_subgraphs(g), key=len)

In [0]:
print("Diameter of the subgraph with maximum connected nodes: "+str(nx.diameter(Gc)))

Diameter of the subgraph with maximum connected nodes: 7


####  The five nodes with the highest betweeness centrality?

In [0]:
#finding betweenness centrality 
bc=nx.betweenness_centrality(DG)

In [0]:
#sorting betweenness centrality into descending order
bcmax1=(sorted(bc.items(), key=lambda x: x[1], reverse=True))

In [0]:
#picking up top 5 nodes
y=[]
for i in range(5):
    y.append(bcmax1[:5][i][0])

In [0]:
print("Following are the top 5 nodes with highest betweeness centrality:\n" + str(y))

Following are the top 5 nodes with highest betweeness centrality:
[160, 86, 5, 121, 62]


#### The five nodes with highest PageRank?

In [0]:
#calculating page rank of each node
pg=nx.pagerank(g, alpha=1)

In [0]:
#sorting pagerank into descending order
pgmax=(sorted(pg.items(), key=lambda x: x[1], reverse=True))

In [0]:
#picking top 5 nodes with the highest pagerank.
p=[]
for i in range(5):
    p.append(pgmax[:5][i][0])

In [0]:
print("Nodes with highest page rank are:\n" + str(p))

Nodes with highest page rank are:
[160, 121, 82, 107, 86]


#### The five nodes with the highest authority score according to HITS?

In [0]:
#Calculating Authority and Hub Score of all the nodes in the graph
h,a=nx.hits(DG)

In [0]:
#Sort in descending order authority score obtained from HITS 
amax=(sorted(a.items(), key=lambda x: x[1], reverse=True))

In [0]:
#pick top 5 nodes with highest authority score
au=[]
for i in range(5):
    au.append(amax[:5][i][0])

In [0]:
print("Nodes with highest Authority Score are:\n" + str(au))

Nodes with highest Authority Score are:
[160, 107, 62, 434, 121]


#### The number of cliques in the graph?

In [0]:
#Using enumerate all cliques function we find number of cliques in the graph.
print("Number of Cliques in a graph:\n"+str(len(list(nx.enumerate_all_cliques(g)))))      
  

Number of Cliques in a graph:
37490583


#### The number of maximal cliques the node with the highest PageRank belongs in?

In [0]:
#Using number of cliques function in networkx library,number of maximal cliques with the highest pageranked node can be found as below
print("Number of Maximal Cliques in a graph containing node with the highest page rank:\n"+ str(nx.number_of_cliques(g,nodes=p[0]))

9356