<img src="https://i.imgur.com/6U6q5jQ.png"/>

# Mining Networks

We have networks available. Let's get some information out of them.

* This is the data about Peruvian elites:

In [None]:
import wget
import networkx as nx
import os

GitLocation="https://github.com/SocialAnalytics-StrategicIntelligence/codes/raw/main/graphdata/"
URL = GitLocation+"EliteNet.graphml"

theFile1=os.path.join('graphdata','EliteNet.graphml')


if os.path.exists(theFile1):
    os.remove(theFile1) # if exist, remove it directly
wget.download(URL, theFile1) # download it to the specific path.
eliNet=nx.read_graphml(theFile1)

* This is the data about international trading

In [None]:
URL = "trade_graph.graphml"

theFile2=os.path.join('graphdata','trade_graph.graphml')


if os.path.exists(theFile2):
    os.remove(theFile2) 
wget.download(URL, theFile2)
tradeNet=nx.read_graphml(theFile2)

* I have also prepared a network from [this article](https://www.builtinseattle.com/2018/08/06/50-seattle-tech-twitter-accounts-to-follow). There, we have a list of people considered key technological players in Seattle. The article gives their Twitter accounts, so I created a network using those accounts, the links meaning _someone follows someone on Twitter_.   

In [None]:
URL = GitLocation+ "SeattleTechTop.graphml"

theFile3=os.path.join('graphdata','SeattleTechTop.graphml')


if os.path.exists(theFile3):
    os.remove(theFile3)
wget.download(URL, theFile3) 
topsNet=nx.read_graphml(theFile3)

## Connectedness

In [None]:
nx.is_connected(eliNet)

So we have these people in components, how many?

In [None]:
numComponents=nx.number_connected_components(eliNet)
numComponents

What nodes are in each component?:

In [None]:
for c in nx.connected_components(eliNet):
    print (c, '\n')

A visual representation follows:

In [None]:
import matplotlib.pyplot as plt

colorsForComponents = plt.get_cmap('Set2',numComponents).colors

nodesPositions=nx.spring_layout(eliNet,k=0.5)

ConnectedComponents =(eliNet.subgraph(c).copy() for c in nx.connected_components(eliNet))
colorsForComponents
for eachComponent,eachColor in zip(ConnectedComponents,colorsForComponents):
    nx.draw(eachComponent,nodesPositions,node_color=eachColor)

In [None]:
colorsForComponents

As we do not have ONE connected network but several components, we can pay attention to the Giant Component:

In [None]:
sorted_Components = sorted(nx.connected_components(eliNet), key=len, reverse=True)
eliNet_giant = eliNet.subgraph(sorted_Components[0])

Let's take a look at the Giant Component:

In [None]:
nx.draw(eliNet_giant,with_labels=True)

Basic summary:

In [None]:
# number of edges:
eliNet_giant.size()

In [None]:
# number of  nodes:
len(eliNet_giant)

The Trade graph is connected:

In [None]:
nx.is_connected(tradeNet)

The Top Tech is Seattle graph is connected too:

In [None]:
nx.is_connected(topsNet)

____

<a id='part2'></a>

## Network Exploration

<a id='part21'></a>

### Exploring the Network as a whole

* Density: from 0 to 1, where 1 makes it a 'complete' network: there is a link between every pair of nodes.

  <center><img src="https://cdn.fs.guides.co/PDn0ImTfSb6QwgIvdoQ8" width="500"></center>


What can we learn from this?

In [None]:
nx.density(eliNet_giant)

In [None]:
nx.density(tradeNet)

In [None]:
nx.density(topsNet)

* Diameter: worst case escenario for number of steps for a node to contact another one (only for connected component). Also known as _maximum eccentricity_.

<center><img src="https://github.com/EvansDataScience/CTforGA_Networks/raw/main/diameter.jpeg" width="500"></center>

In [None]:
nx.diameter(eliNet_giant)

In [None]:
nx.diameter(tradeNet)

In [None]:
nx.diameter(topsNet)

* Average clustering coefficient: is the average of the local clustering coefficients of the nodes. If all the neighbors of every node are connected among one another, you get 1; if none of them are connected you get zero. 

<center><img src="https://raw.githubusercontent.com/SocialAnalytics-StrategicIntelligence/codes/main/images/LocalClustCoeff.png" width="500"></center>



In [None]:
# count_zeros=False -> to make results compatible with R!
nx.average_clustering(eliNet_giant,count_zeros=False)

In [None]:
nx.average_clustering(tradeNet,count_zeros=False)

In [None]:
nx.average_clustering(topsNet,count_zeros=False)

* Shortest path (average): it gets the average of every shortest path among the nodes in the network. A shorter path is the shortest _walk_ from one node to another.

In [None]:
# the average number of steps it takes to get from one node to another.

nx.average_shortest_path_length(eliNet_giant)

In [None]:
nx.average_shortest_path_length(tradeNet)

In [None]:
nx.average_shortest_path_length(topsNet)

* **Random networks** have *small shortest path* and *small clustering coefficient*
* **Small world networks** have *small shortest path* and *high clustering coefficient*
* **Regular networks** have *high shortest path* and *high clustering coefficient*
<center><img src="https://github.com/EvansDataScience/CTforGA_Networks/raw/main/networkTypes.jpeg" width="500"></center>


* Transitivity: Tendency of local clustering.

In [None]:
# Probability  that two business men with a common business friend, are also friends.
nx.transitivity(eliNet_giant)

In [None]:
# Probability that two countries with a common trade partner, are also partners.

nx.transitivity(tradeNet)

In [None]:
# Probability that two tech guys with a common colleague, are also colleagues.
nx.transitivity(topsNet)

* Assortativity (degree): it is a measure to see if nodes are connecting to other nodes similar in degree.  Closer to 1 means higher assortativity, closer to -1 diassortativity; while 0 is no assortitivity.

In [None]:
nx.degree_assortativity_coefficient(eliNet_giant)

In [None]:
nx.degree_assortativity_coefficient(tradeNet)

In [None]:
nx.degree_assortativity_coefficient(topsNet)

You can also compute assortativity using an attribute of interest:

In [None]:
nx.attribute_assortativity_coefficient(eliNet_giant,'multi')

In [None]:
nx.attribute_assortativity_coefficient(topsNet,'sex')

<a id='part22'></a>

### Exploration of network communities

A **clique** can be understood a community of nodes where all of them are connected to one another.

* How many cliques do we have?

In [None]:
len(list(nx.enumerate_all_cliques(eliNet_giant)))

In [None]:
#len(list(nx.enumerate_all_cliques(tradeNet)))

In [None]:
len(list(nx.enumerate_all_cliques(topsNet)))

If a clique in the network can not be bigger is you add another node, then you have a **maximal clique**.

<center><img src="https://github.com/EvansDataScience/CTforGA_Networks/raw/main/cliqueMaximal.png" width="500"></center>

* How many maximal cliques are there in this network?

In [None]:
sum(1 for _ in nx.find_cliques(eliNet_giant))

In [None]:
sum(1 for _ in nx.find_cliques(tradeNet))

In [None]:
sum(1 for _ in nx.find_cliques(topsNet))

This is how you see every maximal clique:

In [None]:
for a in nx.find_cliques(eliNet_giant):
    print (a)

You can find the size of the _maximum clique_, largest clique in the graph, like this:

In [None]:
max(len(c) for c in nx.find_cliques(eliNet_giant))

In [None]:
max(len(c) for c in nx.find_cliques(tradeNet))

In [None]:
max(len(c) for c in nx.find_cliques(topsNet))

You can see each maximum clique like this:

In [None]:
[c for c in nx.find_cliques(eliNet_giant) if len(c) == max(len(c) for c in nx.find_cliques(eliNet_giant))]

If a network presents cliques makes you suspect that there can be **communities**.

This is a huge field of research, let me me show you one of the algorithms known as the [_Louvain method_](https://perso.uclouvain.be/vincent.blondel/research/louvain.html).

This algorithm can be installed using **pip install python-louvain**, and import it using **community**.


In [None]:
import community 
parts = community.best_partition(eliNet_giant)
parts

You can add that value as an attribute:

In [None]:
nx.set_node_attributes(eliNet_giant, parts,'community')

Now plot this attribute:

In [None]:
pos=nx.spring_layout(eliNet_giant, k=0.2) 
plt.figure(figsize=(8,8))
color_map = plt.get_cmap("cool")
valuesForColors=[n[1]['community'] for n in eliNet_giant.nodes(data=True)]
nx.draw(eliNet_giant,node_color=valuesForColors,cmap=color_map,with_labels=True,edge_color='lightblue')

Let's turn our attention to the nodes and their roles in the network.

<a id='part23'></a>

## Exploration of network actors

In [None]:
# Computing centrality measures:
degr=nx.degree_centrality(eliNet_giant)  # based on connections count
clos=nx.closeness_centrality(eliNet_giant) # "speed" to access the rest
betw=nx.betweenness_centrality(eliNet_giant,normalized=True) # "control flow" among the network nodes
eige=nx.eigenvector_centrality(eliNet_giant) # central nodes connected to central nodes (influential?)

In [None]:
# measures into a data frame:
import pandas as pd
Centrality=[ [rich, degr[rich],clos[rich],betw[rich],eige[rich]] for rich in eliNet_giant]
headers=['person','Degree','Closeness','Betweenness','Eigenvector']
DFCentrality=pd.DataFrame(Centrality,columns=headers)
DFCentrality

In [None]:
fig, ax = plt.subplots(figsize=(10,10))

ax.scatter(DFCentrality.Betweenness, DFCentrality.Closeness,s=(DFCentrality.Degree+1.3)**14,
           c=DFCentrality.Eigenvector,
           cmap=plt.get_cmap('YlOrRd'), alpha=0.6)

valsForAnnotate=zip(DFCentrality['person'],DFCentrality['Betweenness'],DFCentrality['Closeness'])
for name,coordX,coordY in valsForAnnotate:
    ax.annotate(name, (coordX,coordY),alpha=0.5)
    
plt.title("scatterplot (size for degree of node, color for eigenvalue)")
plt.xlabel("betweenness")
plt.ylabel("closeness")
plt.show()

### Egonet

The node with the highest degree could be considered a _hub_ in the network:

In [None]:
# degr is a dictionary:
max(degr.keys(), key=(lambda k: degr[k]))

# or you can try:
#DFCentrality['person'].loc[DFCentrality['Degree'].idxmax()]

We can plot the neighbors of the hub, _its ego network_:

In [None]:
# Determine the hub name:
HubNode=max(degr.keys(), key=(lambda k: degr[k]))

# Get ego network of Hub
HubEgonet=nx.ego_graph(eliNet_giant,HubNode)

# prepare to plot:

## positions of the nodes
pos=nx.spring_layout(HubEgonet)

## plot whole ego network
nx.draw(HubEgonet,pos,node_color='b',node_size=800,with_labels=True, alpha=0.5,node_shape='^')

## make the hub salient
nx.draw_networkx_nodes(HubEgonet,pos,nodelist=[HubNode],node_size=2000,node_color='r')

plt.show()

Can this network be disconnected? 
If so, we can compute the minimum number of nodes that must be removed to disconnect the network (create at least two components:

In [None]:
nx.node_connectivity(eliNet_giant)

Who is the sole node with the power to break the network?

In [None]:
list(nx.articulation_points(eliNet_giant))

We can highlight the articulation node in the giant component:

In [None]:
# saving the cut point
cut=list(nx.articulation_points(eliNet_giant))

# positions for all the nodes
pos=nx.spring_layout(eliNet_giant,k=0.5)

# sizes for nodes
SALIENT, NORMAL=(2000,800)

# plot all nodes
nx.draw(eliNet_giant,pos,node_color='b',node_size=NORMAL,with_labels=True, alpha=0.5,node_shape='^')

# make the cut salient:
nx.draw_networkx_nodes(eliNet_giant,pos,nodelist=cut,node_size=SALIENT,node_color='r')
plt.show()