<center>
<hr>
<h1>Complessità nei sistemi sociali</h1>
<h2>Laurea Magistrale in Fisica Dei Sistemi Complessi</h2>
<h2>A.A. 2016/17</h2>
<h3>Dr. Daniela Paolotti, Dr. Michele Tizzoni</h3>
<h3>Introduction to NetworkX - Network visualization</h3>
<hr>
</center>

# NetworkX preliminaries
---

We will use the Python library [NetworkX](https://networkx.github.io/index.html). It is well documented and several [examples](http://networkx.readthedocs.io/en/networkx-1.11/examples/) are available.

It is not the only Python library available for network analysis. Another very good one is [graph-tool](https://graph-tool.skewed.de/).

Also the [SNAP library](http://snap.stanford.edu/data/index.html) provides an excellent tool to analyze very large networks.


In [None]:
import networkx as nx
from operator import itemgetter

We import the plotting library seaborn which integrates very well with matplotlib.
More documentation is available here: https://seaborn.pydata.org/

In [None]:
import seaborn as sns

In [None]:
%pylab inline

Generate an empty UNDIRECTED graph with NetworkX

In [None]:
G=nx.Graph()

Add nodes from a list of names

In [None]:
G.add_nodes_from(['Luca','Andrea','Sara','Carlo','Veronica'])

In [None]:
G.nodes()

Add another node

In [None]:
G.add_node('Giovanni')

print "The nodes of G are: "
print G.nodes()

So far we have nodes in the networks but no edges.
Let's add edges from a list of tuples.

In [None]:
G.add_edges_from([('Luca','Sara'),('Andrea','Luca'),
('Carlo','Veronica'),('Sara','Veronica'),('Giovanni','Andrea')])

We add another edge and 'Lucia' is a new node.

In [None]:
G.add_edge('Veronica','Lucia')

print "The nodes of G are : "
print G.nodes()

print "The links of G are : "
print G.edges()

We want to add the property 'age' to each node

In [None]:
for n in G.nodes():
    if n[0]=='L':
        G.node[n]['age']=24
    else:
        G.node[n]['age']=28

#show the nodes with their age
print G.nodes(data=True)

How to remove a node?

In [None]:
G.remove_node('Luca')
print G.nodes(data=True)
print G.edges()

If we remove an edge, we do not remove the nodes! 

In [None]:
G.remove_edge('Giovanni', 'Andrea')

In [None]:
print G.edges()

Degree is easily accessible

In [None]:
print G.degree('Veronica')

In [None]:
print G.degree()

In [None]:
for i in G.degree():
    print i, G.degree()[i]

## Analyze the citHepTh network with NetworkX
---
We analyze the citation dataset (citHepTh) available on the [Stanford Large Network Data Colletion](http://snap.stanford.edu/data/index.html). 

The network is directed!

In [None]:
H=nx.DiGraph()

In [None]:
fh=open('./cit-HepTh.txt','r')
#reading all the file lines
for line in fh.readlines():
    #remove "\n" characters (.strip()) and split the line at blank spaces (split.())
    s=line.strip().split()
    if s[0]!='#':
        #the first lines are comments
        origin=int(s[0])
        dest=int(s[1])
        H.add_edge(origin,dest)
    
#chiudo il file
fh.close()

In [None]:
print "The network has", len(H), "nodes"

In [None]:
H.in_degree()
#this is a dictionary node->degree_in

### We want to count how many nodes have a given degree-in
The Counter module is a dictionary subclass that allows quick item counting.

In [None]:
from collections import Counter 
degin_distri=Counter(H.in_degree().values())
degin_distri

In [None]:
x=[]
y=[]
for i in sorted(degin_distri):   
    x.append(i)
    y.append(float(degin_distri[i])/len(H))

plt.figure(figsize=(10,7))    
plt.plot(np.array(x),np.array(y))

plt.xlabel('$k_{in}$', fontsize=18)
plt.ylabel('$P(k_{in})$', fontsize=18)

plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

plt.yscale('log')
plt.xscale('log')
plt.axis([1,10000,0.00001,1.0])
plt.show()

Let's plot the degree out distribution

In [None]:
degout_distri=Counter(H.out_degree().values())
degout_distri

In [None]:
x=[]
y=[]
for i in sorted(degout_distri):   
    x.append(i)
    y.append(float(degout_distri[i])/len(H))

plt.figure(figsize=(10,7))    
plt.plot(np.array(x),np.array(y))

plt.xlabel('$k_{out}$', fontsize=18)
plt.ylabel('$P(k_{out})$', fontsize=18)

plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

plt.yscale('log')
plt.xscale('log')
plt.axis([1,10000,0.00001,1.0])
plt.show()

### Export to gml (be careful this is a large network!)

The file in gml format can be visualized using the software tool Gephi (http://gephi.org).

In [None]:
nx.write_gml(H,'./citHepTh.gml')

# Visualizing a network with NetworkX

NetworkX combined with matplotlib can be used to visualize complex networks. 

It provides a good range of functionalities to obtain some basic and more refined visualization. More details are available in the [documentation](https://networkx.github.io/documentation/development/reference/drawing.html).

Notice, as stated in the documentation
>NetworkX provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization. 
>In the future, graph visualization functionality may be removed from NetworkX or only available as an add-on package.


We generate a random Erdos-Renyi network and visualize it.

In [None]:
N=100
prob=0.05

In [None]:
ER=nx.erdos_renyi_graph(N, prob)

In [None]:
plt.figure(figsize=(8,6))
nx.draw_networkx(ER)

In [None]:
plt.figure(figsize=(8,6))
nx.draw_circular(ER)

In [None]:
plt.figure(figsize=(8,6))
nx.draw_random(ER)

In [None]:
plt.figure(figsize=(8,6))
nx.draw_spring(ER)

In [None]:
pos=nx.spring_layout(ER)
pos

We can draw the network nodes only, and assign a specific location to each node.

In [None]:
plt.figure(figsize=(8,6))
nx.draw_networkx_nodes(ER, pos)
#plt.axis('off')

In [None]:
plt.figure(figsize=(8,6))

s=nx.draw_networkx_nodes(ER,
            pos,
            node_size=100.0,
            node_color=nx.degree(ER).values(),
            alpha=1,
            cmap=plt.cm.coolwarm
            )

nx.draw_networkx_edges(ER, pos, alpha=0.5)

#show the colorbar on the right side
cbar=plt.colorbar(s)
cbar.ax.set_ylabel('Degree', size=22)

#plt.axis('off')
plt.show()

# Visualizing a spatial network with NetworkX

We analyze the US airport network of year 2010. 
The network is available from the [network repository of Tore Opshal](https://toreopsahl.com/datasets/#usairports).

Weights represent the total number of passengers who traveled on that connection in a year.

Airport coordinates have been added by myself.


In [None]:
G=nx.Graph()
fh=open('./USairport_2010.txt','r')
for line in fh.readlines():
    s=line.strip().split()
    G.add_edge(int(s[0]),int(s[1]))
fh.close()    

In [None]:
len(G)

Is the network fully connected?

In [None]:
nx.number_connected_components(G)

In [None]:
c=list(nx.connected_components(G))
c[-1]

We define three dictionaries associated to the network to store additional node's features: IATA code, aiport name, geographic coordinates.

In [None]:
G.code={}
G.name={}
G.pos={}

We extract nodes features from a file

In [None]:
finfo=open('./USairport_2010_codes.txt','r')
for line in finfo.readlines():
    s=line.strip().split()
    node=int(s[0])
    G.code[node]=s[1]
    G.name[node]=s[2]
    G.pos[node]=[float(s[4]),float(s[3])]
finfo.close() 

Draw the network

In [None]:
fig=plt.figure(figsize=(10,7))

nx.draw_networkx_nodes(G,
            pos=G.pos,
            node_size=20
            )

nx.draw_networkx_labels(G,
            pos=G.pos,
            labels=G.code
            )

plt.axis('off')

We would like to draw the edges but there are too many of them.

In [None]:
len(G.edges())

We select only the strongest connections as a subgraph of G by setting a threshold on the annual passengers volume.

In [None]:
weight_threshold=700000

In [None]:
H=nx.Graph()

H.pos={}
H.code={}
H.name={}

fh=open('./USairport_2010.txt','r')
for line in fh.readlines():
    s=line.strip().split()
    node1=int(s[0])
    node2=int(s[1])
    
    if int(s[2])>weight_threshold:
        H.add_edge(node1,node2)
        
        H.pos[node1]=G.pos[node1]
        H.pos[node2]=G.pos[node2]
        
        H.code[node1]=G.code[node1]
        H.code[node2]=G.code[node2]
        
        H.name[node1]=G.name[node1]
        H.name[node2]=G.name[node2]
        
        
fh.close()

We draw the network and color code the nodes by their degree.

In [None]:
import math
fig=plt.figure(figsize=(14,10))

s=nx.draw_networkx_nodes(G,
            pos=G.pos,
            node_color=[math.log(G.degree(v)) for v in G],
            node_size=30,
            cmap=plt.cm.YlOrRd
            )

nx.draw_networkx_edges(H,
            pos=G.pos,
            alpha=0.5
            )

nx.draw_networkx_labels(H,
            pos=H.pos,
            labels=H.name
            )

cbar=plt.colorbar(s)
cbar.ax.set_ylabel('$log(k)$', size=22)

plt.axis('off')

What is the node with the largest degree?

In [None]:
max(G.degree().iteritems(), key=itemgetter(1))

In [None]:
G.name[389]

# Data visualization with NetworkX

The easy interface of NetworkX and matplotlib to draw georeferenced data can be used to visualize all type of data points with geo-coordinates. 

A nice example comes from the electoral results of the municipality of Turin available at [the Open Data repository AperTO](http://aperto.comune.torino.it/?q=taxonomy/term/111).

Electoral data can be geo-referenced through the [dataset containing every street number of the city with its coordinates](http://aperto.comune.torino.it/?q=node/504).

Based on this idea, we created a Web-interface to explore electoral data of the city of Torino: [Il colore di Torino](http://datainterfaces.org/projects/ilcoloreditorino/)