# Gentle introduction to social networks in Python

### Prof. Jose Manuel MAGALLANES.
* Professor at Pontificia Universidad Católica del Peru
* Senior Data Scientist at eScience Institute, University of Washington

_______


## CONTENTS:

### 1. Network from dataframes


These data frames have been previously prepared, based on the data from [50 influential people in Seattle tech to follow on Twitter](https://www.builtinseattle.com/2018/08/06/50-seattle-tech-twitter-accounts-to-follow). The file can be seen [here](https://docs.google.com/spreadsheets/d/e/2PACX-1vTpWvtfphnO9eRYwmEyxVAxxo1KEsdpK6sK6q5uhJn5x2QtB-eGiso8ibpF16NaHAers4wDIHkvBo64/pubhtml), and includes 3 sheets:

* Network as two columns representing edges.
* Network as adjacency matrix.
* Attributes of the Network nodes.

In [None]:
fileLink='https://github.com/eScience-UW/gentleIntro_networks/raw/main/data/seattleTop.xlsx'

import pandas as pd

edges=pd.read_excel(io=fileLink,sheet_name="edges")
adjacency=pd.read_excel(io=fileLink,sheet_name="adjacency",index_col=0)# index_col!
attributes=pd.read_excel(io=fileLink,sheet_name="attributes")

Use the previous dataframes to create the network:

* From the edges:

In [None]:
import networkx as net # package needed

EliteNet =net.from_pandas_edgelist(edges)


In [None]:

EliteNet,len(EliteNet.nodes),len(EliteNet.edges)

* From the adjacency:

In [None]:

EliteNet = net.Graph(adjacency)


In [None]:
EliteNet,len(EliteNet.nodes),len(EliteNet.edges)

In [None]:
EliteNet.edges(data=True)

### 2. Adding Attributes

Attributes are added using dictionaries _{node_name:attribute_value}_:

In [None]:
# create dictionaries

# for Nodes
dictAttribute_male=dict(zip(attributes.name,attributes.male))
dictAttribute_followers=dict(zip(attributes.name,attributes.followers))

# for edges
dictAttribute_typeRel=dict(zip(EliteNet.edges(),['tech']*len(EliteNet.edges())))

In [None]:
dictAttribute_male

In [None]:
dictAttribute_typeRel

In [None]:
# set attribute with the dictionaries

# for nodes
net.set_node_attributes(EliteNet, dictAttribute_male,'male')
net.set_node_attributes(EliteNet, dictAttribute_followers,'followers')

# for edges
net.set_edge_attributes(EliteNet, dictAttribute_typeRel,'typeRelation')

In [None]:
## see them
EliteNet.nodes(data=True)

In [None]:
EliteNet.edges(data=True)

In [None]:
maleValues=net.get_node_attributes(EliteNet,'male').values()
colors_for_nodes=['green' if attr==1 else 'red' for attr in maleValues]

In [None]:
net.draw(EliteNet,
        with_labels=True,
        node_color=colors_for_nodes)

## 3. Exploring the Network

* Connectedness: A network is  “connected” if there exists a _path_ between any pair of nodes (undirected networks).

In [None]:
net.is_connected(EliteNet)

* Density: from 0 to 1, where 1 makes it a ‘complete’ network: there is a link between every pair of nodes.

In [None]:
net.density(EliteNet) 

* Diameter: When two vertices are connected, one can reach the other using multiple egdes. The geodesic is the shorthest path between two connected vertices. Then, the diameter, is the maximum geodesic in a network.

In [None]:
net.diameter(EliteNet)

* Assortativity: it is a measure to see if nodes are connecting to other nodes similar to themselves. Closer to 1 means higher assortativity, closer to -1 diassortativity; while 0 is no assortativity.

a.  Degree assortativity: tendency for highly connected nodes to be connected among themselves.

In [None]:
net.degree_assortativity_coefficient(EliteNet)

b.  Categorical assortativity: tendency for nodes with other nodes sharing the same category.

In [None]:
net.attribute_assortativity_coefficient(EliteNet,'male')

c. Assortativity (numerical): 

In [None]:
net.numeric_assortativity_coefficient(EliteNet,'followers')

## 4. Exploration of Network nodes

- The eigenvector of a vertex will tell you how well connected is a vertex; that is, vertices with the highest values are considered the most influential as they are connected to vertices that are also well connected.

- The closeness of a vertex will tell you how close is a vertex to every other vertex. A vertex with high closeness can share information faster than the rest.

- The betweeness of a vertex will tell you how critical is a vertex to connect vertex that are not connected directly.

In [None]:
# Computing centrality measures:

eigen=net.eigenvector_centrality(EliteNet)

clos=net.closeness_centrality(EliteNet)

betw=net.betweenness_centrality(EliteNet)


In [None]:
# the dataframe of centralities

DFCentrality=pd.DataFrame(dict(Eigenvector = eigen,
                               Closeness = clos,
                               Betweenness = betw)) 


In [None]:
DFCentrality.reset_index(drop=False,names='person',inplace=True)
DFCentrality

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(7,7))

ax.scatter(DFCentrality.Betweenness, DFCentrality.Closeness,s=(DFCentrality.Eigenvector+1)**35, alpha=0.5)

valsForAnnotate=zip(DFCentrality['person'],DFCentrality['Betweenness'],DFCentrality['Closeness'])

for name,coordX,coordY in valsForAnnotate:
    ax.annotate(name, (coordX,coordY),alpha=0.5,size=7)
    
plt.title("scatterplot (size for Eigenvector of node)")
plt.xlabel("Betweenness")
plt.ylabel("Closeness")
plt.show()

The previous results tells us that two people are salient:

In [None]:
# Determine the hub name:

labelsHubs={n:n for n in DFCentrality.nlargest(2, 'Eigenvector')['person']}
labelsHubs

let's highlight those guys:

In [None]:
# prepare to plot:
fig, ax = plt.subplots(figsize=(7,7))

## positions of the nodes
pos=net.spring_layout(EliteNet)

#edges
net.draw_networkx_edges(EliteNet,pos,edge_color='grey',alpha=0.2)
# make the hub salient
net.draw_networkx_nodes(EliteNet,pos,nodelist=labelsHubs,node_size=200,node_color='r')
# label the hubs
net.draw_networkx_labels(EliteNet,pos,labels=labelsHubs,font_size=8,font_color='blue')

plt.show()

## 5. Communities

In [None]:
RandomNet=net.erdos_renyi_graph(len(EliteNet.nodes), p=0.5)

* Transitivity: How probable is that two nodes with a common connection, are also connected.

In [None]:
net.transitivity(RandomNet)

In [None]:
net.transitivity(EliteNet)

* Partition

Set of nodes that belong to a group. The hard problems is to decide what node goes where:

In [None]:
# edges that will partition the net
edgesThePartition=net.minimum_edge_cut(EliteNet)
type(edgesThePartition), edgesThePartition

Let's plot:

In [None]:
# labels for the nodes that were part of the edge removed

byeEdgeNodes=set(sum(net.minimum_edge_cut(EliteNet),())) # flattening the set
labelsEdgesBye={n:n for n in byeEdgeNodes} # a dictionary

## see
labelsEdgesBye

In [None]:
EliteNet_cut=EliteNet.copy() 

EliteNet_cut.remove_edges_from(edgesThePartition)


fig, ax = plt.subplots(figsize=(10,10))
pos=net.spring_layout(EliteNet) # positions for the original net
net.draw(EliteNet_cut,pos=pos,node_color='yellow')  # plotting the cuts 
net.draw_networkx_labels(EliteNet,pos,
                         labels=labelsEdgesBye,font_size=12,font_color='red'); # labelling



The **girvan_newman** algorithm is based on creating partitions using the centrality of nodes to create partitions too. We can create something similar to the last result like this:

In [None]:
# generator of partitions
partition_girvanNewman_all = net.community.girvan_newman(EliteNet) 
# the first partition
partition_girvanNewman_first = list(set(sorted(c)) for c in next(partition_girvanNewman_all)) 

Replotting the previous:

In [None]:
fig, ax = plt.subplots(figsize=(10,10))

pos = net.spring_layout(EliteNet)

cmap = plt.get_cmap('Accent', len(partition_girvanNewman_first)) # amount of colors

net.draw_networkx_edges(EliteNet, pos, alpha=0.2) # all edges
net.draw_networkx_labels(EliteNet,pos,
                         labels=labelsEdgesBye,font_size=6,font_color='red'); # labelling


for color, nodes in enumerate(partition_girvanNewman_first): # position , nodeset
    
    net.draw_networkx_nodes(EliteNet, pos, nodes, node_size=50,
                           node_color=[cmap.colors[color]])

A more complex algorithm is the [Louvain](https://perso.uclouvain.be/vincent.blondel/research/louvain.html): 

In [None]:
partition_louvain=net.community.louvain_communities(EliteNet)

In [None]:
###

fig, ax = plt.subplots(figsize=(10,10))
pos = net.spring_layout(EliteNet,k=3)
cmap = plt.get_cmap('Accent', len(partition_louvain))
net.draw_networkx_edges(EliteNet, pos, alpha=0.2)
for color, nodes in enumerate(partition_louvain):
    
    net.draw_networkx_nodes(EliteNet, pos, nodes, node_size=100,
                           node_color=[cmap.colors[color]])

In [None]:
partition_girvanNewman_all = list(net.community.girvan_newman(EliteNet))
pd.DataFrame([partition_girvanNewman_all[len(partition_louvain)],
              partition_girvanNewman_all[len(partition_louvain)-1],
              partition_girvanNewman_all[len(partition_louvain)-2]]).T

In [None]:
len(partition_louvain)

In [None]:
a_partition_girvanNewman=partition_girvanNewman_all[len(partition_louvain)-2]

* Modularity:

If we get positive values (being 1 the top value), we could consider there are good community structures ([wiki](https://en.wikipedia.org/wiki/Modularity_(networks))). The higher modularity the connections between the nodes are dense  within a partition but sparse  between nodes in different partitions.

In [None]:
net.community.modularity(EliteNet,a_partition_girvanNewman)

In [None]:
net.community.modularity(EliteNet,partition_louvain)

## Exporting

In [None]:
from collections import ChainMap


GnNn_attr=dict(ChainMap(*[{node:a for node in b} for a,b in enumerate(a_partition_girvanNewman)]))
GnNn_attr

In [None]:
Ln_attr=dict(ChainMap(*[{node:a for node in b} for a,b in enumerate(partition_louvain)]))
Ln_attr

In [None]:
net.set_node_attributes(EliteNet, GnNn_attr,'Girvan_Newman_partition')
net.set_node_attributes(EliteNet, Ln_attr,'Louvain_partition')

In [None]:
net.write_graphml(EliteNet, "EliteNet_py.graphml",encoding='utf-8')