In [1]:
import pandas as pd #module to work with dataframes
import networkx as nx #module to work with networks
import numpy as np
import scipy as scpy
from networkx.algorithms import bipartite
from Functions import *
import matplotlib.pyplot as plt
#%matplotlib inline

# Lesson 2 Notebook

## Graphs Methods and NetwokX functions
Once we have information stored in the for of graphs, we want to access that information. There are two different ways to do that: methods included in the Graph itself, and functions from the NetworkX module that we apply to Graphs.

### Graph Methods
The graph object has some properties and methods giving data about the whole graph. We can access this information using **methods**. 
This data is available via graph *methods*, *i.e.* they are called from the graph object:

    G.<method_name>(<arguments>)

#### Obtaining nodes and edges in the network

You can get all the nodes in the network using `G.nodes(data=True)` and all the edges in the network using `G.edges()`.
They return `NodeView` and `EdgeView` objects, that have iterators, so we can use them in `for` loops:

In [2]:
# Example: Obtain the nodes of the network of The Lord of the Rings
G=load_LotR_network() #load the network
G.nodes()

NodeView(('andu', 'arag', 'arat', 'arwe', 'bage', 'bali', 'bere', 'bilb', 'bill', 'boro', 'bree', 'cele', 'comp', 'dene', 'dtow', 'duri', 'dwar', 'edor', 'elen', 'elro', 'elve', 'ents', 'eome', 'eorl', 'eowy', 'oldf', 'mirk', 'fara', 'frod', 'gala', 'ganda', 'gber', 'gild', 'gimli', 'gloi', 'glorf', 'goll', 'gond', 'gorb', 'grim', 'hald', 'helm', 'hobb', 'hton', 'isen', 'isil', 'lego', 'lori', 'loth', 'mdoo', 'merr', 'mord', 'morg', 'mori', 'nazg', 'nume', 'orth', 'osgi', 'pipp', 'ring', 'rive', 'roha', 'sams', 'saru', 'saur', 'sfax', 'shel', 'shir', 'theod', 'thor', 'thra', 'tiri', 'tomb', 'treeb', 'orcs'))

In [3]:
#The nodeview allos to iterate over the nodes:
for n in G.nodes():
    print(n)

andu
arag
arat
arwe
bage
bali
bere
bilb
bill
boro
bree
cele
comp
dene
dtow
duri
dwar
edor
elen
elro
elve
ents
eome
eorl
eowy
oldf
mirk
fara
frod
gala
ganda
gber
gild
gimli
gloi
glorf
goll
gond
gorb
grim
hald
helm
hobb
hton
isen
isil
lego
lori
loth
mdoo
merr
mord
morg
mori
nazg
nume
orth
osgi
pipp
ring
rive
roha
sams
saru
saur
sfax
shel
shir
theod
thor
thra
tiri
tomb
treeb
orcs


If you want to have access to the sttributes, you need to specify `data=True`when callin .nodes()

In [4]:
for n in G.nodes(data=True):
    print(n)

('andu', {'type': 'pla', 'Label': 'Anduin', 'FreqSum': 109, 'subtype': 'pla', 'gender': nan})
('arag', {'type': 'per', 'Label': 'Aragorn', 'FreqSum': 1069, 'subtype': 'men', 'gender': 'male'})
('arat', {'type': 'per', 'Label': 'Arathorn', 'FreqSum': 36, 'subtype': 'men', 'gender': 'male'})
('arwe', {'type': 'per', 'Label': 'Arwen', 'FreqSum': 51, 'subtype': 'elves', 'gender': 'female'})
('bage', {'type': 'pla', 'Label': 'Bag End', 'FreqSum': 77, 'subtype': 'pla', 'gender': nan})
('bali', {'type': 'per', 'Label': 'Balin', 'FreqSum': 30, 'subtype': 'dwarf', 'gender': 'male'})
('bere', {'type': 'per', 'Label': 'Beregond', 'FreqSum': 77, 'subtype': 'men', 'gender': 'male'})
('bilb', {'type': 'per', 'Label': 'Bilbo', 'FreqSum': 385, 'subtype': 'hobbit', 'gender': 'male'})
('bill', {'type': 'per', 'Label': 'Bill', 'FreqSum': 45, 'subtype': 'animal', 'gender': 'male'})
('boro', {'type': 'per', 'Label': 'Boromir', 'FreqSum': 293, 'subtype': 'men', 'gender': 'male'})
('bree', {'type': 'pla', 'L

And if you want to accesss one particular node by its ID:

In [5]:
G.nodes['tomb']

{'type': 'per',
 'Label': 'Bombadil',
 'FreqSum': 177,
 'subtype': 'ainur',
 'gender': 'male'}

<div class="alert alert-block alert-success"><b>Up to you: </b>
<h4> Exercise 7</h4>
Now get the edges of the network G.
</div>

In [6]:
# write and execute your code

In [7]:
# %load ./snippets/ex7.py


We can get the number of nodes and edges in a graph using the `number_of_` methods.

In [8]:
N=G.number_of_nodes()
N

75

In [9]:
L=G.number_of_edges()
L

1444

Some graph methods take an edge or node as argument. These provide the graph properties of the given edge or node. For example, the `.neighbors()` method gives the nodes linked to the given node. For performance reasons, many graph methods return iterators instead of lists. They are convenient to loop over:

In [10]:
# list of neighbors of node 'frod'
for neighbor in G.neighbors('thra'): #to who and to what places is Thrain (a dwarf) reated to?
    print(neighbor)

thor
dwar
duri
mori
ring
orcs
bali
ganda
saur
gimli
andu
elve
gloi
hobb
isil
shir


Note: and you can always use the `list` constructor to make a list from an iterator, or the `set`constructor to make a set

In [11]:
list(G.neighbors('thra'))

['thor',
 'dwar',
 'duri',
 'mori',
 'ring',
 'orcs',
 'bali',
 'ganda',
 'saur',
 'gimli',
 'andu',
 'elve',
 'gloi',
 'hobb',
 'isil',
 'shir']

#### Checking for existence of nodes and links

At some times you may want to check if a given node is in a network, or if two nodes are connected (they have an edge between them). 
To **check if a node is present** in a graph, you can use the `has_node()` method:

In [12]:
G.has_node('frod')

True

In [13]:
G.has_node('spiderman')

False

Likewise we can **check if two nodes are connected** by an edge using `has_edge()` method:

In [14]:
G.has_edge('frod', 'sams') #these two character are connected!

True

In [15]:
G.has_edge('goll', 'sfax') #Gollumn and Gandal's horse are never mentioned toghether in the books!!

False

In [16]:
# you can also check for existence wth "in": this is a way to see if a given element is inside a group of elements 
('frod', 'sams') in G.edges

True

> Note: Take into consideration that in **directed networks** the order of the tuple matters!. 
> Instead of the symmetric relationship "neighbors", nodes in directed graphs have `.predecessors()` (**"in-neighbors"**) and `.successors()` (**"out-neighbors"**):

#### Node degree

One of the most important questions we can ask about a node in a graph is how many other nodes it connects to. Using the `.neighbors()` method from above, we could formulate this question as so:

In [17]:
len(list(G.neighbors('frod')))

68

but this is such a common task that NetworkX provides us a graph method to do this in a much clearer way:

In [18]:
G.degree('frod')

68

> In **directed networks** we have `in-degree()` (edges **entering** the node) and `out degree()` (edges **exiting** the node). The method `.degree()` in directed networks returns the sum of the in and out connections.

<div class="alert alert-block alert-success"><b>Up to you: </b>
<h3> Exercise 8</h3>
Load the foodweb of the St Marks Estuary and answer these questions:
    
- How many species are in the network?
- What is the species that has more predators? (out-degree)
- What is the species that has a more varied diet? (in-degree)
- What are the species that feed on the most generalist predator? 
    
![title](./images/figure5.png)
</div>

In [None]:
#write your code

In [19]:
# %load ./snippets/ex8.py
# Start by loading the network as we did before
filename="./data/WoL_StMarks/st_marks_Ilist.csv"
Ilist=pd.read_csv(filename, header=None, index_col=None)
Ilist.columns=["source","target","w"]
FW=nx.from_pandas_edgelist(Ilist, edge_attr="w", create_using=nx.DiGraph)

#1 ) - How many species are in the network?
#check and print the number of nodes
S=FW.number_of_nodes()
print("\n1 - The number of species is %s\n" % S)

#2) What is the species that has more predators? (out-degree)
# you can see it as we have seen by writing:
print("Let's see the out_degree of each species")
for sp in FW.nodes():
    print(sp)
    print(FW.out_degree(sp))
    
#However is much better to store all the out-degrees in a series, as we can then work with it
K_out=pd.Series(dict(FW.out_degree()))
K_out.sort_values(ascending=False)
#Let's see how is the series
print("Let's see it store as a series")
print(K_out)
print("\n2 - The species with more predators is %s\n" % (K_out.idxmax()) )

#3) - What is the species that has a more varied diet? (in-degree)
K_in=pd.Series(dict(FW.in_degree()))
print("\n3 - The species with a more varied diet is %s\n" % (K_in.idxmax()))

#4)  - What are the species that prey on the most generalist predator? 
predator=K_in.idxmax()
print("\n4 - The species feeding on the generalist predator are:")
print(list(FW.successors(predator)))

<div class="alert alert-block alert-success"><b>Up to you: </b>
<h3> Exercise 8b</h3>
Load the web of crime: A bipartite network of associations among suspects, victims, and/or witnesses (in red) involved in crimes (in blue) in St. Louis in the 1990s, and aswer the following questions:
    
- 1. In how many crimes was involved the person that was involved in more crimes?
- 2. What role did he/she played in the crimes?
- 3. What is the crime that has more people involved? 
- 4. Find with whom he/she has shared more crimes

    
![title](./images/figure8b.png)
</div>

In [None]:
#load the crime network
G = load_crime_network()
#continue with your code here below

In [None]:
# %load ./snippets/ex8b.py
#start by determining the two types of nodes
person_nodes = {n for n, d in G.nodes(data=True) if d["bipartite"] == 'person'}
crime_nodes = set(G) - person_nodes

#get the degree of people and of crimes
K_person=pd.Series(dict(G.degree(person_nodes)))
K_crime=pd.Series(dict(G.degree(crime_nodes)))

#Get the person linked to most crimes
most_dangerous_person=K_person.idxmax()
number_of_crimes=G.degree(most_dangerous_person)

print("\n1- the most dangerous person is %s, involved in %s crimes" % (most_dangerous_person,number_of_crimes))

#look the role: We need to go over the EDGES of the person with all the crimes, and for each of retrieve the word inside the attribute "role. We then store them in a list called Roles
Roles=[]
for c, p, r in G.edges(most_dangerous_person, data=True):
    print(c, p, r["role"])
    Role=r["role"]
    Roles.append(Role)

#transform the list of roles in a dictionary containig how many times they appear
from collections import Counter
counts = dict(Counter(Roles))
print(counts)

print("\n2- The person appears:")
for key in counts:
    print("%s times as %s" % (counts[key],key))
    
#look for the crime with more people involved (higer degree)    
print 
most_trending_crime=K_crime.idxmax()
number_of_people_involved=G.degree(most_trending_crime)

print("\n3- The most common crime is %s, that got %s poeple involved" % (most_trending_crime,number_of_people_involved))


# What persons have more shared crimes in common?
# do the unipartite projection on people: weghts give the number of shared crimes!
people_projection = nx.bipartite.weighted_projected_graph(G, person_nodes)
#pass it to edgelist, so we can order by weigth!
shared_df=nx.to_pandas_edgelist(people_projection,source='person1',target='person2')
print("\n4- The persons with more shared crimes are:")
print(shared_df.sort_values(by="weight",ascending=False).head(5))

#To obtain the graph of the network
#pos = nx.spring_layout(G) #assing position to the nodes acoording to their bipartite property
#color_dict={"person":"red","crime":"blue"} #dictionary of node color, each bipartite set is asociated to a different color
#colors = [color_dict[node[1]['bipartite']] for node in G.nodes(data=True)]
#nx.draw(G, pos, node_color=colors, with_labels=False,node_size=20)

### Networkx functions

While several of the most-used NetworkX functions are provided as methods, as we just saw, many more of them are module functions and are called like this:

    nx.<function_name>(G, <arguments>)

that is, with the graph provided as the first, and maybe only, argument. Here are a couple of examples of NetworkX module functions that provide information about a graph:

In [None]:
# To see if a Graph is connected:
nx.is_connected(G)

In [None]:
# Also the function to plot a Graph
nx.draw(G, with_labels=True)

In [None]:
#or to know if a network is bipartite
nx.is_bipartite(G)