# Working with networks in Python using graph-tool

There are several network packages for Python:

* NetworkX: https://networkx.github.io/
* igraph: https://igraph.org/
* graph-tool: https://graph-tool.skewed.de/

Why use `graph-tool`? See a performance comparison here: https://graph-tool.skewed.de/performance

NetworkX is implemented in pure Python. Instead, `graph-tool` follows the Numpy philosophy, and implements the core data structure and algorithms in a lower-level language, C++ with [templates](https://en.wikipedia.org/wiki/Template_(C%2B%2B)). This means a performance improvement of up to 200x.

**But there is no free lunch!** Implementing a library in C++ means that it requires the C++ infrastructure and environmente to be compiled and installed in different architectures. C++ is not as portable as (pure) Python, so a program compiled for GNU/Linux does not work in MacOS, etc.

Instructions to install graph-tool in various systems are available here: https://git.skewed.de/count0/graph-tool/wikis/installation-instructions

Users of MacOS will probably prefer to install it via [homebew](https://brew.sh). Users of GNU/Linux (Debian, Ubuntu, Arch, etc) will have the easiest time installing it. Windows users will have the worst time.

``graph-tool`` has **lots** of documentation that you should definitely read: https://graph-tool.skewed.de/static/doc

#### Excercise 1 (or Homework)

Install `graph-tool` on your machine.

Run the following cell if you are using https://colab.research.google.com/ (ignore it otherwise)

In [None]:
!echo "deb http://downloads.skewed.de/apt bionic main" >> /etc/apt/sources.list
!apt-key adv --keyserver keys.openpgp.org --recv-key 612DEFB798507F25
!apt-get update
!apt-get install python3-graph-tool python3-cairo python3-matplotlib

### Creating a Graph

In [None]:
from graph_tool.all import *

g = Graph()            # create an empty graph with no vertices and no edges. 
print(g)

#By default graphs are directed. If you wish undirected graphs, you need to pass the option: directed=False

g = Graph(directed=False)
print(g)

### Vertices
Adding vertices (nodes) are done with the ``Graph.add_vertex()`` method.

In [None]:
v = g.add_vertex()    # add a single vertex, and returns the vertex object
print(list(g.vertices()))

In [None]:
vs = g.add_vertex(10)    # you can add many vertices at once, and an iterator to the vertices added is returned
print(list(vs))
print(g.num_vertices())

Vertices are always indexed from `0` to `N-1` where `N` is the total number of vertices.

We can always obtain the vertex object directly from the index:

In [None]:
v = g.vertex(5)
v

### Edges

We can add edges using ``Graph.add_edge()``.

In [None]:
v1 = g.vertex(0)
v2 = g.vertex(1)

e = g.add_edge(v1, v2)

print(repr(e))

# we can also use the vertex index directly:

e2 = g.add_edge(0, 2)

print(repr(e2))

print(g.num_edges())

We can add many edges at once using `Graph.add_edge_list()`:

In [None]:
g = Graph(directed=False)
g.add_edge_list([(0, 1), (2, 3), (1, 0), (3, 4)])  # non-existing vertices are automatically added!

print(g)

In [None]:
# We can lookup the existence of edges with the ``Graph.edge()`` method:

e = g.edge(2, 3)
print(e)

e = g.edge(2, 4)
print(e)


# We can query the source and target of an edge:

e = g.edge(2, 3)

print(e.source(), e.target())

# We can also convert an edge to a tuple:

u, v = e

print(u, v)

### Iterating over vertices and edges

In [None]:
for v in g.vertices():
    print(v)
    
for e in g.edges():
    print(e)

for u, v in g.edges():
    print(u, v)
    
for v in g.vertices():
    print(f"Edges incident on {v}: ", end="")
    for e in v.out_edges():
        print(e, end=" ")
    print()
    
for v in g.vertices():
    print(f"The degree of node {v} is {v.out_degree()} and its neighbors are: ", end="")
    for u in v.out_neighbors():
        print(u, end=" ")
    print()

### Directed graphs

In [None]:
g = Graph()
g.add_edge_list([(0, 1), (2, 3), (1, 0), (3, 4)])

print(g)

for v in g.vertices():
    print(f"Outgoing edges from {v}: ", end="")
    for e in v.out_edges():
        print(e, end=" ")
    print()
    
    print(f"Incoming edges to {v}: ", end="")
    for e in v.in_edges():
        print(e, end=" ")        
    print()
    
for v in g.vertices():
    print(f"The out-degree of node {v} is {v.out_degree()} and its out-neighbors are: ", end="")
    for u in v.out_neighbors():
        print(u, end=" ")
    print()

    print(f"The in-degree of node {v} is {v.in_degree()} and its in-neighbors are: ", end="")
    for u in v.in_neighbors():
        print(u, end=" ")
    print()



### Drawing Graphs

``graph-tool`` has sophisticated routines for drawing graphs.

In [None]:
graph_draw(g, output_size=(200, 200));

### Exercise 1

Create and draw the following graph:

<img src="https://upload.wikimedia.org/wikipedia/commons/9/96/K%C3%B6nigsberg_graph.svg"/>


### Exercise 2

Using the graph created above:

* Count the number of edges
* Count the number of nodes
* Calculate the average degree per node
* Calculate the maximum and minimum number of neighbors
* Calculate the number of nodes with degree = 3

## Property maps

In ``graph-tool`` we can attribute nodes and edges with arbitrary properties using property maps.

Property maps can be of the following types:

    
|Type name                     | Alias                                   |
|------------------------------|-----------------------------------------|
|``bool``                      |    ``uint8_t``                          |
|``int16_t``                   |    ``short``                            |
|    ``int32_t``               |    ``int``                              |
|    ``int64_t``               |    ``long``, ``long long``              |
|    ``double``                |    ``float``                            |
|    ``long double``           |                                         | 
|    ``string``                |                                         |
|    ``vector<bool>``          |    ``vector<uint8_t>``                  |
|    ``vector<int16_t>``       |    ``vector<short>``                    |
|    ``vector<int32_t>``       |    ``vector<int>``                      |
|    ``vector<int64_t>``       | ``vector<long>``, ``vector<long long>`` |
|    ``vector<double>``        |    ``vector<float>``                    |
|    ``vector<long double>``   |                                         |
|    ``vector<string>``        |                                         |
|    ``python::object``        |   ``object``                            |



In [None]:
vsize = g.new_vp("int")   # new vertex property map of type int
eweight = g.new_ep("double") # new edge property map of type double

for v in g.vertices():
    vsize[v] = 10
    
for e in g.edges():
    eweight[e] = 3.2
    
# We can also access the values of property maps as numpy arrays:

print(vsize.a)

vsize.a = [3, 10, 5, 1, 15]
vsize.a *= 4
eweight.a = [3.2] * 4
    
# property maps can be used with many functions, e.g. graph_draw()

graph_draw(g, vertex_size=vsize, vertex_fill_color=vsize, edge_pen_width=eweight, output_size=(300, 300));


In [None]:
# shortest paths

g = lattice([25, 25])

vertices, edges = shortest_path(g, g.vertex(0), g.vertex(g.num_vertices() - 1))

ecolor = g.new_ep("string", val="black")
vcolor = g.new_vp("string", val="black")

for v in vertices:
    vcolor[v] = "red"
for e in edges:
    ecolor[e] = "red"

pos = sfdp_layout(g, multilevel=True)

graph_draw(g, pos=pos, vertex_fill_color=vcolor, edge_color=ecolor);


In [None]:
# now we use random weights

import numpy.random

eweights = g.new_ep("double")
eweights.a = numpy.random.random(len(eweights.a))
print(eweights.a)

vertices, edges = shortest_path(g, g.vertex(0), g.vertex(g.num_vertices() - 1), weights=eweights)

ecolor = g.new_ep("string", val="black")
vcolor = g.new_vp("string", val="black")

for v in vertices:
    vcolor[v] = "red"
for e in edges:
    ecolor[e] = "red"

ewidth = eweights.copy()
ewidth.a = 1-ewidth.a 
    
graph_draw(g, pos=pos, vertex_fill_color=vcolor, edge_color=ecolor, edge_pen_width=ewidth);

#### Exercise 3

Consider the weighted undirected graph corresponding to the following list of edges:

    (a, b) weight = 0.6
    (a, c) weight = 0.2
    (c, d) weight = 0.1
    (c, e) weight = 0.7
    (c, f) weight = 0.9
    (a, d) weight = 0.3

* Create a graph with the edges above and two property maps, vlabel and vweight, with the vertex labels and edge weights, respectively.
* Draw the graph, weith using the edge weight as edge width.
* Compute the shortest path from 'b' to 'e' and draw it.
* Change the edge weights so that the shortest path goes through 'd'.



### Random graphs
Banchmarking, testbeds

### Exercise 4
Plot the degree distribution (histogram) of the Erdős-Rényi graph generated as:
```python
g = random_graph(100000, lambda: numpy.random.poisson(5), directed=False)

```

## Graph IO and internal property maps

In [None]:
g = random_graph(10000, lambda: (3,3))
g.save("g.gt") # The gt file format is a binary format that is very efficient

u = load_graph("g.gt")
print(similarity(u, g))

# Other fine formats are also supported
g.save("g.xml")  # GraphML file format
g.save("g.dot")  # Dot file format

# Compression can be achieved by appending ".gz", ".bz2" or ".xz" to the file names:

g.save("g.gt.gz")
g.save("g.xml.bz2")
g.save("g.dot.xz")

u = load_graph("g.xml.bz2")
print(similarity(u, g))

In [None]:
# If we want to store property maps with out graph, we need to make them internal

eweight = g.new_ep("double", vals=numpy.random.random(g.num_edges()))
vcolor = g.new_vp("int", vals=numpy.random.randint(0, 10, g.num_vertices()))

g.ep["eweight"] = eweight
g.vp["vcolor"] = vcolor

g.list_properties()


g.save("g.gt")

u = load_graph("g.gt")
u.list_properties()

eweight = g.ep["eweight"]
vcolor = g.vp["vcolor"]

print(similarity(g, u, g.ep["eweight"], u.ep["eweight"]))


In [None]:
# Shortcuts for property maps. The following two statements are equivalent:

print(g.ep["eweight"].a)
print(g.ep.eweight.a)



## Graph filtering and graph views

In [None]:
g = collection.data["polblogs"]
graph_draw(g, pos=g.vp.pos);

In [None]:
c = label_largest_component(g, directed=False)
u = GraphView(g, vfilt=c)
graph_draw(u, u.vp.pos);

In [None]:
print(u)

In [None]:
import matplotlib.cm

vb, eb = betweenness(u)
graph_draw(u, pos=u.vp.pos, vertex_color=vb, vertex_fill_color=vb, vertex_size=prop_to_size(vb, 5, 20),
           vorder=vb, vcmap=matplotlib.cm.plasma);

In [None]:
# Graph views can be composed.

u.list_properties()

In [None]:
# let's plot only the networks of right wing blogs
w = GraphView(u, vfilt=lambda v: u.vp.value[v] == 1)   # the supplied lambda function evaluates to True 
                                                       # for vertices that are *kept* in the graph
    
graph_draw(w, w.vp.pos);

In [None]:
# Let's look only at the connections between left and right wing
w = GraphView(u, efilt=lambda e: u.vp.value[e.source()] != u.vp.value[e.target()])
graph_draw(w, w.vp.pos);

In [None]:
# We can transform graphs between directed and undirected (and vice-versa) using GraphView as well:

w = GraphView(u, directed=False)
graph_draw(w, w.vp.pos);

### Exercise 5

* Using `GraphView`, write a function that gets the undirected version of a directed graph and filters out vertices of degree 3 or smaller, returning the result.

* Use the function above on the `polblogs` network.

* What happens if you run that function iteratively, multiple times?

# Community detection and statistical inference

Graph-tool has extensive functionally to detect modules (or "communities") of nodes using principled statistical inference approaches. A detailed howto can be found here: https://graph-tool.skewed.de/static/doc/demos/inference/inference.html

In [None]:
g = collection.data["polblogs"]
g = extract_largest_component(g, directed=False, prune=True)

graph_draw(g, pos=g.vp.pos);

In [None]:
# Suppose we want to figure out the groups, without "looking" at the network or using the metadata.
# We can do this by fitting the stochastic block model (SBM):

state = minimize_blockmodel_dl(g, B_min=2, B_max=2)  # we will force B=2 groups for now

b = state.get_blocks()
print(b.a)

state.draw(pos=g.vp.pos);

In [None]:
# What if we don't set the number of groups?

state = minimize_blockmodel_dl(g)
print(state)
state.draw(pos=g.vp.pos);


In [None]:
# We can also fit hierarchical SBMs, which have a stronger explanatory power:

state = minimize_nested_blockmodel_dl(g)
state.draw();

### Exercise  6

* Generate a random graph with 1000 nodes and with a Poisson degree distribution with mean 1.2.
* Draw the graph.
* Fit a SBM (non-nested) on it. How many groups does it find? Do you find this reasonable?

### Exercise 7

* The network `collection.data["dolphins"]` contains the network of friendships between 62 dolphins.
* Use the SBM to investigate its structure: Fit the SBM and visualize the result.

# Dynamics on networks

With ``graph-tool`` it's possible to simulate various dynamical processes on networks, see: https://graph-tool.skewed.de/static/doc/dynamics.html

## Example: Susceptible-Infectious (SI) model for epidemics

This is a simple stochastic model, where at any given time a node that is in the susceptible (S) state can become infections (I) with a probabilit $\beta$ via one of its neighbors that are also in the I state.

In [None]:
g = collection.data["pgp-strong-2009"]
state = SIState(g, beta=0.01)  # This initalizes the state with one random infected node
                               # and all others in the S state

# we will keep around the number of infected nodes
X = []
for t in range(1000):
    ret = state.iterate_sync() # all nodes are updated at the same time
    X.append(state.get_state().fa.sum())

import matplotlib.pyplot as plt
plt.plot(X)
plt.xlabel(r"Time")
plt.ylabel(r"Infected nodes");

### Exercise 8

Modify the above code so that the initial infected node is the one with the largest total degree (in + out). Look at the documentation for `SIState` to find out how to do this. What difference do you observe?

### Exercise 9

Visualize the spread of the epidemic by plotting snapshots (3 or 4) of the network at several stages, using the node colors as the node state. For all snapshots, use the same positions of the node given by the internal property map called `"pos"`. 


## Example: Binary opinion dynamics

We can use a simple binary majority threshold model, i.e. every node takes the majority opinion of its neighbours, which can be either `O` or `1`.

In [None]:
g = collection.data["pgp-strong-2009"]
state = BinaryThresholdState(g, r=0.25)  # The paramter 'r' controls the random noise strength,
                                         # i.e. random transitions

X = []
for t in range(1000):
    ret = state.iterate_sync() # all nodes are updated at the same time
    X.append(state.get_state().fa.sum())

import matplotlib.pyplot as plt
plt.plot(X)
plt.xlabel(r"Time")
plt.ylabel(r"Fraction of nodes with state 1");

### Exercise 10:

Simulate the binary opinion dynamics above on the undirected `polblogs` graph obtained as:

```python
g = gt.GraphView(gt.collection.data["polblogs"], directed=False)
gt.remove_parallel_edges(g)
g = gt.extract_largest_component(g, prune=True)
```

Run the dynamics a few times from the beginning, and draw the network with the final states as node colors. What kind of behaviors do you observe? Can you explain them?

**(Very) advanced:** Take a look at how it's possible to write extensions for graph-tool using C++: https://graph-tool.skewed.de/static/doc/demos/cppextensions/cppextensions.html