### CS4423 - Networks
Angela Carnevale <br>
School of Mathematical and Statistical Sciences <br>
NUI Galway

#### 1. Graphs and Graph Theory

# Week 3, lecture 2: 

# Bipartite Graphs and Projections. Paths and connection.

In [None]:
import networkx as nx
import numpy as np

In [None]:
opts = { "with_labels": True, "node_color":'y' }

## Bipartite Graphs and colorings

A **(vertex)-coloring** of a graph $G$ is an assignment of (finitely many) colors to the nodes of $G$,
so that any two nodes which are connected by an edge have *different* colors.

* A graph is called **$N$-colorable**, if it has a vertex coloring with (at most) $N$ colors.

* The **chromatic number of a graph $G$** is smallest $N$ for which a graph $G$ is $N$-colorable.

**Theorem.** Let $G$ be a graph.  The following are equivalent:

* $G$ is bipartite;

* $G$ is $2$-colorable;
 
* each cycle in $G$ has even length.

(We'll give precise definitions of **cycle** and **length** in a bit)


2D grids are examples of naturally bipartite graphs:

In [None]:
G44 = nx.grid_2d_graph(4, 4)
nx.draw(G44)

How would we go about finding a $2$-coloring of this graph?

The method `nx.bipartite.color` determines a $2$-coloring of a graph $G$ algorithmically, if it exists, i.e. if
$G$ is bipartite.

In [None]:
color = nx.bipartite.color(G44)

This won't work on a graph that is not $2$-colorable:

In [None]:
nx.bipartite.color()

In [None]:
color = [color[x] for x in G44.nodes()]
color

In [None]:
opts2 = { "with_labels": True, "node_color":color, "font_color":'r' }

In [None]:
nx.draw(G44, **opts2)

Note how the nodes are labelled in a $2D$ grid in `networkx`...

In [None]:
G44.nodes()

## Affiliation Networks and Projections

Bipartite graphs arise in practice as models for **affiliation networks**.
In such a network, on one side of the graph we find people or *actors*, and on the other side attributes 
of the people, such as common interests (books bought online, TV shows watched), workplaces, social events attended ...
Edges in such network connect people with their attributes.



We could construct a bipartite graph on the vertex set consisting of the 40 respondents to the survey and the 10 TV shows by using the following adjacency list:

In [None]:
!cat data/TV.adj

...but a graph with 40+10 vertices would be a bit too big for us, so once more we select the first few responses and construct a graph from that.

In [None]:
!cat data/TV_short.adj

In [None]:
G=nx.read_adjlist("data/TV_short.adj")

In [None]:
nx.draw(G, **opts)

In [None]:
G.nodes()

We can relabel the nodes in a way that it makes it easier for us later to handle the various sets of nodes.

In [None]:
G=nx.relabel_nodes(G,{'01':0,'02':1,'03':2,'04':3,'05':4,'06':5,'07':6,'08':7})

In [None]:
G=nx.relabel_nodes(G,{'BB':10,'GoT':11,'Succ':12,'DG':13,'ST':14,'TW':15,'PB':16,'NP':17,'MrR':18,'SG':19})

In [None]:
G.nodes()

In [None]:
color = nx.bipartite.color(G)
print(color)

In [None]:
color = [color[i] for i in G.nodes()]
opts2["node_color"] = color

In [None]:
nx.draw(G,nx.bipartite_layout(G, range(8), align='vertical'),**opts2)

**Note.** The adjacency matrix $A$ of a bipartite graph $G$, with respect to a suitable ordering of the vertices
($X_1$ first, then $X_2$), has the form of a $2 \times 2$-block matrix,
$$
  A = \left( \begin{array}{cc} 0 & C \\ C^T & 0 \end{array} \right)
$$
where the blocks on the diagonal consist entirely of zeros, as there are no edges between vertices of the same color, and the lower left block is the **transpose** $C^T$ of the matrix $C$ of entries in the upper right. 

To see the adjacency matrix in block form we need to give the nodes in a suitable order...

In [None]:
H=nx.Graph()
H.add_nodes_from(range(8))
H.add_nodes_from(range(10,20))

... and then import the edges from $G$.

In [None]:
H.add_edges_from(G.edges())

In [None]:
AA=nx.adjacency_matrix(H)

In [None]:
with np.printoptions(threshold=9999):
    print(AA.toarray())

* In `NetworkX`, all parts of a graph can have **attributes**: the nodes, 
the edges, and the graph object itself.  Graph object attributes of a graph `G` are stored in the field `G.graph`.  By convention, the two
underlying sets of a bipartite graph are stored here as attributes
called `'top'` and `'bottom'`.

* Here, we will simply construct lists of vertices from each of the two sets $X$ and $Y$ and construct a *biadjacency matrix* (this is all it's needed to (re)construct a bipartite graph!).

In [None]:
X, Y = [i for i in range(8)], [i for i in range(10,20)]
C = nx.bipartite.biadjacency_matrix(H, X, Y)
print(C.toarray())

As $A = A^T$, we get
\\[
A^T \cdot A = A \cdot A^T = A \cdot A = 
\left(
\begin{array}{cc}
C \cdot C^T & 0 \\ 0 & C^T \cdot C
\end{array}
\right)
\\]
where 
* $C \cdot C^T$ is the adjacency matrix of the **projection** onto the vertex set $X$,
and 

* $C^T \cdot C$is the adjacency matrix of the **projection** onto the vertex set $Y$.

In [None]:
BB = nx.from_numpy_matrix((C*C.transpose()).toarray())
nx.draw(BB, **opts)

In [None]:
XX = nx.projected_graph(H, X)
nx.draw(XX, **opts)

In [None]:
YY = nx.projected_graph(H,Y)
nx.draw(YY, **opts)

In [None]:
print((C*C.transpose()).toarray())

...where did we see this matrix already?

#### 2. Tree and Graph Traversal

## Paths and connection

Sequences of interconnected edges in a  graph are called **paths**,
leading to notions of **connectivity** and **distance**.
A **tree** is a particularly useful kind of connected graph,
that is frequently used as a data structure in Computer Science.

We will study **random trees** and some classical algorithms for tree traversal.
A network has the structure of a tree when it is of a hierarchical nature,
like an [ancestry chart](https://en.wikipedia.org/wiki/Pedigree_chart)
or a **river network**.

![A River Network](images/rivers.jpg)

[Image from [EPA Maps](https://gis.epa.ie/EPAMaps/)]

## Paths

The fundamental notion of **connectivity** in a network is closely
related to the notion of **paths** in a graph.

**Definitions.** 
    
* A **path** in a graph $G = (X, E)$ is a sequence of nodes, where any pair of consecutive nodes in the sequence is (linked by) an edge in $E$.

* Such a path can have repeated nodes. If it doesn't, the path is called a **simple path**.

* The **length** of a path is the number of edges it involves (that is the number of nodes minus $1$).


**Definitions.** 
    
* At each vertex $x \in X$, there is a unique path of length $0$, the **empty path**, consisting of vertex $x$ only.

* A **cycle** is a path of length at least $3$ that is a simple path, except for the first and the last node being the same.

In [None]:
nodes = 'ABCDEFGHIJKLM'
edges = [
    'AB', 'CE', 'FG', 'FH', 'GI', 'GJ', 'HJ', 'HL', 'HM', 
    'IK', 'JK', 'KL', 'LM'
]
GG = nx.Graph()
GG.add_nodes_from(nodes)
GG.add_edges_from(edges)

In [None]:
opts = { "with_labels": True, "node_color": 'y'}
nx.draw(GG, **opts)

* $(F, G, I)$ is a path in the graph above, and $(H, J, K, L, H)$ is a cycle.

* A cycle in a simple graph provides, for any two nodes on that
cycle, (at least) two different paths from one to the other.

* Note that each edge (and node) of the 1970 Internet graph belongs to
a cycle.  This makes the other way around the cycle an alternative
route in case one of the edges should fail.

* (In a *directed* network, paths are directed, too.
A path from a vertex $x$ to a vertex $y$ is
a sequence of vertices $x = x_0, x_1, \dots, x_k = y$
such that, for any $i = 1, \dots, k$, there is
an edge from $x_{i-1}$ to $x_i$ in the graph.)



## Connected Components

Communication and transportation networks tend to be connected, as
this is their main purpose: to connect.

**Definition.**
    
* A simple graph is **connected** if, for every pair of nodes, there is a path between them.
        
* If a graph is not connected, it naturally breaks into pieces, its **connected components**.
 

* The connected components of the graph above are the
node sets $\{A, B\}$, $\{C, E\}$, $\{D\}$, and $\{F,G,H,I,J,K,L,M\}$.
* Note that a component can consist of a single node only.

In [None]:
list(nx.connected_components(GG))

**Note.** 
The relation 'there is a **path** from $x$ to $y$ on the node set $X$ of a
graph is the **transitive closure** of the graph relation 'there is an
**edge** between $x$ and $y$'.  It is 

* **reflexive** (as each node $x$ is
connected to itself by the zero length path starting and ending at
$x$), 

* **symmetric** (as a path from $x$ to $y$ can be used backwards as
a path from $y$ to $x$), 

* and **transitive** (as a path from $x$ to $y$ and
a path from $y$ to $z$ together make up a path from $x$ to $z$), 

hence
an **equivalence relation**.
The connected components of the graph are
the parts (equivalence classes) of the corresponding **partition** of $X$.

##  Code Corner

### `numpy`

* `array`: [[doc]](https://numpy.org/doc/stable/reference/generated/numpy.array.html)

* `transpose`: [[doc]](https://numpy.org/doc/stable/reference/generated/numpy.transpose.html)

* `printoptions`: [[doc]](https://numpy.org/doc/stable/reference/generated/numpy.printoptions.html) set options for printing arrays

### `networkx`

* `grid_2d_graph`: [[doc]](https://networkx.org/documentation/stable/reference/generated/networkx.generators.lattice.grid_2d_graph.html)
creates a 2D grid graph.

* `bipartite.color`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.bipartite.basic.color.html) computes a $2$-coloring of a graph

* `bipartite_layout`: [doc](https://networkx.org/documentation/stable/reference/generated/networkx.drawing.layout.bipartite_layout.html) works out a useful way to draw a bipartite graph

* `bipartite.biadjacency_matrix`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.bipartite.matrix.biadjacency_matrix.html) the incidence matrix of a bipartite graph

* `projected_graph`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.bipartite.projection.projected_graph.html) the projected graph

* `connected_components` [[doc]](https://networkx.github.io/documentation/stable/reference/algorithms/component.html)

## Exercises

1. Using `divmod` to create a dictionary `pos`, plot the graph `G` above as a bipartite graph. 

1. Compute the adjacency matrix of the bipartite graph $B$ at the top 
of this page and verify its block structure.

1. Compute the biadjacency matrix $C$ of the graph $B$.

1. Compute the two products of $C$ and its transpose,
and, using the products as adjacency matrix, construct two graphs
from them.

1. Compute the two projections of the bipartite graph $B$ and
compare them with the graphs constructed in the previous exercise.