***

*Course:* [Math 535](https://people.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS)  
*Chapter:* 5-Random walks on graphs and Markov chains  
*Author:* [Sebastien Roch](https://people.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison  
*Updated:* Jan 7, 2024   
*Copyright:* &copy; 2024 Sebastien Roch

***

In [None]:
# IF RUNNING ON GOOGLE COLAB, UNCOMMENT THE FOLLOWING CODE CELL
# When prompted, upload: 
#     * mmids.py
#     * mathworld-adjacency.csv
#     * mathworld-titles.csv
# from your local file system
# Files at: https://github.com/MMiDS-textbook/MMiDS-textbook.github.io/tree/main/utils
# Alternative instructions: https://colab.research.google.com/notebooks/io.ipynb

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

In [None]:
# PYTHON 3
import numpy as np
from numpy import linalg as LA
from numpy.random import default_rng
rng = default_rng(535)
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
import tensorflow as tf
from tensorflow import keras
import mmids

## Motivating example: ranking webpages

A common task in network analysis is to identify "central" vertices in a graph. Centrality is a vague concept. It can be defined in many different ways depending on the context and the type of network. Quoting from [Wikipedia](https://en.wikipedia.org/wiki/Centrality):

> In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. [...] Centrality indices are answers to the question "What characterizes an important vertex?" The answer is given in terms of a real-valued function on the vertices of a graph, where the values produced are expected to provide a ranking which identifies the most important nodes. The word "importance" has a wide number of meanings, leading to many different definitions of centrality. 

In an undirected graph, a natural approach is to look at the degree of a vertex as a measure of its importance (also referred to as degree centrality). But it is hardly the only one. One could for instance look at the average distance to all other nodes (its reciprocal is the [closeness centrality](https://en.wikipedia.org/wiki/Closeness_centrality)) or at the number of shortest paths between pairs of vertices going through the vertex (known as [betweenness centrality](https://en.wikipedia.org/wiki/Betweenness_centrality)). 

What if the graph is directed? Things are somewhat more complicated there. For instance, there is now the in-degree as well as the out-degree. 

Let us look at a particular example of practical importance, the World Wide Web (from now on, the Web). In this case, the vertices are webpages and a directed edge from $u$ to $v$ indicates a hyperlink from page $u$ to page $v$. The Web is much too large to analyze here. Instead, we will consider a tiny (but still interesting!) subset of it, the pages of [Wolfram's MathWorld](https://mathworld.wolfram.com), a wonderful mathematics resource. 

Each page of MathWorld concerns a particular mathematical concept, e.g., [scale-free network](https://mathworld.wolfram.com/Scale-FreeNetwork.html). A definition and notable properties are described. Importantly for us, in a section entitled "SEE ALSO", other related mathematical concepts are listed with a link to their MathWorld page. In the case of scale-free networks, the [small world network](https://mathworld.wolfram.com/SmallWorldNetwork.html) topic is referenced, among others.

The resulting directed graph is available through the [NetSet](https://netset.telecom-paris.fr/index.html) datasets and can be downloaded [here](https://netset.telecom-paris.fr/pages/mathworld.html). We load it now. For convenience, we have reformatted it into the files `mathworld-adjacency.csv` and `mathworld-titles.csv`, which are available on the [GitHub of the book](https://github.com/MMiDS-textbook/MMiDS-textbook.github.io/tree/main/utils/datasets).

In [None]:
df_edges = pd.read_csv('mathworld-adjacency.csv')
df_edges.head()

It consists in a list of directed edges. For example, the first one is an edge from vertex `0` to vertex `2`. The second one is from `1` to `47` and so on. 

There is a total of $49069$ edges.

In [None]:
df_edges.shape[0]

The second file contains the titles of the pages.

In [None]:
df_titles = pd.read_csv('mathworld-titles.csv')
df_titles.head()

So the first edge above is from `Alexander's Horned Sphere` to `Antoine's Horned Sphere`. That is, the [latter](https://mathworld.wolfram.com/AntoinesHornedSphere.html) is listed in the SEE ALSO section of the [former](https://mathworld.wolfram.com/AlexandersHornedSphere.html). 

There are $12362$ topics.

In [None]:
df_titles.shape[0]

We construct the graph by adding the edges one by one. We first convert `df_edges` into a Numpy array.

In [None]:
edgelist = df_edges[['from','to']].to_numpy()
print(edgelist)

In [None]:
n = 12362
G = nx.empty_graph(n, create_using=nx.DiGraph)
for i in range(edgelist.shape[0]):
    G.add_edge(edgelist[i,0], edgelist[i,1])

In [None]:
G.in_degree(0)

while that of `Antoine's Horned Sphere` is:

In [None]:
G.in_degree(2)

suggesting that the former is more central than the latter, at least in the sense that it is referenced more often.

But is that the right measure? Consider the following: `Antoine's Horned Sphere` receives only one reference, but it is from a seemingly relatively important vertex, `Alexander's Horned Sphere`. How can one take this into account in quantifying its importance in the network?

We will come back to this question later in this chapter. To hint at things to come, it will turn out that "exploring the graph at random" provides a powerful perspective on centrality.  

## Elements of finite Markov chains

**EXAMPLE:** **(Random Walk on the Petersen Graph)** Let $G = (V,E)$ be the Petersen graph.

In [None]:
G_petersen = nx.petersen_graph()

In [None]:
nx.draw_networkx(G_petersen, pos=nx.circular_layout(G_petersen), labels={i: i+1 for i in range(10)}, 
                 node_size=600, node_color='black', font_size=16, font_color='white')

Each vertex $i$ has degree $3$, that is, it has three neighbors which we denote $v_{i,1}, v_{i,2}, v_{i,3}$ in some arbitrary order. For instance, denoting the vertices by $1,\ldots, 10$ as above, vertex $9$ has neighbors $v_{9,1} = 4, v_{9,2} = 6, v_{9,3} = 7$.

We consider the following random walk on $G$. We start at $X_0 = 1$. Then, for each $t\geq 0$, we let $X_{t+1}$ be a uniformly chosen neighbor of $X_t$, independently of the previous history. That is, we jump at random from neighbor to neighbor. Formally, fix $X_0 = 1$ and let $(Z_t)_{t \geq 0}$ be an i.i.d. sequence of random variables taking values in $\{1,2,3\}$ satisfying

$$
\mathbb{P}[Z_t = 1] = \mathbb{P}[Z_t = 2] = \mathbb{P}[Z_t = 3] = 1/3.
$$

Then define, for all $t \geq 0$,
$
X_{t+1}
= f(X_t, Z_t)
= v_{i,Z_t}
$
if $X_t = v_i$.

By an argument similar to the previous example, $(X_t)_{t \geq 0}$ is a Markov chain.
Also as in the previous example, one can pick $X_0$ according to an initial distribution, independently from the sequence $(Z_t)_{t \geq 0}$.

$\lhd$

**EXAMPLE:** **(Random Walk on the Petersen Graph, continued)** Consider again the random walk on the Petersen graph $G = (V,E)$. We number the vertices $1, 2,\ldots, 10$. To compute the transition matrix, we list for each vertex its neighbors and put the value $1/3$ in the corresponding columns. For instance, vertex $1$ has neighbors $2$, $5$ and $6$, so row $1$ has $1/3$ in columns $2$, $5$, and $6$. And so on.

In [None]:
nx.draw_networkx(G_petersen, pos=nx.circular_layout(G_petersen), labels={i: i+1 for i in range(10)}, 
                 node_size=600, node_color='black', font_size=16, font_color='white')

We get:

$$
P = \begin{pmatrix}
0 & 1/3 & 0 & 0 & 1/3 & 1/3 & 0 & 0 & 0 & 0\\
1/3 & 0 & 1/3 & 0 & 0 & 0 & 1/3 & 0 & 0 & 0\\
0 & 1/3 & 0 & 1/3 & 0 & 0 & 0 & 1/3 & 0 & 0\\
0 & 0 & 1/3 & 0 & 1/3 & 0 & 0 & 0 & 1/3 & 0\\
1/3 & 0 & 0 & 1/3 & 0 & 0 & 0 & 0 & 0 & 1/3\\
1/3 & 0 & 0 & 0 & 0 & 0 & 0 & 1/3 & 1/3 & 0\\
0 & 1/3 & 0 & 0 & 0 & 0 & 0 & 0 & 1/3 & 1/3\\
0 & 0 & 1/3 & 0 & 0 & 1/3 & 0 & 0 & 0 & 1/3\\
0 & 0 & 0 & 1/3 & 0 & 1/3 & 1/3 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 1/3 & 0 & 1/3 & 1/3 & 0 & 0
\end{pmatrix}
$$

We have already encountered a matrix that encodes the neighbors of each vertex, the adjacency matrix. Here we can recover the transition matrix by multiplying the adjacency matrix by $1/3$.

In [None]:
A_petersen = nx.adjacency_matrix(G_petersen).toarray()
P_petersen = (1/3) * A_petersen
print(P_petersen)

$\lhd$

**EXAMPLE:** **(Robot Vacuum, continued)** Returning to our *Robot Vacuum Example*, the transition graph of the chain can be obtained by thinking of $P$ as the weighted adjacency matrix of the transition graph. 

In [None]:
P_robot = np.array([
[0, 0.8, 0, 0.2, 0, 0, 0, 0, 0],
[0.3, 0, 0.2, 0, 0, 0.5, 0, 0, 0],
[0, 0.6, 0, 0, 0, 0.4, 0, 0, 0],
[0.1, 0.1, 0, 0, 0.8, 0, 0, 0, 0],
[0, 0, 0, 0.25, 0, 0, 0.75, 0, 0],
[0, 0.15, 0.15, 0, 0, 0, 0, 0.35, 0.35],
[0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0.3, 0.4, 0.2, 0, 0.1],
[0, 0, 0, 0, 0, 1, 0, 0, 0]
]
)
print(P_robot)

We define a graph from its adjancency matrix. See [`networkx.from_numpy_matrix()`](https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.from_numpy_matrix.html).

In [None]:
G_robot = nx.from_numpy_array(P_robot, create_using=nx.DiGraph)

Drawing edge weights on a directed graph in a readable fashion is not straighforward. We will not do this here. 

In [None]:
n_robot = P_robot.shape[0]
nx.draw_networkx(G_robot, pos=nx.circular_layout(G_robot), 
                 labels={i: i+1 for i in range(n_robot)}, 
                 node_size=600, node_color='black', font_size=16, font_color='white', 
                 connectionstyle='arc3, rad = 0.2')

$\lhd$

**NUMERICAL CORNER:** Once we have specified a transition matrix (and an initial distribution), we can simulate the corresponding Markov chain. This is useful to compute (approximately) probabilities of complex events through the law of large numbers. Here is some code to generate one sample path up to some given time $T$. We assume that the state space is $[n]$. We use [`rng.choice`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html) to generate each transition.

In [None]:
from numpy.random import default_rng
rng = default_rng(535)

In [None]:
def SamplePath(mu, P, T):
    n = mu.shape[0] # size of state space
    X = np.zeros(T+1) # initialization of sampe path
    for i in range(T+1):
        if i == 0: # initial distribution
            X[i] = rng.choice(a=np.arange(start=1,stop=n+1),p=mu)
        else: # next state is chosen from current state row
            X[i] = rng.choice(a=np.arange(start=1,stop=n+1),p=P[int(X[i-1]-1),:])
    return X

Let's try with our *Robot Vacuum*. We take the initial distribution to be the uniform distribution.

In [None]:
mu = np.ones(n_robot) / n_robot
SamplePath(mu, P_robot, 10)

For example, we can use a simulation to approximate the expected number of times that room $9$ is visited up to time $10$. To do this, we run the simulation a large number of times (say $1000$) and count the average number of visits to $9$.

In [None]:
z = 9 # state of interest
N_samples = 1000 # number of repetitions
visits_to_z = np.zeros(N_samples) # initialization of number of visits

for i in range(N_samples):
    visits_to_z[i] = np.count_nonzero(SamplePath(mu, P_robot, 10) == z)

print(np.mean(visits_to_z))

$\lhd$