## Random walks and Markov chains

### Consider a grapph $G = (V, E)$ with the following adjacency matrix:

$$
A = \begin{bmatrix}
    0 & 1 & 0 & 1 & 1 \\
    1 & 0 & 1 & 2 & 0 \\
    0 & 1 & 0 & 0 & 0 \\
    1 & 2 & 0 & 0 & 0 \\
    1 & 0 & 0 & 0 & 0
\end{bmatrix}$$

#### (a) Assuming that the row and column numbers correspond to nodes(enumerate from 1 to 5), calculate the probability that a random walker starting in node 1 will traverse the following sequence of nodes: $\left( 1, 2, 3, 2, 4, 1, 5, 1, 2 \right)$

In [2]:
import matplotlib.pyplot as plt
import multiprocessing as mp
import pandas as pd
import pathpy as pp
from scipy.stats import rankdata
from functools import partial
import numpy as np
from tqdm import tqdm
import seaborn as sns
import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from plotly.graph_objects import Figure
import scipy as sp

plt.style.use('default')
sns.set_style("whitegrid")

In [3]:
ntw_adj_matrix = np.array([[0, 1, 0, 1, 1], [1, 0, 1, 2, 0], [0, 1, 0, 0, 0], [1, 2, 0, 0, 0], [1, 0, 0, 0, 0]])
ntw_adj_matrix

array([[0, 1, 0, 1, 1],
       [1, 0, 1, 2, 0],
       [0, 1, 0, 0, 0],
       [1, 2, 0, 0, 0],
       [1, 0, 0, 0, 0]])

In [4]:
transition_matrix = ntw_adj_matrix / ntw_adj_matrix.sum(axis=1, keepdims=True)
transition_matrix

array([[0.        , 0.33333333, 0.        , 0.33333333, 0.33333333],
       [0.25      , 0.        , 0.25      , 0.5       , 0.        ],
       [0.        , 1.        , 0.        , 0.        , 0.        ],
       [0.33333333, 0.66666667, 0.        , 0.        , 0.        ],
       [1.        , 0.        , 0.        , 0.        , 0.        ]])

In [5]:
transition_way = (1, 2, 3, 2, 4, 1, 5, 1, 2)
transition_way

(1, 2, 3, 2, 4, 1, 5, 1, 2)

In [6]:
r = 1
for index, every_position in enumerate(transition_way):
    if index == 0:
        pass
    else:
        r *= transition_matrix[transition_way[index - 1] - 1, every_position - 1]
r

0.0015432098765432098

So here is the probability that a random walker followed this sequence of steps

#### (b) Show that for an undirected network, the stationary visitation probability of a node v converges $\pi_v = \frac{d_v}{2m}$

We have to prove $$\pi = \left( \frac{k_{1}}{2m}, \ldots , \frac{k_{n}}{2m} \right)$$
We will use the mechanics how we get the next state in randon walks $$\pi^{n} = \pi^{n - 1} \times T$$
Imagine,

$$\pi^{0} = \left( \frac{k_1}{2m}, \ldots, \frac{k_n}{2m} \right)$$

and

$$\sum_{i} \pi_{i} = \frac{k_1 + \ldots + k_n}{2m} = \frac{2m}{2m} = 1$$

So, such $\pi^{0}$ vector satisfy conditions of being a stochastic vector.
Now lets show that on random step $t$ the stationary state
$$\pi^{t} = \left( \frac{k_1}{2m}, \ldots, \frac{k_n}{2m} \right)$$
will lead to the same stationary state.
$$\pi = \pi \times T$$

It means that for each vector value:
$$\pi_{j} = \sum_{i} \pi_{i} \times T_{ij} = \sum_{i} \frac{k_i}{2m} \times \frac{A_{ij}}{k_i} = \frac{\sum_{i} A_{ij}}{2m} = \frac{k_j}{2m}$$

So the stationary distribution value is:
$$\pi = \left( \frac{k_{1}}{2m}, \ldots , \frac{k_{n}}{2m} \right)$$

## 2. Description length of random walks

#### (a) Generate an undirected $G(n, p)$ (no self loops, no multi-edges) and an undirected k-regular. For both networks consider a number of nodes n = 100. Compute the expected code length of a walk on each network.
Hint: Consider a Shannon`s source coding theorem

In [34]:
g_n_p = pp.generators.ER_np(100, 0.6)

In [35]:
def generate_k_regular_network(n, k):
    return pp.generators.Molloy_Reed(np.ones(n) * k)

In [36]:
k_regular = generate_k_regular_network(100, 4)

The expected code length of one step in a random walk can be generated by using the stationary state $\pi$ and transition matrix $T$.
$$L = - \sum_{(i, j) \in E} \pi_i p(T_{ij}) \log p(T_{ij})$$

In [65]:
def log2(x):
    if x == 0:
        return 0
    else:
        return np.log2(x)

In [105]:
def calc_expected_code_length(network: pp.Network):
    network_to_work_with = network.largest_connected_component()
    random_walk = pp.processes.RandomWalk(network_to_work_with)
    transition_matrix = random_walk.transition_matrix
    stationary_state = random_walk.stationary_state()
    result = 0
    for edge in network_to_work_with.edges:
        v, w = network_to_work_with.nodes.index[edge.v.uid], network_to_work_with.nodes.index[edge.w.uid]
        result -= stationary_state[v] * transition_matrix[v, w] * log2(transition_matrix[v, w])
    return result

In [106]:
g_n_p.largest_connected_component()

<pathpy.models.network.Network object at 0x7f6299f06470>

In [107]:
g_n_p_expected = calc_expected_code_length(g_n_p)
print(f"{g_n_p_expected} bit/step for the G(n, p) network")

2.9492684528691844 bit/step for the G(n, p) network


In [82]:
k_regular_expected = calc_expected_code_length(k_regular)
print(f"{k_regular_expected} bit/step for the k-regular network")

1.0000000000000007 bit/step for the k-regular network


So the average description length in k-regular network will be much smaller.

#### (b) Compute the expected per-symbol code length of a random walk for a random microstate in the ensemble of undirected G(n, p) random graphs (with no self loops and no multi-edges) as a function of n and p.

Let's firt set $p = 0.6$

In [83]:
n_space = np.arange(10, 100)

In [84]:
avg_code_lengths = [calc_expected_code_length(pp.generators.ER_np(n, 0.6)) for n in n_space]

In [85]:
px.scatter(x=n_space, y=avg_code_lengths, labels={'x': 'Number of nodes', 'y': 'Average code length'})

In [88]:
p_space = np.linspace(0.5, 1, 100)

In [89]:
avg_code_lengths_by_p = [calc_expected_code_length(pp.generators.ER_np(30, p)) for p in p_space]

In [90]:
px.scatter(x=p_space, y=avg_code_lengths_by_p, labels={'x': 'Probability', 'y': 'Average code length'})

#### (c) Compute the expected per-symbol code length of a random walk on the following directed network.

![directed graph](./2_c_directed_graph.png)

In [108]:
network = pp.Network(directed=True)
network.add_edge('a', 'b')
network.add_edge('b', 'a')
network.add_edge('b', 'c')
network.add_edge('c', 'd')
network.add_edge('d', 'e')
network.add_edge('e', 'b')
network.add_edge('c', 'e')
network.add_edge('a', 'e')
network.plot()

In [110]:
calc_expected_code_length(network)
print(f"{calc_expected_code_length(network)} bit/step for the directed network")

0.6666666666666669 bit/step for the directed network


## 3. InfoMap with exit probabilities

Shannon’s source coding theorem tells us the minimal per-symbol length of a lossless prefix-free code for the sequence of outcomes of a random variable. When the outcomes of the random variable are the nodes visited by a random walk on a network, the source coding theorem returns the minimal code length of a walk on the network. Using the MapEquation we can find an efficient code by using a hierarchical coding that utilizes the natural community structure of a network. This hierarchical code consists of two layers: In the first layer the source coding theorem is used to compute the minimal code length for the transitions between communities. In the second layer the source coding theorem is used in each community separately to compute the minimal code length for the node transitions within the community. The split of the network in communities allows us to reuse the shorter code-words for the nodes of different communities and for the transition between communities. However, the simplified MapEquation described in L06 does not ensure that the resulting code is unambiguous. In the following network, nodes and communities have been named using the definition of the MapEquation given in Lecture L06.

![graph](./3_graph.png)

It is easy to see that a sequence of codewords does not uniquely identify a walk, i.e. is ambiguous. For example, the sequence $0010$ can either refer to the walk $\color{blue}0 010$ or to the walk $\color{blue}0 0 \color{red}1 0$. In the full formulation of the MapEquation this issue is avoided by assigning an additional exit codeword to each community. The exit codeword signals that the current codeword is the last one from the current community, which implies that the next codeword will identify a new community. This procedure separates the codewords of nodes of different communities and uniquely identifies a walk based on a sequence of codewords.

#### (a) Given the stationary distribution, a split in communities, and the transition matrix of a network, what is the probability $q$ to exit from community i?
Hint: what i the probability to traversse any of the links that exit from community?

Let's consider that fact that we have stationary distribution and stationary state $\pi$ for this ditribution. Also we would have transition matrix $T$. Let's imagine that out random walker currently is in community $i$ in the node $\alpha$. The probability of this walker to come to this node is $\pi_i$. The probability to leave the $i$ community is the cumulativ transition probabilities of link which ends out of the community $\sum_{\beta \notin i, (\alpha, \beta) \in E} T_{\alpha, \beta}$. And the general probability is the cumulative probability of all of the nodes in the community $i$:
$$q_i↷ = \sum_{\alpha \in i} \sum_{\beta \notin i, (\alpha, \beta) \in E} \pi_{\alpha} \times T_{\alpha, \beta}$$

#### (b) In the full formulation of the MapEquation the entropy of a community $C_i \subset V$:

![MapEquation](./3_b_Infomap_equation.png)

#### here $q_i ↷$ indicates the probability to exit community $C_i$ and $p_{\alpha}$ is the stationary probability to visit node $\alpha \in V$. This formula uses Shannon’s source coding theorem to calculate the minimal expected code length for the nodes in community $C_i$ and for the exit from $C_i$.

#### Write down the simpler definition of $H(P^i)$ as given on slide 20 of L06, i.e. based on the entropy of visitation probabilities of nodes in cluster i. Highlight the differences in the definition of $H(P^i)$ above as compared to the simpler definition used in the lecture. Explain what the first summand in the definition of $H(P^i)$ captures. Explain the differences in the second summand compared to the simpler definition of $H(P^i)$.

The simpler definition is:
$$H(P^i) = - \sum_{\alpha \in C_i} \frac{p_{\alpha}}{\sum_{\beta \in C_i} p_{\beta}} \log \frac{p_{\alpha}}{\sum_{\beta \in C_i} p_{\beta}}$$

1. We can define the exit code as one meta-node which does not exist in network but has to be considered by internal Huffman encoding. That is why we include it into the entropy formula in the advanced level.

For the simplified version the probability to stay in one community is defined as the sum of fraction of cimmunity nodes:
$$p^i = \sum_{\alpha \in C_i} p_{\alpha}$$

For the advanced version it includes also the probability to exit community:
$$p^i = q_i ↷ + \sum_{\alpha \in C_i} p_{\alpha}$$

2. The first summand i the entropy of the exit-code of the community(meta-node).

3. As it was already considered below the fraction of changing current module i also considered in the probability co move inside the module, while the exit code is a meta-node which shows that the community is below the random walker. The fraction of exit code is also important while building the internal Huffman code for the community.