## 1. Molloy-Reed model

### (a) Given a random microstate generated based on the configuration model with degree distribution P(k), consider a random a random node v and followe a edge to a neighbour v of node v. What is the probability that node w has degree k?

So we choose the arbitrary node from the network, follow one of the links and check the chance that the degree of the successor node will be k. First of all the degree distribution will not be $P(k)$ because that $\langle k_{n} \rangle \geq 1$.
So, we have randomly chosen a node and stub on it to start the link to the neighbour. The probability to come to the node with $k$ degree is $\frac{k}{2m - 1}$. At the same time the degree distribution is calculated by formula: $P(k) = \frac{n_{k}}{n}$, where $n_{k}$ - number of nodes with with $k$ degree. So, $n_{k} = P(k)n$. So, the chance to come to the node with $k$ degree is:
$$P_{n}(k) = \frac{k}{2m - 1} np_{k}$$
At the same time for $n \rightarrow \infty$ we can neglect $-1$ in formula and simplify it:
$$P_{n}(k) = \frac{k}{2m}nP(k) = \frac{kP(k)}{\frac{2m}{n}} = \frac{kP(k)}{\langle k \rangle}$$

### (b) Using the expression obtained above compute the expected degree of the neighbours of a random node v. What do we see when calculate the difference between the expected degree of a random node and the expected degree of a random neighbout of such a node?

The formula for calculating the mean degree:
$$\langle k \rangle = \sum_{k = 0}^{\infty}kP(k)$$

For the neighbour degree we can use previously discovered formula:
$$P_{n}(k) = \frac{kP(k)}{\langle k \rangle}$$
$$\langle k_{n} \rangle = \sum_{k = 0}^{\infty} kP_{n}(k) = \frac{\sum_{k = 0}^{\infty} k^{2} P(k)}{\langle k \rangle} = \frac{\langle k^{2} \rangle}{\langle k \rangle}$$

The difference:
$$\langle k_{n} \rangle - \langle k \rangle = \frac{\langle k^{2} \rangle}{\langle k \rangle} - \langle k \rangle = \frac{\langle k^{2} \rangle - \langle k \rangle^{2}}{\langle k \rangle} = \frac{Var(k)}{\langle k \rangle} \geq 0$$
 It is called **Friendship paradox**

### (c) Often, rather than the degree of a node at the end of an edge, we are interested in the number of edges attached to the node other than the one we arrived through. This number is called the excess degree of a node. What is the probability that the node at which you arrive has excess degree k?

Access degree of the node is the usual neighbour degree without one more edge. So the node with $k$ excess degree will have $k + 1$ degree. So the probability is:
$$Q(k) = \frac{(k + 1)P(k +1)}{\langle k \rangle}$$

### (d) Consider a Molloy-Reed model with no self-loops and where we allowe for the craetion of multiple edges between a single pair of nodes. What is the probability that two nodes v and w with degrees dv and dw are connected?

So we start from the node we and need to calculate probability of the link between these two nodes. The v node has dv stubs and the w node has dw stubs. There are 2m stubs in the graph exclusing 1 stub already busy in the v node. So the probability that node in the same time will choose one stub from v and one stub from w nodes is:

$$p_{v,w} = d_{v} * d_{w} / (2m - 1)$$

But for the big m we can ignore -1 and the formula is:

$$p_{v,w} = d_{v} * d_{w} / 2m$$

## 2. Inference and Statistical Ensembles 

### (a) Consider the G(n, p) model for undirected random graphs with no self-lops. Show that for a given network Ge with n nodes and m links, a maximum likelihood estimate of parameter is given as
$$\hat p = m / (\binom{n}{2})$$

In [2]:
from typing import Set
from tqdm import tqdm
from itertools import product
from collections import defaultdict

import scipy as sp
import numpy as np
import pandas as pd
import pathpy as pp
import seaborn as sns
import plotly.express as px
from scipy.spatial import KDTree
from matplotlib import pyplot as plt
from sklearn.datasets import make_moons

plt.style.use('default')
sns.set_style("whitegrid")

In [3]:
n_range = np.arange(20, 120, 20)
p_range = np.arange(0.1, 1.0, 0.1)

In [4]:
measuremenst = []

for n, p in tqdm(product(n_range, p_range)):
    network = pp.generators.ER_np(n=n, p=p, directed=False)
    m = len(network.edges)
    calculated_p = m / sp.special.binom(n, 2)
    measuremenst.append((n, m, p, calculated_p, np.abs(calculated_p - p)))

45it [00:19,  2.28it/s]


In [5]:
df = pd.DataFrame.from_records(
    measuremenst,
    columns=["n nodes", "m links", "p", "p^", "p delta"]
)
df

Unnamed: 0,n nodes,m links,p,p^,p delta
0,20,17,0.1,0.089474,0.010526
1,20,47,0.2,0.247368,0.047368
2,20,75,0.3,0.394737,0.094737
3,20,83,0.4,0.436842,0.036842
4,20,91,0.5,0.478947,0.021053
5,20,109,0.6,0.573684,0.026316
6,20,137,0.7,0.721053,0.021053
7,20,148,0.8,0.778947,0.021053
8,20,165,0.9,0.868421,0.031579
9,40,88,0.1,0.112821,0.012821


As you see the delta between real p and calculated p is very small that is why the calculated value of p is close to predefined p value.

From anther facet lets consider the G(n, m) model with n nodes and m generated links. For the undirected graph with no self-lops the maximum number of possible links is $\binom{n}{2}$. So the probability of the link is 

$$p = m / \binom{n}{2}$$

As you see the formula is identical to the formula in the task.

The analytical part:
$$
\begin{align*}
\mathcal{L}(\Theta | G) &= P(G | \Theta) = \\
&= P(G) = \\
&= p^m (1 - p)^{\binom{n}{2} - m} \\

log \mathcal{L}(\Theta) &= m \log p - \left( \binom{n}{2} - m \right) \log (1 - p) \\
\frac{\partial{\mathcal{L}}}{\partial{p}} &= \frac{m}{p} - \frac{\binom{n}{2} - m}{1 - p} = 0 \\
\frac{m}{p} &= \frac{\binom{n}{2} - m}{1 - p} \\
m - mp &= p \binom{n}{2} - mp \\
p &= \frac{m}{\binom{n}{2}}
\end{align*}
$$

### (b) Consider the microstates G1 and G2 with n = 100 and m1 = 300 and m2 = 350 edges respectively. What is the probability of these microstate within:  

#### - a G(n, p) model with n = 100 and p = 5 / 99? What is the expected number of edges in this model?

$$P(G) = \binom{\binom{n}{2}}{m} p^m (1 - p)^{\binom{n}{2} - m}$$

In [13]:
def calc_g_n_p_probability(n, m, p):
    return sp.special.binom(sp.special.binom(n, 2), m) * p**m * (1 - p)**(sp.special.binom(n, 2) - m)

In [19]:
n, m, p = 100, 300, 5/99
calc_g_n_p_probability(n, m, p)

  return sp.special.binom(sp.special.binom(n, 2), m) * p**m * (1 - p)**(sp.special.binom(n, 2) - m)


nan

In [20]:
n, m, p = 100, 350, 5/99
calc_g_n_p_probability(n, m, p)

  return sp.special.binom(sp.special.binom(n, 2), m) * p**m * (1 - p)**(sp.special.binom(n, 2) - m)


nan

#### - a G(n, m) model with n = 100 and m = 300?

$$
\begin{align*}
P(G_1) &= 1 \\
P(G_0) &= 0
\end{align*}
$$