# Chance of generating ER from MMSBM given alpha

To generate mmsbm graph we sample probabilities for each node to act as belonging to certain community. Therefore, if we assign the concentration for a Dirichlet distribution in such a way that the probability of all nodes to belong to the same community is really high, the graph will effectively look like an ER.

Here is shown what these mmvectors($i$ entry of each vector indicates probability of the node to belong to community $i$) look like for different values of alpha to determine the probability of deviation from ER given a 2-block network.

In [5]:
import numpy as np

rng = np.random.default_rng(12)

n = 20

alpha = [100, 1]

mm_vectors = rng.dirichlet(alpha, n)

mm_vectors = np.array(sorted(mm_vectors, key=lambda x: np.argmax(x)))

#display mm_vectors for 20 nodes
mm_vectors

array([[9.99323001e-01, 6.76998558e-04],
       [9.90753471e-01, 9.24652853e-03],
       [9.65837996e-01, 3.41620038e-02],
       [9.98173283e-01, 1.82671679e-03],
       [9.91681905e-01, 8.31809532e-03],
       [9.84418363e-01, 1.55816371e-02],
       [9.97009120e-01, 2.99088032e-03],
       [9.76373039e-01, 2.36269610e-02],
       [9.90030274e-01, 9.96972607e-03],
       [9.96259530e-01, 3.74046974e-03],
       [9.93584249e-01, 6.41575108e-03],
       [9.52506178e-01, 4.74938224e-02],
       [9.99637762e-01, 3.62238298e-04],
       [9.98191876e-01, 1.80812419e-03],
       [9.90280249e-01, 9.71975082e-03],
       [9.89067552e-01, 1.09324477e-02],
       [9.86388653e-01, 1.36113468e-02],
       [9.87857757e-01, 1.21422431e-02],
       [9.91091939e-01, 8.90806076e-03],
       [9.97397775e-01, 2.60222535e-03]])

As we can see the probability for each node to behave as if belonging to community 1 when interacting with other nodes are very high(above 0.95). To get a better idea of how close to 1 the probability will be for each node I will repeat for 10000 mmvectors and compute average and variance.
 

In [12]:
rng = np.random.default_rng(12)

n = 10000

alpha = [100, 1]

mm_vectors = rng.dirichlet(alpha, n)

mm_vectors = np.array(sorted(mm_vectors, key=lambda x: np.argmax(x)))

print("the average probability for 10000 nodes to act as if belonging to community 1 is: " + str(np.average(mm_vectors[:,0])))

print("the variance for 10000 nodes to act as if belonging to community 1 is: " + str(np.var(mm_vectors[:,0])))


the average probability for 10000 nodes to act as if belonging to community 1 is: 0.9899309493281928
the variance for 10000 nodes to act as if belonging to community 1 is: 9.904781412502551e-05


Therefore the expected deviation from ER should be approximately 0.02%(in reality will be slightly higher) for `alpha = [100, 1]`

Now I will repeat the same steps for `alpha = [1000, 1]` and `alpha = [10000 1]`

In [13]:
n = 10000

alpha = [[1000, 1], [10000, 1]]

for i in range(2):

    mm_vectors = rng.dirichlet(alpha[i], n)

    mm_vectors = np.array(sorted(mm_vectors, key=lambda x: np.argmax(x)))

    print("the average probability for 10000 nodes to act as if belonging to community 1 with alpha = " + str(alpha[i]) + " is: " + str(np.average(mm_vectors[:,0])))

    print("the variance for 10000 nodes to act as if belonging to community 1 with alpha = " + str(alpha[i]) + " is: " + str(np.var(mm_vectors[:,0])))

the average probability for 10000 nodes to act as if belonging to community 1 with alpha = [1000, 1] is: 0.9990032464966428
the variance for 10000 nodes to act as if belonging to community 1 with alpha = [1000, 1] is: 9.907175854817301e-07
the average probability for 10000 nodes to act as if belonging to community 1 with alpha = [10000, 1] is: 0.9998988923810262
the variance for 10000 nodes to act as if belonging to community 1 with alpha = [10000, 1] is: 1.04743725340356e-08


Therefore, to reduce even further the probability of deviation from ER will adopt `alpha = [1000, 1]` both for the tests and the doc build. Although I feel like for the tutorial purposes keeping the [100 1] should be fine as it is stated that we are approximating an ER. 