In [None]:
from datascience import *

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import pyplot, patches
import numpy as np
plt.style.use('fivethirtyeight')

import os
import networkx as nx
from networkx.algorithms import bipartite
from networkx.utils import groups
import pandas as pd
from collections import defaultdict

import zipfile
import requests
import io

# functions to help read in add health data
# (which is assumed to be stored under data/)
import sbms.sbms as sbms

%matplotlib inline

## Stochastic block model

In [None]:
# number of nodes in each group/block
block_sizes = [100, 200, 100, 200]
# stochastic block matrix
block_probs  = [ [0.4, 0.2, 0.2, 0.2],
                 [0.2, 0.4, 0.2, 0.2],
                 [0.2, 0.2, 0.4, 0.2],
                 [0.2, 0.2, 0.2, 0.4] ]

In [None]:
g = nx.stochastic_block_model(block_sizes, block_probs)

The nodes have an attribute with the index of their block:

In [None]:
nx.get_node_attributes(g, 'block')

Here's a helper function we can use to count the number of edges between the blocks:

In [None]:
sbms.get_block_stats(g)

Let's use this info to check a couple of our formulas. 

In lecture, we said that the expected number of edges from group 0 to itself should be

$$
\frac{n_0~(n_0-1)}{2}p_{0,0} = \frac{100*99}{2}\times 0.4
$$

In [None]:
((100*99)/2)*.4

Pretty close!

Let's also check the number of expected edges between, say, group 2 and group 3. This should be

$$
n_2~n_3~p_{23} = 100 \times 200 \times 0.2
$$

In [None]:
100*200*0.2

Again, looks pretty close

Let's see if we can visualize the network in a helpful way:

In [None]:
nx.draw_networkx(g)

It's kind of hard to tell what is going on!

We have a helper function (loaded above) called `draw_adjacency_matrix` which can potentially help visualize the structure of a block model.

Recall that an adjacency matrix is a big table showing which pairs of nodes are connected with an edge. Entry $(i,j)$ of the adjacency matrix is 0 if there is no edge and 1 if there is an edge between nodes $i$ and $j$.

This plot will draw the adjacency matrix as an image, where a dark square means the matrix entry is 1 (i.e., there is an edge) and a light square means the matrix entry is 0 (i.e., there is no edge).

In [None]:
sbms.draw_adjacency_matrix(g)

We can see that there is a greater density of edges within the blocks (which makes sense, given the parameters we chose for the block matrix)

### Another example: community structure

In [None]:
p_w = 0.4 # within-group prob
p_o = 0.01 # outside-group prob

# number of nodes in each group/block
block_sizes = [100, 200, 100, 200]
# stochastic block matrix
block_probs  = [ [p_w, p_o, p_o, p_o],
                 [p_o, p_w, p_o, p_o],
                 [p_o, p_o, p_w, p_o],
                 [p_o, p_o, p_o, p_w] ]

In [None]:
g = nx.stochastic_block_model(block_sizes, block_probs)

In [None]:
sbms.draw_adjacency_matrix(g)

When there is very strong structure to the blocks in this way, sometimes researchers will lay the network nodes out to illustrate it. The `visualize_blocks` helper function can do this for us:

In [None]:
fig, ax = plt.subplots(1, figsize=(15, 20))
sbms.visualize_blocks(g, edge_alpha=.1, node_size=10)

### Disassortative  example

In [None]:
p_w = 0.01 # within-group prob
p_o = 0.2 # outside-group prob

# number of nodes in each group/block
block_sizes = [100, 200, 100, 200]
# stochastic block matrix
block_probs  = [ [p_w, p_o, p_o, p_o],
                 [p_o, p_w, p_o, p_o],
                 [p_o, p_o, p_w, p_o],
                 [p_o, p_o, p_o, p_w] ]

In [None]:
g = nx.stochastic_block_model(block_sizes, block_probs)

In [None]:
sbms.draw_adjacency_matrix(g)

In this case, edges are mostly between groups, with very few edges within a group. This case is approaching something like a *multi-partite network* -- i.e., a generalization of a bi-partite network in which nodes can be partitioned into different groups with all edges going between different groups. (A bi-partite network is the same but with only two groups.)

## Hierarchical example

In [None]:
p_h = 0.3

# number of nodes in each group/block
block_sizes = [100, 200, 200, 200]
# stochastic block matrix
block_probs  = [ [0,   p_h, 0,   0  ],
                 [p_h, 0,   p_h, 0  ],
                 [0,   p_h, 0,   p_h],
                 [0,   0,   p_h,   0] ]

In [None]:
g = nx.stochastic_block_model(block_sizes, block_probs)

In [None]:
sbms.draw_adjacency_matrix(g)

This produces a kind of hierarchical structure...

In [None]:
fig, ax = plt.subplots(1, figsize=(15, 20))
sbms.visualize_blocks(g, edge_alpha=.01, node_size=10)

### ER  example

Finally, note that if all of the entries in the block matrix are the same, then the model is essentially an Erdos-Renyi random network model

In [None]:
p_w = 0.1 # within-group prob
p_o = 0.1 # outside-group prob

# number of nodes in each group/block
block_sizes = [100, 200, 100, 200]
# stochastic block matrix
block_probs  = [ [p_w, p_o, p_o, p_o],
                 [p_o, p_w, p_o, p_o],
                 [p_o, p_o, p_w, p_o],
                 [p_o, p_o, p_o, p_w] ]

In [None]:
g = nx.stochastic_block_model(block_sizes, block_probs)

In [None]:
sbms.draw_adjacency_matrix(g)

In this case, the node layout algorithm doesn't work as well - this is a hint that it's hard to find these groups without knowing them in advance. We'll talk about that in a future class!

In [None]:
fig, ax = plt.subplots(1, figsize=(15, 20))
sbms.visualize_blocks(g, edge_alpha=.1, node_size=10)