# Example of Generating Node Embeddings for a Graph from SBM

*By: Aman Barot*

This notebook first generates a graph from the Stochastic Block Model (SBM). Then, it generates node embeddings for that graph.

In [1]:
import networkx as nx
import numpy as np
import torch
# my modules below for node2vec 
import gen_node2vec_walks
import gen_embeddings

In [2]:
# setting the device
# MPS is for Apple Silicon Macs, CUDA is for NVIDIA GPUs, and CPU is the fallback
device = None
if torch.backends.mps.is_available():
    device = torch.device('mps')
elif torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

## Generating SBM and Node Embeddings

### Generating graph from SBM

In [3]:
n_nodes = 500
n_comm = 5
p_in = 0.5 * (50 * 5/n_nodes)
p_out = 0.01 * (50 * 5/n_nodes)

In [4]:
print(f"p_in: {p_in}, p_out: {p_out}")
print(f"Within comm. degree: {n_nodes/n_comm * p_in}, \
      Between comm. degree: {n_nodes/n_comm * p_out * (n_comm - 1)}")

p_in: 0.25, p_out: 0.005
Within comm. degree: 25.0,       Between comm. degree: 2.0


In [5]:
comm_sizes = np.full(n_comm, n_nodes // n_comm)
# Adjust the last community size if n_nodes is not divisible by n_comm
comm_sizes[-1] += n_nodes % n_comm  
comm_sizes = comm_sizes.tolist()
comm_labels = np.full(n_nodes, n_comm - 1)  # Default to last community
for i in range(n_comm - 1):
    # The first n_comm-1 communities will have equal sizes
    comm_labels[i * (n_nodes // n_comm):(i + 1) * (n_nodes // n_comm)] = i
block_matrix = np.full((n_comm, n_comm), p_out)
np.fill_diagonal(block_matrix, p_in) #in-place mod. of diagonal
G = nx.stochastic_block_model(
    sizes=comm_sizes,
    p=block_matrix,
    seed=0,
    directed=False,
    selfloops=False,
)

### Generating Node Embeddings

Generating random walks using Deepwalk

In [6]:
random_walks = gen_node2vec_walks.simulate_walks(
    G=G, 
    num_walks=10, 
    walk_length=80, 
    p=1.0, 
    q=1.0,
    )

Setting negative sampling distribution to be unigram distribution raise to the power of $0.75$.

In [7]:
word_dist = np.zeros(n_nodes)
for walk in random_walks:
    for word in walk:
        word_dist[word] += 1

word_dist /= np.sum(word_dist) 
word_dist = word_dist ** 0.75

The context window is given by the parameters `t_L` and `t_U`. Below, we get all co-occurences in a window of size $10$. 

In [8]:
train_output = gen_embeddings.gen_embeddings(
    n=n_nodes,
    g_walks=random_walks,
    embed_dim=5,
    t_L=1, t_U=10,
    neg_sample_ct=5,
    device=device,
    batch_size=1024,
    n_epochs=2,
    word_dist=word_dist,
)