Tutorial on random-walk based node embeddings using node2vec.
Code modified version of the code from page 54 of _Hands-On Graph Neutral Networks Using Python_ by Maxime Labonne

Mike Goodrich
CS 575

---

We'll use the karate graph to explore what kind of information random walks reveals. Let's plot the karate graph using labels from the nodes to indicate color. The labels come from whether the member of the karate club stayed with the club ("club") or went with the breakoff group ("officer).

Let's import the graph and create a list of labels to use when we plot the graph.

In [None]:
"""Visualize random walks versus biased random walks in the Karate graph"""
import networkx as nx
from matplotlib import pyplot as plt
G: nx.Graph = nx.karate_club_graph()
# The karate club split into two groups as members followed 
# Mr. Hi (=0) or one of the club officers (=1)
labels: list[str] = []
for node in G.nodes:
    label = G.nodes[node]['club']
    labels.append(1 if label == 'Officer' else 0)


Create a function that will handle drawing. Since we'll draw a lot, it will be nice to have this function.

In [None]:
def draw_network(G,colors,title="karate network",show_scale = False, show_degree_as_size = False, show_labels = True):
    #plt.figure(figsize=(5,5),dpi=300)
    plt.figure()
    plt.axis('off')
    plt.title(title)
    if show_degree_as_size:
        my_node_size=[v * 10 for v in dict(G.degree).values()]
    else: my_node_size = 200
    nx.draw_networkx(G, 
                 #pos=nx.spring_layout(G,seed=0), 
                 pos = nx.nx_pydot.graphviz_layout(G, prog = "neato"),
                 #pos = nx.fruchterman_reingold_layout(G,seed=0),
                 node_color=colors,
                 node_size=my_node_size,
                 cmap='cool',
                 font_size=9,
                 font_color='white',
                 with_labels=show_labels)
    if show_scale == True:
        sm = plt.cm.ScalarMappable(cmap = 'cool',norm=plt.Normalize(vmin = 0, vmax=max(colors)))
        _ = plt.colorbar(sm, ax=plt.gca())

Draw the karate graph.

In [None]:
draw_network(G,labels)

---

Let's define an unbiased random walk function

In [None]:
import numpy as np
from typing import Hashable
def unbiased_random_walk(G: nx.Graph,
                         start: Hashable,
                         length: int
                         ) -> list[Hashable]:
    #function for performing uniformly random walk
    walk: list[Hashable] = [start] # starting node
    for _ in range(length):
        neighbors = [node for node in G.neighbors(start)]
        next_node = np.random.choice(neighbors,1)[0]
        walk.append(next_node)
        start = next_node
    return walk

Let's start by looking where deep walks go if the walks begin at node 0. 

In [None]:
# pg 39 of book
WALK_LENGTH = 10
NUM_TRIALS = 200

node_count_dict = dict.fromkeys(G.nodes,0)
walks = []
for node in G.nodes:
    for _ in range(NUM_TRIALS):
        walk = unbiased_random_walk(G,node,WALK_LENGTH)
        walks.append(walk)

        # Compute frequency of different nodes on random walks that start at node 0
        if node == 0: 
            for visited_node in walk:
                node_count_dict[visited_node] += 1


Turn the set of walks into the probability that a node was visited. Plot the result.

In [None]:
def normalize(node_count_dict, num_trials):
    node_probabilities = []
    for node in node_count_dict.keys():
        node_probabilities.append(min(1,node_count_dict[node]/num_trials))
    return node_probabilities

node_probabilities = normalize(node_count_dict, NUM_TRIALS)

draw_network(G,node_probabilities,
             title="Uniform random walk from node 0", 
             show_scale=True)


---


#### Node2vec's biased random walks

In the unbiased random walks, the probability of the next node depended only on the current node. An unbiased random walk is therefore a first order Markov process defined by the transition probability

$$ p(v_{t+1} | v_t) $$

By contrast, the node2vec algorithm allows us to decide whether the random walk should be biased toward depth-first exploration or should be biased toward breadth-first exploration.
 - Depth-first exploration tends to spend roughly equal amounts of time at nodes whose _roles_ in the network are similar, e.g., hubs, leafs, etc. Thus, depth-first exploration tends to identify regular equivalences.
 - Breadth-first exploration tends to spend time in nearby node neighborhoods. Thus, depth-first exploration tends to identify communities as collections of similar nodes, which is a form of  homophily.

Node two vec implements this as a second order Markov process defined by the transition probability

$$ p(v_{t+1} | v_t, v_{t-1}) $$

Let's discuss this second order Markov process including how node2vec specifies parameters that allow us to emphasize depth-first or breadth-first random walks.

The following figure is modified from [Figure 2](https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf) in the node2vec paper.

<img src="figures/biased_2nd_order_random_walk.png" width="300" alt="Figure showing a biased random walk">

The transitions in this figure show two parameters, $p$ and $q$. Let's talk about what they mean.

 - The parameter $q$ says how much we want the biased random walk to emphasize depth first search. Small values of $q$, meaning $0 < q < 1$, indicate that the next node in the random walk tends toward $v_{t+1} = v_{\rm far}$.
 - The parameter $p$ says how much we want the random walk to emphasize breadth first search. Small values of $p$, meaning $0 < p < 1 $, indicate that the random walk tends toward returning to node $v_t$'s parent, $v_{t+1} = v_{t-1}$.
 - The transition between $v_t$ and $v_{\rm nbr}$ also emphasizes a random walk, but it makes sure that the random walk goes to a unique node in the neighborhood rather than back to the parent node. The figure shows that $v_{\rm nbr}$ has to be adjacent to node $v_t$ (solid arrow) and adjacent to the parent node $v_{t-1}$ (dashed line).



Notice that the edges do not add up to one, which is a problem if we want to implement the random walk using actual transition probabilities. We fix this by normalizing using the following pseudocode:
 - for each $v \in V$ where $V$ is the set of vertices
   - if $v = v_{t-1}$: $p(v) = 1/p $
   - else if $v\in v {\rm neighborhood}$: $p(v) = 1$
   - else if $v \in v_{\rm far}$: $p_(v) = 1/q$
   - else $p(v) = 0$.
The order of the if-else statements are important. If a vertex is both a parent and a neighbor then its probability should be $1/p$. If a vertex is a neighbor but not a parent then its probability should be $1$. If a vertex is not a neighbor nor a parent but is adjacent then its probability should be $1/q$. Otherwise, its probability is $0$.


We then normalize to get the transition probability

$$ p(v | v_{t}, v_{t-1}) = \frac{p(v)}{\sum_{u\in V} p(v)} $$

We implement this in three functions. See the function description for information.

In [None]:
"""Biased random walks ..."""
# pg 55 of book
import numpy as np
import random
random.seed(0)
np.random.seed(0)

def biased_next_node(G,previous,current,p,q):
    ''' Get the probabilities using the parameters and returns the next node '''
    neighbors = list(G.neighbors(current))
    alphas = []
    for neighbor in neighbors:
        if neighbor == previous:
            alpha = 1/p
        elif G.has_edge(neighbor,previous):
            alpha = 1
        else:
            alpha = 1/q
        alphas.append(alpha)
    probs = [alpha/sum(alphas) for alpha in alphas]
    next = np.random.choice(neighbors, p=probs)
    return next

def biased_random_walk(G,start,length,p,q):
    ''' Implement a biased random walk by choosing nodes from the biased_next_node function'''
    #function for performing biased random walk
    walk = [start] # starting node
    for _ in range(length):
        current = walk[-1]
        previous = walk[-2] if len(walk) > 1 else None
        next = biased_next_node(G, previous, current, p, q)
        walk.append(next)
    return walk

def get_node_probabilities(G,start_node,walk_length,num_trials,p,q):
    ''' Compute how often each node is visited on a biased random walk from start_node'''
    node_count_dict = dict.fromkeys(sorted(G.nodes),0)
    node_probabilities = []
    walks = []
    for _ in range(num_trials):
        walk = biased_random_walk(G,start_node,walk_length,p,q)
        walks.append(walk)
        # Compute frequency of different nodes on random walks that start at start_node
        for visited_node in walk:
            node_count_dict[visited_node] += 1
    normalizer = sum(node_count_dict.keys())
    for node in node_count_dict.keys():
        node_probabilities.append(node_count_dict[node]/normalizer)
    return node_probabilities

Let's take a look at what happens if we bias the walk toward depth-first search. We previously said that depth-first searches tend to discover regular equivalence, so let's see if that's correct. A depth first search has a small $q$ and a large $p$. I'm going to choose long walks to really emphasize the point. I'll also show the size of the node proportional to its degree since degree is a first-order approximation of regular equivalence.

In [None]:
WALK_LENGTH = 30
NUM_TRIALS = 200
p = 10
q = 0.001 # Depth first search
start_node = 0

# Color a node by it's random walk distance from node zero
node_probabilities = get_node_probabilities(G,start_node,WALK_LENGTH,NUM_TRIALS,p,q)
graph_title = f"Biased DFS random walk: p = {p}, q = {q}"
draw_network(G,node_probabilities, title=graph_title, show_scale = True, show_degree_as_size = False)

Notice how the hub node 33 is the node whose values are most close to node 0, which is also a hub node. This is a form of regular equivalence. 

---

Let's bias the random walk so that it emphasizes breadth first search. This should pull out homophily. 

In [None]:
WALK_LENGTH = 30
NUM_TRIALS = 200
p = 10
q = 1000 # Emphasize breadth first search
start_node = 0

# Color a node by it's random walk distance from node zero
node_probabilities = get_node_probabilities(G,start_node,WALK_LENGTH,NUM_TRIALS,p,q)
graph_title = f"Biased BFS random walk: p = {p}, q = {q}"
draw_network(G,node_probabilities, title=graph_title, show_scale = True)


Notice how the nodes most similar to node 0 are those nodes that are in node 0's neighborhood.

---

#### Clustering by homophily

Let's put the pieces together and consider random walks biased towards DFS and random walks biased towards BFS. We'll then do the same kind of clustering that we did in the deepwalk tutorial. I'm going to copy the biased random walk function here so that we can see it.

Let's collect walks starting from each node.

In [None]:
WALK_LENGTH = 30
NUM_TRIALS = 200
p = 10; q = 1000 # emphasizes homophily (neighbors via breadth first search)
walks = []
for node in G.nodes:
    for _ in range(NUM_TRIALS):
        walks.append(biased_random_walk(G,node,WALK_LENGTH,p,q))

Use skip gram to get embedding from the set of walks

In [None]:
from gensim.models.word2vec import Word2Vec

def get_trained_model(walks, walk_length, embedding_dimension):
    model = Word2Vec(walks,
                    hs=1, #softmax = 0, hierarchical softmax = 1
                    sg=1, #skip-gram
                    vector_size=embedding_dimension,
                    window=walk_length,
                    workers=2, negative = 10,
                    alpha = 0.03,
                    seed=0)
    model.train(walks,
                total_examples=model.corpus_count,
                epochs=30, 
                report_delay=1)
    return model

In [None]:
embedding_dimension = 64
model = get_trained_model(walks, WALK_LENGTH ,embedding_dimension)


Which nodes are closest to node 1 in the embedding? Note that default distance in Word2Vec is _cosine similarity_, ${\mathbf x}_i^T {\mathbf x}_j$.

In [None]:
target_node = 0
closest: list[tuple] = model.wv.most_similar(target_node)
count = 0
for t in closest:
    print(t)
    count+= 1
    if count == 10: break

Since there are a lot of dimensions to the embedding returned by the skipgram model (e.g., word2vec), let's look at the first two embedding dimensions. Other embedding dimensions might show something different so treat this as a very rough check on whether clustering might be occuring.

In [None]:
def show_embedding(G, embedding):
    for i, node in enumerate(G.nodes):
        label = G.nodes[node]['club']
        color = ('c' if label == 'Officer' else 'm')
        plt.scatter(embedding[i,0], embedding[i,1],s=100,alpha = 0.8, color = color)
        plt.annotate(node, xy=(embedding[i,0], embedding[i,1]))


Let's actually get the embedding and the show it. Then we can do a visual inspection to see if we think things will cluster well.

In [None]:
from sklearn.manifold import TSNE

# Use TSNE to compress from 100 dimensional embedding to two
X = model.wv[G.nodes]
embedding = TSNE(n_components=2).fit_transform(X)
show_embedding(G, embedding)

It looks like there will be good clustering here. Let's cluster the nodes into a handful groups. W We can then compare the clusters we find to how the karate graph actually split. Hopefully, we'll learn a bit about how well structural/relational similarity matched what happend in the karate class.

In [None]:
# Set up kmeans clustering
from sklearn.cluster import KMeans
def get_clusters(embedding, num_clusters = 4):
    kmeans = KMeans(
        init="random",
        n_clusters=num_clusters,
        n_init=10,
        random_state=1234
        )
    kmeans.fit(embedding)
    return kmeans

kmeans = get_clusters(embedding)
print(kmeans.labels_[:10])

# See how things clustered
labels = kmeans.labels_
draw_network(G, labels, title = "Nodes colored by homophily-biased walk")


---

#### Clustering by regular equivalence

Let's repeat but with more balance between the depth-first and breadth-first searches in node2vec. That should pickup up similarity that includes both structural and relational components.

In [None]:
WALK_LENGTH = 30
NUM_TRIALS = 200
p = 0.1; q = 0.001 # emphasizes structure via DFS
walks = []
for node in G.nodes:
    for _ in range(NUM_TRIALS):
        walks.append(biased_random_walk(G,node,WALK_LENGTH,p,q))

embedding_dimension = 64
model = get_trained_model(walks, WALK_LENGTH ,embedding_dimension)

from sklearn.manifold import TSNE
# Use TSNE to compress from 100 dimensional embedding to two
X = model.wv[G.nodes]
embedding = TSNE(n_components=2).fit_transform(X)
show_embedding(G, embedding)


Cluster and plot by cluster. I'm going to use fewer clusters because that illustrates how depth first search emphasizes regular equivalence. I'll also show node size proportional to degree to emphasize regular equivalence.

In [None]:
kmeans = get_clusters(embedding,2)
# See how things clustered
draw_network(G, kmeans.labels_, title = "Nodes colored by DFS-biased walk", show_degree_as_size=True)

Note that there is still strong clustering by local community, but that node 0 and node 33 are clustered together. Note further that I had to really tweak parameters to make this work. Most of the parameter sets had node 0 and node 33 in different clusters.

Try it for a different number of clusters.

In [None]:
kmeans = get_clusters(embedding,3)
# See how things clustered
draw_network(G, kmeans.labels_, title = "Nodes colored by DFS-biased walk", show_degree_as_size=True)



---

#### Applying to the Les Miserables network

Let's see if we can use node2vec to recreate Figure 3 in the [original node2vec paper](https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf)


In [None]:
G = nx.les_miserables_graph()
draw_network(G, colors = 'y', show_degree_as_size=True, show_labels=False, title = "Les Miserables graph")

In [None]:
WALK_LENGTH = 30
NUM_TRIALS = 200
p = 0.1
q = 0.001 # emphasizes structure via DFS
num_clusters = 4
walks = []
for node in G.nodes:
    for _ in range(NUM_TRIALS):
        walks.append(biased_random_walk(G,node,WALK_LENGTH,p,q))

embedding_dimension = 64
model = get_trained_model(walks, WALK_LENGTH ,embedding_dimension)

from sklearn.manifold import TSNE
# Use TSNE to compress from 100 dimensional embedding to two
X = model.wv[G.nodes]
embedding = TSNE(n_components=2).fit_transform(X)
# cluster
kmeans = get_clusters(embedding,num_clusters)

In [None]:
colorlist = ['#e41a1c', '#377eb8', '#4daf4a', '#984ea3', '#ff7f00', '#ffff33', '#a65628']
pos = dict()
labels = []
for i, node in enumerate(G.nodes):
    pos[node] = embedding[i,:]
    color = colorlist[kmeans.labels_[i]]
    labels.append(color)
    plt.scatter(embedding[i,0], embedding[i,1],s=100,alpha = 0.8, color = color)


In [None]:
# See how things clustered
draw_network(G, kmeans.labels_, title = "Nodes colored by DFS-biased walk", show_degree_as_size=True, show_labels=False)

---

#### Now emphasizing structural equivalence/ homophily

In [None]:
from sklearn.manifold import TSNE

WALK_LENGTH = 30
NUM_TRIALS = 200
p = 10
q = 100 # emphasizes homophily
num_clusters = 6
walks = []
for node in G.nodes:
    for _ in range(NUM_TRIALS):
        walks.append(biased_random_walk(G,node,WALK_LENGTH,p,q))

embedding_dimension = 64
model = get_trained_model(walks, WALK_LENGTH ,embedding_dimension)


# Use TSNE to compress from 100 dimensional embedding to two
X = model.wv[G.nodes]
embedding = TSNE(n_components=2).fit_transform(X)
# cluster
kmeans = get_clusters(embedding,num_clusters)

In [None]:
colorlist = ['#e41a1c', '#377eb8', '#4daf4a', '#984ea3', '#ff7f00', '#ffff33', '#a65628']
#pos = dict()
labels = []
for i, node in enumerate(G.nodes):
    #pos[node] = embedding[i,:]
    color = colorlist[kmeans.labels_[i]]
    labels.append(color)
    plt.scatter(embedding[i,0], embedding[i,1],s=100,alpha = 0.8, color = color)
    #plt.annotate(node, xy=(embedding[i,0], embedding[i,1]))


In [None]:
# See how things clustered
draw_network(G, kmeans.labels_, title = "Nodes colored by BFS-biased walk", show_degree_as_size=True, show_labels=False)