Introducing Node2vec
- Biased random walks perform better and how to implement
them in the two following sections:
    - Defining a **neighborhood**.
    - Introducing biases in random walks

Sampling strategy:
- A possible solution would be to consider the three closest nodes in terms of connections. Like BFS
- Another possible sampling strategy consists of selecting nodes that are not adjacent to previous nodes first. Like DFS

In [7]:
##Zachary’s Karate Club

import networkx as nx
import random
random.seed(0)
import numpy as np
np.random.seed(0)

In [8]:
G = nx.erdos_renyi_graph(10, 0.3, seed=1, directed=False)

In [15]:
def next_node(previous, current, p, q):
    neighbors = list(G.neighbors(current))
    alphas = []
    for neighbor in neighbors:
        if neighbor == previous:
            alpha = 1/p
        elif G.has_edge(neighbor,previous):
            alpha = 1
        else: alpha = 1/q
        alphas.append(alpha)
    probs = [alpha/sum(alphas) for alpha in alphas]
    next = np.random.choice(neighbors, size=1, p=probs)[0]
    return next

In [10]:
def random_walk(start, length, p,q):
    walk = [start]
    
    for i in range(length):
        current = walk[-1]
        previous = walk[-2] if len(walk) > 1 else None
        next = next_node(previous, current,p,q)
        walk.append(next)
        
    return [str(x) for x in walk]

In [16]:
random_walk(0,8,p=1, q=1)

['0', '4', '6', '3', '7', '8', '7', '8', '7']

In [17]:
random_walk(0,8,p=1, q=10)

['0', '4', '9', '0', '1', '0', '9', '4', '9']

In [18]:
random_walk(0,8,p=10, q=1)

['0', '9', '4', '5', '6', '1', '2', '5', '6']

In [19]:
from gensim.models.word2vec import Word2Vec
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [20]:
G = nx.karate_club_graph()
