
### Objective: 

In this assignment, implement the Node2Vec algorithm, a random-walk-based GNN, to learn node embeddings. Train a classifier using the learned embeddings to predict node labels.

### Dataset: 

Cora dataset: The dataset consists of 2,708 nodes (scientific publications) with 5,429 edges (citations between publications). Each node has a feature vector of size 1,433, and there are 7 classes (research topics).
Skeleton Code:

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch_geometric.datasets import Planetoid
from torch_geometric.utils import to_networkx
from node2vec import Node2Vec  # Importing Node2Vec for the random walk

# Load the Cora dataset
dataset = Planetoid(root='data/Cora', name='Cora')

# Prepare data
data = dataset[0]

# Convert to networkx for random walk
import networkx as nx
G = to_networkx(data, to_undirected=True)

# Node2Vec configuration
node2vec = Node2Vec(G, dimensions=64, walk_length=30, num_walks=200, workers=2) 
model = node2vec.fit(window=10, min_count=1)

# Embeddings for each node
embeddings = model.wv  # Node embeddings

# Define a simple classifier
class Classifier(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(Classifier, self).__init__()
        self.fc = nn.Linear(input_dim, output_dim)

    def forward(self, x):
        return self.fc(x)

# Initialize classifier and optimizer
classifier = Classifier(64, 7)
optimizer = optim.Adam(classifier.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(100):
    classifier.train()
    optimizer.zero_grad()
    
    # Get node embeddings as input
    output = classifier(torch.tensor([embeddings[str(i)] for i in range(data.num_nodes)]))
    
    loss = criterion(output, data.y)
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')

print("Training complete!")

Computing transition probabilities:   0%|          | 0/2708 [00:00<?, ?it/s]

Generating walks (CPU: 1): 100%|██████████| 100/100 [00:33<00:00,  2.96it/s]
Generating walks (CPU: 2): 100%|██████████| 100/100 [00:34<00:00,  2.87it/s]
  output = classifier(torch.tensor([embeddings[str(i)] for i in range(data.num_nodes)]))


Epoch 0, Loss: 1.9926451444625854
Epoch 10, Loss: 1.2331691980361938
Epoch 20, Loss: 0.8968768119812012
Epoch 30, Loss: 0.7473369836807251
Epoch 40, Loss: 0.6756051182746887
Epoch 50, Loss: 0.6319088935852051
Epoch 60, Loss: 0.6023977398872375
Epoch 70, Loss: 0.5803974270820618
Epoch 80, Loss: 0.5632628798484802
Epoch 90, Loss: 0.549415647983551
Training complete!


## Explanation:
Node2Vec generates node embeddings by simulating random walks on the graph. These walks capture structural properties of nodes.
The generated embeddings are then used to train a classifier for predicting node labels.
The Cora dataset is a benchmark graph where nodes are papers and edges are citations.

## Questions (1 point each):
1. What would happen if we increased the number of walks (num_walks) per node? How might this affect the learned embeddings?
2. What would happen if we reduced the walk length (walk_length)? How would this influence the structural information captured by the embeddings?

4. What would happen if we used directed edges instead of undirected edges for the random walks?
5. What would happen if we added more features to the nodes (e.g., 2000-dimensional features instead of 1433)?
6. What would happen if we used a different dataset with more classes? Would the classifier performance change significantly?
8. What would happen if we used a larger embedding dimension (e.g., 128 instead of 64)? How would this affect the model’s performance and training time?



### Extra credit: 
1. What would happen if we increased the window size (window) for the skip-gram model? How would it affect the embedding quality?

## No points, just for you to think about
7. What would happen if we removed self-loops from the graph before training Node2Vec?

9. What would happen if we applied normalization to the node embeddings before feeding them to the classifier?

**Answers To Questions**

1. What would happen if we increased the number of walks (num_walks) per node? How might this affect the learned embeddings?

Increasing the number of walks per node will result in more training time for learned embeddings. The learned embeddings will be able to make more generalizations of the model and be able to naviagte through complex scenarios. However, the learned embeddings may also be prone to overfitting.

2. What would happen if we reduced the walk length (walk_length)? How would this influence the structural information captured by the embeddings?

Decreasing the walk length could decrease the structural information captured by the embeddings. There could be a potential decrease in strongly connected components and some connections between nodes may not be identified at all, resulting in more local connections than global.

3. What would happen if we used directed edges instead of undirected edges for the random walks?

Using directed edges would cause significant changes in paths as a restriction is imposed on movement within the nodes, which is possible with undirected edges. There could also be nodes which cannot reach other nodes due to the edge direction. There could also be cases where some paths are traversed more often than the others, and can result in some paths being considered more important than the others in a model.

4. What would happen if we added more features to the nodes (e.g., 2000-dimensional features instead of 1433)?

Giving a model more features will provide more information for the graph neural network to learn. This should lead in a lower loss in the model on the basis that useful information is present in the additonal features provided.

5. What would happen if we used a different dataset with more classes? Would the classifier performance change significantly?

The classifier performance would be determined based on the usefulness of the classes of the new dataset and how much data is present. More classes will result in a tougher classification task and could result in a decrease in the prediction accuracy of models.

6. What would happen if we used a larger embedding dimension (e.g., 128 instead of 64)? How would this affect the model’s performance and training time?

Using a larger embedding dimension can increase the model performance by helping the model generalize to more complex scenarios and also train on more features. The training time would increase as the computational cost will rise and moee memoery will be required to store the features.

**Answers to Extra Credit**

1. What would happen if we increased the window size (window) for the skip-gram model? How would it affect the embedding quality?

Increasing the window size for the skip-gram model would result in more generalization of embeddings. Creating a larger window makes embeddings less specific and allows for a broader context instead of looking for specific patterns.