# ACM Summer School 2021

## Lab : Geometric DeepLearning - Graph Convolution Network on Cora Dataset with PyTorch

## Presenter : Aditya Intwala

![Status](https://img.shields.io/static/v1.svg?label=Status&message=Finished&color=green)

**Exercise notebook:** 
[![View on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/adityaintwala/ACM-GeometricDeepLearning-2021/blob/main/Lab1_DeepLearning_Introduction_to_PyTorch.ipynb)
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adityaintwala/ACM-GeometricDeepLearning-2021/blob/main/Lab1_DeepLearning_Introduction_to_PyTorch.ipynb)

**Solution notebook:** 
[![View on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/adityaintwala/ACM-GeometricDeepLearning-2021/blob/main/Solution/Lab3_DeepLearning_GCN_On_Cora_PyTorch.ipynb)
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adityaintwala/ACM-GeometricDeepLearning-2021/blob/main/Solution/Lab3_DeepLearning_GCN_On_Cora_PyTorch.ipynb)  

**Solution python file:** 
[![View on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/adityaintwala/ACM-GeometricDeepLearning-2021/blob/main/Lab3_DeepLearning_GCN_On_Cora_PyTorch.py)

*This is an empty version of the notebook for the Lab of Geometric Deep Learning session at ACM Summer School 2021.*

*Try to fill in the blanks indicated by `## TODO` comments and `...` parts in the code*

In this tutorial, we will implement custom Graph Convolution Network along with our custom GCNConv layer and use it for classification task on Cora dataset. We will also use the GCNConv layer pre-implemented in Pytorch for the same dataset and compare the results.  

Below, we will start by importing our standard libraries.

In [1]:
## Standard libraries
import os
import math
import numpy as np 
import time

## Imports for plotting
import matplotlib.pyplot as plt
%matplotlib inline 
from matplotlib.colors import to_rgba
import seaborn as sns
sns.set()

import networkx as nx

## PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as data
import torch.optim as optim

# torch geometric
try: 
    import torch_geometric
except ModuleNotFoundError:
    # You might need to install those packages with specific CUDA+PyTorch version. 
    # See https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html for details 
    !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
    !pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
    !pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
    !pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
    !pip install torch-geometric
    import torch_geometric
from torch_geometric.nn import MessagePassing, GCNConv
from torch_geometric.utils import add_self_loops, degree
from torch_geometric.datasets import Planetoid #for Cora dataset

## Cora Dataset

<center width="100%" style="padding:10px"><img src="https://relational.fit.cvut.cz/assets/img/datasets-generated/CORA.svg?raw=1" width="500px"></center>
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words. More information about the dataset is available at [Cora](https://relational.fit.cvut.cz/dataset/CORA).

## Custom Graph Convolution Networks

### Custom Graph Convolution Layer

Graph Convolutional Networks have been introduced by [Kipf et al.](https://openreview.net/pdf?id=SJU4ayYgl) in 2016 at the University of Amsterdam. He also wrote a great [blog post](https://tkipf.github.io/graph-convolutional-networks/) about this topic, which is recommended if you want to read about GCNs from a different perspective. GCNs are similar to convolutions in images in the sense that the "filter" parameters are typically shared over all locations in the graph. At the same time, GCNs rely on message passing methods, which means that vertices exchange information with the neighbors, and send "messages" to each other. Before looking at the math, we can try to visually understand how GCNs work. The first step is that each node creates a feature vector that represents the message it wants to send to all its neighbors. In the second step, the messages are sent to the neighbors, so that a node receives one message per adjacent node. Below we have visualized the two steps for our example graph. 

<center width="100%" style="padding:10px"><img src="https://github.com/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/tutorial7/graph_message_passing.svg?raw=1" width="700px"></center>

If we want to formulate that in more mathematical terms, we need to first decide how to combine all the messages a node receives. As the number of messages vary across nodes, we need an operation that works for any number. Hence, the usual way to go is to sum or take the mean. Given the previous features of nodes $H^{(l)}$, the GCN layer is defined as follows:

$$H^{(l+1)} = \sigma\left(\hat{D}^{-1/2}\hat{A}\hat{D}^{-1/2}H^{(l)}W^{(l)}\right)$$

$W^{(l)}$ is the weight parameters with which we transform the input features into messages ($H^{(l)}W^{(l)}$). To the adjacency matrix $A$ we add the identity matrix so that each node sends its own message also to itself: $\hat{A}=A+I$. Finally, to take the average instead of summing, we calculate the matrix $\hat{D}$ which is a diagonal matrix with $D_{ii}$ denoting the number of neighbors node $i$ has. $\sigma$ represents an arbitrary activation function, and not necessarily the sigmoid (usually a ReLU-based activation function is used in GNNs).

In [2]:
class GCNLayer(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(GCNLayer, self).__init__(aggr='add')  # "Add" aggregation
        ## TODO: Add a linear layer with in_channels as input dim and out_channels as output dim

    def forward(self, x, edge_index):
        ## TODO: Step 1: Add self-loops
        edge_index, _ = ...

        ## TODO: Step 2: Multiply with weights
        x = ...

        ## TODO: Step 3: Calculate the normalization
        
        
        
        
        norm = ...

        # Step 4: Propagate the embeddings to the next layer
        return self.propagate(edge_index, size=(x.size(0), x.size(0)), x=x, norm=norm)

    def message(self, x_j, norm):
        # Normalize node features.
        return norm.view(-1, 1) * x_j

Create a GCN network using Pytorch nn.Module and our custom GCNLayer

In [3]:
class GCNNetwork(torch.nn.Module):
    def __init__(self, dataset):
        super(GCNNetwork, self).__init__()
        ## TODO: Use two GCNLayer, output of first should be 16
        self.conv1 = ...
        self.conv2 = ...

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        ## TODO: Stack layers Conv1, ReLU,Dropout,Conv2
        x = ...
        x = ...
        x = ...
        x = ...

        return F.log_softmax(x, dim=1)

Node features:
 tensor([[[0., 1.],
         [2., 3.],
         [4., 5.],
         [6., 7.]]])

Adjacency matrix:
 tensor([[[1., 1., 0., 0.],
         [1., 1., 1., 1.],
         [0., 1., 1., 1.],
         [0., 1., 1., 1.]]])


## Pytorch Graph Convolution Networks

In [5]:
class GCNNetworkPytorch(torch.nn.Module):
    def __init__(self,dataset):
        super(GCNNetworkPytorch, self).__init__()
        ## TODO: Use two GCNConv layers, output of first should be 16
        self.conv1 = ...
        self.conv2 = ...
        

    def forward(self,data):
        x, edge_index, edge_weight = data.x, data.edge_index, data.edge_attr
        ## TODO: Stack layers Conv1, ReLU,Dropout,Conv2
        x = ...
        x = ...
        x = ...
        return F.log_softmax(x, dim=1)

## Training and Testing

In [6]:
def plot_dataset(dataset):
    edges_raw = dataset.data.edge_index.numpy()
    edges = [(x, y) for x, y in zip(edges_raw[0, :], edges_raw[1, :])]
    labels = dataset.data.y.numpy()

    G = nx.Graph()
    G.add_nodes_from(list(range(np.max(edges_raw))))
    G.add_edges_from(edges)
    plt.subplot(111)
    options = { 'node_size': 1, 'width': 0.2 }
    nx.draw(G, with_labels=False, node_color=labels.tolist(), cmap=plt.cm.tab10, **options)
    plt.show()

Attention probs
 tensor([[[[0.3543, 0.6457, 0.0000, 0.0000],
          [0.1096, 0.1450, 0.2642, 0.4813],
          [0.0000, 0.1858, 0.2885, 0.5257],
          [0.0000, 0.2391, 0.2696, 0.4913]],

         [[0.5100, 0.4900, 0.0000, 0.0000],
          [0.2975, 0.2436, 0.2340, 0.2249],
          [0.0000, 0.3838, 0.3142, 0.3019],
          [0.0000, 0.4018, 0.3289, 0.2693]]]])
Adjacency matrix tensor([[[1., 1., 0., 0.],
         [1., 1., 1., 1.],
         [0., 1., 1., 1.],
         [0., 1., 1., 1.]]])
Input features tensor([[[0., 1.],
         [2., 3.],
         [4., 5.],
         [6., 7.]]])
Output features tensor([[[1.2913, 1.9800],
         [4.2344, 3.7725],
         [4.6798, 4.8362],
         [4.5043, 4.7351]]])


In [None]:
def train(model, optimizer, data):
    train_accuracies, test_accuracies = list(), list()
    for epoch in range(100):
            model.train()
            optimizer.zero_grad()
            out = model(data)
            loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
            loss.backward()
            optimizer.step()

            train_acc = test(model, data)
            test_acc = test(model, data, train=False)

            train_accuracies.append(train_acc)
            test_accuracies.append(test_acc)
            print('Epoch: {:03d}, Loss: {:.5f}, Train Acc: {:.5f}, Test Acc: {:.5f}'.
                  format(epoch, loss, train_acc, test_acc))

    return train_accuracies, test_accuracies

In [None]:
def test(model, data, train=True):
    model.eval()

    correct = 0
    pred = model(data).max(dim=1)[1]

    if train:
        correct += pred[data.train_mask].eq(data.y[data.train_mask]).sum().item()
        return correct / (len(data.y[data.train_mask]))
    else:
        correct += pred[data.test_mask].eq(data.y[data.test_mask]).sum().item()
        return correct / (len(data.y[data.test_mask]))

## Cora Example

Download / Load the Cora dataset using Planetoid dataset from torch_geometric

In [None]:
print('========================== Downloading / Loading Cora Dataset ==========================')
## TODO: Load Cora dataset using Planetoid
dataset = ...

#plot the dataset
plot_dataset(dataset)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


In [None]:
print('========================== GCNNetwork with our custom GCNLayer ==========================')
## TODO: Create GCNNetwork
customGCN = ...
#assign data to device for training
data = dataset[0].to(device)
#create optimizer for training
optimizer = torch.optim.Adam(customGCN.parameters(), lr=0.01, weight_decay=5e-4)
print('============= Training & Evaluation =============')
customGCN_train_accuracies, customGCN_test_accuracies = train(customGCN, optimizer, data)

In [None]:
print('========================== GCNNetworkPytorch with pre implemented GraphConv Layer ==========================')
## TODO: Create GCNNetworkPytorch
pytorchGCN = ...
#assign data to device for training
data = dataset[0].to(device)
#create optimizer for training
optimizer = torch.optim.Adam([ dict(params=pytorchGCN.conv1.parameters(), weight_decay=5e-4), dict(params=pytorchGCN.conv2.parameters(), weight_decay=0)], lr=0.01)  # Only perform weight-decay on first convolution.
print('============= Training & Evaluation =============')
pytorchGCN_train_accuracies, pytorchGCN_test_accuracies = train(pytorchGCN, optimizer, data)

Plot the training and testing accuracies for both the models for comparision.

In [None]:
plt.plot(customGCN_train_accuracies, label="customGCN Train accuracy")
plt.plot(customGCN_test_accuracies, label="customGCN Validation accuracy")
plt.plot(pytorchGCN_train_accuracies, label="pytorchGCN Train accuracy")
plt.plot(pytorchGCN_test_accuracies, label="pytorchGCN Validation accuracy")
plt.xlabel("# Epoch")
plt.ylabel("Accuracy")
plt.legend(loc='lower right')
plt.show()

## Conclusion

In this tutorial, we have seen the custom implementation of Graph Convolution layer and compared the performance of it with the pre implemented GraphConv layer in pytorch geometric on Cora dataset.