<h1 style="color:#7A74D9;">Pytorch (Backend) Geometric Baseline</h1>

## 1] Notes: 

1.  Neighborhood aggregation modules
    a. 
2.  Graph Convolutional Modules

    a. GraphConv a GCN layer
    
    b. RelGraphConv a relational graph convolution layer
    
    c. TAGConv Topology Adaptive Graph Conolutional layer 
    
    d. GATConv Graph attention network 
    
    e. EdgeConv
    
    f. SAGEConv 
    
    g. and there many more in [dgl.ai Conv Layers](https://docs.dgl.ai/en/latest/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv)
    
## On the limimitions

https://tkipf.github.io/graph-convolutional-networks/

https://www.inference.vc/how-powerful-are-graph-convolutions-review-of-kipf-welling-2016-2/


GCNS are optimal in the spectral domain. For this type of domain, using the Fourier transform is argued to have some limitations when the data scales up requiring relaxations. Thus, learning over graphs using GCNs is limited due to first order approcimation of a given graph reduced to a 2D lattice (which works great for image problems) but may not so easily generalize to graphs or non image-like structure. For more, see [Huszar/2016](https://www.inference.vc/how-powerful-are-graph-convolutions-review-of-kipf-welling-2016-2/).


In [None]:
# !conda install -c conda-forge scikit-learn -y

In [2]:
# !conda install pytorch-geometric -c rusty1s -c conda-forge -y

In [3]:
!python -c "import torch; print(torch.__version__)"

1.8.0


In [6]:
# !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html


In [7]:
# !pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html


In [4]:
# Helper function for visualization.
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

def visualize(h, color):
    z = TSNE(n_components=2).fit_transform(out.detach().cpu().numpy())

    plt.figure(figsize=(10,10))
    plt.xticks([])
    plt.yticks([])

    plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
    plt.show()

<hr></hr>

<h1 style="color:#7A74D9;">Import Libraries</h1>

A baseline GNN with DGL and pytorch backend example end-to-end referenced from this blog by [Builders][1].

[1]: https://buildersbox.corp-sansan.com/entry/2020/10/12/110000

In [8]:
# from torch_geometric.datasets import Planetoid
# from torch_geometric.transforms import NormalizeFeatures

# dataset = Planetoid(root='data/Planetoid', name='Cora', transform=NormalizeFeatures())

import torch
import torch as th
import torch.nn as nn
import torch.nn.functional as F

import dgl
from dgl import DGLGraph
from dgl.data import CoraGraphDataset
import dgl.function as fn

import time
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'

Using backend: pytorch


In [9]:
dataset = CoraGraphDataset()
graph = dataset[0]

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.


In [11]:
graph.ndata

{'feat': tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]]), 'label': tensor([3, 4, 4,  ..., 3, 3, 3]), 'val_mask': tensor([False, False, False,  ..., False, False, False]), 'test_mask': tensor([False, False, False,  ...,  True,  True,  True]), 'train_mask': tensor([ True,  True,  True,  ..., False, False, False])}

Verifies that there are more intra-class edges than inter-class

In [12]:
# find all the nodes labelled with class 0
label0_nodes = th.nonzero(graph.ndata['label'] == 0).squeeze()

# find all the edges pointing to class 0 nodes
src, _ = graph.in_edges(label0_nodes)
src_labels = graph.ndata['label'][src]

# find all the edges whose both endpoints are in class 0
intra_src = th.nonzero(src_labels == 0)
print('Intra-class edges percent: %.4f' % (len(intra_src) / len(src_labels)))


Intra-class edges percent: 0.6994


In [13]:
gcn_msg = fn.copy_src(src='h', out='m')
gcn_reduce = fn.sum(msg='m', out='h')

## GNNLayer Module 


In [14]:
class GCNLayer(nn.Module):
    def __init__(self, in_feats, out_feats):
        super(GCNLayer, self).__init__()
        self.linear = nn.Linear(in_feats, out_feats)

    def forward(self, g, feature):
        with g.local_scope():
            g.ndata['h'] = feature
            g.update_all(gcn_msg, gcn_reduce)
            h = g.ndata['h']
            return self.linear(h)

In [15]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = GCNLayer(1433, 16)
        self.layer2 = GCNLayer(16, 7)

    def forward(self, g, features):
        x = F.relu(self.layer1(g, features))
        x = self.layer2(g, x)
        return x

In [16]:
def evaluate(model, g, features, labels, mask):
    model.eval()
    with th.no_grad():
        logits = model(g, features)
        logits = logits[mask]
        labels = labels[mask]
        _, indices = th.max(logits, dim=1)
        correct = th.sum(indices == labels)
        return correct.item() * 1.0 / len(labels)

In [17]:
net = Net()
print(net)

Net(
  (layer1): GCNLayer(
    (linear): Linear(in_features=1433, out_features=16, bias=True)
  )
  (layer2): GCNLayer(
    (linear): Linear(in_features=16, out_features=7, bias=True)
  )
)


In [18]:
optimizer = th.optim.Adam(net.parameters(), lr=1e-3)
dur = []
for epoch in range(250):
    t0 = time.time()

    net.train()
    logits = net(graph, graph.ndata['feat'])
    logp = F.log_softmax(logits, 1)
    loss = F.nll_loss(logp[graph.ndata['train_mask']], graph.ndata['label'][graph.ndata['train_mask']])

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    acc = evaluate(net, graph, graph.ndata['feat'], graph.ndata['label'], graph.ndata['test_mask'])
    if epoch%10==0:
        print("Epoch {:05d} | Loss {:.4f} | Test Acc {:.4f} | Time(s) {:.4f}".format(
            epoch, loss.item(), acc, time.time()-t0))

Epoch 00000 | Loss 1.9875 | Test Acc 0.0660 | Time(s) 0.0555
Epoch 00010 | Loss 1.8442 | Test Acc 0.1400 | Time(s) 0.0285
Epoch 00020 | Loss 1.6958 | Test Acc 0.3440 | Time(s) 0.0286
Epoch 00030 | Loss 1.5462 | Test Acc 0.4250 | Time(s) 0.0292
Epoch 00040 | Loss 1.4093 | Test Acc 0.4900 | Time(s) 0.0326
Epoch 00050 | Loss 1.2916 | Test Acc 0.5760 | Time(s) 0.0313
Epoch 00060 | Loss 1.1861 | Test Acc 0.6200 | Time(s) 0.0304
Epoch 00070 | Loss 1.0877 | Test Acc 0.6480 | Time(s) 0.0286
Epoch 00080 | Loss 0.9962 | Test Acc 0.6720 | Time(s) 0.0333
Epoch 00090 | Loss 0.9119 | Test Acc 0.6790 | Time(s) 0.0419
Epoch 00100 | Loss 0.8350 | Test Acc 0.6880 | Time(s) 0.0357
Epoch 00110 | Loss 0.7656 | Test Acc 0.6950 | Time(s) 0.0393
Epoch 00120 | Loss 0.7034 | Test Acc 0.6960 | Time(s) 0.0305
Epoch 00130 | Loss 0.6482 | Test Acc 0.7020 | Time(s) 0.0392
Epoch 00140 | Loss 0.5991 | Test Acc 0.7050 | Time(s) 0.0299
Epoch 00150 | Loss 0.5551 | Test Acc 0.7080 | Time(s) 0.0311
Epoch 00160 | Loss 0.515

<hr></hr>

<h1 style="color:#7A74D9;">1] Node Classification with GNN</h1>

A baseline GNN with DGL and pytorch backend example end-to-end referenced from this blog by [dgl.ai][1].

[1]: https://docs.dgl.ai/en/latest/new-tutorial/1_introduction.html

In [20]:
# === Loading Cora Dataset

import dgl.data

dataset = dgl.data.CoraGraphDataset()
print('Number of categories:', dataset.num_classes)

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.
Number of categories: 7


In [21]:
g = dataset[0]

In [25]:
print ('Node features')
print(g.ndata)
print('Edge Features')
print(g.edata)

Node features
{'feat': tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]]), 'label': tensor([3, 4, 4,  ..., 3, 3, 3]), 'val_mask': tensor([False, False, False,  ..., False, False, False]), 'test_mask': tensor([False, False, False,  ...,  True,  True,  True]), 'train_mask': tensor([ True,  True,  True,  ..., False, False, False])}
Edge Features
{}


There are no edge features, but there are 
- feat below showing a total of 2708
- label
- test_mask
- train_mask
- val_mask a boolean tensor prededfined as whether or not the node is in the validation set 

In [26]:
len(g.ndata['feat'])

2708

## Defining a Graph Convolutional Network (GCN)


In [28]:
from dgl.nn import GraphConv

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

# Create the model with given dimensions
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)



In [29]:
def train(g, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    best_val_acc = 0
    best_test_acc = 0

    features = g.ndata['feat']
    labels = g.ndata['label']
    train_mask = g.ndata['train_mask']
    val_mask = g.ndata['val_mask']
    test_mask = g.ndata['test_mask']
    for e in range(100):
        # Forward
        logits = model(g, features)

        # Compute prediction
        pred = logits.argmax(1)

        # Compute loss
        # Note that you should only compute the losses of the nodes in the training set.
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])

        # Compute accuracy on training/validation/test
        train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()

        # Save the best validation accuracy and the corresponding test accuracy.
        if best_val_acc < val_acc:
            best_val_acc = val_acc
            best_test_acc = test_acc

        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if e % 5 == 0:
            print('In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(
                e, loss, val_acc, best_val_acc, test_acc, best_test_acc))
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)
train(g, model)

In epoch 0, loss: 1.946, val acc: 0.130 (best 0.130), test acc: 0.110 (best 0.110)
In epoch 5, loss: 1.886, val acc: 0.462 (best 0.462), test acc: 0.487 (best 0.487)
In epoch 10, loss: 1.799, val acc: 0.552 (best 0.552), test acc: 0.570 (best 0.570)
In epoch 15, loss: 1.687, val acc: 0.638 (best 0.638), test acc: 0.643 (best 0.643)
In epoch 20, loss: 1.553, val acc: 0.678 (best 0.678), test acc: 0.691 (best 0.691)
In epoch 25, loss: 1.399, val acc: 0.698 (best 0.698), test acc: 0.713 (best 0.713)
In epoch 30, loss: 1.231, val acc: 0.708 (best 0.708), test acc: 0.727 (best 0.727)
In epoch 35, loss: 1.056, val acc: 0.720 (best 0.720), test acc: 0.732 (best 0.730)
In epoch 40, loss: 0.885, val acc: 0.740 (best 0.740), test acc: 0.746 (best 0.746)
In epoch 45, loss: 0.726, val acc: 0.750 (best 0.750), test acc: 0.760 (best 0.760)
In epoch 50, loss: 0.586, val acc: 0.752 (best 0.756), test acc: 0.759 (best 0.759)
In epoch 55, loss: 0.469, val acc: 0.758 (best 0.758), test acc: 0.770 (best 0

In both example above we use 250 and 100 epochs to train the model, this suggests that 
300 is a good number to use as well [github dmlc](https://github.com/dmlc/dgl/tree/master/examples/pytorch/gat)