<a href="https://colab.research.google.com/github/chris-kehl/graph_neural_network_playground/blob/main/dgl_Graph_neural_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook is the basics of GNN's:
Load a DGL provided dataset
Build a GNN model with DGL provided neural network
Train and evaluate a GNN
Predict the category of a node in a graph
This GNN assumes that you know the basics of pytorch

In [1]:
# import all the necessary tools
# first conda or pip install dgl
# pytorch  1.10.0. gave me errors so i reverted to 1.9.1
!pip install dgl==0.6.1
!pip install torch==1.9.1

import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F



Using backend: pytorch


The node classification that we are building will consist of a semisupervised GNN for a small number of labels on a Cora dataset. This is a citation network with papers as nodes and citations as edges. The task will be to predict the category of a given paper. The nodes of each paper contains word count vectors as features. The words are normalized so that they ass up to one per the paper found at this site https://arxiv.org/abs/1609.02907 section 5.2

In [2]:
# import dgl.data and get the built in dataset CoraGraphDataset
import dgl.data

dataset = dgl.data.CoraGraphDataset()
print('Number of categories:', dataset.num_classes)

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.
Number of categories: 7


In [3]:
# the Cora dataset only contains one graph
g = dataset[0]

In [4]:
# a dgl graph can store node features and edge features
# in tow dictionary-like attributes ndata and edata
print('Node features')
print(g.ndata)
print('Edge features')
print(g.edata)

Node features
{'feat': tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]]), 'label': tensor([3, 4, 4,  ..., 3, 3, 3]), 'test_mask': tensor([False, False, False,  ...,  True,  True,  True]), 'train_mask': tensor([ True,  True,  True,  ..., False, False, False]), 'val_mask': tensor([False, False, False,  ..., False, False, False])}
Edge features
{}


**Defining a graph convolutional network**

We will build a two-layer graph convolutional network(GCN)
How this works is to stack dgl.nn.GraphConv, which are
inherited by the torch.nn.module


In [5]:
from dgl.nn import GraphConv

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

# Create the model with given dimensions
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)

In [8]:
# Train the Graph Neural Network
def train(g, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    best_val_acc = 0
    best_test_acc = 0

    features = g.ndata['feat']
    labels = g.ndata['label']
    train_mask = g.ndata['train_mask']
    val_mask = g.ndata['val_mask']
    test_mask = g.ndata['test_mask']
    for e in range(100):
        # Forward
        logits = model(g, features)

        # Compute predictions
        pred = logits.argmax(1)

        # Compute loss
        # Note: Compute the losses only in the training set
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])

        # Compute accuracy on training/validation/test
        train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()

        # Save the best validation accuracy and the corresponding test acuracy.
        if best_val_acc < val_acc:
            best_val_acc = val_acc
            best_test_acc = test_acc
        
        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if e % 5 == 0:
            print('In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}, test acc: {:.3f} (best {:.3f})'.format(
                e, loss, val_acc, best_val_acc, test_acc, best_val_acc))

model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)
train(g, model)


In epoch 0, loss: 1.945, val acc: 0.206 (best 0.206, test acc: 0.227 (best 0.206)
In epoch 5, loss: 1.890, val acc: 0.652 (best 0.656, test acc: 0.664 (best 0.656)
In epoch 10, loss: 1.808, val acc: 0.702 (best 0.706, test acc: 0.696 (best 0.706)
In epoch 15, loss: 1.701, val acc: 0.676 (best 0.706, test acc: 0.687 (best 0.706)
In epoch 20, loss: 1.569, val acc: 0.718 (best 0.718, test acc: 0.708 (best 0.718)
In epoch 25, loss: 1.415, val acc: 0.708 (best 0.718, test acc: 0.711 (best 0.718)
In epoch 30, loss: 1.243, val acc: 0.720 (best 0.720, test acc: 0.720 (best 0.720)
In epoch 35, loss: 1.064, val acc: 0.730 (best 0.730, test acc: 0.730 (best 0.730)
In epoch 40, loss: 0.886, val acc: 0.752 (best 0.752, test acc: 0.744 (best 0.752)
In epoch 45, loss: 0.722, val acc: 0.760 (best 0.760, test acc: 0.753 (best 0.760)
In epoch 50, loss: 0.578, val acc: 0.764 (best 0.764, test acc: 0.759 (best 0.764)
In epoch 55, loss: 0.458, val acc: 0.768 (best 0.768, test acc: 0.775 (best 0.768)
In epo