# A Blitz Introduction to DGL - Node Classification (TO BE REFINED)

Goal of this tutorial:

* Train a node classification neural network on a single small graph.

This tutorial assumes that you have experience in building neural networks with PyTorch.  DGL also supports MXNet and Tensorflow whose tutorials are upcoming.

In [1]:
import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F

Using backend: pytorch


## Loading Cora Dataset

The study of community structure in graphs has a long history. Many proposed methods are *unsupervised* (or *self-supervised* by recent definition), where the model predicts the community labels only by connectivity. Recently, [Kipf et al.,](https://arxiv.org/abs/1609.02907) proposed to formulate the community detection problem as a semi-supervised node classification task. With the help of only a small portion of labeled nodes, a graph neural network (GNN) can accurately predict the community labels of the others.

This tutorial will show how to build such a GNN for semi-supervised node classification with only a small number of labels on [Cora dataset](https://docs.dgl.ai/api/python/dgl.data.html#dgl.data.CoraGraphDataset), a citation network with papers as nodes and citations as edges.  The papers contain word count vectorization as features, normalized so that they sum up to 1, as in Section 5.2 in [the paper](https://arxiv.org/abs/1609.02907).

In [2]:
import dgl.data

dataset = dgl.data.CoraGraphDataset()
print('Number of categories:', dataset.num_classes)

Loading from cache failed, re-processing.
Finished data loading and preprocessing.
  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done saving data into cached files.
Number of categories: 7


A [DGL Dataset object](https://docs.dgl.ai/api/python/dgl.data.html) may contain one or multiple graphs.  The Cora dataset used in this tutorial only consists of one single graph.

<div class="alert alert-info">
    <b>Note: </b>For more details in how DGL organizes its Dataset objects, see <a href=https://docs.dgl.ai/guide/data.html>here</a>.
</div>

In [3]:
g = dataset[0]

DGL graphs can store node-wise and edge-wise information in [`ndata`](https://docs.dgl.ai/generated/dgl.DGLGraph.ndata.html#dgl.DGLGraph.ndata) and [`edata`](https://docs.dgl.ai/generated/dgl.DGLGraph.edata.html#dgl.DGLGraph.edata) attribute as dictionaries.  In the DGL Cora dataset, the graph contains:

* `train_mask`: Whether the node is in training set.
* `val_mask`: Whether the node is in validation set.
* `test_mask`: Whether the node is in test set.
* `label`: The ground truth node category.
* `feat`: The node features.

In [4]:
print('Node features')
print(g.ndata)
print('Edge features')
print(g.edata)

Node features
{'train_mask': tensor([ True,  True,  True,  ..., False, False, False]), 'val_mask': tensor([False, False, False,  ..., False, False, False]), 'test_mask': tensor([False, False, False,  ...,  True,  True,  True]), 'label': tensor([3, 4, 4,  ..., 3, 3, 3]), 'feat': tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])}
Edge features
{}


<div class="alert alert-info">
    <b>Note: </b>For loading data from your own dataset, please refer to <a href=2_load_data.ipynb>this tutorial</a>.
</div>

## Define a Graph Convolutional Network (GCN)

This tutorial will build a two-layer Graph Convolutional Network (GCN).  Each of its layer computes new node representations by aggregating neighbor information.  The equations are:

$$
h_v^k\leftarrow \sum_{u\in\mathcal{N}(v)} \dfrac{1}{c_{uv}} \mathbf{W}^k h_u^{k-1}
$$

To build a multi-layer GCN you can simply stack `dgl.nn.GraphConv` modules.

In [5]:
from dgl.nn import GraphConv

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)
    
    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h
    
# Create the model with given dimensions 
# input layer dimension: 5, node embeddings
# hidden layer dimension: 16
# output layer dimension: 2, the two classes, 0 and 1
net = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)

DGL provides implementation of many popular neighbor aggregation modules.  They all can be invoked easily with one line of code.  See the full list of supported [graph convolution modules](https://docs.dgl.ai/api/python/nn.pytorch.html#module-dgl.nn.pytorch.conv).

## Training the GCN

Training GCN on the entire graph is no different from training other PyTorch neural networks.

In [6]:
def train(g, net):
    optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
    all_logits = []
    best_val_acc = 0
    best_test_acc = 0

    features = g.ndata['feat']
    labels = g.ndata['label']
    train_mask = g.ndata['train_mask']
    val_mask = g.ndata['val_mask']
    test_mask = g.ndata['test_mask']
    for e in range(100):
        # Forward
        logits = net(g, features)

        # Compute prediction
        pred = logits.argmax(1)

        # Compute loss
        # Note that we should only compute the losses of the nodes in the training set,
        # i.e. with train_mask 1.
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])

        # Compute accuracy on training/validation/test
        train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()

        # Save the best validation accuracy and the corresponding test accuracy.
        if best_val_acc < val_acc:
            best_val_acc = val_acc
            best_test_acc = test_acc

        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        all_logits.append(logits.detach())

        if e % 5 == 0:
            print('In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(
                e, loss, val_acc, best_val_acc, test_acc, best_test_acc))
net = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)
train(g, net)

In epoch 0, loss: 1.946, val acc: 0.278 (best 0.278), test acc: 0.290 (best 0.290)
In epoch 5, loss: 1.888, val acc: 0.596 (best 0.596), test acc: 0.580 (best 0.580)
In epoch 10, loss: 1.805, val acc: 0.638 (best 0.638), test acc: 0.658 (best 0.658)
In epoch 15, loss: 1.699, val acc: 0.660 (best 0.660), test acc: 0.692 (best 0.692)
In epoch 20, loss: 1.572, val acc: 0.688 (best 0.688), test acc: 0.701 (best 0.701)
In epoch 25, loss: 1.426, val acc: 0.694 (best 0.694), test acc: 0.710 (best 0.708)
In epoch 30, loss: 1.266, val acc: 0.710 (best 0.710), test acc: 0.721 (best 0.721)
In epoch 35, loss: 1.100, val acc: 0.714 (best 0.714), test acc: 0.725 (best 0.725)
In epoch 40, loss: 0.936, val acc: 0.714 (best 0.716), test acc: 0.729 (best 0.725)
In epoch 45, loss: 0.781, val acc: 0.722 (best 0.722), test acc: 0.736 (best 0.736)
In epoch 50, loss: 0.641, val acc: 0.742 (best 0.742), test acc: 0.736 (best 0.736)
In epoch 55, loss: 0.522, val acc: 0.752 (best 0.752), test acc: 0.741 (best 0

## Training on GPU

Training on GPU requires to put both the model and the graph onto GPU with the [`to`](https://docs.dgl.ai/generated/dgl.DGLGraph.to.html#dgl.DGLGraph.to) method.

In [7]:
g = g.to('cuda')
net = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda')
train(g, net)

In epoch 0, loss: 1.947, val acc: 0.104 (best 0.104), test acc: 0.105 (best 0.105)
In epoch 5, loss: 1.913, val acc: 0.530 (best 0.530), test acc: 0.509 (best 0.509)
In epoch 10, loss: 1.846, val acc: 0.570 (best 0.628), test acc: 0.567 (best 0.629)
In epoch 15, loss: 1.754, val acc: 0.664 (best 0.664), test acc: 0.639 (best 0.639)
In epoch 20, loss: 1.633, val acc: 0.700 (best 0.700), test acc: 0.716 (best 0.716)
In epoch 25, loss: 1.488, val acc: 0.698 (best 0.702), test acc: 0.703 (best 0.711)
In epoch 30, loss: 1.324, val acc: 0.702 (best 0.702), test acc: 0.705 (best 0.711)
In epoch 35, loss: 1.150, val acc: 0.714 (best 0.714), test acc: 0.723 (best 0.719)
In epoch 40, loss: 0.975, val acc: 0.718 (best 0.718), test acc: 0.736 (best 0.730)
In epoch 45, loss: 0.809, val acc: 0.728 (best 0.728), test acc: 0.743 (best 0.743)
In epoch 50, loss: 0.661, val acc: 0.736 (best 0.736), test acc: 0.755 (best 0.755)
In epoch 55, loss: 0.535, val acc: 0.754 (best 0.754), test acc: 0.761 (best 0

## What's next?

* [Load and process your own graph data](2_load_data.ipynb).
* [Write your own GNN module](3_message_passing.ipynb).
* [Link prediction (predicting existence of edges) on full graph](4_link_predict.ipynb).
* [Graph classification (TODO)](5_graph_classification.ipynb).
* If you wish to scale your model to a large graph, please begin with the tutorial [Stochastic Training of GNN for Node Classification on Large Graphs](L1_large_node_classification.ipynb).
* If you have heterogeneous graphs, please begin with the tutorial [Node classification on heterogeneous graphs (TODO)](H1_node_classification.ipynb).
* [Categorization of DGL examples](I1_examples.ipynb)
* [More tutorials](I2_tutorials.ipynb)

### Advanced topics

* [Running with multiple GPUs (TODO)]()
* [Running with distributed training (TODO)]()