#### Node classification with DGL, predicting the category of a Node in a graph
* You will master:
   1. Load a DGL-provided dataset
   2. Build a GNN model with DGL-provided neural network modules
   3. Train and evaluate a GNN model for node classification on GPU

In [2]:
import torch
import dgl
import torch.nn as nn
import torch.nn.functional as F

  from .autonotebook import tqdm as notebook_tqdm


* GNN offers an ooportunity to obtain node representations by combing the connectivity and features of a local neighborhood
* GNN 图神经网络能够通过整合领域节点和相应边的特征的特则来获取中心节点的表达
* The next part will show how to build **a GNN for semi-supervised node classification** with only a small number of labels on the **Cora dataset**, a citation network **with papers as nodes** and **citations as edges**.
* 这里介绍一个Cora图数据集，是一个关于论文引用的图网络数据库，paper作为顶点node，引用代表paper和paper之间的边

* This task is to predict the category of a given paper, **each paper node contains a word count vector as its features**, normalized so that they sum up to one.

In [3]:
import dgl.data

dataset = dgl.data.CoraGraphDataset()
print("Number of categories: ", dataset.num_classes)

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.
Number of categories:  7


* 一个DGL数据集可能会包含一个或者多个图，但是这里的Cora数据集是只包含一张图的

In [4]:
g = dataset[0]
print(g)

Graph(num_nodes=2708, num_edges=10556,
      ndata_schemes={'feat': Scheme(shape=(1433,), dtype=torch.float32), 'label': Scheme(shape=(), dtype=torch.int64), 'test_mask': Scheme(shape=(), dtype=torch.bool), 'train_mask': Scheme(shape=(), dtype=torch.bool), 'val_mask': Scheme(shape=(), dtype=torch.bool)}
      edata_schemes={'__orig__': Scheme(shape=(), dtype=torch.int64)})


* A DGL graph can store node features and edge features in two dictionary-like attributes called ndata and edata.
* The graph contains the following node features:
   1. train_mask
   2. val_mask
   3. test_mask
   4. label
   5. feat

In [5]:
print('Node features')
for k, v in g.ndata.items():
    print(k)
    print(v)

Node features
feat
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0526, 0.0000]])
label
tensor([4, 4, 4,  ..., 4, 3, 3])
test_mask
tensor([ True,  True, False,  ..., False, False, False])
train_mask
tensor([False, False, False,  ..., False, False, False])
val_mask
tensor([False, False,  True,  ..., False, False, False])


In [6]:
print(g.ndata['label'])
print(g.ndata['feat'])
g.ndata['feat'].size()

tensor([4, 4, 4,  ..., 4, 3, 3])
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0526, 0.0000]])


torch.Size([2708, 1433])

* g.ndata['feat'].size() -> [2708, 1433] 这里的feature是一个 2708 * 1433，说明每个顶点的表示维度为1433，即每个顶点通过1433个特征来描述，一共2708个顶点，每个顶点是代表一篇paper

#### Defining a Graph Convolutional Network(GCN)
* In the GCN, **each layer computes new node representations by aggregating neighbor information**.
* 目前常见的类GCN模型，本质上都是在计算每个顶点的编码表示，有个模型会同时计算边的表示，最终就是为了抽取出更富有特征的顶点和边的向量表示，GCN可以看作是一种用于上游任务的编码器Encoder，而Node classification就是下游任务Decoder了。就类似CNN可以提取出更丰富的图像特征从而作为后期分类器的输入一样，本质上都是在做特征提取，获取更具有代表性的编码

* To build a multi-layer GCN you can simply stack dgl.nn.GraphConv modules, which **inherit torch.nn.Module**
* 之前基于PyTorch复现了过GCN相关模型的paper，在类定义和API层面，GCN模型的构建大体上和其他神经网络差不多，都是继承nn.Module，然后写__init__() 和 forward()
* 感觉这个DGL库就是将这些paper模型整合起来了，提供了统一的API

In [7]:
from dgl.nn import GraphConv

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)
    
    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

In [8]:
# Create the model with given dimensions
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)

#### Training the GCN
* DGL provides implementation of many popular neighbor aggregation modules
* DGL 提供了许多顶点领域的聚合模型，每个聚合模型背后就是一篇paper吧哈哈
* Training the GCN is similar to training other PyTorch neural networks

In [9]:
def train(g, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    best_val_acc = 0
    best_test_acc = 0
    
    features = g.ndata['feat']
    labels = g.ndata['label']
    train_mask = g.ndata['train_mask']
    val_mask = g.ndata['val_mask']
    test_mask = g.ndata['test_mask']

    for e in range(100):
        # Forward
        logits = model(g, features)
        
        # Compute prediction
        pred = logits.argmax(1)
        
        # Compute loss
        # 在训练集中计算每个节点的预测损失
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])
        
        # 在训练集测试集验证集上计算accuracy
        train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()
        
        # save the best validation acc and the correspongding test accracy
        if best_val_acc < val_acc:
            best_test_acc = test_acc
            best_val_acc = val_acc
        
        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if e % 5 == 0:
            print("In epoch {}, loss: {:.3f}, val acc: {:.3f}(best: .3f), test acc:{:.3f}(best {:.3f})".format(
                e, loss, val_acc, best_val_acc, test_acc, best_test_acc, 
            ))

In [10]:
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)

In [11]:
train(g, model=model)

In epoch 0, loss: 1.946, val acc: 0.172(best: .3f), test acc:0.172(best 0.160)
In epoch 5, loss: 1.888, val acc: 0.650(best: .3f), test acc:0.650(best 0.677)
In epoch 10, loss: 1.805, val acc: 0.672(best: .3f), test acc:0.672(best 0.674)
In epoch 15, loss: 1.695, val acc: 0.710(best: .3f), test acc:0.710(best 0.701)
In epoch 20, loss: 1.564, val acc: 0.722(best: .3f), test acc:0.722(best 0.729)
In epoch 25, loss: 1.411, val acc: 0.724(best: .3f), test acc:0.724(best 0.732)
In epoch 30, loss: 1.243, val acc: 0.740(best: .3f), test acc:0.740(best 0.726)
In epoch 35, loss: 1.069, val acc: 0.742(best: .3f), test acc:0.746(best 0.727)
In epoch 40, loss: 0.896, val acc: 0.754(best: .3f), test acc:0.754(best 0.736)
In epoch 45, loss: 0.736, val acc: 0.762(best: .3f), test acc:0.762(best 0.751)
In epoch 50, loss: 0.595, val acc: 0.768(best: .3f), test acc:0.772(best 0.756)
In epoch 55, loss: 0.477, val acc: 0.766(best: .3f), test acc:0.772(best 0.764)
In epoch 60, loss: 0.381, val acc: 0.766(b

In [12]:
g = g.to('cuda')
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda')

In [13]:
train(g, model=model)

In epoch 0, loss: 1.945, val acc: 0.190(best: .3f), test acc:0.190(best 0.168)
In epoch 5, loss: 1.880, val acc: 0.594(best: .3f), test acc:0.594(best 0.609)
In epoch 10, loss: 1.792, val acc: 0.646(best: .3f), test acc:0.646(best 0.664)
In epoch 15, loss: 1.683, val acc: 0.678(best: .3f), test acc:0.680(best 0.690)
In epoch 20, loss: 1.552, val acc: 0.714(best: .3f), test acc:0.714(best 0.712)
In epoch 25, loss: 1.403, val acc: 0.726(best: .3f), test acc:0.728(best 0.727)
In epoch 30, loss: 1.241, val acc: 0.732(best: .3f), test acc:0.732(best 0.737)
In epoch 35, loss: 1.073, val acc: 0.734(best: .3f), test acc:0.738(best 0.733)
In epoch 40, loss: 0.906, val acc: 0.734(best: .3f), test acc:0.738(best 0.737)
In epoch 45, loss: 0.751, val acc: 0.742(best: .3f), test acc:0.742(best 0.738)
In epoch 50, loss: 0.613, val acc: 0.744(best: .3f), test acc:0.746(best 0.736)
In epoch 55, loss: 0.495, val acc: 0.750(best: .3f), test acc:0.750(best 0.748)
In epoch 60, loss: 0.399, val acc: 0.756(b