

**CogDL Notebook**
created by CogDL Team
[cogdlteam@gmail.com]

This notebook shows how to write your first GCN model. 

CogDL Link: https://github.com/THUDM/CogDL

Colab Link: https://colab.research.google.com/drive/1V47IIanXxDxi0Qsd6feOvvyYuqXcFP6P?usp=sharing


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

**第一部分：手动模拟GCN的计算和训练过程。**

---



1. 根据初始的邻接矩阵A得到正则化后的邻接矩阵normA。


In [None]:
A = torch.tensor([[0, 1, 1, 1], [1, 0, 1, 0], [1, 1, 0, 1], [1, 0, 1, 0]])
A = A + torch.eye(4)
print("A=", A)
# 计算度数矩阵D，并对A进行正则化得到normA
D = torch.diag(A.sum(1))
D_hat = torch.diag(1.0 / torch.sqrt(A.sum(1)))
normA = torch.mm(torch.mm(D_hat, A), D_hat)
print("normA=", normA)

A= tensor([[1., 1., 1., 1.],
        [1., 1., 1., 0.],
        [1., 1., 1., 1.],
        [1., 0., 1., 1.]])
normA= tensor([[0.2500, 0.2887, 0.2500, 0.2887],
        [0.2887, 0.3333, 0.2887, 0.0000],
        [0.2500, 0.2887, 0.2500, 0.2887],
        [0.2887, 0.0000, 0.2887, 0.3333]])


2. 根据初始特征X，模型参数W1，邻接矩阵normA来计算第一层的输出H1。

In [None]:
H0 = X = torch.FloatTensor([[1,0], [0,1], [1,0], [1,1]])
W1 = torch.tensor([[1, -0.5], [0.5, 1]], requires_grad=True)
# 通过normA/H0/W1计算得到H1
H1 = F.relu(torch.mm(normA, torch.mm(H0, W1)))
print(H1)

tensor([[1.0774, 0.1830],
        [0.7440, 0.0447],
        [1.0774, 0.1830],
        [1.0774, 0.0000]], grad_fn=<ReluBackward0>)


3. 计算第二层的输出H2和最后的输出Z。

In [None]:
W2 = torch.tensor([[0.5, -0.5], [1, 0.5]], requires_grad=True)
# 通过normA/H1/W2计算得到H2和Z
H2 = torch.mm(normA, torch.mm(H1, W2))
print("H2=", H2)
Z = F.softmax(H2, dim=-1)
print("Z=", Z)

H2= tensor([[ 0.6366, -0.4800],
        [ 0.5556, -0.3747],
        [ 0.6366, -0.4800],
        [ 0.5962, -0.4377]], grad_fn=<MmBackward>)
Z= tensor([[0.7534, 0.2466],
        [0.7171, 0.2829],
        [0.7534, 0.2466],
        [0.7377, 0.2623]], grad_fn=<SoftmaxBackward>)


4. 计算损失函数loss。

In [None]:
Y = torch.LongTensor([0, 1, 0, 0])
# 根据输出Z和标签Y来计算最后的loss
loss = F.nll_loss(Z.log(), Y)
print(loss.item())

0.5333564281463623


5. 通过loss进行反向传播。可以看到模型参数W1/W2的梯度值。

In [None]:
loss.backward(retain_graph=True)
print(W1)
print(W1.grad)
print(W2)
print(W2.grad)

tensor([[ 1.0000, -0.5000],
        [ 0.5000,  1.0000]], requires_grad=True)
tensor([[-0.0352,  0.0085],
        [-0.0088,  0.0052]])
tensor([[ 0.5000, -0.5000],
        [ 1.0000,  0.5000]], requires_grad=True)
tensor([[-0.0396,  0.0396],
        [ 0.0018, -0.0018]])


**第二部分：使用你实现的GCN模型来运行cora数据集**

---



1. 通过pip install来安装cogdl。

In [None]:
!pip install cogdl

Collecting cogdl
  Downloading cogdl-0.4.0-py3-none-any.whl (324 kB)
[?25l[K     |█                               | 10 kB 37.5 MB/s eta 0:00:01[K     |██                              | 20 kB 33.0 MB/s eta 0:00:01[K     |███                             | 30 kB 19.4 MB/s eta 0:00:01[K     |████                            | 40 kB 16.3 MB/s eta 0:00:01[K     |█████                           | 51 kB 8.8 MB/s eta 0:00:01[K     |██████                          | 61 kB 9.2 MB/s eta 0:00:01[K     |███████                         | 71 kB 9.0 MB/s eta 0:00:01[K     |████████                        | 81 kB 10.1 MB/s eta 0:00:01[K     |█████████                       | 92 kB 10.2 MB/s eta 0:00:01[K     |██████████                      | 102 kB 8.4 MB/s eta 0:00:01[K     |███████████                     | 112 kB 8.4 MB/s eta 0:00:01[K     |████████████                    | 122 kB 8.4 MB/s eta 0:00:01[K     |█████████████▏                  | 133 kB 8.4 MB/s eta 0:00:01[K 

2. 从cogdl中加载cora数据集（x表示特征，y表示标签，mask表示训练/验证/测试集的划分）

In [None]:
from cogdl.datasets import build_dataset_from_name

dataset = build_dataset_from_name("cora")
data = dataset[0]
print(data)
n = data.x.shape[0]
edge_index = torch.stack(data.edge_index)
A = torch.sparse_coo_tensor(edge_index, torch.ones(edge_index.shape[1]), (n, n)).to_dense()

Failed to load C version of sampling, use python version instead.
Downloading https://cloud.tsinghua.edu.cn/d/6808093f7f8042bfa1f0/files/?p=%2Fcora.zip&dl=1
unpacking cora.zip
Processing...
Done!
Graph(x=[2708, 1433], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708], edge_index=[2, 10184])


3. 使用你实现的GCN模型进行训练（在GCN模型的forward中填入你在第一部分中写的代码）

In [None]:
import math
import copy
from tqdm import tqdm

def accuracy(y_pred, y_true):
    y_true = y_true.squeeze().long()
    preds = y_pred.max(1)[1].type_as(y_true)
    correct = preds.eq(y_true).double()
    correct = correct.sum().item()
    return correct / len(y_true)

class GCN(nn.Module):

    def __init__(
        self,
        in_feats,
        hidden_size,
        out_feats,
    ):
        super(GCN, self).__init__()
        self.out_feats = out_feats
        self.W1 = nn.Parameter(torch.FloatTensor(in_feats, hidden_size))
        self.W2 = nn.Parameter(torch.FloatTensor(hidden_size, out_feats))
        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1.0 / math.sqrt(self.out_feats)
        torch.nn.init.uniform_(self.W1, -stdv, stdv)
        torch.nn.init.uniform_(self.W2, -stdv, stdv)

    def forward(self, A, X):
        n = X.shape[0]
        A = A + torch.eye(n, device=X.device)
        # 依次计算normA/H1/H2，然后返回H2。注意：此处不需要计算Z，因为通常直接根据H2和Y来计算loss。
        # 注意使用self.W1/W2来调用模型参数。
        D_hat = torch.diag(1.0 / torch.sqrt(A.sum(1)))
        normA = torch.mm(torch.mm(D_hat, A), D_hat)
        H1 = F.relu(torch.mm(normA, torch.mm(X, self.W1)))
        H2 = torch.mm(normA, torch.mm(H1, self.W2))

        return H2


hidden_size = 64
model = GCN(data.x.shape[1], hidden_size, data.y.max() + 1)

if torch.cuda.is_available():
    device = torch.device("cuda")
    model = model.to(device)
    A = A.to(device)
    data.apply(lambda x: x.to(device))

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
epoch_iter = tqdm(range(100), position=0, leave=True)
best_model = None
best_loss = 1e8
for epoch in epoch_iter:
    model.train()
    optimizer.zero_grad()
    logits = model(A, data.x)
    loss = F.cross_entropy(logits[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    train_loss = loss.item()

    model.eval()
    with torch.no_grad():
        logits = model(A, data.x)
        val_loss = F.cross_entropy(logits[data.val_mask], data.y[data.val_mask]).item()
        val_acc = accuracy(logits[data.val_mask], data.y[data.val_mask])
        if val_loss < best_loss:
            best_loss = val_loss
            best_model = copy.deepcopy(model)

    epoch_iter.set_description(f"Epoch: {epoch:03d}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")

with torch.no_grad():
    logits = best_model(A, data.x)
    val_acc = accuracy(logits[data.val_mask], data.y[data.val_mask])
    test_acc = accuracy(logits[data.test_mask], data.y[data.test_mask])
print("Val Acc", val_acc)
print("Test Acc", test_acc)

Epoch: 099, Train Loss: 0.0112, Val Loss: 0.7426, Val Acc: 0.7820: 100%|██████████| 100/100 [00:03<00:00, 26.00it/s]


Val Acc 0.786
Test Acc 0.79


4. 调用cogdl的GCN模型来运行cora数据集，观察两者的区别（包括Acc和训练时间）

In [None]:
from cogdl import experiment

experiment(task="node_classification", dataset="cora", model="gcn", max_epoch=100)

Epoch: 008, Train: 0.9571, Val: 0.7400, ValLoss: 1.8320:   3%|▎         | 3/100 [00:00<00:03, 28.81it/s]

Namespace(activation='relu', checkpoint=None, cpu=False, dataset='cora', device_id=[0], dropout=0.5, fast_spmm=False, hidden_size=64, inference=False, lr=0.01, max_epoch=100, missing_rate=0, model='gcn', norm=None, num_classes=None, num_features=None, num_layers=2, patience=100, residual=False, save_dir='.', save_model=None, seed=1, task='node_classification', trainer=None, use_best_config=False, weight_decay=0.0005)


Epoch: 099, Train: 1.0000, Val: 0.7880, ValLoss: 0.7775: 100%|██████████| 100/100 [00:00<00:00, 112.34it/s]


Valid accurracy =  0.7880
Test accuracy = 0.8090
| Variant         | Acc           | ValAcc        |
|-----------------|---------------|---------------|
| ('cora', 'gcn') | 0.8090±0.0000 | 0.7880±0.0000 |


defaultdict(list, {('cora', 'gcn'): [{'Acc': 0.809, 'ValAcc': 0.788}]})