In this work, you are required to build a GNN training pipline. Then you can truly use the Graph Neural Network.

First, we need to download the dataset and load data.

In [1]:
import torch_geometric.transforms as T
from torch_geometric.datasets import Planetoid
dataset = Planetoid("./", "Cora", transform=T.NormalizeFeatures())


  return numba.jit(cache=True)(func)


Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


AttributeError: 'Data' object has no attribute 'edge_weight'

In [3]:
print(data)

Data(edge_index=[2, 10556], test_mask=[2708], train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])


In [10]:
data = dataset[0]

x = data.x
edge_index = data.edge_index


Then, you need to implement a GNN model. You may copy the GCNConv from your work two weeks ago, and build the model with the convolution layers.

In [11]:
from torch_geometric.nn import MessagePassing
import torch.nn as nn
class PyG_GCNConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(PyG_GCNConv, self).__init__(aggr='add')  # "Add" aggregation.
        self.lin = nn.Linear(in_channels, out_channels)

    def forward(self, x, edge_index, edge_weight=None):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]
        
        # Step 1: Linearly transform node feature matrix.
        x = self.lin(x)

        # Step 2: Normalize edge weights.
        if edge_weight is None:
            edge_weight = torch.ones((edge_index.size(1), ), dtype=x.dtype, device=edge_index.device)
        row, col = edge_index
        deg = torch.bincount(row, weights=edge_weight, minlength=x.size(0))
        deg_inv_sqrt = deg.pow(-0.5)
        deg_inv_sqrt[torch.isinf(deg_inv_sqrt)] = 0
        norm = deg_inv_sqrt[row] * edge_weight * deg_inv_sqrt[col]

        # Step 3: Start propagating messages.
        return self.propagate(edge_index, x=x, norm=norm)

    def message(self, x_j, norm):
        # x_j has shape [E, out_channels]
        return norm.view(-1, 1) * x_j

class PyG_GCN(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(PyG_GCN, self).__init__()
        self.conv1 = PyG_GCNConv(in_channels, hidden_channels)
        self.conv2 = PyG_GCNConv(hidden_channels, out_channels)

    def forward(self, x, edge_index, edge_weight=None):
        x = self.conv1(x, edge_index, edge_weight)
        x = F.relu(x)
        x = self.conv2(x, edge_index, edge_weight)
        return F.log_softmax(x, dim=1)

Building the training and evaluation part, this is similar to the work in week4. Our downstream task is just node classification.

In [14]:
import torch
import torch.nn.functional as F
import torch_geometric.transforms as T
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import MessagePassing
import torch.nn as nn
# Build your training pipeline
hidden_dim = 16
lr = 0.001
epochs = 100
model = PyG_GCN(dataset.num_features, hidden_dim, dataset.num_classes)
optimizer = torch.optim.Adam(
    model.parameters(), lr=lr, weight_decay=5e-4)

def train():
    model.train()
    optimizer.zero_grad()
    out = model(data.x, data.edge_index)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss.item()
@torch.no_grad()
def test():
    model.eval()
    pred = model(data.x, data.edge_index).argmax(dim=-1)

    accs = []
    for mask in [data.train_mask, data.val_mask, data.test_mask]:
        accs.append(int((pred[mask] == data.y[mask]).sum()) / int(mask.sum()))
    return accs

best_val_acc = 0
test_acc = 0
for epoch in range(1, epochs + 1):
    loss = train()
    train_acc, val_acc, tmp_test_acc = test()
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        test_acc = tmp_test_acc
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Train: {train_acc:.4f}, Val: {val_acc:.4f}, Test: {test_acc:.4f}')

Epoch: 001, Loss: 1.9551, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 002, Loss: 1.9546, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 003, Loss: 1.9541, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 004, Loss: 1.9535, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 005, Loss: 1.9530, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 006, Loss: 1.9525, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 007, Loss: 1.9519, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 008, Loss: 1.9514, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 009, Loss: 1.9508, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 010, Loss: 1.9502, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 011, Loss: 1.9497, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 012, Loss: 1.9491, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 013, Loss: 1.9485, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 014, Loss: 1.9479, Train: 0.1429, Val: 0.1140, Test: 0.1030
Epoch: 015, Loss: 1.9473, Train: 0.1429, Val: 0.1140, Test: 0.

Now, you can train the GCN model with PyG. Next, you may try using the DGL to implement the similiar function.

In [19]:
import argparse

import dgl
import dgl.nn as dglnn

import torch
import torch.nn as nn
import torch.nn.functional as F


from dgl.data import CoraGraphDataset

try:
    from dgl.transforms import AddSelfLoop
    transform = AddSelfLoop()
    data = CoraGraphDataset(transform=transform)
except ImportError:
    data = CoraGraphDataset()
    g = data[0]
    g = dgl.add_self_loop(g)
g = data[0]
features = g.ndata["feat"]
labels = g.ndata["label"]
masks = g.ndata["train_mask"], g.ndata["val_mask"], g.ndata["test_mask"]


class DGL_GCNConv(nn.Module):
    def __init__(self, in_feats, out_feats):
        super(DGL_GCNConv, self).__init__()
        self.conv = dglnn.GraphConv(in_feats, out_feats, activation=F.relu)

    def forward(self, g, features):
        h = self.conv(g, features)
        return h

class DGL_GCN(nn.Module):
    def __init__(self, in_feats, hidden_feats, out_feats):
        super(DGL_GCN, self).__init__()
        self.layer1 = DGL_GCNConv(in_feats, hidden_feats)
        self.layer2 = DGL_GCNConv(hidden_feats, out_feats)

    def forward(self, g, features):
        h = self.layer1(g, features)
        h = self.layer2(g, h)
        return h
    
def train(g, features, labels, masks, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
    train_mask, val_mask, test_mask = masks
    best_val_acc = 0
    best_model = None

    for epoch in range(200):
        model.train()
        logits = model(g, features)
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        train_acc = evaluate(g, features, labels, train_mask, model)
        val_acc = evaluate(g, features, labels, val_mask, model)
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_model = model.state_dict()

        print(f'Epoch {epoch:03d} | Loss {loss.item():.4f} | Train Acc {train_acc:.4f} | Val Acc {val_acc:.4f}')

    print("Training finished.")
    # Load the best model
    model.load_state_dict(best_model)
def evaluate(g, features, labels, mask, model):
    model.eval()
    with torch.no_grad():
        logits = model(g, features)
        logits = logits[mask]
        labels = labels[mask]
        _, indices = torch.max(logits, dim=1)
        correct = torch.sum(indices == labels)
        return correct.item() * 1.0 / len(labels)



Downloading C:\Users\lhb\.dgl\cora_v2.zip from https://data.dgl.ai/dataset/cora_v2.zip...
Extracting file to C:\Users\lhb\.dgl\cora_v2
Finished data loading and preprocessing.
  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done saving data into cached files.


TypeError: __init__() missing 1 required positional argument: 'out_feats'

In [20]:
model = DGL_GCN(features.shape[1], 16,len(torch.unique(labels)))
print("Training...")
train(g, features, labels, masks, model)

# test the model
print("Testing...")
acc = evaluate(g, features, labels, masks[2], model)
print("Test accuracy {:.4f}".format(acc))

Training...
Epoch 000 | Loss 1.9461 | Train Acc 0.2643 | Val Acc 0.1660
Epoch 001 | Loss 1.9440 | Train Acc 0.3857 | Val Acc 0.2620
Epoch 002 | Loss 1.9403 | Train Acc 0.4286 | Val Acc 0.2700
Epoch 003 | Loss 1.9354 | Train Acc 0.3857 | Val Acc 0.2700
Epoch 004 | Loss 1.9304 | Train Acc 0.3786 | Val Acc 0.2580
Epoch 005 | Loss 1.9245 | Train Acc 0.4000 | Val Acc 0.2520
Epoch 006 | Loss 1.9184 | Train Acc 0.4071 | Val Acc 0.2820
Epoch 007 | Loss 1.9119 | Train Acc 0.5000 | Val Acc 0.3280
Epoch 008 | Loss 1.9048 | Train Acc 0.5643 | Val Acc 0.3940
Epoch 009 | Loss 1.8973 | Train Acc 0.6429 | Val Acc 0.4460
Epoch 010 | Loss 1.8894 | Train Acc 0.6643 | Val Acc 0.4980
Epoch 011 | Loss 1.8812 | Train Acc 0.6571 | Val Acc 0.5200
Epoch 012 | Loss 1.8726 | Train Acc 0.6643 | Val Acc 0.5360
Epoch 013 | Loss 1.8635 | Train Acc 0.6643 | Val Acc 0.5420
Epoch 014 | Loss 1.8539 | Train Acc 0.6571 | Val Acc 0.5380
Epoch 015 | Loss 1.8441 | Train Acc 0.6571 | Val Acc 0.5380
Epoch 016 | Loss 1.8339 | Tr

Epoch 137 | Loss 0.8489 | Train Acc 0.7500 | Val Acc 0.6180
Epoch 138 | Loss 0.8472 | Train Acc 0.7429 | Val Acc 0.6180
Epoch 139 | Loss 0.8453 | Train Acc 0.7429 | Val Acc 0.6180
Epoch 140 | Loss 0.8436 | Train Acc 0.7214 | Val Acc 0.6140
Epoch 141 | Loss 0.8421 | Train Acc 0.7286 | Val Acc 0.6180
Epoch 142 | Loss 0.8403 | Train Acc 0.7357 | Val Acc 0.6220
Epoch 143 | Loss 0.8386 | Train Acc 0.7214 | Val Acc 0.6180
Epoch 144 | Loss 0.8370 | Train Acc 0.7286 | Val Acc 0.6160
Epoch 145 | Loss 0.8355 | Train Acc 0.7500 | Val Acc 0.6200
Epoch 146 | Loss 0.8337 | Train Acc 0.7429 | Val Acc 0.6200
Epoch 147 | Loss 0.8324 | Train Acc 0.7500 | Val Acc 0.6200
Epoch 148 | Loss 0.8306 | Train Acc 0.7357 | Val Acc 0.6180
Epoch 149 | Loss 0.8293 | Train Acc 0.7357 | Val Acc 0.6180
Epoch 150 | Loss 0.8277 | Train Acc 0.7429 | Val Acc 0.6200
Epoch 151 | Loss 0.8261 | Train Acc 0.7429 | Val Acc 0.6200
Epoch 152 | Loss 0.8246 | Train Acc 0.7429 | Val Acc 0.6140
Epoch 153 | Loss 0.8232 | Train Acc 0.73

If you find it hard to implement, you may refer to the official implementation of the GNN training, like [PyG](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/gcn.py) and [DGL](https://github.com/dmlc/dgl/blob/master/examples/pytorch/gcn/train.py).