# Semi-supervised node classification using Heterogenous Graph Neural Networks

In this tutorial, you will learn:

* Build a relational graph neural network model proposed by [Schlichtkrull et al.](https://arxiv.org/abs/1703.06103)
* Train the model and understand the result.

In [1]:
import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F
import itertools
import numpy as np
import scipy.sparse as sp

Using backend: pytorch


## Problem formulation

- Given the graph structure, node features, and node labels on a subset of nodes of a certain type
- Predict the labels on the rest of the nodes of the labeled type

In [2]:
# We first load the graph and node labels as is covered in the last session.

from dgl.data.rdf import AIFBDataset

dataset = AIFBDataset()
g = dataset[0]

category = dataset.predict_category
num_classes = dataset.num_classes

# obtain the training testing splits stored as graph node attributes
train_mask = g.nodes[category].data.pop('train_mask')
test_mask = g.nodes[category].data.pop('test_mask')
train_idx = torch.nonzero(train_mask, as_tuple=False).squeeze()
test_idx = torch.nonzero(test_mask, as_tuple=False).squeeze()
labels = g.nodes[category].data.pop('labels')

# split dataset into train, validate, test
val_idx = train_idx[:len(train_idx) // 5]
train_idx = train_idx[len(train_idx) // 5:]

# check cuda
device = "cuda" if torch.cuda.is_available() else "cpu"
g = g.to(device)
labels = labels.to(device)
train_idx = train_idx.to(device)
test_idx = test_idx.to(device)

Done loading data from cached files.


## Heterogenous models

- Heterogenous graphs have multiple edge types
- Messages arrive to nodes from different edge type
- One class of models can be defined by selecting how to aggregate messages per edge-type


### Relational GCN model

- Relational GCN sums the messages per each relation type
- Neighborhood per relation


$$
 h_i^{(l+1)} = \sigma \big(\sum_r \sum_{j\in\mathcal{N}_{(i)}^r}\frac{1}{c_{i,r}}h_j^{(l)}W_r^{(l)}\big)
$$


- The HeteroRGCNLayer is used to implement the previous equation.
- It takes in a dictionary of node types and node feature tensors as input, and returns another dictionary of node types and node features.
- For a graph with R relations it uses
 - R message passing functions
 - R aggregation functions
 - A single function to aggregate the messages across relations.

In [3]:
class HeteroRGCNLayer(nn.Module):
    def __init__(self, in_size, out_size, etypes):
        super(HeteroRGCNLayer, self).__init__()
        # W_r for each relation
        self.weight = nn.ModuleDict({
                name: nn.Linear(in_size, out_size) for name in etypes
            })

    def forward(self, G, feat_dict):
        # The input is a dictionary of node features for each type
        funcs = {}
        for srctype, etype, dsttype in G.canonical_etypes:
            # Compute W_r * h
            if srctype in feat_dict:
                Wh = self.weight[etype](feat_dict[srctype])
                # Save it in graph for message passing
                G.nodes[srctype].data['Wh_%s' % etype] = Wh
                # Specify per-relation message passing functions: (message_func, reduce_func).
                # Note that the results are saved to the same destination feature 'h', which
                # hints the type wise reducer for aggregation.
                funcs[etype] = (fn.copy_u('Wh_%s' % etype, 'm'), fn.mean('m', 'h'))
        # Trigger message passing of multiple types.
        # The first argument is the message passing functions for each relation.
        # The second one is the type wise reducer, could be "sum", "max",
        # "min", "mean", "stack"
        G.multi_update_all(funcs, 'sum')
        # return the updated node feature dictionary
        return {ntype: G.dstnodes[ntype].data['h'] for ntype in G.ntypes if 'h' in G.dstnodes[ntype].data}

### Define a HeteroGraphConv model

- HeteroGraphConv is a encapsulation to run DGL NN module on heterogeneous graphs.
 - $f_r(\cdot,\cdot)$: A DGL NN module has to defined per relation 𝑟, e.g., GraphConv()
 - A DGL NN module corresponds to a pair of message passing and aggregation functions
 - $G(\cdot)$: A reduction function to merge the results on the same node type from multiple relations, e.g., $\sum$.  
 - $g_r$: Graph per relation $r$
$$
h_{x}^{(l+1)} = \underset{r\in\mathcal{R}, r_{dst}=x}
{G} (f_r(g_r, h_{r_{src}}^l, h_{r_{dst}}^l))$$

- RGCN implementation is abstracted by the HeteroGraphConv model. (Exercise: Map the function above to RGCN)

See also the [link](https://docs.dgl.ai/guide/nn-heterograph.html?highlight=heterogenous%20graphs).


In [4]:
# ----------- 2. create model -------------- #
# build a two-layer RGCN model
import dgl.nn as dglnn

class RGCN(nn.Module):
    def __init__(self, in_feats, hid_feats, out_feats, rel_names):
        super().__init__()

        self.conv1 = dglnn.HeteroGraphConv({
            rel: dglnn.GraphConv(in_feats, hid_feats)
            for rel in rel_names}, aggregate='sum')
        self.conv2 = dglnn.HeteroGraphConv({
            rel: dglnn.GraphConv(hid_feats, out_feats)
            for rel in rel_names}, aggregate='sum')

    def forward(self, graph, inputs):
        # inputs are features of nodes
        h = self.conv1(graph, inputs)
        h = {k: F.relu(v) for k, v in h.items()}
        h = self.conv2(graph, h)
        return h
    

### Flexibility of HeteroGraphConv

- Performs a separate graph convolution on each edge type
- Sums the message aggregations on each edge type as the final result for all node types.
- By replacing the GraphConv with GraphAtt we get a different model.


### Node embedding layer for heterogenous graphs

- Since AIFB does not have node feature we will use learnable embeddings.
- In heterogenous graphs a dictionary of embeddings is used.
- The embeddings will be updated on training.

In [5]:
class NodeEmbed(nn.Module):
    def __init__(self, num_nodes, embed_size,decice):
        super(NodeEmbed, self).__init__()
        self.embed_size = embed_size
        self.node_embeds = nn.ModuleDict()
        self.device=device
        self.num_nodes=num_nodes
        for ntype in num_nodes:
            node_embed = torch.nn.Embedding(num_nodes[ntype], self.embed_size)
            nn.init.uniform_(node_embed.weight, -1.0, 1.0)
            self.node_embeds[str(ntype)] = node_embed
    
    def forward(self):
        embeds = {}
        num_nodes=self.num_nodes
        for ntype in num_nodes:
            embeds[ntype] = self.node_embeds[ntype](torch.tensor(list(range(num_nodes[ntype]))).to(self.device))
        return embeds
    

In [6]:
num_nodes = {ntype: g.number_of_nodes(ntype) for ntype in g.ntypes}

h_hidden=16
embed = NodeEmbed(num_nodes, h_hidden,device).to(device)
model = RGCN(h_hidden, h_hidden, num_classes,g.etypes).to(device)


In [7]:
# ----------- 3. set up optimizer -------------- #

optimizer = torch.optim.Adam(itertools.chain(model.parameters(), embed.parameters()), lr=0.01)

# ----------- 4. training -------------------------------- #
all_logits = []
for e in range(50):
    # forward
    embeds = embed()
    logits= model(g,embeds)[category]
    
    # compute loss
    loss = F.cross_entropy(logits[train_idx], labels[train_idx])
    
    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    all_logits.append(logits.detach())
    
    if e % 5 == 0:
        train_acc = torch.sum(logits[train_idx].argmax(dim=1) == labels[train_idx]).item() / len(train_idx)
        val_loss = F.cross_entropy(logits[val_idx], labels[val_idx])
        val_acc = torch.sum(logits[val_idx].argmax(dim=1) == labels[val_idx]).item() / len(val_idx)
        print("Epoch {:05d} | Train Acc: {:.4f} | Train Loss: {:.4f} | Valid Acc: {:.4f} | Valid loss: {:.4f}".
              format(e, train_acc, loss.item(), val_acc, val_loss.item()))

Epoch 00000 | Train Acc: 0.1161 | Train Loss: 4.6996 | Valid Acc: 0.1429 | Valid loss: 2.8667
Epoch 00005 | Train Acc: 0.8929 | Train Loss: 0.4026 | Valid Acc: 0.5000 | Valid loss: 0.9255
Epoch 00010 | Train Acc: 0.9375 | Train Loss: 0.2100 | Valid Acc: 0.6786 | Valid loss: 0.7023
Epoch 00015 | Train Acc: 0.9643 | Train Loss: 0.1209 | Valid Acc: 0.7500 | Valid loss: 0.5429
Epoch 00020 | Train Acc: 0.9643 | Train Loss: 0.0961 | Valid Acc: 0.7143 | Valid loss: 0.4996
Epoch 00025 | Train Acc: 0.9821 | Train Loss: 0.0744 | Valid Acc: 0.7500 | Valid loss: 0.4819
Epoch 00030 | Train Acc: 0.9821 | Train Loss: 0.0535 | Valid Acc: 0.7857 | Valid loss: 0.4839
Epoch 00035 | Train Acc: 0.9821 | Train Loss: 0.0390 | Valid Acc: 0.7857 | Valid loss: 0.5103
Epoch 00040 | Train Acc: 0.9911 | Train Loss: 0.0293 | Valid Acc: 0.7857 | Valid loss: 0.5434
Epoch 00045 | Train Acc: 0.9911 | Train Loss: 0.0211 | Valid Acc: 0.7500 | Valid loss: 0.5718


In [8]:
# ----------- 5. check results ------------------------ #
    model.eval()
    embed.eval()
    embeds = embed()
    logits= model.forward(g,embeds)[category]
    test_loss = F.cross_entropy(logits[test_idx], labels[test_idx])
    test_acc = torch.sum(logits[test_idx].argmax(dim=1) == labels[test_idx]).item() / len(test_idx)
    print("Test Acc: {:.4f} | Test loss: {:.4f}".format(test_acc, test_loss.item()))
    print()

Test Acc: 0.8889 | Test loss: 0.4004

