# Stochastic Training of GNN for Node Classification on Large Heterogeneous  Graphs

*Note: this tutorial requires a GPU enabled machine*

This tutorial shows how to train a multi-layer R-GCN for node classification on the `ogbn-mag` dataset provided by OGB.

The ogbn-mag dataset is a heterogeneous network composed of a subset of the Microsoft Academic Graph (MAG) and has 1.9M nodes and 21M edges. 

It contains four types of entities: papers, authors, institutions, fields of study 

as well as four types of relations: author “affiliated with” institution, author “writes” paper, paper “cites” paper, paper “has topic” field of study.

At the end of this tutorial you will be able to

* Create a DGL graph using the ogb data loader for dgl.
* Train a GNN model for a large heterogeneous graph on a single machine using a single GPU.

## Load Dataset

Although you can directly use the Python package provided by OGB, for demonstration, we will instead manually download the dataset, peek into its contents, and process it with only `numpy`.

In [1]:
!pip install ogb -qq

In [2]:
from ogb.nodeproppred import DglNodePropPredDataset

dataset = DglNodePropPredDataset(name='ogbn-mag')

Using backend: pytorch


The dataset contains the following:

* DGL graph object
* The node label tensor

We can also use the utility function in the dataset to get the train, validation, test splits

In [3]:
import dgl

graph, label = dataset[0] # graph: dgl graph object, label: torch tensor of shape (num_nodes, 1)


split_idx = dataset.get_idx_split()
train_nids, valid_nids, test_nids = split_idx["train"], split_idx["valid"], split_idx["test"]

Since the graph is heterogeneous our train_nids is a node dictionary with the node type as key and a list of node ids as the value

In [4]:
train_nids

{'paper': tensor([     0,      1,      2,  ..., 736386, 736387, 736388])}

We can see the size of the graph, features, and labels as follows.

In [5]:
print(graph)

print('Node labels')
node_labels = label['paper'].flatten()
print('Shape of target node labels:', node_labels.shape)
num_classes = (node_labels.max() + 1).item()
print('Number of classes:', num_classes)

print('Node features')
node_features = graph.nodes['paper'].data['feat']
num_features = node_features.shape[1]
print('Shape of features of paper node type: {}'.format(num_features))

Graph(num_nodes={'author': 1134649, 'field_of_study': 59965, 'institution': 8740, 'paper': 736389},
      num_edges={('author', 'affiliated_with', 'institution'): 1043998, ('author', 'writes', 'paper'): 7145660, ('paper', 'cites', 'paper'): 5416271, ('paper', 'has_topic', 'field_of_study'): 7505078},
      metagraph=[('author', 'institution', 'affiliated_with'), ('author', 'paper', 'writes'), ('paper', 'paper', 'cites'), ('paper', 'field_of_study', 'has_topic')])
Node labels
Shape of target node labels: torch.Size([736389])
Number of classes: 349
Node features
Shape of features of paper node type: 128


### Add reverse edges

Since the relations have a fixed orientation we add the reverse relation as well to the graph to make the relations undirected.

In [6]:
src_writes, dst_writes = graph.all_edges(etype="writes")
src_topic, dst_topic = graph.all_edges(etype="has_topic")
src_aff, dst_aff = graph.all_edges(etype="affiliated_with")


graph = dgl.heterograph({
    ("author", "writes", "paper"): (src_writes, dst_writes),
    ("paper", "has_topic", "field_of_study"): (src_topic, dst_topic),
    ("author", "affiliated_with", "institution"): (src_aff, dst_aff),
    ("paper", "writes-rev", "author"): (dst_writes, src_writes),
    ("field_of_study", "has_topic-rev", "paper"): (dst_topic, src_topic),
    ("institution", "affiliated_with-rev", "author"): (dst_aff, src_aff),
})

<div class="alert alert-info">
    <b>Note:</b> A DGL heterograph is immutable. To add new edges you have to create a new graph.
</div>

### Defining neighbor sampler and data loader in DGL

For training a 2-layer R-GCN with neighbor sampling, where each node will gather messages from 15 neighbors on each layer, the code defining the data loader and neighbor sampler will look like the following.

In [7]:
import dgl

sampler = dgl.dataloading.MultiLayerNeighborSampler([15, 15])
train_dataloader = dgl.dataloading.NodeDataLoader(
    graph, train_nids, sampler,
    batch_size=1024,
    shuffle=True,
    drop_last=False,
    num_workers=0
)

We can iterate over the data loader we created and see what it gives us.

In [8]:
example_minibatch = next(iter(train_dataloader))
print(example_minibatch)

({'author': tensor([ 82634, 325839, 449850,  ..., 289852, 245058,  71282]), 'field_of_study': tensor([  343,  4376,  5159,  ...,  9888, 19703, 15510]), 'institution': tensor([ 656, 2017, 4230,  ..., 5547, 8387, 8553]), 'paper': tensor([712657, 447547, 269429,  ..., 631677, 680909, 692204])}, {'author': tensor([], dtype=torch.int64), 'field_of_study': tensor([], dtype=torch.int64), 'institution': tensor([], dtype=torch.int64), 'paper': tensor([712657, 447547, 269429,  ..., 398093, 590450, 562032])}, [Block(num_src_nodes={'author': 4814, 'field_of_study': 3532, 'institution': 1319, 'paper': 75361},
      num_dst_nodes={'author': 4657, 'field_of_study': 3532, 'institution': 0, 'paper': 1024},
      num_edges={('author', 'affiliated_with', 'institution'): 0, ('author', 'writes', 'paper'): 4724, ('field_of_study', 'has_topic-rev', 'paper'): 10618, ('institution', 'affiliated_with-rev', 'author'): 6972, ('paper', 'has_topic', 'field_of_study'): 51095, ('paper', 'writes-rev', 'author'): 40835

Although it's muddled, the `NodeDataLoader` gives us three items per iteration similar to the one for Homogenous Graph: (input nodes, output nodes, computation dependency for each layer).

In [9]:
input_nodes, output_nodes, bipartites = example_minibatch
print("To compute {} target nodes' output we need {} nodes' input features".format(len(output_nodes['paper']), len(input_nodes['paper'])))

print("")
print("Output nodes")
print(output_nodes)

print("")
print("Input nodes")
print(input_nodes)

To compute 1024 target nodes' output we need 75361 nodes' input features

Output nodes
{'author': tensor([], dtype=torch.int64), 'field_of_study': tensor([], dtype=torch.int64), 'institution': tensor([], dtype=torch.int64), 'paper': tensor([712657, 447547, 269429,  ..., 398093, 590450, 562032])}

Input nodes
{'author': tensor([ 82634, 325839, 449850,  ..., 289852, 245058,  71282]), 'field_of_study': tensor([  343,  4376,  5159,  ...,  9888, 19703, 15510]), 'institution': tensor([ 656, 2017, 4230,  ..., 5547, 8387, 8553]), 'paper': tensor([712657, 447547, 269429,  ..., 631677, 680909, 692204])}


In [10]:
for block in bipartites:
    print(block)
    print()

Block(num_src_nodes={'author': 4814, 'field_of_study': 3532, 'institution': 1319, 'paper': 75361},
      num_dst_nodes={'author': 4657, 'field_of_study': 3532, 'institution': 0, 'paper': 1024},
      num_edges={('author', 'affiliated_with', 'institution'): 0, ('author', 'writes', 'paper'): 4724, ('field_of_study', 'has_topic-rev', 'paper'): 10618, ('institution', 'affiliated_with-rev', 'author'): 6972, ('paper', 'has_topic', 'field_of_study'): 51095, ('paper', 'writes-rev', 'author'): 40835},
      metagraph=[('author', 'institution', 'affiliated_with'), ('author', 'paper', 'writes'), ('institution', 'author', 'affiliated_with-rev'), ('paper', 'field_of_study', 'has_topic'), ('paper', 'author', 'writes-rev'), ('field_of_study', 'paper', 'has_topic-rev')])

Block(num_src_nodes={'author': 4657, 'field_of_study': 3532, 'institution': 0, 'paper': 1024},
      num_dst_nodes={'author': 0, 'field_of_study': 0, 'institution': 0, 'paper': 1024},
      num_edges={('author', 'affiliated_with', 'i

## Defining Model

The RGCN model can be written as follows:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl.nn as dglnn

In [11]:
class RGCN(nn.Module):
    def __init__(self, in_feats, n_hidden, n_classes, n_layers, rel_names):
        super().__init__()
        
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.n_classes = n_classes
        self.layers = nn.ModuleList()
        
        self.layers.append(dglnn.HeteroGraphConv({
            rel: dglnn.GraphConv(in_feats, n_hidden)
            for rel in rel_names}, aggregate='sum'))
        
        for i in range(1, n_layers - 1):
            self.layers.append(dglnn.HeteroGraphConv({
                rel: dglnn.GraphConv(n_hidden, n_hidden)
                for rel in rel_names}, aggregate='sum'))
            
        self.layers.append(dglnn.HeteroGraphConv({
            rel: dglnn.GraphConv(n_hidden, n_classes)
            for rel in rel_names}, aggregate='sum'))

    def forward(self, bipartites, x):
        # inputs are features of nodes
        for l, (layer, bipartite) in enumerate(zip(self.layers, bipartites)):
            x = layer(bipartite, x)
            if l != self.n_layers - 1:
                x = {k: F.relu(v) for k, v in x.items()}
        return x

###  What to do about featureless nodes

We need initial representations for all nodes to perform message passing

Directly learn the initial representations for featureless nodes with an Embedding layer

Here's how to do that

In [12]:
class NodeEmbed(nn.Module):
    def __init__(self, num_nodes, embed_size,):
        super(NodeEmbed, self).__init__()
        self.embed_size = embed_size
        self.node_embeds = nn.ModuleDict()
        for ntype in num_nodes:
            node_embed = torch.nn.Embedding(num_nodes[ntype], self.embed_size)
            nn.init.uniform_(node_embed.weight, -1.0, 1.0)
            self.node_embeds[str(ntype)] = node_embed
    
    def forward(self, node_ids):
        embeds = {}
        for ntype in node_ids:
            embeds[ntype] = self.node_embeds[ntype](node_ids[ntype])
        return embeds

### Initialize model and optimizer

The following initializes the model and defines the optimizer.

In [13]:
num_nodes = {ntype: graph.number_of_nodes(ntype) for ntype in graph.ntypes if ntype != 'paper'}
num_layers = 2
hidden_dim = 128
embed = NodeEmbed(num_nodes, hidden_dim)
model = RGCN(num_features, hidden_dim, num_classes, num_layers, graph.etypes).cuda()
opt = torch.optim.Adam(list(model.parameters()) + list(embed.parameters()))

In [14]:
embed

NodeEmbed(
  (node_embeds): ModuleDict(
    (author): Embedding(1134649, 128)
    (field_of_study): Embedding(59965, 128)
    (institution): Embedding(8740, 128)
  )
)

In [None]:
model

## Defining Training Loop

When computing the validation score for model selection, usually you can also do neighbor sampling.  To do that, you need to define another data loader.

In [None]:
valid_dataloader = dgl.dataloading.NodeDataLoader(
    graph, valid_nids, sampler,
    batch_size=1024,
    shuffle=False,
    drop_last=False,
    num_workers=0
)

The following is a training loop that performs validation every epoch.  It also saves the model with the best validation accuracy into a file.

In [None]:
import tqdm
import numpy as np
import sklearn.metrics

best_accuracy = 0
best_model_path = 'model.pt'
for epoch in range(10):
    model.train()
    
    with tqdm.tqdm(train_dataloader) as tq:
        for step, (input_nodes, output_nodes, bipartites) in enumerate(tq):
            bipartites = [b.to(torch.device('cuda')) for b in bipartites]
            
            # Get featureless input nodes and use the node embeddings as their initial representation 
            featureless_nodes = {ntype: node_ids for ntype, node_ids in input_nodes.items() if ntype != 'paper'}
            embeddings = {ntype: node_embedding.cuda() for ntype, node_embedding in embed(featureless_nodes).items()}
            
            # Get input features for node type 'paper' which has input features
            inputs = {'paper': node_features[input_nodes['paper']].cuda()}
            
            inputs.update(embeddings) # Merge feature inputs with input that has features
            
            labels = node_labels[output_nodes['paper']].cuda()
            predictions = model(bipartites, inputs)['paper']

            loss = F.cross_entropy(predictions, labels)
            opt.zero_grad()
            loss.backward()
            opt.step()

            accuracy = sklearn.metrics.accuracy_score(labels.cpu().numpy(), predictions.argmax(1).detach().cpu().numpy())
            
            tq.set_postfix({'loss': '%.03f' % loss.item(), 'acc': '%.03f' % accuracy}, refresh=False)
        
    model.eval()
    
    predictions = []
    labels = []
    with tqdm.tqdm(valid_dataloader) as tq, torch.no_grad():
        for input_nodes, output_nodes, bipartites in tq:
            bipartites = [b.to(torch.device('cuda')) for b in bipartites]
            
            featureless_nodes = {ntype: node_ids for ntype, node_ids in input_nodes.items() if ntype != "paper"}
            embeddings = {ntype: node_embedding.cuda() for ntype, node_embedding in embed(featureless_nodes).items()}
            inputs = {'paper': node_features[input_nodes['paper']].cuda()}
            inputs.update(embeddings)
            
            labels.append(node_labels[output_nodes['paper']].numpy())
            predictions.append(model(bipartites, inputs)['paper'].argmax(1).cpu().numpy())
        predictions = np.concatenate(predictions)
        labels = np.concatenate(labels)
        accuracy = sklearn.metrics.accuracy_score(labels, predictions)
        print('Epoch {} Validation Accuracy {}'.format(epoch, accuracy))
        if best_accuracy < accuracy:
            best_accuracy = accuracy
            torch.save(model.state_dict(), best_model_path)

## Offline Inference without Neighbor Sampling

We reuse the same function from the previous tutorial for computing the node representation output from a GNN under an unsupervised learning setting as well.

In [None]:
def inference(model, graph, input_features, batch_size):
    nodes = {ntype: torch.arange(graph.number_of_nodes(ntype)) for ntype in graph.ntypes}
    
    sampler = dgl.dataloading.MultiLayerNeighborSampler([None])  # one layer at a time, taking all neighbors
    dataloader = dgl.dataloading.NodeDataLoader(
        graph, nodes, sampler
        ,
        batch_size=batch_size,
        shuffle=False,
        drop_last=False,
        num_workers=0)
    
    with torch.no_grad():
        for l, layer in enumerate(model.layers):
            # Allocate a buffer of output representations for every node
            # Note that the buffer is on CPU memory.
            output_features = {ntype: torch.zeros(
                graph.number_of_nodes(ntype), model.n_hidden if l != model.n_layers - 1 else model.n_classes)
                for ntype in graph.ntypes}

            for input_nodes, output_nodes, bipartites in tqdm.tqdm(dataloader):
                bipartite = bipartites[0].to(torch.device('cuda'))

                # send features for nodes in batch to gpu 
                x = {ntype: input_features[ntype][input_nodes[ntype]].cuda() for ntype in input_nodes}

                # the following code is identical to the loop body in model.forward()
                x = layer(bipartite, x)
                if l != model.n_layers - 1:
                    x = {k: F.relu(v) for k, v in x.items()}
                
                for ntype in x:
                    output_features[ntype][output_nodes[ntype]] = x[ntype].cpu()
            input_features = output_features
    return output_features

The following code loads the best model from the file saved previously and performs offline inference.  It computes the accuracy on the test set afterwards.

In [None]:
model.load_state_dict(torch.load(best_model_path))

featureless_nodes = {ntype: torch.arange(num_nodes_ntype) for ntype, num_nodes_ntype in num_nodes.items()}
embeddings = {ntype: node_embedding for ntype, node_embedding in embed(featureless_nodes).items()}
inputs = {'paper': node_features}
inputs.update(embeddings)

all_predictions = inference(model, graph, inputs, 8192)

In [15]:
test_predictions = all_predictions['paper'][test_nids['paper']].argmax(1)
test_labels = node_labels[test_nids['paper']]
test_accuracy = sklearn.metrics.accuracy_score(test_predictions.numpy(), test_labels.numpy())
print('Test accuracy:', test_accuracy)

RGCN(
  (layers): ModuleList(
    (0): HeteroGraphConv(
      (mods): ModuleDict(
        (affiliated_with): GraphConv(in=128, out=128, normalization=both, activation=None)
        (affiliated_with-rev): GraphConv(in=128, out=128, normalization=both, activation=None)
        (has_topic): GraphConv(in=128, out=128, normalization=both, activation=None)
        (has_topic-rev): GraphConv(in=128, out=128, normalization=both, activation=None)
        (writes): GraphConv(in=128, out=128, normalization=both, activation=None)
        (writes-rev): GraphConv(in=128, out=128, normalization=both, activation=None)
      )
    )
    (1): HeteroGraphConv(
      (mods): ModuleDict(
        (affiliated_with): GraphConv(in=128, out=349, normalization=both, activation=None)
        (affiliated_with-rev): GraphConv(in=128, out=349, normalization=both, activation=None)
        (has_topic): GraphConv(in=128, out=349, normalization=both, activation=None)
        (has_topic-rev): GraphConv(in=128, out=349, n

## Defining Training Loop

When computing the validation score for model selection, usually you can also do neighbor sampling.  To do that, you need to define another data loader.

In [16]:
valid_dataloader = dgl.dataloading.NodeDataLoader(
    graph, valid_nids, sampler,
    batch_size=1024,
    shuffle=False,
    drop_last=False,
    num_workers=0
)

The following is a training loop that performs validation every epoch.  It also saves the model with the best validation accuracy into a file.

In [17]:
import tqdm
import numpy as np
import sklearn.metrics

best_accuracy = 0
best_model_path = 'model.pt'
for epoch in range(10):
    model.train()
    
    with tqdm.tqdm(train_dataloader) as tq:
        for step, (input_nodes, output_nodes, bipartites) in enumerate(tq):
            bipartites = [b.to(torch.device('cuda')) for b in bipartites]
            
            # Get featureless input nodes and use the node embeddings as their initial representation 
            featureless_nodes = {ntype: node_ids for ntype, node_ids in input_nodes.items() if ntype != 'paper'}
            embeddings = {ntype: node_embedding.cuda() for ntype, node_embedding in embed(featureless_nodes).items()}
            
            # Get input features for node type 'paper' which has input features
            inputs = {'paper': node_features[input_nodes['paper']].cuda()}
            
            inputs.update(embeddings) # Merge feature inputs with input that has features
            
            labels = node_labels[output_nodes['paper']].cuda()
            predictions = model(bipartites, inputs)['paper']

            loss = F.cross_entropy(predictions, labels)
            opt.zero_grad()
            loss.backward()
            opt.step()

            accuracy = sklearn.metrics.accuracy_score(labels.cpu().numpy(), predictions.argmax(1).detach().cpu().numpy())
            
            tq.set_postfix({'loss': '%.03f' % loss.item(), 'acc': '%.03f' % accuracy}, refresh=False)
        
    model.eval()
    
    predictions = []
    labels = []
    with tqdm.tqdm(valid_dataloader) as tq, torch.no_grad():
        for input_nodes, output_nodes, bipartites in tq:
            bipartites = [b.to(torch.device('cuda')) for b in bipartites]
            
            featureless_nodes = {ntype: node_ids for ntype, node_ids in input_nodes.items() if ntype != "paper"}
            embeddings = {ntype: node_embedding.cuda() for ntype, node_embedding in embed(featureless_nodes).items()}
            inputs = {'paper': node_features[input_nodes['paper']].cuda()}
            inputs.update(embeddings)
            
            labels.append(node_labels[output_nodes['paper']].numpy())
            predictions.append(model(bipartites, inputs)['paper'].argmax(1).cpu().numpy())
        predictions = np.concatenate(predictions)
        labels = np.concatenate(labels)
        accuracy = sklearn.metrics.accuracy_score(labels, predictions)
        print('Epoch {} Validation Accuracy {}'.format(epoch, accuracy))
        if best_accuracy < accuracy:
            best_accuracy = accuracy
            torch.save(model.state_dict(), best_model_path)

100%|██████████| 615/615 [00:51<00:00, 12.01it/s, loss=2.615, acc=0.327]
100%|██████████| 64/64 [00:04<00:00, 13.61it/s]
  0%|          | 1/615 [00:00<01:18,  7.86it/s, loss=2.494, acc=0.372]

Epoch 0 Validation Accuracy 0.3171442223215524


100%|██████████| 615/615 [00:52<00:00, 11.64it/s, loss=2.510, acc=0.358]
100%|██████████| 64/64 [00:04<00:00, 13.78it/s]
  0%|          | 1/615 [00:00<01:18,  7.82it/s, loss=2.445, acc=0.356]

Epoch 1 Validation Accuracy 0.3248662895544013


100%|██████████| 615/615 [00:50<00:00, 12.20it/s, loss=2.445, acc=0.382]
100%|██████████| 64/64 [00:04<00:00, 14.03it/s]
  0%|          | 1/615 [00:00<01:16,  7.99it/s, loss=2.365, acc=0.377]

Epoch 2 Validation Accuracy 0.32774857812235086


100%|██████████| 615/615 [00:50<00:00, 12.11it/s, loss=2.392, acc=0.375]
100%|██████████| 64/64 [00:04<00:00, 14.44it/s]
  0%|          | 1/615 [00:00<01:18,  7.83it/s, loss=2.316, acc=0.388]

Epoch 3 Validation Accuracy 0.3417746882658487


100%|██████████| 615/615 [00:51<00:00, 12.06it/s, loss=2.259, acc=0.381]
100%|██████████| 64/64 [00:04<00:00, 14.12it/s]
  0%|          | 1/615 [00:00<01:17,  7.96it/s, loss=2.356, acc=0.376]

Epoch 4 Validation Accuracy 0.3340063811094499


100%|██████████| 615/615 [00:50<00:00, 12.17it/s, loss=2.218, acc=0.394]
100%|██████████| 64/64 [00:04<00:00, 14.39it/s]
  0%|          | 1/615 [00:00<01:16,  7.98it/s, loss=2.188, acc=0.404]

Epoch 5 Validation Accuracy 0.3491114228024476


100%|██████████| 615/615 [00:50<00:00, 12.20it/s, loss=2.153, acc=0.390]
100%|██████████| 64/64 [00:04<00:00, 14.48it/s]
  0%|          | 1/615 [00:00<01:17,  7.97it/s, loss=2.273, acc=0.399]

Epoch 6 Validation Accuracy 0.36047103068789593


100%|██████████| 615/615 [00:50<00:00, 12.14it/s, loss=2.197, acc=0.402]
100%|██████████| 64/64 [00:04<00:00, 14.57it/s]
  0%|          | 1/615 [00:00<01:16,  8.04it/s, loss=2.155, acc=0.392]

Epoch 7 Validation Accuracy 0.35903759305784616


100%|██████████| 615/615 [00:50<00:00, 12.19it/s, loss=2.204, acc=0.420]
100%|██████████| 64/64 [00:04<00:00, 14.44it/s]
  0%|          | 1/615 [00:00<01:16,  8.01it/s, loss=2.148, acc=0.421]

Epoch 8 Validation Accuracy 0.3687942169268947


100%|██████████| 615/615 [00:50<00:00, 12.21it/s, loss=2.209, acc=0.389]
100%|██████████| 64/64 [00:04<00:00, 14.49it/s]

Epoch 9 Validation Accuracy 0.3527489634550471





## Offline Inference without Neighbor Sampling

We reuse the same function from the previous tutorial for computing the node representation output from a GNN under an unsupervised learning setting as well.

In [18]:
def inference(model, graph, input_features, batch_size):
    nodes = {ntype: torch.arange(graph.number_of_nodes(ntype)) for ntype in graph.ntypes}
    
    sampler = dgl.dataloading.MultiLayerNeighborSampler([None])  # one layer at a time, taking all neighbors
    dataloader = dgl.dataloading.NodeDataLoader(
        graph, nodes, sampler
        ,
        batch_size=batch_size,
        shuffle=False,
        drop_last=False,
        num_workers=0)
    
    with torch.no_grad():
        for l, layer in enumerate(model.layers):
            # Allocate a buffer of output representations for every node
            # Note that the buffer is on CPU memory.
            output_features = {ntype: torch.zeros(
                graph.number_of_nodes(ntype), model.n_hidden if l != model.n_layers - 1 else model.n_classes)
                for ntype in graph.ntypes}

            for input_nodes, output_nodes, bipartites in tqdm.tqdm(dataloader):
                bipartite = bipartites[0].to(torch.device('cuda'))

                # send features for nodes in batch to gpu 
                x = {ntype: input_features[ntype][input_nodes[ntype]].cuda() for ntype in input_nodes}

                # the following code is identical to the loop body in model.forward()
                x = layer(bipartite, x)
                if l != model.n_layers - 1:
                    x = {k: F.relu(v) for k, v in x.items()}
                
                for ntype in x:
                    output_features[ntype][output_nodes[ntype]] = x[ntype].cpu()
            input_features = output_features
    return output_features

The following code loads the best model from the file saved previously and performs offline inference.  It computes the accuracy on the test set afterwards.

In [19]:
model.load_state_dict(torch.load(best_model_path))

featureless_nodes = {ntype: torch.arange(num_nodes_ntype) for ntype, num_nodes_ntype in num_nodes.items()}
embeddings = {ntype: node_embedding for ntype, node_embedding in embed(featureless_nodes).items()}
inputs = {'paper': node_features}
inputs.update(embeddings)

all_predictions = inference(model, graph, inputs, 8192)

100%|██████████| 237/237 [00:21<00:00, 11.15it/s]
100%|██████████| 237/237 [00:21<00:00, 10.97it/s]


In [20]:
test_predictions = all_predictions['paper'][test_nids['paper']].argmax(1)
test_labels = node_labels[test_nids['paper']]
test_accuracy = sklearn.metrics.accuracy_score(test_predictions.numpy(), test_labels.numpy())
print('Test accuracy:', test_accuracy)

Test accuracy: 0.32940699587496125


## Conclusion

In this tutorial, you have learned how to train a multi-layer RGCN with neighbor sampling on a large heterogeneous dataset.  The method used here works on a single machine with a single GPU.

## What's next?

The next tutorial will be about scaling the training procedure out to multiple GPUs on a single machine.