In [1]:
%matplotlib inline



Relational Graph Convolutional Network Tutorial
================================================

**Author:** Lingfan Yu, Mufei Li, Zheng Zhang

The vanilla Graph Convolutional Network (GCN)
(`paper <https://arxiv.org/pdf/1609.02907.pdf>`_,
`DGL tutorial <http://doc.dgl.ai/tutorials/index.html>`_) exploits
structural information of the dataset (i.e. the graph connectivity) to
improve the extraction of node representations. Graph edges are left as
untyped.

A knowledge graph is made up by a collection of triples of the form
(subject, relation, object). Edges thus encode important information and
have their own embeddings to be learned. Furthermore, there may exist
multiple edges among any given pair.

A recent model Relational-GCN (R-GCN) from the paper
`Modeling Relational Data with Graph Convolutional
Networks <https://arxiv.org/pdf/1703.06103.pdf>`_ is one effort to
generalize GCN to handle different relations between entities in knowledge
base. This tutorial shows how to implement R-GCN with DGL.



R-GCN: a brief introduction
---------------------------
In *statistical relational learning* (SRL), there are two fundamental
tasks:

- **Entity classification**, i.e., assign types and categorical
  properties to entities.
- **Link prediction**, i.e., recover missing triples.

In both cases, missing information are expected to be recovered from
neighborhood structure of the graph. Here is the example from the R-GCN
paper:

"Knowing that Mikhail Baryshnikov was educated at the Vaganova Academy
implies both that Mikhail Baryshnikov should have the label person, and
that the triple (Mikhail Baryshnikov, lived in, Russia) must belong to the
knowledge graph."

R-GCN solves these two problems using a common graph convolutional network
extended with multi-edge encoding to compute embedding of the entities, but
with different downstream processing:

- Entity classification is done by attaching a softmax classifier at the
  final embedding of an entity (node). Training is through loss of standard
  cross-entropy.
- Link prediction is done by reconstructing an edge with an autoencoder
  architecture, using a parameterized score function. Training uses negative
  sampling.

This tutorial will focus on the first task to show how to generate entity
representation. `Complete
code <https://github.com/dmlc/dgl/tree/rgcn/examples/pytorch/rgcn>`_
for both tasks can be found in DGL's github repository.

Key ideas of R-GCN
-------------------
Recall that in GCN, the hidden representation for each node $i$ at
$(l+1)^{th}$ layer is computed by:

\begin{align}h_i^{l+1} = \sigma\left(\sum_{j\in N_i}\frac{1}{c_i} W^{(l)} h_j^{(l)}\right)~~~~~~~~~~(1)\\\end{align}

where $c_i$ is a normalization constant.

The key difference between R-GCN and GCN is that in R-GCN, edges can
represent different relations. In GCN, weight $W^{(l)}$ in equation
$(1)$ is shared by all edges in layer $l$. In contrast, in
R-GCN, different edge types use different weights and only edges of the
same relation type $r$ are associated with the same projection weight
$W_r^{(l)}$.

So the hidden representation of entities in $(l+1)^{th}$ layer in
R-GCN can be formulated as the following equation:

\begin{align}h_i^{l+1} = \sigma\left(W_0^{(l)}h_i^{(l)}+\sum_{r\in R}\sum_{j\in N_i^r}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}\right)~~~~~~~~~~(2)\\\end{align}

where $N_i^r$ denotes the set of neighbor indices of node $i$
under relation $r\in R$ and $c_{i,r}$ is a normalization
constant. In entity classification, the R-GCN paper uses
$c_{i,r}=|N_i^r|$.

The problem of applying the above equation directly is rapid growth of
number of parameters, especially with highly multi-relational data. In
order to reduce model parameter size and prevent overfitting, the original
paper proposes to use basis decomposition:

\begin{align}W_r^{(l)}=\sum\limits_{b=1}^B a_{rb}^{(l)}V_b^{(l)}~~~~~~~~~~(3)\\\end{align}

Therefore, the weight $W_r^{(l)}$ is a linear combination of basis
transformation $V_b^{(l)}$ with coefficients $a_{rb}^{(l)}$.
The number of bases $B$ is much smaller than the number of relations
in the knowledge base.

<div class="alert alert-info"><h4>Note</h4><p>Another weight regularization, block-decomposition, is implemented in
   the `link prediction <link-prediction_>`_.</p></div>

Implement R-GCN in DGL
----------------------

An R-GCN model is composed of several R-GCN layers. The first R-GCN layer
also serves as input layer and takes in features (e.g. description texts)
associated with node entity and project to hidden space. In this tutorial,
we only use entity id as entity feature.

R-GCN Layers
~~~~~~~~~~~~

For each node, an R-GCN layer performs the following steps:

- Compute outgoing message using node representation and weight matrix
  associated with the edge type (message function)
- Aggregate incoming messages and generate new node representations (reduce
  and apply function)

The following is the definition of an R-GCN hidden layer.

<div class="alert alert-info"><h4>Note</h4><p>Each relation type is associated with a different weight. Therefore,
   the full weight matrix has three dimensions: relation, input_feature,
   output_feature.</p></div>




In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from dgl import DGLGraph
import dgl.function as fn
from functools import partial
import dgl

class RGCNLayer(nn.Module):
    def __init__(self, in_feat, out_feat, num_rels, num_bases=-1, bias=None,
                 activation=None, is_input_layer=False):
        super(RGCNLayer, self).__init__()
        self.in_feat = in_feat
        self.out_feat = out_feat
        self.num_rels = num_rels
        self.num_bases = num_bases
        self.bias = bias
        self.activation = activation
        self.is_input_layer = is_input_layer

        # sanity check
        if self.num_bases <= 0 or self.num_bases > self.num_rels:
            self.num_bases = self.num_rels

        # weight bases in equation (3)
        self.weight = nn.Parameter(torch.Tensor(self.num_bases, self.in_feat,
                                                self.out_feat))
        #self.weight_nodes = nn.Parameter(torch.Tensor(self.in_feat, self.))
        if self.num_bases < self.num_rels:
            # linear combination coefficients in equation (3)
            self.w_comp = nn.Parameter(torch.Tensor(self.num_rels, self.num_bases))

        # add bias
        if self.bias:
            self.bias = nn.Parameter(torch.Tensor(out_feat))

        # init trainable parameters
        nn.init.xavier_uniform_(self.weight,
                                gain=nn.init.calculate_gain('relu'))
        if self.num_bases < self.num_rels:
            nn.init.xavier_uniform_(self.w_comp,
                                    gain=nn.init.calculate_gain('relu'))
        if self.bias:
            nn.init.xavier_uniform_(self.bias,
                                    gain=nn.init.calculate_gain('relu'))

    def forward(self, g):
        print("Starting Forwarding*************** ")
        if self.num_bases < self.num_rels:
            # generate all weights from bases (equation (3))
            
            print(weight)
            print(weight.shape)
            
            weight = self.weight.view(self.in_feat, self.num_bases, self.out_feat)
            print(weight)
            print(weight.shape)
            
            weight = torch.matmul(self.w_comp, weight).view(self.num_rels,self.in_feat, self.out_feat)
            print(weight)
            print(weight.shape)
                                                        
        else:
            weight = self.weight
            
            print(weight)
            print(weight.shape)
            
    
        if self.is_input_layer:
            def message_func(edges):
                # for input layer, matrix multiply can be converted to be
                # an embedding lookup using source node id
                print("Input_Layer")
                embed = weight.view(-1, self.out_feat)
                print(embed)
                index = edges.data['rel_type'] * self.in_feat + edges.src['id']
                print(edges.data['rel_type'])
                print(index)
                print(edges.data['norm'])
                print(embed[index]* edges.data['norm'])
                return {'msg': embed[index] * edges.data['norm']}
        else:
            def message_func(edges):
                
                w = weight[edges.data['rel_type']]
                msg = torch.bmm(edges.src['h'].unsqueeze(1), w).squeeze()
                msg = msg * edges.data['norm']
                print("Message shape and message")
                print(msg.shape)
                print(msg)
                return {'msg': msg }

        def apply_func(nodes):
            h = nodes.data['h']
            
            if self.bias:
                h = h + self.bias
            if self.activation:
                h = self.activation(h)
            return {'h': h}

        g.update_all(message_func, fn.sum(msg='msg', out='h'), apply_func)
        print("Ending Forwarding*************")

Define full R-GCN model
~~~~~~~~~~~~~~~~~~~~~~~



In [3]:
import dgl
class Model(nn.Module):
    def __init__(self, num_nodes, h_dim, out_dim, num_rels,
                 num_bases=-1, num_hidden_layers=1):
        super(Model, self).__init__()
        self.num_nodes = num_nodes
        self.h_dim = h_dim
        self.out_dim = out_dim
        self.num_rels = num_rels
        self.num_bases = num_bases
        self.num_hidden_layers = num_hidden_layers

        # create rgcn layers
        self.build_model()

        # create initial features
        self.features = self.create_features()

    def build_model(self):
        self.layers = nn.ModuleList()
        # input to hidden
        
        i2h = self.build_input_layer()
        self.layers.append(i2h)
        
        # hidden to hidden
        
        for _ in range(self.num_hidden_layers):
            h2h = self.build_hidden_layer()
            self.layers.append(h2h)
        
        # hidden to output
        
        h2o = self.build_output_layer()
        self.layers.append(h2o)

    # initialize feature for each node This needs to Be modified As per our dataset
    def create_features(self):
        features = torch.arange(self.num_nodes)
        return features

    def build_input_layer(self):
        return RGCNLayer(self.num_nodes, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu, is_input_layer=True)

    def build_hidden_layer(self):
        return RGCNLayer(self.h_dim, self.h_dim, self.num_rels, self.num_bases,
                         activation=F.relu)

    def build_output_layer(self):
        return RGCNLayer(self.h_dim, self.out_dim, self.num_rels, self.num_bases,
                         activation=partial(F.softmax, dim=1))

    def forward(self, g):
        #Dont need to initialise in our case as we have the features
        if self.features is not None:
            g.ndata['id'] = self.features
        for layer in self.layers:
            layer(g)
        return g.ndata.pop('h')

Handle dataset
~~~~~~~~~~~~~~~~
In this tutorial, we use AIFB dataset from R-GCN paper:



In [4]:
# load graph data
from dgl.contrib.data import load_data
import numpy as np
import dgl
data = load_data(dataset='aifb')

num_nodes = data.num_nodes
num_rels = data.num_rels
num_classes = data.num_classes
labels = data.labels
train_idx = data.train_idx
print(num_rels)
#Some modifications for batched graph adding training samples for the second graph as well


# split training and validation set
val_idx = train_idx[:len(train_idx) // 5]
train_idx = train_idx[len(train_idx) // 5:]
print(train_idx)
# edge type and normalization factor
edge_type = torch.from_numpy(data.edge_type)
edge_norm = torch.from_numpy(data.edge_norm).unsqueeze(1)
print(labels)
labels = torch.from_numpy(labels).view(-1)
print(edge_norm)
print(edge_type)
print(edge_type.shape)


Loading dataset aifb
Number of nodes:  8285
Number of edges:  66371
Number of relations:  91
Number of classes:  4
removing nodes that are more than 3 hops away
91
[ 289 6747 4779 6340 6480 1232 4987 7929 5247 2100  839 2776 2306 3599
 6538 5914 4463 7005  781  590 1962 6949 4619 3124 3249 8198 6817 6268
 3655  941 6589 2572 4126 1153 2225 7710 3920 2666 8166 2558 1650 4493
 6155 4488 1509 7409 4669 2011 1919 3616   76 1626 1069 3871  448 3962
 5638 1177 1048 6810 3852 5763 5080  652 5791 4193 5428 7877 3236 5107
 3613 3429 2246  712 2170 3795 5698 6995 4703 2368 1894  190 2390 6805
 1535 2512 6038 1415  390 1061 2842 2051 3527 6724  857 2280  811 7031
  737  552 5000 7659 1158 7765 7046 6036  876 5412 3596 1347 7850 4907]
[[0]
 [0]
 [0]
 ...
 [0]
 [0]
 [0]]
tensor([[1.0000],
        [1.0000],
        [1.0000],
        ...,
        [0.1000],
        [0.1000],
        [1.0000]])
tensor([ 0, 61,  0,  ...,  1,  6,  0])
torch.Size([65439])


Create graph and model
~~~~~~~~~~~~~~~~~~~~~~~



In [5]:
import networkx as nx
import matplotlib.pyplot as plt
from dgl import DGLGraph
#nx.draw(g.to_networkx(), with_labels=True)
plt.show()
# configurations
n_hidden = 16 # number of hidden units
n_bases = -1 # use number of relations as number of bases
n_hidden_layers = 0 # use 1 input layer, 1 output layer, no hidden layer
n_epochs = 25 # epochs to train
lr = 0.01 # learning rate
l2norm = 0 # L2 norm coefficient

# create graph instance
g = DGLGraph()
g.add_nodes(num_nodes)
g.add_edges(data.edge_src, data.edge_dst)
g.edata.update({'rel_type': edge_type, 'norm': edge_norm})

# create model
model = Model(len(g),
              n_hidden,
              num_classes,
              num_rels,
              num_bases=n_bases,
              num_hidden_layers=n_hidden_layers)

Training loop
~~~~~~~~~~~~~~~~



In [71]:
# optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=l2norm)

print("start training...")
model.train()
for epoch in range(n_epochs):
    optimizer.zero_grad()
    logits = model.forward(g)
    loss = F.cross_entropy(logits[train_idx], labels[train_idx])
    loss.backward()

    optimizer.step()

    train_acc = torch.sum(logits[train_idx].argmax(dim=1) == labels[train_idx])
    train_acc = train_acc.item() / len(train_idx)
    val_loss = F.cross_entropy(logits[val_idx], labels[val_idx])
    val_acc = torch.sum(logits[val_idx].argmax(dim=1) == labels[val_idx])
    val_acc = val_acc.item() / len(val_idx)
    print("Epoch {:05d} | ".format(epoch) +
          "Train Accuracy: {:.4f} | Train Loss: {:.4f} | ".format(
              train_acc, loss.item()) +
          "Validation Accuracy: {:.4f} | Validation loss: {:.4f}".format(
              val_acc, val_loss.item()))

start training...
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  3.1985e-03,  3.2966e-03,  ...,  6.1904e-03,
           8.1345e-03,  1.3485e-03],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-9.4596e-04, -1.7840e-03, -2.0631e-03,  ..., -4.4521e-03,
           7.3976e-03,  8.0187e-03]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
          -6.8174e-03,  2.8886e-03],
         [-3.6743e-03, -6.8400e-03, -1.6478e-03,  ..., -5.0176e-03,


Epoch 00001 | Train Accuracy: 0.9286 | Train Loss: 1.3455 | Validation Accuracy: 0.9643 | Validation loss: 1.3590
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  1.0577e-02,  3.2966e-03,  ...,  6.1904e-03,
           8.1345e-03,  8.7670e-03],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-1.7191e-02, -1.7840e-03, -2.0631e-03,  ..., -4.4521e-03,
           1.4661e-02,  2.7878e-02]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00003 | Train Accuracy: 0.9464 | Train Loss: 1.1995 | Validation Accuracy: 1.0000 | Validation loss: 1.2606
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  2.8213e-02,  3.2966e-03,  ...,  6.1904e-03,
           2.2236e-02,  2.6448e-02],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-2.6356e-02,  1.2247e-02, -2.0631e-03,  ..., -4.4521e-03,
           3.1976e-02,  4.7850e-02]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00005 | Train Accuracy: 0.9375 | Train Loss: 1.0171 | Validation Accuracy: 1.0000 | Validation loss: 1.1080
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  4.7042e-02,  3.2966e-03,  ...,  6.1904e-03,
           3.9665e-02,  4.5263e-02],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-3.2851e-02,  2.9005e-02, -2.0631e-03,  ..., -4.4521e-03,
           5.0499e-02,  6.7331e-02]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00007 | Train Accuracy: 0.9464 | Train Loss: 0.8917 | Validation Accuracy: 1.0000 | Validation loss: 0.9588
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  6.4867e-02,  3.2966e-03,  ...,  6.1904e-03,
           5.7023e-02,  6.2821e-02],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-3.8255e-02,  4.4163e-02, -2.0631e-03,  ..., -4.4521e-03,
           6.7227e-02,  8.4053e-02]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00009 | Train Accuracy: 0.9464 | Train Loss: 0.8279 | Validation Accuracy: 1.0000 | Validation loss: 0.8627
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  8.0223e-02,  3.2966e-03,  ...,  6.1904e-03,
           7.2245e-02,  7.7728e-02],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-4.3329e-02,  5.6572e-02, -2.0631e-03,  ..., -4.4521e-03,
           8.0935e-02,  9.7552e-02]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00011 | Train Accuracy: 0.9554 | Train Loss: 0.8010 | Validation Accuracy: 0.9643 | Validation loss: 0.8185
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  9.2975e-02,  3.2966e-03,  ...,  2.0265e-03,
           8.4953e-02,  9.0004e-02],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-4.7528e-02,  6.6609e-02, -2.0631e-03,  ..., -4.4521e-03,
           9.2025e-02,  1.0844e-01]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00013 | Train Accuracy: 0.9643 | Train Loss: 0.7867 | Validation Accuracy: 0.9643 | Validation loss: 0.8033
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  1.0347e-01,  3.2966e-03,  ..., -9.1286e-03,
           9.5427e-02,  1.0007e-01],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-5.0963e-02,  7.4776e-02, -2.0631e-03,  ..., -4.4521e-03,
           1.0105e-01,  1.1729e-01]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00015 | Train Accuracy: 0.9732 | Train Loss: 0.7764 | Validation Accuracy: 0.9286 | Validation loss: 0.8015
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  1.1210e-01,  3.2966e-03,  ..., -2.0446e-02,
           1.0405e-01,  1.0833e-01],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-5.3780e-02,  8.1466e-02, -2.0631e-03,  ..., -4.4521e-03,
           1.0844e-01,  1.2454e-01]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00017 | Train Accuracy: 0.9821 | Train Loss: 0.7687 | Validation Accuracy: 0.9286 | Validation loss: 0.8051
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  1.1923e-01,  3.2966e-03,  ..., -3.0042e-02,
           1.1116e-01,  1.1515e-01],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-5.6101e-02,  8.6972e-02, -2.0631e-03,  ..., -4.4521e-03,
           1.1452e-01,  1.3050e-01]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00019 | Train Accuracy: 0.9821 | Train Loss: 0.7630 | Validation Accuracy: 0.9286 | Validation loss: 0.8099
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  1.2511e-01,  3.2966e-03,  ..., -3.7965e-02,
           1.1704e-01,  1.2078e-01],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-5.8017e-02,  9.1520e-02, -2.0725e-03,  ..., -4.4536e-03,
           1.1955e-01,  1.3543e-01]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00021 | Train Accuracy: 0.9821 | Train Loss: 0.7583 | Validation Accuracy: 0.9286 | Validation loss: 0.8148
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  1.2999e-01,  3.2966e-03,  ..., -4.4524e-02,
           1.2191e-01,  1.2544e-01],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-5.9604e-02,  9.5285e-02, -2.0855e-03,  ..., -4.4580e-03,
           1.2371e-01,  1.3951e-01]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         

Epoch 00023 | Train Accuracy: 1.0000 | Train Loss: 0.7541 | Validation Accuracy: 0.9286 | Validation loss: 0.8201
Starting Forwarding*************** 
Parameter containing:
tensor([[[-8.8672e-04,  2.2826e-03,  2.3784e-03,  ...,  3.5401e-03,
           8.9110e-03,  5.8109e-03],
         [ 8.0549e-03,  1.3403e-01,  3.2966e-03,  ..., -4.9960e-02,
           1.2595e-01,  1.2930e-01],
         [ 1.3149e-04, -6.1761e-04, -2.6951e-04,  ...,  2.1976e-03,
           1.8512e-03, -8.1937e-03],
         ...,
         [-4.7640e-04,  5.3515e-03,  9.2228e-03,  ..., -3.0779e-03,
           3.1010e-03, -8.2695e-03],
         [ 4.9193e-03, -7.8355e-03,  2.9565e-03,  ..., -2.1280e-04,
          -9.9196e-04,  6.8946e-03],
         [-6.0919e-02,  9.8406e-02, -2.0983e-03,  ..., -4.4623e-03,
           1.2716e-01,  1.4289e-01]],

        [[-4.4994e-03,  7.3933e-03, -6.9483e-03,  ..., -4.4977e-03,
           2.4714e-03,  2.2747e-04],
         [-5.9721e-03, -2.3236e-03, -5.4484e-03,  ...,  2.4739e-03,
         


The second task: Link prediction
--------------------------------
So far, we have seen how to use DGL to implement entity classification with
R-GCN model. In the knowledge base setting, representation generated by
R-GCN can be further used to uncover potential relations between nodes. In
R-GCN paper, authors feed the entity representations generated by R-GCN
into the `DistMult <https://arxiv.org/pdf/1412.6575.pdf>`_ prediction model
to predict possible relations.

The implementation is similar to the above but with an extra DistMult layer
stacked on top of the R-GCN layers. You may find the complete
implementation of link prediction with R-GCN in our `example
code <https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn/link_predict.py>`_.

