### Import DGL and PyTorch

In this tutorial, we are going to introduce how to implement GraphSAGE to do semi-supervised learning on a node classification task. 

- We will first introduce how to implement and train a GraphSAGE model for node classification (`without the neighbor sampling step`).
- Then, we will show how to use DGL's sampler to enable the neighbor sampling to train and test the GraphSAGE model.

We will demonstrate with the DGL package.
However, feel free to try on other packages such as PyTorch-Geometric.

First, load pytorch, dgl, and other necessary packages (here we need to use NumPy).

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [2]:
!pip install dgl

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import dgl
from dgl import DGLGraph

# Load Pytorch as backend
dgl.load_backend('pytorch')

In [None]:
import numpy as np
import warnings
warnings.filterwarnings('ignore')

### Prepare the PubMed dataset
We use a citation network called pubmed for demonstration. A node in the citation network is a paper and an edge represents the citation between two papers. 

This dataset has 19,717 papers and 88,651 citations. Each paper has a sparse bag-of-words feature vector and a class label.

In [None]:
from dgl.data import citegrh

# load and preprocess the pubmed dataset
data = citegrh.load_pubmed()

# sparse bag-of-words features of papers
features = torch.FloatTensor(data.features)
# the number of input node features
in_feats = features.shape[1]
# class labels of papers
labels = torch.LongTensor(data.labels)
# the number of unique classes on the nodes.
n_classes = data.num_labels

Here we remove all self-loops in the graph.

In [None]:
data[0]

In [None]:
graph = dgl.remove_self_loop(data[0])
graph

### Implement the GNN model

Essentially, given a graph structure, GNNs (GCN, GraphSAGE, GAT, etc.) are used to learn meaningful node representations (in this case, the embeddings, or vectors).
Once these embeddings are properly learnt, we may perform downstream tasks such as node classification, graph classification, and link prediction.

DGL provides two ways of implementing a GNN model:

- using the nn module, which contains many commonly used GNN modules.
- using the message passing interface to implement a GNN model from scratch.

For simplicity, we implement the GraphSAGE model in the tutorial with the nn module.

If you are interested in using the message passing interface to implement a GNN model, check this link https://doc.dgl.ai/tutorials/models/index.html out.

![fishy](https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/master/images/GNN.png)

The GraphSage model has multiple layers. In each layer, a vertex accesses its direct neighbors. When we stack $k$ layers in a model, a node $v$ access neighbors within $k$ hops. The output of the GraphSage model is **node embeddings** that represent the nodes and all information in the k-hop neighborhood.

If you want to learn about the details of the SageConv layer, look at its official documantation at https://docs.dgl.ai/en/0.8.x/generated/dgl.nn.pytorch.conv.SAGEConv.html 

In [None]:
from dgl.nn.pytorch import conv as dgl_conv

class GraphSAGEModel(nn.Module):
    def __init__(self,
                 in_feats,
                 n_hidden,
                 out_dim,
                 n_layers,
                 activation,
                 dropout,
                 aggregator_type):
        super(GraphSAGEModel, self).__init__()
        self.layers = nn.ModuleList()

        # input layer
        self.layers.append(dgl_conv.SAGEConv(in_feats, n_hidden, aggregator_type,
                                         feat_drop=dropout, activation=activation))
        # hidden layers
        for i in range(n_layers - 1):
            self.layers.append(dgl_conv.SAGEConv(n_hidden, n_hidden, aggregator_type,
                                             feat_drop=dropout, activation=activation))
        # output layer
        self.layers.append(dgl_conv.SAGEConv(n_hidden, out_dim, aggregator_type,
                                         feat_drop=dropout, activation=None))

    def forward(self, g, features):
        h = features
        for layer in self.layers:
            h = layer(g, h)
        return h

### Node classification (semi-supervised)
Let us perform node classification in a semi-supervised setting. In this setting, we have the entire graph structure and all node features. We only have labels on some of the nodes. We want to predict the labels on other nodes. Even though some of the nodes do not have labels, they connect with nodes with labels. Thus, we train the model with both labeled nodes and unlabeled nodes. Semi-supervised learning can usually improve performance.

![semisupervised](https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/master/images/node_classify1.png)

This dependency graph shows a better view of how labeled and unlabled nodes are used in the training. 

![dependency](https://raw.githubusercontent.com/dglai/WWW20-Hands-on-Tutorial/master/images/node_classify2.png)

In [None]:
# Hyperparameters
n_hidden = 64
n_layers = 2
dropout = 0.5
aggregator_type = 'mean'

gconv_model = GraphSAGEModel(in_feats,
                             n_hidden,
                             n_classes,
                             n_layers,
                             F.relu,
                             dropout,
                             aggregator_type)

Now we create the node classification model based on the GraphSage model. The GraphSage model takes a DGLGraph object and node features as input and computes node embeddings as output. With node embeddings, we use a cross entropy loss to train the node classification model.

In [None]:
class NodeClassification(nn.Module):
    def __init__(self, gconv_model, n_hidden, n_classes):
        super(NodeClassification, self).__init__()
        self.gconv_model = gconv_model
        self.loss_fcn = torch.nn.CrossEntropyLoss()

    def forward(self, g, features, train_mask):
        logits = self.gconv_model(g, features)
        return self.loss_fcn(logits[train_mask], labels[train_mask])

After defining a model for node classification, we define the evaluation, train and test function.

In [None]:
def NCEvaluate(model, g, features, labels, test_mask):
    model.eval()
    with torch.no_grad():
        # compute embeddings with GNN
        logits = model.gconv_model(g, features)
        logits = logits[test_mask]
        test_labels = labels[test_mask]
        _, indices = torch.max(logits, dim=1)
        correct = torch.sum(indices == test_labels)
        acc = correct.item() * 1.0 / len(test_labels)
    return acc

def Train(model, graph, features, train_mask, val_mask, labels, n_epochs):
    for epoch in range(n_epochs):
        # Set the model in the training mode.
        model.train()
        # forward
        loss = model(graph, features, train_mask)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        acc = NCEvaluate(model, graph, features, labels, val_mask)
        print("Epoch {:05d} | Loss {:.4f} | Accuracy {:.4f}"
            .format(epoch, loss.item(), acc))

def Test(model, graph, features, labels, test_mask):
    print('Testing Accuracy:', NCEvaluate(model, graph, features, labels, test_mask))

Prepare data for semi-supervised node classification

In [None]:
train_mask = graph.ndata['train_mask']
val_mask = graph.ndata['val_mask']
test_mask = graph.ndata['test_mask']

print("""----Data statistics------'
      #Classes {}
      #Train samples {}
      #Val samples {}
      #Test samples {}""".format(
          n_classes,
           data.train_mask.sum().item(),
           data.val_mask.sum().item(),
           data.test_mask.sum().item()))

After defining the model and evaluation function, we can put everything into the training loop to train the model.

In [None]:
# Node classification task
model = NodeClassification(gconv_model, n_hidden, n_classes)

# Training hyperparameters
weight_decay = 5e-4
n_epochs = 150
lr = 1e-3

# create the Adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

Train(model, graph, features, train_mask, val_mask, labels, n_epochs)
Test(model, graph, features, labels, test_mask)


### Train with neighbor sampling
The above example runs without neighbor sampling.
Now, let's look at how to implement this feature.

DGL has implemented this for us in functions:

`dgl.dataloading.MultiLayerNeighborSampler`
and
`dgl.dataloading.NodeDataLoader`

Note that **the GraphSAGE structure does not change**, it is only a change in the training fashion: 

(1) we change to batched training, and 

(2) each node within a batch is updated with a portion of randomly sampled neighbors instead of all its neighbors.

In [None]:
batch_size = 1024
fan_out = [10, 20, 30]  # maximum number of neighbors in each 

sampler = dgl.dataloading.MultiLayerNeighborSampler(fan_out)

def get_dataloader_with_sampling(graph, mask, sampler, batch_size=32, shuffle=False):
    nids = torch.where(mask==True)[0]
    dataloader = dgl.dataloading.NodeDataLoader(
                    graph,
                    nids,
                    sampler,
                    batch_size=batch_size,
                    shuffle=shuffle,
                    drop_last=False,
                    num_workers=8)
    return dataloader


train_dataloader = get_dataloader_with_sampling(graph, train_mask, sampler, batch_size, True)
val_dataloader = get_dataloader_with_sampling(graph, val_mask, sampler, batch_size, False)
test_dataloader = get_dataloader_with_sampling(graph, test_mask, sampler, batch_size, False)

The model structure remains the same. 
And the only difference is at the **forward** function, where we adapt the function to receive `blocks` data as inputs, which are the batched neighborhood-sampled graphs.

In [None]:
class GraphSAGEModel(nn.Module):
    def __init__(self,
                 in_feats,
                 n_hidden,
                 out_dim,
                 n_layers,
                 activation,
                 dropout,
                 aggregator_type):
        super(GraphSAGEModel, self).__init__()
        self.layers = nn.ModuleList()

        # input layer
        self.layers.append(dgl_conv.SAGEConv(in_feats, n_hidden, aggregator_type,
                                         feat_drop=dropout, activation=activation))
        # hidden layers
        for i in range(n_layers - 1):
            self.layers.append(dgl_conv.SAGEConv(n_hidden, n_hidden, aggregator_type,
                                             feat_drop=dropout, activation=activation))
        # output layer
        self.layers.append(dgl_conv.SAGEConv(n_hidden, out_dim, aggregator_type,
                                         feat_drop=dropout, activation=None))

    ''' Notice the difference in the forward method'''
    def forward(self, blocks, features):
        h = features
        for layer, block in zip(self.layers, blocks):
            h = layer(block, h)
        return h

    ''' This is the forward method for no-sampling '''
    # def forward(self, g, features):
    #     h = features
    #     for layer in self.layers:
    #         h = layer(g, h)
    #     return h


class NodeClassification(nn.Module):
    def __init__(self, gconv_model):
        super(NodeClassification, self).__init__()
        self.gconv_model = gconv_model
        self.loss_fcn = torch.nn.CrossEntropyLoss()

    ''' Instead of using masks, we pass the features and labels corresponding to the blocks for batched learning. '''
    def forward(self, blocks, features, labels):
        logits = self.gconv_model(blocks, features)
        return self.loss_fcn(logits, labels)



For the training and evaluation, we re-organize them to receive batch input.

In [None]:
def Evaluate(model, eval_dataloader):
    model.eval()
    with torch.no_grad():
        all_labels = []
        all_logits = []
        for batch in eval_dataloader:
            input_nodes, output_nodes, blocks = batch
            x = blocks[0].srcdata["feat"]
            y = blocks[-1].dstdata["label"]
            batch_logits = model.gconv_model(blocks, x)

            all_logits.append(batch_logits)
            all_labels.append(y)

        labels = torch.cat(all_labels)
        logits = torch.cat(all_logits)
        
        # compute metrics: Accuracy
        _, indices = torch.max(logits, dim=1)
        correct = torch.sum(indices == labels)
        acc = correct.item() * 1.0 / len(labels)
    return acc

def Train(model, train_dataloader, val_dataloader, optimizer, n_epochs):
    for epoch in range(n_epochs):
        # Set the model in the training mode.
        model.train()
    
        # # forward (no sampling)
        # loss = model(graph, features, train_mask)
        # optimizer.zero_grad()
        # loss.backward()
        # optimizer.step()

        # forward
        for batch in train_dataloader:
            input_nodes, output_nodes, blocks = batch
            x = blocks[0].srcdata["feat"]
            y = blocks[-1].dstdata["label"]
            loss = model(blocks, x, y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        # evaluate on the validation set
        acc = Evaluate(model, val_dataloader)
        print("Epoch {:05d} | Loss {:.4f} | Accuracy {:.4f}"
            .format(epoch, loss.item(), acc))

def Test(model, test_dataloader):
    acc = Evaluate(model, test_dataloader)
    print('Testing Accuracy', acc)

Let's try training a model in this way...

In [None]:
# Hyperparameters
n_hidden = 64
n_layers = 2
dropout = 0.5
aggregator_type = 'mean'

gconv_model = GraphSAGEModel(in_feats,
                             n_hidden,
                             n_classes,
                             n_layers,
                             F.relu,
                             dropout,
                             aggregator_type)

# Node classification task
model = NodeClassification(gconv_model)

In [None]:
# Training hyperparameters
weight_decay = 5e-4
n_epochs = 150
lr = 1e-3

# create the Adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

Train(model, train_dataloader, val_dataloader, optimizer, n_epochs)
Test(model, test_dataloader)

Generally, the results should be very similar to the previous one on this dataset.