# **CS224W - Colab 2**

In Colab 2, we will work to construct our own graph neural network using PyTorch Geometric (PyG) and then apply that model on two Open Graph Benchmark (OGB) datasets. These two datasets will be used to benchmark your model's performance on two different graph-based tasks: 1) node property prediction, predicting properties of single nodes and 2) graph property prediction, predicting properties of entire graphs or subgraphs.

First, we will learn how PyTorch Geometric stores graphs as PyTorch tensors.

Then, we will load and inspect one of the Open Graph Benchmark (OGB) datasets by using the `ogb` package. OGB is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. The `ogb` package not only provides data loaders for each dataset but also model evaluators.

Lastly, we will build our own graph neural network using PyTorch Geometric. We will then train and evaluate our model on the OGB node property prediction and graph property prediction tasks.

**Note**: Make sure to **sequentially run all the cells in each section**, so that the intermediate variables / packages will carry over to the next cell

We recommend you save a copy of this colab in your drive so you don't lose progress!

The expected time to finish this Colab is 2 hours. However, debugging training loops can easily take a while. So, don't worry at all if it takes you longer! Have fun and good luck on Colab 2 :)

# Device
We recommend using a GPU for this Colab.

Please click `Runtime` and then `Change runtime type`. Then set the `hardware accelerator` to **GPU**.

## Installation

In [None]:
# Install torch geometric
import os
import torch
if 'IS_GRADESCOPE_ENV' not in os.environ:
  torch_version = str(torch.__version__)
  scatter_src = f"https://pytorch-geometric.com/whl/torch-{torch_version}.html"
  sparse_src = f"https://pytorch-geometric.com/whl/torch-{torch_version}.html"
  !pip install torch-scatter -f $scatter_src
  !pip install torch-sparse -f $sparse_src
  !pip install torch-geometric
  !pip install -q git+https://github.com/snap-stanford/deepsnap.git

Looking in links: https://pytorch-geometric.com/whl/torch-2.3.1+cu121.html
Collecting torch-scatter
  Downloading torch_scatter-2.1.2.tar.gz (108 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.0/108.0 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: torch-scatter
  Building wheel for torch-scatter (setup.py) ... [?25l[?25hdone
  Created wheel for torch-scatter: filename=torch_scatter-2.1.2-cp310-cp310-linux_x86_64.whl size=3594777 sha256=417e576ce6fcae924716ca434a553e5ef3647843564287b848e441a0ca274b77
  Stored in directory: /root/.cache/pip/wheels/92/f1/2b/3b46d54b134259f58c8363568569053248040859b1a145b3ce
Successfully built torch-scatter
Installing collected packages: torch-scatter
Successfully installed torch-scatter-2.1.2
Looking in links: https://pytorch-geometric.com/whl/torch-2.3.1+cu121.html
Collecting torch-sparse
  Downloading torch_sparse-0.6.18.tar.gz (2

In [None]:
!pip install ogb

Collecting ogb
  Downloading ogb-1.3.6-py3-none-any.whl.metadata (6.2 kB)
Collecting outdated>=0.2.0 (from ogb)
  Downloading outdated-0.2.2-py2.py3-none-any.whl.metadata (4.7 kB)
Collecting littleutils (from outdated>=0.2.0->ogb)
  Downloading littleutils-0.2.4-py3-none-any.whl.metadata (679 bytes)
Downloading ogb-1.3.6-py3-none-any.whl (78 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.8/78.8 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading outdated-0.2.2-py2.py3-none-any.whl (7.5 kB)
Downloading littleutils-0.2.4-py3-none-any.whl (8.1 kB)
Installing collected packages: littleutils, outdated, ogb
Successfully installed littleutils-0.2.4 ogb-1.3.6 outdated-0.2.2


In [None]:
import torch_geometric
torch_geometric.__version__

'2.5.3'

# 2) Open Graph Benchmark (OGB)

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. Its datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can then be evaluated by using the OGB Evaluator in a unified manner.

## Dataset and Data

OGB also supports PyG dataset and data classes. Here we take a look on the `ogbn-arxiv` dataset.

In [None]:
import torch_geometric.transforms as T
from ogb.nodeproppred import PygNodePropPredDataset

if 'IS_GRADESCOPE_ENV' not in os.environ:
  dataset_name = 'ogbn-arxiv'
  # Load the dataset and transform it to sparse tensor
  dataset = PygNodePropPredDataset(name=dataset_name,
                                  transform=T.ToSparseTensor())
  print('The {} dataset has {} graph'.format(dataset_name, len(dataset)))

  # Extract the graph
  data = dataset[0]
  print(data)

The ogbn-arxiv dataset has 1 graph
Data(num_nodes=169343, x=[169343, 128], node_year=[169343, 1], y=[169343, 1], adj_t=[169343, 169343, nnz=1166243])


## Question 4: How many features are in the ogbn-arxiv graph? (5 points)

In [None]:
def graph_num_features(data):
  # TODO: Implement a function that takes a PyG data object,
  # and returns the number of features in the graph (as an integer).

  num_features = 0

  ############# Your code here ############
  num_features = data.x.shape[1] ## [num_of_nodes, num_of_features]

  #########################################

  return num_features

if 'IS_GRADESCOPE_ENV' not in os.environ:
  dataset = PygNodePropPredDataset(name=dataset_name,
                                  transform=T.ToSparseTensor())
  data = dataset[0]
  num_features = graph_num_features(data)
  print('The graph has {} features'.format(num_features))

The graph has 128 features


# 3) GNN: Node Property Prediction

In this section we will build our first graph neural network using PyTorch Geometric. Then we will apply it to the task of node property prediction (node classification).

Specifically, we will use GCN as the foundation for your graph neural network ([Kipf et al. (2017)](https://arxiv.org/pdf/1609.02907.pdf)). To do so, we will work with PyG's built-in `GCNConv` layer.

## Setup

In [None]:
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [None]:
import os
import torch.nn as nn
from torch.nn import BatchNorm1d, LogSoftmax

In [None]:
import torch
import torch.nn.functional as F
print(torch.__version__)

# The PyG built-in GCNConv
from torch_geometric.nn import GCNConv

import torch_geometric.transforms as T
from ogb.nodeproppred import PygNodePropPredDataset, Evaluator

2.3.1+cu121


## Load and Preprocess the Dataset

In [None]:
dataset_name = 'ogbn-arxiv'
dataset = PygNodePropPredDataset(name=dataset_name,
                                 transform=T.ToSparseTensor())
data = dataset[0]

# Make the adjacency matrix to symmetric
data.adj_t = data.adj_t.to_symmetric()

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# If you use GPU, the device should be cuda
print('Device: {}'.format(device))

# data = data.to(device)
split_idx = dataset.get_idx_split()
train_idx = split_idx['train']
# train_idx = split_idx['train'].to(device)

Device: cuda


In [None]:
print(data)

Data(num_nodes=169343, x=[169343, 128], node_year=[169343, 1], y=[169343, 1], adj_t=[169343, 169343, nnz=2315598])


In [None]:
print("Number of nodes in the graph:\t\t", data.num_nodes)
print("Number of features for each node in the graph:\t\t", data.x.shape[1])

Number of nodes in the graph:		 169343
Number of features for each node in the graph:		 128


## GCN Model

Now we will implement our GCN model!

Please follow the figure below to implement the `forward` function.


![test](https://drive.google.com/uc?id=128AuYAXNXGg7PIhJJ7e420DoPWKb-RtL)

In [None]:
class GCN(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers,
                 dropout_, return_embeds=False):
        # TODO: Implement a function that initializes self.convs,
        # self.bns, and self.softmax.

        super(GCN, self).__init__()

        ############# Your code here ############
        ## Note:
        ## 1. You should use torch.nn.ModuleList for self.convs and self.bns
        ## 2. self.convs has num_layers GCNConv layers
        ## 3. self.bns has num_layers - 1 BatchNorm1d layers
        ## 4. You should use torch.nn.LogSoftmax for self.softmax
        ## 5. The parameters you can set for GCNConv include 'in_channels' and
        ## 'out_channels'. For more information please refer to the documentation:
        ## https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.conv.GCNConv
        ## 6. The only parameter you need to set for BatchNorm1d is 'num_features'
        ## For more information please refer to the documentation:
        ## https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html
        ## (~10 lines of code)

        self.dropout_rate = dropout_
        self.return_embeds = return_embeds

        self.convs = nn.ModuleList()
        self.convs.append(GCNConv(input_dim, hidden_dim))
        for i in range(num_layers - 2):
          self.convs.append(GCNConv(hidden_dim, hidden_dim))
        self.convs.append(GCNConv(hidden_dim, output_dim))
        self.bns = nn.ModuleList()
        for i in range(num_layers - 1):
          self.bns.append(BatchNorm1d(hidden_dim))
        self.softmax = LogSoftmax(dim = 1)

        #########################################

    def reset_parameters(self):
        for conv in self.convs:
            conv.reset_parameters()
        for bn in self.bns:
            bn.reset_parameters()

    def forward(self, x, adj_t):
        ############# Your code here ############
        ## Note:
        ## 1. Construct the network as shown in the figure
        ## 2. torch.nn.functional.relu and torch.nn.functional.dropout are useful
        ## For more information please refer to the documentation:
        ## https://pytorch.org/docs/stable/nn.functional.html
        ## 3. Don't forget to set F.dropout training to self.training
        ## 4. If return_embeds is True, then skip the last softmax layer
        ## (~7 lines of code)

        for i in range(len(self.convs) - 1):
           x = self.convs[i](x, adj_t)
           x = self.bns[i](x)
           x = F.relu(x)
           x = F.dropout(x, p = self.dropout_rate, training = self.training)

        x = self.convs[-1](x, adj_t) # The last block does not have any BN, relu, Dropout
        if self.return_embeds:   # Return embeddings not classes
          return x
        else:
          return self.softmax(x)     # Return classes
        #########################################

In [None]:
def train(model, data, train_idx, optimizer, loss_fn):
    # TODO: Implement a function that trains the model by
    # using the given optimizer and loss_fn.
    model.train() ## model is in training config
    loss = 0

    ############# Your code here ############
    ## Note:
    ## 1. Zero grad the optimizer
    ## 2. Feed the data into the model
    ## 3. Slice the model output and label by train_idx
    ## 4. Feed the sliced output and label to loss_fn
    ## (~4 lines of code)

    optimizer.zero_grad()

    output = model(data.x, data.adj_t)
    preds_for_train = output[train_idx]
    truth_labels = data.y[train_idx].squeeze()

    loss = loss_fn(preds_for_train, labels)


    loss.backward()
    optimizer.step()
    #########################################


    return loss.item()

In [None]:
# Test function here
@torch.no_grad()
def test(model, data, split_idx, evaluator, save_model_results=False):
    # TODO: Implement a function that tests the model by
    # using the given split_idx and evaluator.
    model.eval()

    # The output of model on all data
    out = None

    ############# Your code here ############
    ## (~1 line of code)
    ## Note:
    ## 1. No index slicing here
    out = model(data.x, data.adj_t)
    #########################################

    y_pred = out.argmax(dim=-1, keepdim=True)

    train_acc = evaluator.eval({
        'y_true': data.y[split_idx['train']],
        'y_pred': y_pred[split_idx['train']],
    })['acc']
    valid_acc = evaluator.eval({
        'y_true': data.y[split_idx['valid']],
        'y_pred': y_pred[split_idx['valid']],
    })['acc']
    test_acc = evaluator.eval({
        'y_true': data.y[split_idx['test']],
        'y_pred': y_pred[split_idx['test']],
    })['acc']

    if save_model_results:
      print ("Saving Model Predictions")

      data = {}
      data['y_pred'] = y_pred.view(-1).cpu().detach().numpy()

      df = pd.DataFrame(data=data)
      # Save locally as csv
      df.to_csv('ogbn-arxiv_node.csv', sep=',', index=False)


    return train_acc, valid_acc, test_acc

In [None]:
# Please do not change the args
if 'IS_GRADESCOPE_ENV' not in os.environ:
  args = {
      'device': device,
      'num_layers': 3,
      'hidden_dim': 256,
      'dropout': 0.5,
      'lr': 0.01,
      'epochs': 100,
  }
  args

In [None]:
print("Num features: ", data.num_features)
print("Hidden: ", args['hidden_dim'])
print("num classes: ", dataset.num_classes)

Num features:  128
Hidden:  256
num classes:  40


In [None]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
  model = GCN(input_dim = data.num_features,
              hidden_dim = args['hidden_dim'],
              output_dim = dataset.num_classes,
              num_layers = args['num_layers'],
              dropout_=args['dropout'])
  # model = GCN(input_dim = data.num_features,
  #             hidden_dim = args['hidden_dim'],
  #             output_dim = dataset.num_classes,
  #             num_layers = args['num_layers'],
  #             dropout_=args['dropout']).to(device)

  evaluator = Evaluator(name='ogbn-arxiv')

In [None]:
# Please do not change these args
# Training should take <10min using GPU runtime
import copy
if 'IS_GRADESCOPE_ENV' not in os.environ:
  # reset the parameters to initial random value
  model.reset_parameters()

  optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'])
  loss_fn = F.nll_loss

  best_model = None
  best_valid_acc = 0

  print("Train idx: ", train_idx.shape)
  print("Split train: ", split_idx['train'].shape)
  print("Split test: ", split_idx['train'].shape)
  print("Split val: ", split_idx['valid'].shape)


  for epoch in range(1, 1 + args["epochs"]):
    loss = train(model, data, train_idx, optimizer, loss_fn)
    result = test(model, data, split_idx, evaluator)
    train_acc, valid_acc, test_acc = result
    if valid_acc > best_valid_acc:
        best_valid_acc = valid_acc
        best_model = copy.deepcopy(model)
    print(f'Epoch: {epoch:02d}, '
          f'Loss: {loss:.4f}, '
          f'Train: {100 * train_acc:.2f}%, '
          f'Valid: {100 * valid_acc:.2f}% '
          f'Test: {100 * test_acc:.2f}%')

Train idx:  torch.Size([90941])
Split train:  torch.Size([90941])
Split test:  torch.Size([90941])
Split val:  torch.Size([29799])


  return torch.sparse_csr_tensor(rowptr, col, value, self.sizes())


Epoch: 01, Loss: 4.0876, Train: 18.12%, Valid: 25.54% Test: 23.18%
Epoch: 02, Loss: 2.3840, Train: 21.53%, Valid: 20.66% Test: 26.15%
Epoch: 03, Loss: 2.0094, Train: 38.48%, Valid: 47.07% Test: 48.01%
Epoch: 04, Loss: 1.8520, Train: 43.14%, Valid: 44.09% Test: 43.40%
Epoch: 05, Loss: 1.6997, Train: 38.59%, Valid: 24.74% Test: 22.79%
Epoch: 06, Loss: 1.6251, Train: 38.71%, Valid: 22.98% Test: 19.21%
Epoch: 07, Loss: 1.5445, Train: 38.31%, Valid: 23.36% Test: 19.77%
Epoch: 08, Loss: 1.4798, Train: 38.52%, Valid: 24.25% Test: 20.98%
Epoch: 09, Loss: 1.4359, Train: 39.77%, Valid: 27.93% Test: 26.08%
Epoch: 10, Loss: 1.3995, Train: 41.63%, Valid: 30.66% Test: 31.79%
Epoch: 11, Loss: 1.3633, Train: 43.07%, Valid: 31.52% Test: 32.70%
Epoch: 12, Loss: 1.3287, Train: 44.17%, Valid: 32.32% Test: 33.18%
Epoch: 13, Loss: 1.3047, Train: 45.12%, Valid: 32.97% Test: 33.96%
Epoch: 14, Loss: 1.2866, Train: 46.38%, Valid: 34.88% Test: 36.09%
Epoch: 15, Loss: 1.2622, Train: 48.07%, Valid: 37.52% Test: 39

## Question 5: What are your `best_model` validation and test accuracies?(20 points)

Run the cell below to see the results of your best of model and save your model's predictions to a file named *ogbn-arxiv_node.csv*. You can view this file by clicking on the *Folder* icon on the left side pannel. Report the results on Gradescope.

In [None]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
  best_result = test(best_model, data, split_idx, evaluator, save_model_results=True)
  train_acc, valid_acc, test_acc = best_result
  print(f'Best model: '
        f'Train: {100 * train_acc:.2f}%, '
        f'Valid: {100 * valid_acc:.2f}% '
        f'Test: {100 * test_acc:.2f}%')

Best model: Train: 73.55%, Valid: 71.90% Test: 71.43%


# 4) GNN: Graph Property Prediction

In this section we will create a graph neural network for graph property prediction (graph classification).


In [None]:
from ogb.graphproppred import PygGraphPropPredDataset, Evaluator
from torch_geometric.data import DataLoader
from tqdm.notebook import tqdm

if 'IS_GRADESCOPE_ENV' not in os.environ:
  # Load the dataset
  dataset = PygGraphPropPredDataset(name='ogbg-molhiv')

  device = 'cuda' if torch.cuda.is_available() else 'cpu'
  print('Device: {}'.format(device))

  split_idx = dataset.get_idx_split()

  # Check task type
  print('Task type: {}'.format(dataset.task_type))

Downloading http://snap.stanford.edu/ogb/data/graphproppred/csv_mol_download/hiv.zip


Downloaded 0.00 GB: 100%|██████████| 3/3 [00:02<00:00,  1.24it/s]
Processing...


Extracting dataset/hiv.zip
Loading necessary files...
This might take a while.
Processing graphs...


100%|██████████| 41127/41127 [00:00<00:00, 77431.90it/s]


Converting graphs into PyG objects...


100%|██████████| 41127/41127 [00:01<00:00, 22985.84it/s]


Saving...
Device: cuda
Task type: binary classification


Done!


In [None]:
# Load the dataset splits into corresponding dataloaders
# We will train the graph classification task on a batch of 32 graphs
# Shuffle the order of graphs for training set
if 'IS_GRADESCOPE_ENV' not in os.environ:
  train_loader = DataLoader(dataset[split_idx["train"]], batch_size=32, shuffle=True, num_workers=0)
  valid_loader = DataLoader(dataset[split_idx["valid"]], batch_size=32, shuffle=False, num_workers=0)
  test_loader = DataLoader(dataset[split_idx["test"]], batch_size=32, shuffle=False, num_workers=0)



In [None]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
  # Please do not change the args
  args = {
      'device': device,
      'num_layers': 5,
      'hidden_dim': 256,
      'dropout': 0.5,
      'lr': 0.001,
      'epochs': 30,
  }
  args

### Graph Mini-Batching
Before diving into the actual model, we introduce the concept of mini-batching with graphs. In order to parallelize the processing of a mini-batch of graphs, PyG combines the graphs into a single disconnected graph data object (*torch_geometric.data.Batch*). *torch_geometric.data.Batch* inherits from *torch_geometric.data.Data* (introduced earlier) and contains an additional attribute called `batch`.

The `batch` attribute is a vector mapping each node to the index of its corresponding graph within the mini-batch:

    batch = [0, ..., 0, 1, ..., n - 2, n - 1, ..., n - 1]

This attribute is crucial for associating which graph each node belongs to and can be used to e.g. average the node embeddings for each graph individually to compute graph level embeddings.

### Implemention
Now, we have all of the tools to implement a GCN Graph Prediction model!  

We will reuse the existing GCN model to generate `node_embeddings` and then use  `Global Pooling` over the nodes to create graph level embeddings that can be used to predict properties for the each graph. Remeber that the `batch` attribute will be essential for performining Global Pooling over our mini-batch of graphs.

In [None]:
from ogb.graphproppred.mol_encoder import AtomEncoder
from torch_geometric.nn import global_add_pool, global_mean_pool

### GCN to predict graph property
class GCN_Graph(torch.nn.Module):
    def __init__(self, hidden_dim, output_dim, num_layers, dropout):
        super(GCN_Graph, self).__init__()

        # Load encoders for Atoms in molecule graphs
        self.node_encoder = AtomEncoder(hidden_dim)
        print("Node encoder: ", self.node_encoder)

        # Node embedding model
        # Note that the input_dim and output_dim are set to hidden_dim
        self.gnn_node = GCN(input_dim=hidden_dim, hidden_dim=hidden_dim,
                            output_dim=hidden_dim, num_layers = num_layers,
                            dropout_ = dropout, return_embeds=True) # It returns embeddings not softmax (node prediction part)

        self.pool = None

        ############# Your code here ############
        ## Note:
        ## 1. Initialize self.pool as a global mean pooling layer
        ## For more information please refer to the documentation:
        ## https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#global-pooling-layers
        self.pool = global_mean_pool

        #########################################

        # Output layer
        self.linear = torch.nn.Linear(hidden_dim, output_dim)


    def reset_parameters(self):
      self.gnn_node.reset_parameters()
      self.linear.reset_parameters()

    def forward(self, batched_data):
        # TODO: Implement a function that takes as input a
        # mini-batch of graphs (torch_geometric.data.Batch) and
        # returns the predicted graph property for each graph.
        #
        # NOTE: Since we are predicting graph level properties,
        # your output will be a tensor with dimension equaling
        # the number of graphs in the mini-batch


        # Extract important attributes of our mini-batch
        # print("Batched data type: ", type(batched_data))
        # Batched data type:  <class 'torch_geometric.data.batch.DataBatch'>
        # print("Batched data: ", batched_data)
        # Batched data:  DataBatch(edge_index=[2, 1572], edge_attr=[1572, 3], x=[744, 9], y=[32, 1], num_nodes=744, batch=[744], ptr=[33])
        x, edge_index, batch = batched_data.x, batched_data.edge_index, batched_data.batch
        embed = self.node_encoder(x)

        out = None

        ############# Your code here ############
        ## Note:
        ## 1. Construct node embeddings using existing GCN model
        ## 2. Use the global pooling layer to aggregate features for each individual graph
        ## For more information please refer to the documentation:
        ## https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#global-pooling-layers
        ## 3. Use a linear layer to predict each graph's property
        ## (~3 lines of code)
        node_embeddings = self.gnn_node(embed, edge_index) # Just like model(x, adj_t) in the previous GCN part
        # print("Node embedding shape: ", node_embeddings.shape)
        # Node embedding shape:  torch.Size([744, 256])
        graph_embedding = self.pool(node_embeddings, batch)
        # print("Graph embedding shape: ", graph_embedding.shape)
        # Graph embedding shape:  torch.Size([32, 256])
        out = self.linear(graph_embedding)
        # print("Out shape: ", out.shape)
        # Out shape:  torch.Size([32, 1])
        #########################################

        return out

In [None]:
def train(model, device, data_loader, optimizer, loss_fn):
    # TODO: Implement a function that trains your model by
    # using the given optimizer and loss_fn.
    model.train()
    loss = 0

    for step, batch in enumerate(data_loader):
      batch = batch.to(device)

      if batch.x.shape[0] == 1 or batch.batch[-1] == 0:
          pass
      else:
        ## ignore nan targets (unlabeled) when computing training loss.
        is_labeled = batch.y == batch.y

        ############# Your code here ############
        ## Note:
        ## 1. Zero grad the optimizer
        ## 2. Feed the data into the model
        ## 3. Use `is_labeled` mask to filter output and labels
        ## 4. You may need to change the type of label to torch.float32
        ## 5. Feed the output and label to the loss_fn
        ## (~3 lines of code)
        optimizer.zero_grad()
        output = model(batch)
        labels = batch.y[is_labeled].type(torch.float32)
        output = output[is_labeled]
        loss = loss_fn(input = output, target = labels)
        #########################################

        loss.backward()
        optimizer.step()

    return loss.item()

In [None]:
# The evaluation function
def eval(model, device, loader, evaluator, save_model_results=False, save_file=None):
    model.eval()
    y_true = []
    y_pred = []

    for step, batch in enumerate(loader):
        batch = batch.to(device)

        if batch.x.shape[0] == 1:
            pass
        else:
            with torch.no_grad():
                pred = model(batch)

            y_true.append(batch.y.view(pred.shape).detach().cpu())
            y_pred.append(pred.detach().cpu())

    y_true = torch.cat(y_true, dim = 0).numpy()
    y_pred = torch.cat(y_pred, dim = 0).numpy()

    input_dict = {"y_true": y_true, "y_pred": y_pred}

    if save_model_results:
        print ("Saving Model Predictions")

        # Create a pandas dataframe with a two columns
        # y_pred | y_true
        data = {}
        data['y_pred'] = y_pred.reshape(-1)
        data['y_true'] = y_true.reshape(-1)

        df = pd.DataFrame(data=data)
        # Save to csv
        df.to_csv('ogbg-molhiv_graph_' + save_file + '.csv', sep=',', index=False)

    return evaluator.eval(input_dict)

In [None]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
  model = GCN_Graph(args['hidden_dim'],
              dataset.num_tasks, args['num_layers'],
              args['dropout']).to(device)
  evaluator = Evaluator(name='ogbg-molhiv')

Node encoder:  AtomEncoder(
  (atom_embedding_list): ModuleList(
    (0): Embedding(119, 256)
    (1): Embedding(5, 256)
    (2-3): 2 x Embedding(12, 256)
    (4): Embedding(10, 256)
    (5-6): 2 x Embedding(6, 256)
    (7-8): 2 x Embedding(2, 256)
  )
)


In [None]:
# Please do not change these args
# Training should take <10min using GPU runtime
import copy

if 'IS_GRADESCOPE_ENV' not in os.environ:
  model.reset_parameters()

  optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'])
  loss_fn = torch.nn.BCEWithLogitsLoss()

  best_model = None
  best_valid_acc = 0

  for epoch in range(1, 1 + args["epochs"]):
    # print('Training...')
    loss = train(model, device, train_loader, optimizer, loss_fn)

    # print('Evaluating...')
    train_result = eval(model, device, train_loader, evaluator)
    val_result = eval(model, device, valid_loader, evaluator)
    test_result = eval(model, device, test_loader, evaluator)

    train_acc, valid_acc, test_acc = train_result[dataset.eval_metric], val_result[dataset.eval_metric], test_result[dataset.eval_metric]
    if valid_acc > best_valid_acc:
        best_valid_acc = valid_acc
        best_model = copy.deepcopy(model)
    print(f'Epoch: {epoch:02d}, '
          f'Loss: {loss:.4f}, '
          f'Train: {100 * train_acc:.2f}%, '
          f'Valid: {100 * valid_acc:.2f}% '
          f'Test: {100 * test_acc:.2f}%')

Epoch: 01, Loss: 0.0561, Train: 72.74%, Valid: 73.11% Test: 70.68%
Epoch: 02, Loss: 0.0373, Train: 75.08%, Valid: 74.11% Test: 72.23%
Epoch: 03, Loss: 0.0225, Train: 75.70%, Valid: 69.14% Test: 68.75%
Epoch: 04, Loss: 0.4226, Train: 76.06%, Valid: 74.48% Test: 73.28%
Epoch: 05, Loss: 0.0541, Train: 75.57%, Valid: 73.24% Test: 72.00%
Epoch: 06, Loss: 0.0296, Train: 77.64%, Valid: 76.40% Test: 69.34%
Epoch: 07, Loss: 0.0279, Train: 79.13%, Valid: 72.81% Test: 69.83%
Epoch: 08, Loss: 0.0622, Train: 78.27%, Valid: 73.95% Test: 72.96%
Epoch: 09, Loss: 0.0179, Train: 78.69%, Valid: 76.75% Test: 73.78%
Epoch: 10, Loss: 0.7289, Train: 80.34%, Valid: 74.73% Test: 70.70%
Epoch: 11, Loss: 0.0398, Train: 79.50%, Valid: 76.46% Test: 70.88%
Epoch: 12, Loss: 0.1703, Train: 78.90%, Valid: 77.48% Test: 73.62%
Epoch: 13, Loss: 0.6975, Train: 80.62%, Valid: 75.57% Test: 73.86%
Epoch: 14, Loss: 0.0231, Train: 80.36%, Valid: 77.77% Test: 72.49%
Epoch: 15, Loss: 0.0468, Train: 80.91%, Valid: 77.91% Test: 73

## Question 6: What are your `best_model` validation and test ROC-AUC scores? (20 points)

Run the cell below to see the results of your best of model and save your model's predictions in files named *ogbg-molhiv_graph_[valid,test].csv*. Again, you can view the files by clicking on the *Folder* icon on the left side pannel. Report the results on Gradescope.

In [None]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
  train_auroc = eval(best_model, device, train_loader, evaluator)[dataset.eval_metric]
  valid_auroc = eval(best_model, device, valid_loader, evaluator)[dataset.eval_metric]
  test_auroc  = eval(best_model, device, test_loader, evaluator)[dataset.eval_metric]

  print(f'Best model: '
      f'Train: {100 * train_auroc:.2f}%, '
      f'Valid: {100 * valid_auroc:.2f}% '
      f'Test: {100 * test_auroc:.2f}%')

Best model: Train: 83.80%, Valid: 79.32% Test: 75.71%
