# **CS485 & CS584 - Homework 3**


In this Colab, we will work to construct our own graph neural network (GNN) using PyTorch Geometric (PyG) and then apply that model on two [Open Graph Benchmark (OGB)](https://ogb.stanford.edu/) datasets. These two datasets will be used to benchmark your model's performance on two different graph-based tasks: 1) node property prediction, predicting properties of single nodes and 2) graph property prediction, predicting properties of entire graphs or subgraphs.

First, we'll review how PyTorch Geometric stores graphs as tensors. Since OGB offers benchmark datasets and evaluation tools for graph learning, complementing PyG, we'll load and inspect an OGB dataset using the `ogb` package, which provides both data loaders and evaluators for large-scale, diverse graph benchmarks.

Then, we will build our own graph neural network using PyTorch Geometric. We will then train and evaluate our model on the OGB node property prediction and graph property prediction tasks.

Now let's get started! This Colab should take 1-2 hours to complete.

**Note**: Make sure to **restart and run all** before submission, so that the intermediate variables / packages will carry over to the next cell.

# Device
You might need to use a GPU for this Colab to run quickly.

Please click `Runtime` and then `Change runtime type`. Then set the `hardware accelerator` to **GPU**. You can switch to a `T4 GPU` instance to access a free T4 GPU with 16 GB of memory.

# Setup
Let's start by downloading the required packages.

In [1]:
import torch
import os
print("PyTorch has version {}".format(torch.__version__))

PyTorch has version 2.1.2


Download the necessary packages for PyG. Make sure that your version of torch matches the output from the cell above. In case of any issues, more information can be found on the [PyG's installation page](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html).

In [2]:
torch_version = str(torch.__version__)
scatter_src = f"https://pytorch-geometric.com/whl/torch-{torch_version}.html"
sparse_src = f"https://pytorch-geometric.com/whl/torch-{torch_version}.html"
!pip install torch-scatter -f $scatter_src
!pip install torch-sparse -f $sparse_src
!pip install torch-geometric
!pip install ogb

Looking in links: https://pytorch-geometric.com/whl/torch-2.1.2.html

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Looking in links: https://pytorch-geometric.com/whl/torch-2.1.2.html

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;4

Check if we can use GPU. If you use GPU, the device should be `cuda`.

In [3]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Device: {}'.format(device))

Device: cpu


# 1 Load Dataset: PyTorch Geometric & Open Graph Benchmark (OGB)


Recall that PyTorch Geometric has two classes for storing and/or transforming graphs into tensor format. One is [`torch_geometric.datasets`](https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html), which contains a variety of common graph datasets. Another is [`torch_geometric.data`](https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html), which provides the data handling of graphs in PyTorch tensors.

Another package we'll use is **OGB**, which provides **standardized**, **large-scale**, and **challenging benchmark datasets** with consistent evaluation protocols. It offers a variety of datasets tailored for different graph tasks, including [Node Classification](https://ogb.stanford.edu/docs/nodeprop/), [Link Prediction](https://ogb.stanford.edu/docs/linkprop/), and [Graph Classification](https://ogb.stanford.edu/docs/graphprop/). Additionally, OGB features leaderboards for each task, such as the [Node Classification Leaderboard](https://ogb.stanford.edu/docs/leader_nodeprop/), allowing for easy comparison of model performance.

Before diving into to graph deep learning, we will first learn how to use these datasets.

## PyG Datasets

The `torch_geometric.datasets` class has many common graph datasets. Here we will explore its usage through one example dataset.

In [4]:
from torch_geometric.datasets import TUDataset

root = './enzymes'
name = 'ENZYMES'

# The ENZYMES dataset
pyg_dataset= TUDataset(root, name)
# You will find that there are 600 graphs in this dataset,
# which means this pyg_dataset is a dataset for graph classification
print(pyg_dataset)

# Let us check the number of classes and number of features in the ENZYMES dataset
num_classes = pyg_dataset.num_classes
num_features = pyg_dataset.num_features
print("{} dataset has {} classes".format(name, num_classes))
print("{} dataset has {} features".format(name, num_features))

Downloading https://www.chrsmrrs.com/graphkerneldatasets/ENZYMES.zip
Processing...


ENZYMES(600)
ENZYMES dataset has 6 classes
ENZYMES dataset has 3 features


Done!


## PyG Data

Now, we know that each PyG dataset stores a list of torch_geometric.data.Data objects, where each torch_geometric.data.Data object represents a graph. We can easily get the Data object by indexing into the dataset. Now, let's examine the details of a graph from a PyG dataset.

For more information such as what is stored in the `Data` object, please refer to the [documentation](https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.Data).

In [5]:
from torch_geometric.datasets import TUDataset

root = './enzymes'
name = 'ENZYMES'
pyg_dataset= TUDataset(root, name)

print(pyg_dataset[100])
# Check the label of the graph with index 100 in the ENZYMES dataset
label = pyg_dataset[100].y.item()
# Check the number of edges the graph with index 200 have
num_edges = pyg_dataset[100].num_edges

print(f'Graph with index 100 has label {label}.')
print(f'Graph with index 100 has {num_edges} edges.')

Data(edge_index=[2, 176], x=[45, 3], y=[1])
Graph with index 100 has label 4.
Graph with index 100 has 176 edges.


## OGB Datasets

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. Its datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can then be evaluated by using the OGB Evaluator in a unified manner.

OGB also supports PyG dataset and data classes. Here we take a look on the `ogbn-arxiv` dataset.

In [6]:
import torch_geometric.transforms as T
from ogb.nodeproppred import PygNodePropPredDataset

dataset_name = 'ogbn-arxiv'
# Load the dataset and transform it to sparse tensor
dataset = PygNodePropPredDataset(name=dataset_name, transform=T.ToSparseTensor())
# You will find that there is only 1 graph in this dataset,
# which means this dataset is for node classification
print(f'The {dataset_name} dataset has {len(dataset)} graph')

# Extract the graph
data = dataset[0]
# Check the number of features
num_features = data.num_features
print(data)
print(f'The graph has {num_features} features')

Downloading http://snap.stanford.edu/ogb/data/nodeproppred/arxiv.zip


Downloaded 0.08 GB: 100%|██████████| 81/81 [00:33<00:00,  2.40it/s]
Processing...


Extracting dataset/arxiv.zip
Loading necessary files...
This might take a while.
Processing graphs...


100%|██████████| 1/1 [00:00<00:00, 16384.00it/s]


Converting graphs into PyG objects...


100%|██████████| 1/1 [00:00<00:00, 888.62it/s]

Saving...



Done!


The ogbn-arxiv dataset has 1 graph
Data(num_nodes=169343, x=[169343, 128], node_year=[169343, 1], y=[169343, 1], adj_t=[169343, 169343, nnz=1166243])
The graph has 128 features


# 2 GNN: Node Property Prediction

In this section we will build our first graph neural network using PyTorch Geometric. Then we will apply it to the task of node property prediction (node classification).

**Node Property Prediction** is a fundamental task in graph machine learning, where the goal is to predict specific properties or labels associated with the nodes in a graph. Formally, given a graph $G = (V, E)$, where $V$ is the set of nodes and $E$ is the set of edges, the objective is to learn a function $f: V \rightarrow \mathcal{Y}$ that maps each node $v \in V$ to a target label $y_v \in \mathcal{Y}$.

- Let $G = (V, E, X)$ be a graph where:  
  - $V = \{v_1, v_2, \dots, v_n\}$ is the set of **nodes**,  
  - $E \subseteq V \times V$ is the set of **edges**,  
  - $X \in \mathbb{R}^{n \times d}$ represents the **feature matrix**, where $x_i \in \mathbb{R}^d$ is the feature vector for node $v_i$.

- The task is to predict node labels $Y = \{y_1, y_2, \dots, y_n\}$, where $y_i \in \mathcal{Y}$ is the label or property associated with node $v_i$.

#### **Common Applications:**
- **Social Networks:** Predicting user attributes like age, interests, or political affiliation.  
- **Biological Networks:** Classifying proteins or genes in protein-protein interaction networks.  
- **Citation Networks:** Predicting research topics or categories of academic papers.


Specifically, we will use GCN as the foundation for your graph neural network ([Kipf et al. (2017)](https://arxiv.org/pdf/1609.02907.pdf)). To do so, we will work with PyG's built-in `GCNConv` layer.

## Setup

In [7]:
import torch
import pandas as pd
import torch.nn.functional as F
import torch_geometric.transforms as T
print(torch.__version__)

2.1.2


## Load and Preprocess the Dataset

In [8]:
from ogb.nodeproppred import PygNodePropPredDataset, Evaluator

dataset_name = 'ogbn-arxiv'
dataset = PygNodePropPredDataset(name=dataset_name, transform=T.ToSparseTensor())
data = dataset[0]
# Make the adjacency matrix to symmetric (undirected graph)
data.adj_t = data.adj_t.to_symmetric()
device = 'cuda' if torch.cuda.is_available() else 'cpu'

data = data.to(device)
split_idx = dataset.get_idx_split()
train_idx = split_idx['train'].to(device)

## GCN Model

Please follow the figure below to implement the `forward` function.


![test](https://drive.google.com/uc?id=128AuYAXNXGg7PIhJJ7e420DoPWKb-RtL)

- **GCNConv:** The `GCNConv` layer performs **graph convolution**, aggregating feature information from neighboring nodes to capture the local structure of the graph and update node embeddings accordingly. For more information please refer to the [GCN documentation](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.GCNConv.html#torch_geometric.nn.conv.GCNConv). We have various other GNN layers, such as `GraphSAGEConv`, `GATConv`, and `GINConv`, as detailed in the [PyG GNN layers documentation](https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#convolutional-layers).

- **Batch Normalization (BN):** **Batch Normalization** normalizes the inputs of each layer to have a consistent distribution, which **accelerates training** and improves **model stability** by reducing internal covariate shift. For more information please refer to the [BN documentation](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html).

- **ReLU:** The **ReLU (Rectified Linear Unit)** activation function introduces **non-linearity** into the model, allowing it to learn complex patterns and relationships in the graph data.

- **Dropout:** **Dropout** is a regularization technique that randomly **deactivates neurons** during training to prevent **overfitting**, enhancing the model's generalization ability on unseen data. For more information about ReLU and Dropout, please refer to the [`torch.nn.functional` documentation](https://pytorch.org/docs/stable/nn.functional.html ).

- **LogSoftmax (in Final Projection Layer):** The **LogSoftmax** function converts the final output scores into **log-probabilities**, making it suitable for **multi-class classification** tasks and ensuring numerical stability when combined with negative log-likelihood loss.

Now let us implement our GCN model!

In [9]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
import torch_geometric.transforms as T


class GCN(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers,
                 dropout, return_embeds=False):
        # TODO: Implement a function that initializes self.convs,
        # self.bns, and self.softmax.

        super(GCN, self).__init__()

        # A list of GCNConv layers
        self.convs = nn.ModuleList()
        self.convs.append(GCNConv(input_dim, hidden_dim))  # First layer
        for _ in range(num_layers - 2):  # Hidden layers
            self.convs.append(GCNConv(hidden_dim, hidden_dim))
        self.convs.append(GCNConv(hidden_dim, output_dim))  # Output layer

        # A list of 1D batch normalization layers
        self.bns = nn.ModuleList([nn.BatchNorm1d(hidden_dim) for _ in range(num_layers - 1)])

        # The log softmax layer
        self.softmax = nn.LogSoftmax(dim=1)

        # Probability of an element getting zeroed
        self.dropout = dropout

        # Skip classification layer and return node embeddings
        self.return_embeds = return_embeds

    def reset_parameters(self):
        for conv in self.convs:
            conv.reset_parameters()
        for bn in self.bns:
            bn.reset_parameters()

    def forward(self, x, adj_t):
      for i, conv in enumerate(self.convs):
          x = conv(x, adj_t)  # Apply GCN layer

          # Apply BatchNorm and ReLU for all but the last layer
          if i < len(self.convs) - 1:
              x = self.bns[i](x)  # Batch normalization
              x = F.relu(x)  # Activation
              x = F.dropout(x, p=self.dropout, training=self.training)  # Dropout

      # If return_embeds is True, return embeddings before softmax
      if self.return_embeds:
          return x

      return self.softmax(x)  # Apply LogSoftmax for classification

In [10]:
def train(model, data, train_idx, optimizer, loss_fn):
    model.train()  # Set the model to training mode
    optimizer.zero_grad()  # Zero gradients

    out = model(data.x, data.adj_t)  # Forward pass

    # Ensure y[train_idx] is a 1D tensor of class labels
    target = data.y[train_idx].view(-1)  # Flatten to ensure it's 1D

    loss = loss_fn(out[train_idx], target)  # Compute loss on training indices

    loss.backward()  # Backpropagation
    optimizer.step()  # Update model parameters

    return loss.item()  # Return loss value

In [11]:
# Test function here
@torch.no_grad()
def test(model, data, split_idx, evaluator, save_model_results=False):
    model.eval()  # Set model to evaluation mode

    out = model(data.x, data.adj_t)  # Forward pass on all data

    y_pred = out.argmax(dim=-1, keepdim=True)  # Get predicted class labels

    train_acc = evaluator.eval({
        'y_true': data.y[split_idx['train']],
        'y_pred': y_pred[split_idx['train']],
    })['acc']
    valid_acc = evaluator.eval({
        'y_true': data.y[split_idx['valid']],
        'y_pred': y_pred[split_idx['valid']],
    })['acc']
    test_acc = evaluator.eval({
        'y_true': data.y[split_idx['test']],
        'y_pred': y_pred[split_idx['test']],
    })['acc']

    if save_model_results:
        print("Saving Model Predictions")

        data_dict = {'y_pred': y_pred.view(-1).cpu().detach().numpy()}
        df = pd.DataFrame(data=data_dict)

        # Save locally as CSV
        df.to_csv('ogbn-arxiv_node.csv', sep=',', index=False)

    return train_acc, valid_acc, test_acc

In [12]:
# Please DO NOT change the args
args = {
    'device': device,
    'num_layers': 3,
    'hidden_dim': 256,
    'dropout': 0.5,
    'lr': 0.01,
    'epochs': 100,
}
args

{'device': 'cpu',
 'num_layers': 3,
 'hidden_dim': 256,
 'dropout': 0.5,
 'lr': 0.01,
 'epochs': 100}

In [13]:
model = GCN(data.num_features, args['hidden_dim'],
            dataset.num_classes, args['num_layers'],
            args['dropout']).to(device)
evaluator = Evaluator(name='ogbn-arxiv')

In [14]:
# Please do not change these args
# Training should take <10min using GPU runtime
# reset the parameters to initial random value
import copy

model.reset_parameters()

optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'])
loss_fn = F.nll_loss

best_model = None
best_valid_acc = 0

for epoch in range(1, 1 + args["epochs"]):
  loss = train(model, data, train_idx, optimizer, loss_fn)
  result = test(model, data, split_idx, evaluator)
  train_acc, valid_acc, test_acc = result
  if valid_acc > best_valid_acc:
      best_valid_acc = valid_acc
      best_model = copy.deepcopy(model)
  print(f'Epoch: {epoch:02d}, '
        f'Loss: {loss:.4f}, '
        f'Train: {100 * train_acc:.2f}%, '
        f'Valid: {100 * valid_acc:.2f}% '
        f'Test: {100 * test_acc:.2f}%')

  return torch.sparse_csr_tensor(rowptr, col, value, self.sizes())


Epoch: 01, Loss: 4.0272, Train: 19.25%, Valid: 25.70% Test: 23.22%
Epoch: 02, Loss: 2.3456, Train: 24.27%, Valid: 21.86% Test: 27.20%
Epoch: 03, Loss: 1.9408, Train: 27.97%, Valid: 24.26% Test: 29.34%
Epoch: 04, Loss: 1.7620, Train: 39.10%, Valid: 40.51% Test: 44.83%
Epoch: 05, Loss: 1.6719, Train: 43.70%, Valid: 44.55% Test: 45.69%
Epoch: 06, Loss: 1.5894, Train: 41.74%, Valid: 39.08% Test: 41.96%
Epoch: 07, Loss: 1.5129, Train: 41.31%, Valid: 33.13% Test: 35.78%
Epoch: 08, Loss: 1.4649, Train: 41.66%, Valid: 33.02% Test: 35.95%
Epoch: 09, Loss: 1.4261, Train: 41.44%, Valid: 33.27% Test: 36.83%
Epoch: 10, Loss: 1.3950, Train: 41.36%, Valid: 34.17% Test: 38.63%
Epoch: 11, Loss: 1.3504, Train: 41.75%, Valid: 34.85% Test: 39.74%
Epoch: 12, Loss: 1.3300, Train: 42.67%, Valid: 35.79% Test: 41.04%
Epoch: 13, Loss: 1.3019, Train: 43.39%, Valid: 37.09% Test: 42.48%
Epoch: 14, Loss: 1.2776, Train: 44.44%, Valid: 39.71% Test: 44.01%
Epoch: 15, Loss: 1.2542, Train: 45.79%, Valid: 42.00% Test: 44

## Question 1: What are your `best_model` validation and test accuracies? (25 points)

Run the cell below to see the results of your best of model and save your model's predictions to a file named *ogbn-arxiv_node.csv*. You can view this file by clicking on the *Folder* icon on the left side pannel.

**Note**: Make sure you have implemented the above **GCN model** before running this test code.

In [15]:
best_result = test(best_model, data, split_idx, evaluator, save_model_results=True)
train_acc, valid_acc, test_acc = best_result
print(f'Best model: '
      f'Train: {100 * train_acc:.2f}%, '
      f'Valid: {100 * valid_acc:.2f}% '
      f'Test: {100 * test_acc:.2f}%')

Saving Model Predictions
Best model: Train: 73.43%, Valid: 71.73% Test: 70.86%


# 3 GNN: Graph Property Prediction

In this section we will create a graph neural network for graph property prediction (graph classification).

**Graph Property Prediction** is a key task in graph machine learning, where the goal is to predict properties or labels associated with **entire graphs** rather than individual nodes or edges. This task is common in applications such as molecular property prediction, social network analysis, and program verification.


- Let $\mathcal{G} = \{G_1, G_2, \dots, G_N\}$ represent a collection of graphs, where each graph $G_i = (V_i, E_i, X_i)$ consists of:
  - $V_i$ : the set of nodes,
  - $E_i$ : the set of edges,
  - $X_i \in \mathbb{R}^{|V_i| \times d}$ : the node feature matrix, where each node has $d$-dimensional features.

- The objective is to learn a function $f_\theta: \mathcal{G} \rightarrow \mathcal{Y}$ that maps each graph $G_i$ to a graph-level label $y_i \in \mathcal{Y}$.

#### **Common Applications:**
- **Molecular Property Prediction:** Predicting chemical properties or biological activities of molecules represented as graphs.
- **Social Network Analysis:** Classifying entire networks based on structure or user interactions.
- **Program Verification:** Determining properties of code represented as abstract syntax trees.

Now let us implement to address one graph classification task!



## Load and preprocess the dataset

In [16]:
from ogb.graphproppred import PygGraphPropPredDataset, Evaluator
from torch_geometric.data import DataLoader
from tqdm.notebook import tqdm

# Load the dataset
dataset = PygGraphPropPredDataset(name='ogbg-molhiv')

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Device: {}'.format(device))

split_idx = dataset.get_idx_split()

# Check task type
print(f'Task type: {dataset.task_type}')

Downloading http://snap.stanford.edu/ogb/data/graphproppred/csv_mol_download/hiv.zip


Downloaded 0.00 GB: 100%|██████████| 3/3 [00:01<00:00,  1.67it/s]
Processing...


Extracting dataset/hiv.zip
Loading necessary files...
This might take a while.
Processing graphs...


100%|██████████| 41127/41127 [00:00<00:00, 196866.71it/s]


Converting graphs into PyG objects...


100%|██████████| 41127/41127 [00:00<00:00, 41356.49it/s]


Saving...
Device: cpu
Task type: binary classification


Done!


In [17]:
# Load the dataset splits into corresponding dataloaders
# We will train the graph classification task on a batch of 32 graphs
# Shuffle the order of graphs for training set
train_loader = DataLoader(dataset[split_idx["train"]], batch_size=32, shuffle=True, num_workers=0)
valid_loader = DataLoader(dataset[split_idx["valid"]], batch_size=32, shuffle=False, num_workers=0)
test_loader = DataLoader(dataset[split_idx["test"]], batch_size=32, shuffle=False, num_workers=0)



In [18]:
# Please DO NOT change the args
args = {
    'device': device,
    'num_layers': 5,
    'hidden_dim': 256,
    'dropout': 0.5,
    'lr': 0.001,
    'epochs': 30,
}
args

{'device': 'cpu',
 'num_layers': 5,
 'hidden_dim': 256,
 'dropout': 0.5,
 'lr': 0.001,
 'epochs': 30}

## Graph Prediction Model

### Graph Mini-Batching
Before diving into the actual model, we introduce the concept of mini-batching with graphs. In order to parallelize the processing of a mini-batch of graphs, PyG combines the graphs into a single disconnected graph data object (*torch_geometric.data.Batch*). **[torch_geometric.data.Batch](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Batch.html#torch_geometric.data.Batch)** inherits from *torch_geometric.data.Data* (introduced earlier) and contains an additional attribute called `batch`.

The `batch` attribute is a vector mapping each node to the index of its corresponding graph within the mini-batch:

    batch = [0, ..., 0, 1, ..., n - 2, n - 1, ..., n - 1]

This attribute is crucial for associating which graph each node belongs to and can be used to e.g. average the node embeddings for each graph individually to compute graph level embeddings.



### Implemention

We will reuse the existing GCN model to generate `node_embeddings` and then use  `Global Pooling` over the nodes to create graph level embeddings that can be used to predict properties for the each graph. Remeber that the `batch` attribute will be essential for performining Global Pooling over our mini-batch of graphs. For more information please refer to the [Pooling Layer documentation](https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#global-pooling-layers):

In **node property prediction**, the model focuses on learning individual node representations and predicting properties for each node. However, in **graph property prediction**, the challenge lies in **aggregating node-level information** into a **single graph-level representation** that captures the global structure and features of the entire graph.

To achieve this, we use **global pooling techniques** such as:

- **[Global Mean Pooling](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.pool.global_mean_pool.html#torch_geometric.nn.pool.global_mean_pool):** Takes the average of all node embeddings in the graph.
- **[Global Max Pooling](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.pool.global_max_pool.html#torch_geometric.nn.pool.global_max_pool):** Selects the maximum value across all node embeddings.
- **[Global Add Pooling](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.pool.global_add_pool.html#torch_geometric.nn.pool.global_add_pool):** Sums all node embeddings to form a graph representation.

These pooling operations ensure that the model can **summarize the entire graph**, making it suitable for tasks where the **overall structure** or **interactions** between nodes are critical for prediction (e.g., predicting a molecule's solubility based on the entire molecular structure).

In this experiment, we use [Global Mean Pooling](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.pool.global_mean_pool.html#torch_geometric.nn.pool.global_mean_pool). Now, we have all of the tools to implement a GCN Graph Prediction model!  


In [19]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
import torch_geometric.transforms as T

from ogb.graphproppred.mol_encoder import AtomEncoder
from torch_geometric.nn import global_add_pool, global_mean_pool

### GCN to predict graph property
class GCN_Graph(torch.nn.Module):
    def __init__(self, hidden_dim, output_dim, num_layers, dropout):
        super(GCN_Graph, self).__init__()

        # Load encoders for Atoms in molecule graphs
        self.node_encoder = AtomEncoder(hidden_dim)

        # Node embedding model we have implemented
        self.gnn_node = GCN(hidden_dim, hidden_dim, hidden_dim, num_layers, dropout, return_embeds=True)

        # Global mean pooling layer for graph-level aggregation
        self.pool = global_mean_pool

        # Output layer for prediction
        self.linear = torch.nn.Linear(hidden_dim, output_dim)

    def reset_parameters(self):
        self.gnn_node.reset_parameters()
        self.linear.reset_parameters()

    def forward(self, batched_data):
        # Extract important attributes of our mini-batch
        x, edge_index, batch = batched_data.x, batched_data.edge_index, batched_data.batch

        # Encode node features
        embed = self.node_encoder(x)

        # Obtain node embeddings from GCN using edge_index for the graph connectivity
        out = self.gnn_node(embed, edge_index)  # Use edge_index instead of adj_t

        # Aggregate node features using global mean pooling
        out = self.pool(out, batch)

        # Predict graph property using a linear layer
        out = self.linear(out)

        return out

In [20]:
def train(model, device, data_loader, optimizer, loss_fn):
    # Set the model to training mode
    model.train()
    total_loss = 0

    for step, batch in enumerate(tqdm(data_loader, desc="Iteration")):
        # Move batch data to the correct device
        batch = batch.to(device)

        # Skip if the batch contains only 1 node or is an empty batch
        if batch.x.shape[0] == 1 or batch.batch[-1] == 0:
            continue

        # Mask for labeled nodes (excluding NaNs in the target labels)
        is_labeled = batch.y == batch.y

        # Zero gradients for the optimizer
        optimizer.zero_grad()

        # Feed the batch data into the model
        output = model(batch)

        # Filter the output and labels using the is_labeled mask
        output_labeled = output[is_labeled]
        labels_labeled = batch.y[is_labeled].to(torch.float32)  # Ensure the label is float32

        # Compute the loss for labeled nodes only
        loss = loss_fn(output_labeled, labels_labeled)

        # Backpropagate the loss
        loss.backward()

        # Step the optimizer
        optimizer.step()

        # Track the total loss
        total_loss += loss.item()

    return total_loss / len(data_loader)  # Return average loss for the epoch

In [21]:
# The evaluation function
def eval(model, device, loader, evaluator, save_model_results=False, save_file=None):
    model.eval()
    y_true = []
    y_pred = []

    for step, batch in enumerate(tqdm(loader, desc="Iteration")):
        batch = batch.to(device)

        if batch.x.shape[0] == 1:
            pass
        else:
            with torch.no_grad():
                pred = model(batch)

            y_true.append(batch.y.view(pred.shape).detach().cpu())
            y_pred.append(pred.detach().cpu())

    y_true = torch.cat(y_true, dim = 0).numpy()
    y_pred = torch.cat(y_pred, dim = 0).numpy()

    input_dict = {"y_true": y_true, "y_pred": y_pred}

    if save_model_results:
        print ("Saving Model Predictions")

        # Create a pandas dataframe with a two columns
        # y_pred | y_true
        data = {}
        data['y_pred'] = y_pred.reshape(-1)
        data['y_true'] = y_true.reshape(-1)

        df = pd.DataFrame(data=data)
        # Save to csv
        df.to_csv('ogbg-molhiv_graph_' + save_file + '.csv', sep=',', index=False)

    return evaluator.eval(input_dict)

In [22]:
model = GCN_Graph(args['hidden_dim'],
            dataset.num_tasks, args['num_layers'],
            args['dropout']).to(device)
evaluator = Evaluator(name='ogbg-molhiv')

In [24]:
# Please do not change these args
# Training should take <10min using GPU runtime
from tqdm import tqdm
import copy

model.reset_parameters()

optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'])
loss_fn = torch.nn.BCEWithLogitsLoss()

best_model = None
best_valid_acc = 0

for epoch in range(1, 1 + args["epochs"]):
  print('Training...')
  loss = train(model, device, train_loader, optimizer, loss_fn)

  print('Evaluating...')
  train_result = eval(model, device, train_loader, evaluator)
  val_result = eval(model, device, valid_loader, evaluator)
  test_result = eval(model, device, test_loader, evaluator)

  train_acc, valid_acc, test_acc = train_result[dataset.eval_metric], val_result[dataset.eval_metric], test_result[dataset.eval_metric]
  if valid_acc > best_valid_acc:
      best_valid_acc = valid_acc
      best_model = copy.deepcopy(model)
  print(f'Epoch: {epoch:02d}, '
        f'Loss: {loss:.4f}, '
        f'Train: {100 * train_acc:.2f}%, '
        f'Valid: {100 * valid_acc:.2f}% '
        f'Test: {100 * test_acc:.2f}%')

Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 4/1029 [00:00<00:30, 33.34it/s][A
Iteration:   1%|          | 8/1029 [00:00<00:28, 35.69it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:27, 36.90it/s][A
Iteration:   2%|▏         | 16/1029 [00:00<00:26, 37.52it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:27, 36.91it/s][A
Iteration:   2%|▏         | 24/1029 [00:00<00:27, 36.93it/s][A
Iteration:   3%|▎         | 28/1029 [00:00<00:27, 36.97it/s][A
Iteration:   3%|▎         | 32/1029 [00:00<00:26, 37.11it/s][A
Iteration:   3%|▎         | 36/1029 [00:00<00:26, 37.30it/s][A
Iteration:   4%|▍         | 40/1029 [00:01<00:26, 37.28it/s][A
Iteration:   4%|▍         | 44/1029 [00:01<00:26, 36.97it/s][A
Iteration:   5%|▍         | 48/1029 [00:01<00:26, 36.93it/s][A
Iteration:   5%|▌         | 52/1029 [00:01<00:26, 37.36it/s][A
Iteration:   5%|▌         | 56/1029 [00:01<00:26, 37.19it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:26, 

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 118.85it/s][A
Iteration:   2%|▏         | 24/1029 [00:00<00:08, 117.39it/s][A
Iteration:   3%|▎         | 36/1029 [00:00<00:08, 117.79it/s][A
Iteration:   5%|▍         | 49/1029 [00:00<00:08, 119.20it/s][A
Iteration:   6%|▌         | 62/1029 [00:00<00:08, 120.77it/s][A
Iteration:   7%|▋         | 75/1029 [00:00<00:08, 115.50it/s][A
Iteration:   9%|▊         | 88/1029 [00:00<00:08, 117.31it/s][A
Iteration:  10%|▉         | 101/1029 [00:00<00:07, 118.49it/s][A
Iteration:  11%|█         | 113/1029 [00:00<00:07, 115.69it/s][A
Iteration:  12%|█▏        | 125/1029 [00:01<00:07, 114.04it/s][A
Iteration:  13%|█▎        | 137/1029 [00:01<00:07, 113.04it/s][A
Iteration:  14%|█▍        | 149/1029 [00:01<00:07, 112.13it/s][A
Iteration:  16%|█▌        | 161/1029 [00:01<00:08, 107.77it/s][A
Iteration:  17%|█▋        | 172/1029 [00:01<00:08, 100.44it/s][A
Iteration:  18%|█▊        |

Epoch: 01, Loss: 0.1595, Train: 71.33%, Valid: 70.87% Test: 71.29%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:24, 41.30it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:24, 42.04it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:24, 42.19it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:24, 41.67it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:23, 41.88it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:24, 41.41it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:24, 40.78it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:24, 41.02it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:23, 41.78it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:23, 41.74it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:23, 41.71it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:23, 41.22it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:23, 40.75it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:23, 40.95it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:23,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 114.77it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:08, 118.70it/s][A
Iteration:   4%|▎         | 37/1029 [00:00<00:09, 102.94it/s][A
Iteration:   5%|▍         | 48/1029 [00:00<00:12, 80.60it/s] [A
Iteration:   6%|▌         | 57/1029 [00:00<00:12, 79.53it/s][A
Iteration:   7%|▋         | 69/1029 [00:00<00:10, 88.72it/s][A
Iteration:   8%|▊         | 80/1029 [00:00<00:10, 94.58it/s][A
Iteration:   9%|▉         | 92/1029 [00:00<00:09, 99.81it/s][A
Iteration:  10%|█         | 104/1029 [00:01<00:08, 103.05it/s][A
Iteration:  11%|█▏        | 116/1029 [00:01<00:08, 105.78it/s][A
Iteration:  12%|█▏        | 127/1029 [00:01<00:08, 106.17it/s][A
Iteration:  13%|█▎        | 138/1029 [00:01<00:08, 106.83it/s][A
Iteration:  14%|█▍        | 149/1029 [00:01<00:08, 107.38it/s][A
Iteration:  16%|█▌        | 160/1029 [00:01<00:08, 102.71it/s][A
Iteration:  17%|█▋        | 171/

Epoch: 02, Loss: 0.1489, Train: 73.17%, Valid: 74.65% Test: 67.84%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:24, 41.08it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:24, 41.11it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:24, 40.77it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:24, 40.90it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:23, 42.20it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:23, 42.81it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 43.31it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:23, 42.48it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:23, 41.38it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:23, 41.47it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:23, 40.74it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:23, 41.09it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:23, 41.81it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:23, 40.54it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:23,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 14/1029 [00:00<00:07, 131.25it/s][A
Iteration:   3%|▎         | 28/1029 [00:00<00:07, 129.31it/s][A
Iteration:   4%|▍         | 42/1029 [00:00<00:07, 131.32it/s][A
Iteration:   5%|▌         | 56/1029 [00:00<00:07, 128.33it/s][A
Iteration:   7%|▋         | 70/1029 [00:00<00:07, 129.96it/s][A
Iteration:   8%|▊         | 84/1029 [00:00<00:07, 129.85it/s][A
Iteration:  10%|▉         | 98/1029 [00:00<00:07, 130.19it/s][A
Iteration:  11%|█         | 112/1029 [00:00<00:07, 130.48it/s][A
Iteration:  12%|█▏        | 126/1029 [00:00<00:06, 130.89it/s][A
Iteration:  14%|█▎        | 140/1029 [00:01<00:06, 131.24it/s][A
Iteration:  15%|█▍        | 154/1029 [00:01<00:06, 130.27it/s][A
Iteration:  16%|█▋        | 168/1029 [00:01<00:06, 129.75it/s][A
Iteration:  18%|█▊        | 182/1029 [00:01<00:06, 131.39it/s][A
Iteration:  19%|█▉        | 196/1029 [00:01<00:06, 130.37it/s][A
Iteration:  20%|██        |

Epoch: 03, Loss: 0.1449, Train: 74.03%, Valid: 75.65% Test: 68.65%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 44.59it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 44.02it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 44.55it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.51it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 45.05it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 45.37it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:21, 45.29it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 44.73it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:21, 45.10it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 44.92it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 45.00it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 44.75it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 44.71it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 44.36it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 14/1029 [00:00<00:07, 132.90it/s][A
Iteration:   3%|▎         | 28/1029 [00:00<00:07, 131.61it/s][A
Iteration:   4%|▍         | 42/1029 [00:00<00:07, 132.62it/s][A
Iteration:   5%|▌         | 56/1029 [00:00<00:07, 133.13it/s][A
Iteration:   7%|▋         | 70/1029 [00:00<00:07, 133.42it/s][A
Iteration:   8%|▊         | 84/1029 [00:00<00:07, 133.93it/s][A
Iteration:  10%|▉         | 98/1029 [00:00<00:06, 133.85it/s][A
Iteration:  11%|█         | 112/1029 [00:00<00:06, 134.56it/s][A
Iteration:  12%|█▏        | 126/1029 [00:00<00:06, 133.39it/s][A
Iteration:  14%|█▎        | 140/1029 [00:01<00:06, 133.06it/s][A
Iteration:  15%|█▍        | 154/1029 [00:01<00:06, 133.61it/s][A
Iteration:  16%|█▋        | 168/1029 [00:01<00:06, 132.12it/s][A
Iteration:  18%|█▊        | 182/1029 [00:01<00:06, 132.95it/s][A
Iteration:  19%|█▉        | 196/1029 [00:01<00:06, 132.09it/s][A
Iteration:  20%|██        |

Epoch: 04, Loss: 0.1433, Train: 76.75%, Valid: 75.47% Test: 68.53%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 45.25it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 44.17it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 44.14it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.67it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.61it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 45.16it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:21, 45.43it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 44.73it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:21, 45.14it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 45.47it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 45.24it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 45.47it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 45.73it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:20, 45.94it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:20,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 14/1029 [00:00<00:07, 131.57it/s][A
Iteration:   3%|▎         | 28/1029 [00:00<00:07, 131.73it/s][A
Iteration:   4%|▍         | 42/1029 [00:00<00:07, 132.50it/s][A
Iteration:   5%|▌         | 56/1029 [00:00<00:07, 131.94it/s][A
Iteration:   7%|▋         | 70/1029 [00:00<00:07, 131.27it/s][A
Iteration:   8%|▊         | 84/1029 [00:00<00:07, 131.23it/s][A
Iteration:  10%|▉         | 98/1029 [00:00<00:07, 132.42it/s][A
Iteration:  11%|█         | 112/1029 [00:00<00:06, 133.53it/s][A
Iteration:  12%|█▏        | 126/1029 [00:00<00:06, 133.57it/s][A
Iteration:  14%|█▎        | 140/1029 [00:01<00:06, 133.18it/s][A
Iteration:  15%|█▍        | 154/1029 [00:01<00:06, 132.66it/s][A
Iteration:  16%|█▋        | 168/1029 [00:01<00:06, 132.12it/s][A
Iteration:  18%|█▊        | 182/1029 [00:01<00:06, 133.09it/s][A
Iteration:  19%|█▉        | 196/1029 [00:01<00:06, 132.58it/s][A
Iteration:  20%|██        |

Epoch: 05, Loss: 0.1410, Train: 76.02%, Valid: 77.62% Test: 72.74%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 45.80it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:21, 47.08it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:21, 46.17it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:21, 46.38it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:21, 45.94it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:21, 46.17it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:21, 45.85it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:21, 45.92it/s][A
Iteration:   4%|▍         | 45/1029 [00:00<00:21, 46.13it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 46.12it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 46.16it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:20, 46.40it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:20, 46.31it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:20, 46.09it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:20,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:07, 129.15it/s][A
Iteration:   3%|▎         | 27/1029 [00:00<00:07, 133.12it/s][A
Iteration:   4%|▍         | 41/1029 [00:00<00:07, 133.35it/s][A
Iteration:   5%|▌         | 55/1029 [00:00<00:07, 132.44it/s][A
Iteration:   7%|▋         | 69/1029 [00:00<00:07, 132.61it/s][A
Iteration:   8%|▊         | 83/1029 [00:00<00:07, 131.94it/s][A
Iteration:   9%|▉         | 97/1029 [00:00<00:07, 132.72it/s][A
Iteration:  11%|█         | 111/1029 [00:00<00:06, 132.31it/s][A
Iteration:  12%|█▏        | 125/1029 [00:00<00:07, 127.83it/s][A
Iteration:  14%|█▎        | 139/1029 [00:01<00:06, 130.31it/s][A
Iteration:  15%|█▍        | 153/1029 [00:01<00:06, 129.87it/s][A
Iteration:  16%|█▌        | 167/1029 [00:01<00:06, 131.28it/s][A
Iteration:  18%|█▊        | 181/1029 [00:01<00:06, 131.92it/s][A
Iteration:  19%|█▉        | 195/1029 [00:01<00:06, 131.46it/s][A
Iteration:  20%|██        |

Epoch: 06, Loss: 0.1397, Train: 77.35%, Valid: 77.96% Test: 70.88%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:23, 44.24it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 44.29it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 45.56it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 45.65it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:21, 45.88it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:21, 46.21it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:21, 46.17it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:21, 45.09it/s][A
Iteration:   4%|▍         | 45/1029 [00:00<00:21, 45.38it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 45.36it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 45.05it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 45.75it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 45.66it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 44.92it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 14/1029 [00:00<00:07, 135.81it/s][A
Iteration:   3%|▎         | 28/1029 [00:00<00:07, 131.64it/s][A
Iteration:   4%|▍         | 42/1029 [00:00<00:07, 133.18it/s][A
Iteration:   5%|▌         | 56/1029 [00:00<00:07, 133.57it/s][A
Iteration:   7%|▋         | 70/1029 [00:00<00:07, 134.74it/s][A
Iteration:   8%|▊         | 84/1029 [00:00<00:07, 134.12it/s][A
Iteration:  10%|▉         | 98/1029 [00:00<00:06, 133.32it/s][A
Iteration:  11%|█         | 112/1029 [00:00<00:06, 132.10it/s][A
Iteration:  12%|█▏        | 126/1029 [00:00<00:06, 132.20it/s][A
Iteration:  14%|█▎        | 140/1029 [00:01<00:06, 132.93it/s][A
Iteration:  15%|█▍        | 154/1029 [00:01<00:06, 132.04it/s][A
Iteration:  16%|█▋        | 168/1029 [00:01<00:06, 132.48it/s][A
Iteration:  18%|█▊        | 182/1029 [00:01<00:06, 132.34it/s][A
Iteration:  19%|█▉        | 196/1029 [00:01<00:06, 132.30it/s][A
Iteration:  20%|██        |

Epoch: 07, Loss: 0.1384, Train: 78.85%, Valid: 79.13% Test: 72.61%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 45.42it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:22, 45.07it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 45.36it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.98it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.65it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.65it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:21, 45.20it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:21, 45.40it/s][A
Iteration:   4%|▍         | 45/1029 [00:00<00:21, 45.35it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 45.27it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 45.46it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 45.57it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 45.77it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:20, 45.79it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:20,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 126.10it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:08, 125.33it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 126.65it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:07, 125.95it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:07, 125.92it/s][A
Iteration:   8%|▊         | 78/1029 [00:00<00:07, 126.12it/s][A
Iteration:   9%|▉         | 91/1029 [00:00<00:07, 122.45it/s][A
Iteration:  10%|█         | 104/1029 [00:00<00:07, 122.95it/s][A
Iteration:  11%|█▏        | 117/1029 [00:00<00:07, 121.08it/s][A
Iteration:  13%|█▎        | 130/1029 [00:01<00:07, 122.07it/s][A
Iteration:  14%|█▍        | 143/1029 [00:01<00:07, 121.88it/s][A
Iteration:  15%|█▌        | 156/1029 [00:01<00:07, 123.44it/s][A
Iteration:  16%|█▋        | 169/1029 [00:01<00:06, 123.38it/s][A
Iteration:  18%|█▊        | 182/1029 [00:01<00:06, 125.13it/s][A
Iteration:  19%|█▉        |

Epoch: 08, Loss: 0.1370, Train: 78.27%, Valid: 78.61% Test: 72.46%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:23, 43.94it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:22, 45.08it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 45.10it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 45.30it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 45.40it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 45.23it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.82it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:21, 45.84it/s][A
Iteration:   4%|▍         | 45/1029 [00:00<00:21, 45.70it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 45.43it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 45.50it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 46.05it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 45.66it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:20, 45.73it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:20,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 115.62it/s][A
Iteration:   2%|▏         | 24/1029 [00:00<00:08, 115.55it/s][A
Iteration:   3%|▎         | 36/1029 [00:00<00:08, 114.15it/s][A
Iteration:   5%|▍         | 48/1029 [00:00<00:08, 113.79it/s][A
Iteration:   6%|▌         | 60/1029 [00:00<00:08, 111.48it/s][A
Iteration:   7%|▋         | 72/1029 [00:00<00:08, 111.42it/s][A
Iteration:   8%|▊         | 84/1029 [00:00<00:08, 112.84it/s][A
Iteration:   9%|▉         | 96/1029 [00:00<00:08, 113.11it/s][A
Iteration:  10%|█         | 108/1029 [00:00<00:08, 113.29it/s][A
Iteration:  12%|█▏        | 121/1029 [00:01<00:07, 116.96it/s][A
Iteration:  13%|█▎        | 134/1029 [00:01<00:07, 120.30it/s][A
Iteration:  14%|█▍        | 147/1029 [00:01<00:07, 122.52it/s][A
Iteration:  16%|█▌        | 161/1029 [00:01<00:06, 125.27it/s][A
Iteration:  17%|█▋        | 175/1029 [00:01<00:06, 128.29it/s][A
Iteration:  18%|█▊        | 

Epoch: 09, Loss: 0.1354, Train: 78.40%, Valid: 75.84% Test: 71.62%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:23, 43.60it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:22, 44.40it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 44.25it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.18it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.41it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.57it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 45.15it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:21, 45.50it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:21, 45.44it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 45.34it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 45.47it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 45.51it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 45.83it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 45.58it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 123.48it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:08, 122.65it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:08, 120.91it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:08, 120.01it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:08, 119.64it/s][A
Iteration:   8%|▊         | 78/1029 [00:00<00:07, 120.59it/s][A
Iteration:   9%|▉         | 91/1029 [00:00<00:08, 116.88it/s][A
Iteration:  10%|█         | 103/1029 [00:00<00:08, 115.41it/s][A
Iteration:  11%|█         | 115/1029 [00:00<00:07, 115.97it/s][A
Iteration:  12%|█▏        | 127/1029 [00:01<00:07, 116.73it/s][A
Iteration:  14%|█▎        | 140/1029 [00:01<00:07, 118.30it/s][A
Iteration:  15%|█▍        | 153/1029 [00:01<00:07, 120.50it/s][A
Iteration:  16%|█▌        | 166/1029 [00:01<00:07, 121.56it/s][A
Iteration:  17%|█▋        | 180/1029 [00:01<00:06, 125.03it/s][A
Iteration:  19%|█▉        |

Epoch: 10, Loss: 0.1347, Train: 79.48%, Valid: 77.15% Test: 71.91%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 45.73it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:22, 44.90it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 44.56it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.13it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 43.86it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 43.77it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.55it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 44.12it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 44.20it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 44.28it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 44.41it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 44.54it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 44.41it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 44.24it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:07, 127.95it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:07, 126.93it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 127.01it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:08, 118.06it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:07, 121.17it/s][A
Iteration:   8%|▊         | 78/1029 [00:00<00:07, 122.57it/s][A
Iteration:   9%|▉         | 91/1029 [00:00<00:07, 121.56it/s][A
Iteration:  10%|█         | 104/1029 [00:00<00:07, 122.42it/s][A
Iteration:  11%|█▏        | 117/1029 [00:00<00:07, 121.10it/s][A
Iteration:  13%|█▎        | 130/1029 [00:01<00:07, 123.25it/s][A
Iteration:  14%|█▍        | 144/1029 [00:01<00:07, 125.63it/s][A
Iteration:  15%|█▌        | 158/1029 [00:01<00:06, 127.65it/s][A
Iteration:  17%|█▋        | 172/1029 [00:01<00:06, 128.62it/s][A
Iteration:  18%|█▊        | 185/1029 [00:01<00:06, 128.93it/s][A
Iteration:  19%|█▉        |

Epoch: 11, Loss: 0.1336, Train: 79.63%, Valid: 78.08% Test: 70.85%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 44.62it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 44.08it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 44.72it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.53it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.46it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 43.77it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.51it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 44.27it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 44.10it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 43.64it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:22, 43.13it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:23, 40.95it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:23, 41.87it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:23, 41.03it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:09, 111.45it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:08, 118.39it/s][A
Iteration:   4%|▎         | 38/1029 [00:00<00:08, 121.57it/s][A
Iteration:   5%|▍         | 51/1029 [00:00<00:08, 121.09it/s][A
Iteration:   6%|▌         | 64/1029 [00:00<00:07, 123.07it/s][A
Iteration:   7%|▋         | 77/1029 [00:00<00:07, 122.12it/s][A
Iteration:   9%|▊         | 90/1029 [00:00<00:07, 121.73it/s][A
Iteration:  10%|█         | 103/1029 [00:00<00:08, 115.67it/s][A
Iteration:  11%|█         | 115/1029 [00:00<00:08, 111.78it/s][A
Iteration:  12%|█▏        | 128/1029 [00:01<00:07, 114.55it/s][A
Iteration:  14%|█▎        | 141/1029 [00:01<00:07, 116.96it/s][A
Iteration:  15%|█▍        | 154/1029 [00:01<00:07, 118.73it/s][A
Iteration:  16%|█▌        | 167/1029 [00:01<00:07, 119.28it/s][A
Iteration:  17%|█▋        | 180/1029 [00:01<00:07, 120.08it/s][A
Iteration:  19%|█▉        |

Epoch: 12, Loss: 0.1334, Train: 80.35%, Valid: 74.93% Test: 73.94%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:23, 42.94it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 43.86it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 44.11it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:23, 43.70it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.19it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.12it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.32it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 44.12it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:21, 44.74it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:23, 41.94it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:23, 41.70it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:22, 42.79it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 43.14it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:22, 43.33it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 126.22it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:08, 125.23it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 124.63it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:07, 125.77it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:07, 125.59it/s][A
Iteration:   8%|▊         | 78/1029 [00:00<00:07, 124.97it/s][A
Iteration:   9%|▉         | 92/1029 [00:00<00:07, 126.86it/s][A
Iteration:  10%|█         | 105/1029 [00:00<00:07, 126.29it/s][A
Iteration:  11%|█▏        | 118/1029 [00:00<00:07, 126.71it/s][A
Iteration:  13%|█▎        | 131/1029 [00:01<00:07, 126.97it/s][A
Iteration:  14%|█▍        | 144/1029 [00:01<00:06, 127.80it/s][A
Iteration:  15%|█▌        | 157/1029 [00:01<00:06, 127.67it/s][A
Iteration:  17%|█▋        | 170/1029 [00:01<00:06, 126.66it/s][A
Iteration:  18%|█▊        | 183/1029 [00:01<00:06, 125.49it/s][A
Iteration:  19%|█▉        |

Epoch: 13, Loss: 0.1324, Train: 80.84%, Valid: 77.02% Test: 70.52%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:25, 40.43it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:24, 42.27it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 44.00it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.31it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.44it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.48it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.33it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 44.20it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 44.02it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 44.74it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 45.16it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 45.28it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 44.75it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 44.64it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 125.56it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:07, 126.04it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 126.94it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:07, 125.62it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:07, 125.00it/s][A
Iteration:   8%|▊         | 78/1029 [00:00<00:07, 123.96it/s][A
Iteration:   9%|▉         | 91/1029 [00:00<00:07, 124.34it/s][A
Iteration:  10%|█         | 104/1029 [00:00<00:07, 125.70it/s][A
Iteration:  11%|█▏        | 117/1029 [00:00<00:07, 122.59it/s][A
Iteration:  13%|█▎        | 130/1029 [00:01<00:07, 124.28it/s][A
Iteration:  14%|█▍        | 143/1029 [00:01<00:07, 124.55it/s][A
Iteration:  15%|█▌        | 156/1029 [00:01<00:07, 123.06it/s][A
Iteration:  16%|█▋        | 169/1029 [00:01<00:06, 123.49it/s][A
Iteration:  18%|█▊        | 182/1029 [00:01<00:06, 124.58it/s][A
Iteration:  19%|█▉        |

Epoch: 14, Loss: 0.1318, Train: 80.64%, Valid: 78.27% Test: 72.41%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 44.67it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 43.82it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 43.95it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.53it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.49it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.03it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.21it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 43.57it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:23, 42.24it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:23, 41.60it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:23, 41.93it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:23, 41.89it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 42.87it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:22, 43.53it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 123.34it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:07, 125.46it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:08, 120.27it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:07, 122.19it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:07, 122.81it/s][A
Iteration:   8%|▊         | 78/1029 [00:00<00:07, 124.49it/s][A
Iteration:   9%|▉         | 91/1029 [00:00<00:07, 126.05it/s][A
Iteration:  10%|█         | 104/1029 [00:00<00:07, 124.26it/s][A
Iteration:  11%|█▏        | 118/1029 [00:00<00:07, 125.62it/s][A
Iteration:  13%|█▎        | 131/1029 [00:01<00:07, 120.80it/s][A
Iteration:  14%|█▍        | 144/1029 [00:01<00:07, 118.76it/s][A
Iteration:  15%|█▌        | 157/1029 [00:01<00:07, 120.16it/s][A
Iteration:  17%|█▋        | 170/1029 [00:01<00:07, 121.45it/s][A
Iteration:  18%|█▊        | 183/1029 [00:01<00:06, 123.66it/s][A
Iteration:  19%|█▉        |

Epoch: 15, Loss: 0.1309, Train: 81.39%, Valid: 77.93% Test: 72.92%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 44.87it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 43.30it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 43.56it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 43.99it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:23, 43.49it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 43.88it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.19it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 43.96it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 43.67it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 43.69it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:22, 43.48it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:22, 43.44it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 43.48it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 44.11it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 125.02it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:08, 125.14it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 124.93it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:07, 126.31it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:07, 126.90it/s][A
Iteration:   8%|▊         | 78/1029 [00:00<00:07, 127.23it/s][A
Iteration:   9%|▉         | 91/1029 [00:00<00:07, 127.85it/s][A
Iteration:  10%|█         | 104/1029 [00:00<00:07, 127.37it/s][A
Iteration:  11%|█▏        | 117/1029 [00:00<00:07, 128.11it/s][A
Iteration:  13%|█▎        | 130/1029 [00:01<00:07, 127.18it/s][A
Iteration:  14%|█▍        | 143/1029 [00:01<00:07, 126.41it/s][A
Iteration:  15%|█▌        | 156/1029 [00:01<00:06, 126.91it/s][A
Iteration:  16%|█▋        | 169/1029 [00:01<00:06, 126.39it/s][A
Iteration:  18%|█▊        | 182/1029 [00:01<00:06, 125.06it/s][A
Iteration:  19%|█▉        |

Epoch: 16, Loss: 0.1317, Train: 80.49%, Valid: 79.55% Test: 71.70%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:24, 42.42it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 43.04it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 43.08it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.18it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 43.84it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 43.68it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.26it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:23, 41.47it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:23, 42.39it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:23, 42.56it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:22, 43.32it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:22, 42.72it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 43.33it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:22, 43.17it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 126.77it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:08, 124.83it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 126.02it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:07, 125.98it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:07, 125.47it/s][A
Iteration:   8%|▊         | 79/1029 [00:00<00:07, 127.73it/s][A
Iteration:   9%|▉         | 92/1029 [00:00<00:07, 124.53it/s][A
Iteration:  10%|█         | 105/1029 [00:00<00:07, 124.04it/s][A
Iteration:  11%|█▏        | 118/1029 [00:00<00:07, 124.48it/s][A
Iteration:  13%|█▎        | 131/1029 [00:01<00:07, 122.63it/s][A
Iteration:  14%|█▍        | 144/1029 [00:01<00:07, 121.88it/s][A
Iteration:  15%|█▌        | 157/1029 [00:01<00:07, 122.97it/s][A
Iteration:  17%|█▋        | 170/1029 [00:01<00:06, 122.87it/s][A
Iteration:  18%|█▊        | 183/1029 [00:01<00:06, 123.55it/s][A
Iteration:  19%|█▉        |

Epoch: 17, Loss: 0.1298, Train: 81.20%, Valid: 78.01% Test: 70.11%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 44.75it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 44.29it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 43.92it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.39it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 43.84it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:23, 43.33it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:23, 42.97it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:23, 42.44it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 43.38it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 43.40it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:22, 42.75it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:22, 42.24it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 42.95it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:22, 42.62it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 121.51it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:08, 124.13it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 125.57it/s][A
Iteration:   5%|▌         | 52/1029 [00:00<00:07, 126.86it/s][A
Iteration:   6%|▋         | 65/1029 [00:00<00:07, 126.11it/s][A
Iteration:   8%|▊         | 78/1029 [00:00<00:07, 126.47it/s][A
Iteration:   9%|▉         | 91/1029 [00:00<00:07, 126.37it/s][A
Iteration:  10%|█         | 104/1029 [00:00<00:08, 110.80it/s][A
Iteration:  11%|█▏        | 116/1029 [00:00<00:08, 107.50it/s][A
Iteration:  12%|█▏        | 127/1029 [00:01<00:08, 108.00it/s][A
Iteration:  14%|█▎        | 139/1029 [00:01<00:08, 109.63it/s][A
Iteration:  15%|█▍        | 152/1029 [00:01<00:07, 114.18it/s][A
Iteration:  16%|█▌        | 166/1029 [00:01<00:07, 120.06it/s][A
Iteration:  17%|█▋        | 180/1029 [00:01<00:06, 123.03it/s][A
Iteration:  19%|█▉        |

Epoch: 18, Loss: 0.1296, Train: 82.42%, Valid: 78.61% Test: 74.38%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:24, 42.41it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 43.45it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 42.70it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:23, 43.46it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:23, 43.48it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:23, 43.37it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 43.25it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 43.22it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 42.92it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 43.20it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:22, 42.64it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:22, 42.34it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 42.22it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:22, 42.68it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 14/1029 [00:00<00:07, 131.46it/s][A
Iteration:   3%|▎         | 28/1029 [00:00<00:07, 131.15it/s][A
Iteration:   4%|▍         | 42/1029 [00:00<00:07, 129.37it/s][A
Iteration:   5%|▌         | 55/1029 [00:00<00:07, 127.42it/s][A
Iteration:   7%|▋         | 68/1029 [00:00<00:07, 127.82it/s][A
Iteration:   8%|▊         | 81/1029 [00:00<00:07, 127.90it/s][A
Iteration:   9%|▉         | 95/1029 [00:00<00:07, 129.03it/s][A
Iteration:  10%|█         | 108/1029 [00:00<00:07, 129.06it/s][A
Iteration:  12%|█▏        | 121/1029 [00:00<00:07, 128.25it/s][A
Iteration:  13%|█▎        | 134/1029 [00:01<00:06, 127.95it/s][A
Iteration:  14%|█▍        | 147/1029 [00:01<00:06, 126.80it/s][A
Iteration:  16%|█▌        | 160/1029 [00:01<00:07, 124.09it/s][A
Iteration:  17%|█▋        | 173/1029 [00:01<00:07, 121.34it/s][A
Iteration:  18%|█▊        | 186/1029 [00:01<00:07, 114.94it/s][A
Iteration:  19%|█▉        |

Epoch: 19, Loss: 0.1286, Train: 81.77%, Valid: 78.38% Test: 71.98%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:23, 44.22it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 43.60it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 44.01it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.52it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.13it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.13it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 43.87it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 43.35it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 43.33it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 43.58it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:22, 43.37it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:22, 43.60it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 43.60it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:22, 43.34it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 116.76it/s][A
Iteration:   2%|▏         | 24/1029 [00:00<00:08, 117.91it/s][A
Iteration:   3%|▎         | 36/1029 [00:00<00:08, 117.99it/s][A
Iteration:   5%|▍         | 49/1029 [00:00<00:08, 119.90it/s][A
Iteration:   6%|▌         | 61/1029 [00:00<00:08, 118.31it/s][A
Iteration:   7%|▋         | 73/1029 [00:00<00:08, 114.66it/s][A
Iteration:   8%|▊         | 85/1029 [00:00<00:08, 114.46it/s][A
Iteration:   9%|▉         | 97/1029 [00:00<00:08, 113.97it/s][A
Iteration:  11%|█         | 109/1029 [00:00<00:08, 113.99it/s][A
Iteration:  12%|█▏        | 121/1029 [00:01<00:08, 106.35it/s][A
Iteration:  13%|█▎        | 132/1029 [00:01<00:08, 101.08it/s][A
Iteration:  14%|█▍        | 143/1029 [00:01<00:08, 98.54it/s] [A
Iteration:  15%|█▍        | 153/1029 [00:01<00:08, 98.86it/s][A
Iteration:  16%|█▌        | 165/1029 [00:01<00:08, 103.84it/s][A
Iteration:  17%|█▋        | 1

Epoch: 20, Loss: 0.1282, Train: 81.69%, Valid: 79.11% Test: 70.86%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:24, 41.25it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:24, 41.24it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:24, 41.92it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:23, 42.11it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:24, 41.26it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:24, 40.25it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:24, 40.32it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:24, 40.99it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:23, 41.50it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:23, 40.87it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:23, 41.47it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:23, 41.89it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 42.07it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:22, 42.13it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 117.81it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:08, 120.81it/s][A
Iteration:   4%|▎         | 38/1029 [00:00<00:08, 117.60it/s][A
Iteration:   5%|▍         | 50/1029 [00:00<00:08, 117.92it/s][A
Iteration:   6%|▌         | 63/1029 [00:00<00:08, 120.37it/s][A
Iteration:   7%|▋         | 76/1029 [00:00<00:07, 121.82it/s][A
Iteration:   9%|▊         | 89/1029 [00:00<00:07, 121.84it/s][A
Iteration:  10%|▉         | 102/1029 [00:00<00:07, 120.54it/s][A
Iteration:  11%|█         | 115/1029 [00:00<00:07, 120.94it/s][A
Iteration:  12%|█▏        | 128/1029 [00:01<00:07, 118.58it/s][A
Iteration:  14%|█▎        | 141/1029 [00:01<00:07, 121.28it/s][A
Iteration:  15%|█▍        | 154/1029 [00:01<00:07, 122.97it/s][A
Iteration:  16%|█▌        | 167/1029 [00:01<00:06, 124.04it/s][A
Iteration:  17%|█▋        | 180/1029 [00:01<00:06, 124.03it/s][A
Iteration:  19%|█▉        |

Epoch: 21, Loss: 0.1286, Train: 82.49%, Valid: 78.39% Test: 72.63%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 4/1029 [00:00<00:25, 39.45it/s][A
Iteration:   1%|          | 9/1029 [00:00<00:24, 41.27it/s][A
Iteration:   1%|▏         | 14/1029 [00:00<00:23, 42.72it/s][A
Iteration:   2%|▏         | 19/1029 [00:00<00:23, 42.88it/s][A
Iteration:   2%|▏         | 24/1029 [00:00<00:23, 43.63it/s][A
Iteration:   3%|▎         | 29/1029 [00:00<00:22, 43.83it/s][A
Iteration:   3%|▎         | 34/1029 [00:00<00:22, 44.53it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:22, 44.74it/s][A
Iteration:   4%|▍         | 44/1029 [00:01<00:21, 45.03it/s][A
Iteration:   5%|▍         | 49/1029 [00:01<00:21, 44.95it/s][A
Iteration:   5%|▌         | 54/1029 [00:01<00:22, 44.32it/s][A
Iteration:   6%|▌         | 59/1029 [00:01<00:21, 44.13it/s][A
Iteration:   6%|▌         | 64/1029 [00:01<00:21, 44.31it/s][A
Iteration:   7%|▋         | 69/1029 [00:01<00:21, 44.74it/s][A
Iteration:   7%|▋         | 74/1029 [00:01<00:21, 

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:07, 127.60it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:07, 128.23it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 127.49it/s][A
Iteration:   5%|▌         | 53/1029 [00:00<00:07, 128.67it/s][A
Iteration:   6%|▋         | 66/1029 [00:00<00:07, 127.72it/s][A
Iteration:   8%|▊         | 79/1029 [00:00<00:07, 126.33it/s][A
Iteration:   9%|▉         | 92/1029 [00:00<00:07, 124.67it/s][A
Iteration:  10%|█         | 105/1029 [00:00<00:07, 124.77it/s][A
Iteration:  11%|█▏        | 118/1029 [00:00<00:07, 123.14it/s][A
Iteration:  13%|█▎        | 131/1029 [00:01<00:07, 124.60it/s][A
Iteration:  14%|█▍        | 145/1029 [00:01<00:06, 126.33it/s][A
Iteration:  15%|█▌        | 159/1029 [00:01<00:06, 127.66it/s][A
Iteration:  17%|█▋        | 173/1029 [00:01<00:06, 129.07it/s][A
Iteration:  18%|█▊        | 186/1029 [00:01<00:06, 127.39it/s][A
Iteration:  19%|█▉        |

Epoch: 22, Loss: 0.1282, Train: 82.47%, Valid: 77.61% Test: 73.90%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:23, 43.78it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:22, 44.61it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 44.49it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 45.10it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.83it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.98it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.84it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:21, 45.04it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 44.57it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 44.18it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 44.52it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 44.84it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 44.42it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 44.04it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:08, 126.89it/s][A
Iteration:   3%|▎         | 27/1029 [00:00<00:07, 131.00it/s][A
Iteration:   4%|▍         | 41/1029 [00:00<00:07, 130.27it/s][A
Iteration:   5%|▌         | 55/1029 [00:00<00:07, 132.09it/s][A
Iteration:   7%|▋         | 69/1029 [00:00<00:07, 131.27it/s][A
Iteration:   8%|▊         | 83/1029 [00:00<00:07, 130.98it/s][A
Iteration:   9%|▉         | 97/1029 [00:00<00:07, 132.39it/s][A
Iteration:  11%|█         | 111/1029 [00:00<00:06, 132.14it/s][A
Iteration:  12%|█▏        | 125/1029 [00:00<00:06, 131.58it/s][A
Iteration:  14%|█▎        | 139/1029 [00:01<00:06, 131.19it/s][A
Iteration:  15%|█▍        | 153/1029 [00:01<00:06, 129.88it/s][A
Iteration:  16%|█▌        | 167/1029 [00:01<00:06, 131.59it/s][A
Iteration:  18%|█▊        | 181/1029 [00:01<00:06, 130.69it/s][A
Iteration:  19%|█▉        | 195/1029 [00:01<00:06, 131.10it/s][A
Iteration:  20%|██        |

Epoch: 23, Loss: 0.1267, Train: 82.79%, Valid: 79.61% Test: 74.41%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:23, 42.93it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:22, 44.54it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:22, 45.12it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.94it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.37it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.34it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.51it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 44.77it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 43.82it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 43.65it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:22, 42.61it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:23, 42.06it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:23, 41.22it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:23, 41.23it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:07, 129.01it/s][A
Iteration:   3%|▎         | 26/1029 [00:00<00:07, 128.66it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:07, 129.06it/s][A
Iteration:   5%|▌         | 53/1029 [00:00<00:07, 129.02it/s][A
Iteration:   7%|▋         | 67/1029 [00:00<00:07, 129.88it/s][A
Iteration:   8%|▊         | 80/1029 [00:00<00:07, 127.82it/s][A
Iteration:   9%|▉         | 93/1029 [00:00<00:07, 124.89it/s][A
Iteration:  10%|█         | 106/1029 [00:00<00:07, 122.27it/s][A
Iteration:  12%|█▏        | 119/1029 [00:00<00:07, 121.11it/s][A
Iteration:  13%|█▎        | 132/1029 [00:01<00:07, 121.76it/s][A
Iteration:  14%|█▍        | 145/1029 [00:01<00:07, 120.38it/s][A
Iteration:  15%|█▌        | 158/1029 [00:01<00:07, 120.05it/s][A
Iteration:  17%|█▋        | 171/1029 [00:01<00:07, 120.39it/s][A
Iteration:  18%|█▊        | 184/1029 [00:01<00:06, 122.09it/s][A
Iteration:  19%|█▉        |

Epoch: 24, Loss: 0.1264, Train: 83.56%, Valid: 79.26% Test: 73.09%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:23, 43.67it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:23, 43.08it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 43.51it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.02it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.77it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.39it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.60it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:21, 45.19it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 44.71it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 44.64it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:22, 42.88it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:22, 42.84it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 43.77it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 44.64it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:07, 127.98it/s][A
Iteration:   3%|▎         | 27/1029 [00:00<00:07, 128.95it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:07, 128.69it/s][A
Iteration:   5%|▌         | 53/1029 [00:00<00:07, 127.82it/s][A
Iteration:   7%|▋         | 67/1029 [00:00<00:07, 129.24it/s][A
Iteration:   8%|▊         | 80/1029 [00:00<00:07, 129.19it/s][A
Iteration:   9%|▉         | 93/1029 [00:00<00:07, 128.94it/s][A
Iteration:  10%|█         | 106/1029 [00:00<00:07, 129.13it/s][A
Iteration:  12%|█▏        | 119/1029 [00:00<00:07, 129.24it/s][A
Iteration:  13%|█▎        | 132/1029 [00:01<00:06, 129.29it/s][A
Iteration:  14%|█▍        | 146/1029 [00:01<00:06, 129.67it/s][A
Iteration:  15%|█▌        | 159/1029 [00:01<00:06, 128.86it/s][A
Iteration:  17%|█▋        | 172/1029 [00:01<00:06, 128.81it/s][A
Iteration:  18%|█▊        | 186/1029 [00:01<00:06, 130.18it/s][A
Iteration:  19%|█▉        |

Epoch: 25, Loss: 0.1261, Train: 83.08%, Valid: 77.71% Test: 73.91%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:24, 42.61it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:24, 42.27it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 43.44it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 44.25it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.64it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.69it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 43.99it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:21, 44.97it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:21, 45.32it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:21, 45.52it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 45.16it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:22, 43.31it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:22, 42.95it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:22, 43.34it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 116.01it/s][A
Iteration:   2%|▏         | 24/1029 [00:00<00:08, 117.84it/s][A
Iteration:   4%|▎         | 37/1029 [00:00<00:08, 119.31it/s][A
Iteration:   5%|▍         | 49/1029 [00:00<00:08, 118.79it/s][A
Iteration:   6%|▌         | 62/1029 [00:00<00:08, 119.48it/s][A
Iteration:   7%|▋         | 74/1029 [00:00<00:08, 117.68it/s][A
Iteration:   8%|▊         | 86/1029 [00:00<00:08, 117.33it/s][A
Iteration:  10%|▉         | 98/1029 [00:00<00:07, 116.38it/s][A
Iteration:  11%|█         | 110/1029 [00:00<00:07, 116.08it/s][A
Iteration:  12%|█▏        | 123/1029 [00:01<00:07, 119.53it/s][A
Iteration:  13%|█▎        | 136/1029 [00:01<00:07, 120.68it/s][A
Iteration:  14%|█▍        | 149/1029 [00:01<00:07, 121.34it/s][A
Iteration:  16%|█▌        | 162/1029 [00:01<00:07, 121.69it/s][A
Iteration:  17%|█▋        | 175/1029 [00:01<00:06, 123.90it/s][A
Iteration:  18%|█▊        | 

Epoch: 26, Loss: 0.1261, Train: 83.16%, Valid: 77.86% Test: 72.38%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:25, 40.78it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:24, 42.20it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 42.86it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:23, 42.71it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:23, 42.83it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:23, 41.90it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:23, 41.75it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:23, 41.33it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:23, 41.48it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:23, 41.37it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:23, 41.37it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:23, 41.60it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:23, 41.74it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:23, 41.44it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|▏         | 13/1029 [00:00<00:07, 127.26it/s][A
Iteration:   3%|▎         | 27/1029 [00:00<00:07, 129.31it/s][A
Iteration:   4%|▍         | 41/1029 [00:00<00:07, 130.45it/s][A
Iteration:   5%|▌         | 55/1029 [00:00<00:07, 130.96it/s][A
Iteration:   7%|▋         | 69/1029 [00:00<00:07, 130.45it/s][A
Iteration:   8%|▊         | 83/1029 [00:00<00:07, 129.88it/s][A
Iteration:   9%|▉         | 96/1029 [00:00<00:07, 129.26it/s][A
Iteration:  11%|█         | 110/1029 [00:00<00:07, 130.52it/s][A
Iteration:  12%|█▏        | 124/1029 [00:00<00:06, 131.11it/s][A
Iteration:  13%|█▎        | 138/1029 [00:01<00:06, 130.96it/s][A
Iteration:  15%|█▍        | 152/1029 [00:01<00:06, 130.72it/s][A
Iteration:  16%|█▌        | 166/1029 [00:01<00:06, 131.70it/s][A
Iteration:  17%|█▋        | 180/1029 [00:01<00:06, 130.18it/s][A
Iteration:  19%|█▉        | 194/1029 [00:01<00:06, 130.29it/s][A
Iteration:  20%|██        |

Epoch: 27, Loss: 0.1256, Train: 83.96%, Valid: 78.82% Test: 73.34%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:22, 45.00it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:22, 45.56it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:23, 43.73it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:22, 43.92it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:22, 44.89it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:22, 44.40it/s][A
Iteration:   3%|▎         | 35/1029 [00:00<00:22, 44.66it/s][A
Iteration:   4%|▍         | 40/1029 [00:00<00:22, 44.87it/s][A
Iteration:   4%|▍         | 45/1029 [00:01<00:22, 44.63it/s][A
Iteration:   5%|▍         | 50/1029 [00:01<00:22, 44.35it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:21, 44.37it/s][A
Iteration:   6%|▌         | 60/1029 [00:01<00:21, 44.75it/s][A
Iteration:   6%|▋         | 65/1029 [00:01<00:21, 44.69it/s][A
Iteration:   7%|▋         | 70/1029 [00:01<00:21, 44.28it/s][A
Iteration:   7%|▋         | 75/1029 [00:01<00:21,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 116.04it/s][A
Iteration:   2%|▏         | 24/1029 [00:00<00:08, 117.94it/s][A
Iteration:   3%|▎         | 36/1029 [00:00<00:08, 117.70it/s][A
Iteration:   5%|▍         | 48/1029 [00:00<00:08, 118.57it/s][A
Iteration:   6%|▌         | 60/1029 [00:00<00:08, 119.05it/s][A
Iteration:   7%|▋         | 72/1029 [00:00<00:08, 118.70it/s][A
Iteration:   8%|▊         | 85/1029 [00:00<00:07, 120.38it/s][A
Iteration:  10%|▉         | 98/1029 [00:00<00:07, 120.74it/s][A
Iteration:  11%|█         | 111/1029 [00:00<00:07, 120.49it/s][A
Iteration:  12%|█▏        | 124/1029 [00:01<00:07, 119.79it/s][A
Iteration:  13%|█▎        | 137/1029 [00:01<00:07, 120.83it/s][A
Iteration:  15%|█▍        | 150/1029 [00:01<00:07, 120.06it/s][A
Iteration:  16%|█▌        | 163/1029 [00:01<00:07, 117.88it/s][A
Iteration:  17%|█▋        | 175/1029 [00:01<00:07, 112.47it/s][A
Iteration:  18%|█▊        | 

Epoch: 28, Loss: 0.1251, Train: 83.69%, Valid: 80.06% Test: 73.35%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:24, 42.32it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:24, 42.01it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:24, 42.18it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:23, 42.14it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:24, 40.38it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:25, 39.79it/s][A
Iteration:   3%|▎         | 34/1029 [00:00<00:25, 39.55it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:24, 39.74it/s][A
Iteration:   4%|▍         | 43/1029 [00:01<00:25, 39.11it/s][A
Iteration:   5%|▍         | 47/1029 [00:01<00:25, 38.80it/s][A
Iteration:   5%|▍         | 51/1029 [00:01<00:25, 38.30it/s][A
Iteration:   5%|▌         | 55/1029 [00:01<00:25, 38.72it/s][A
Iteration:   6%|▌         | 59/1029 [00:01<00:25, 38.06it/s][A
Iteration:   6%|▌         | 63/1029 [00:01<00:25, 38.27it/s][A
Iteration:   7%|▋         | 67/1029 [00:01<00:25,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 115.58it/s][A
Iteration:   2%|▏         | 24/1029 [00:00<00:08, 117.46it/s][A
Iteration:   3%|▎         | 36/1029 [00:00<00:08, 118.31it/s][A
Iteration:   5%|▍         | 49/1029 [00:00<00:08, 119.30it/s][A
Iteration:   6%|▌         | 61/1029 [00:00<00:08, 119.37it/s][A
Iteration:   7%|▋         | 73/1029 [00:00<00:08, 119.49it/s][A
Iteration:   8%|▊         | 85/1029 [00:00<00:07, 118.25it/s][A
Iteration:   9%|▉         | 97/1029 [00:00<00:08, 112.69it/s][A
Iteration:  11%|█         | 109/1029 [00:00<00:08, 111.38it/s][A
Iteration:  12%|█▏        | 121/1029 [00:01<00:08, 110.51it/s][A
Iteration:  13%|█▎        | 133/1029 [00:01<00:08, 110.53it/s][A
Iteration:  14%|█▍        | 145/1029 [00:01<00:08, 108.36it/s][A
Iteration:  15%|█▌        | 156/1029 [00:01<00:08, 106.17it/s][A
Iteration:  16%|█▌        | 167/1029 [00:01<00:08, 105.30it/s][A
Iteration:  17%|█▋        | 

Epoch: 29, Loss: 0.1227, Train: 82.30%, Valid: 75.52% Test: 73.41%
Training...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   0%|          | 5/1029 [00:00<00:25, 40.88it/s][A
Iteration:   1%|          | 10/1029 [00:00<00:24, 40.89it/s][A
Iteration:   1%|▏         | 15/1029 [00:00<00:24, 40.69it/s][A
Iteration:   2%|▏         | 20/1029 [00:00<00:24, 40.90it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:24, 40.32it/s][A
Iteration:   3%|▎         | 30/1029 [00:00<00:25, 39.55it/s][A
Iteration:   3%|▎         | 34/1029 [00:00<00:25, 39.66it/s][A
Iteration:   4%|▍         | 39/1029 [00:00<00:24, 40.72it/s][A
Iteration:   4%|▍         | 44/1029 [00:01<00:23, 41.50it/s][A
Iteration:   5%|▍         | 49/1029 [00:01<00:23, 42.10it/s][A
Iteration:   5%|▌         | 54/1029 [00:01<00:23, 41.45it/s][A
Iteration:   6%|▌         | 59/1029 [00:01<00:22, 42.20it/s][A
Iteration:   6%|▌         | 64/1029 [00:01<00:22, 42.56it/s][A
Iteration:   7%|▋         | 69/1029 [00:01<00:22, 42.50it/s][A
Iteration:   7%|▋         | 74/1029 [00:01<00:22,

Evaluating...



Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 12/1029 [00:00<00:08, 114.95it/s][A
Iteration:   2%|▏         | 25/1029 [00:00<00:08, 122.86it/s][A
Iteration:   4%|▎         | 38/1029 [00:00<00:08, 122.67it/s][A
Iteration:   5%|▍         | 51/1029 [00:00<00:07, 125.43it/s][A
Iteration:   6%|▌         | 64/1029 [00:00<00:07, 125.42it/s][A
Iteration:   7%|▋         | 77/1029 [00:00<00:07, 124.87it/s][A
Iteration:   9%|▊         | 90/1029 [00:00<00:07, 125.45it/s][A
Iteration:  10%|█         | 103/1029 [00:00<00:07, 123.89it/s][A
Iteration:  11%|█▏        | 116/1029 [00:00<00:07, 123.87it/s][A
Iteration:  13%|█▎        | 129/1029 [00:01<00:07, 124.24it/s][A
Iteration:  14%|█▍        | 142/1029 [00:01<00:07, 123.06it/s][A
Iteration:  15%|█▌        | 155/1029 [00:01<00:07, 124.26it/s][A
Iteration:  16%|█▋        | 169/1029 [00:01<00:06, 126.28it/s][A
Iteration:  18%|█▊        | 182/1029 [00:01<00:06, 125.92it/s][A
Iteration:  19%|█▉        |

Epoch: 30, Loss: 0.1236, Train: 82.35%, Valid: 76.10% Test: 72.21%





## Question 2: What are your `best_model` validation and test ROC-AUC scores? (25 points)

Run the cell below to see the results of your best of model and save your model's predictions in files named *ogbg-molhiv_graph_[valid,test].csv*. Again, you can view the files by clicking on the *Folder* icon on the left side pannel.


**Note**: Make sure you have updated the above **GCN model** before running this test code.

In [25]:
train_auroc = eval(best_model, device, train_loader, evaluator)[dataset.eval_metric]
valid_auroc = eval(best_model, device, valid_loader, evaluator, save_model_results=True, save_file="valid")[dataset.eval_metric]
test_auroc  = eval(best_model, device, test_loader, evaluator, save_model_results=True, save_file="test")[dataset.eval_metric]

print(f'Best model: '
    f'Train: {100 * train_auroc:.2f}%, '
    f'Valid: {100 * valid_auroc:.2f}% '
    f'Test: {100 * test_auroc:.2f}%')


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s][A
Iteration:   1%|          | 11/1029 [00:00<00:10, 101.37it/s][A
Iteration:   2%|▏         | 22/1029 [00:00<00:09, 103.86it/s][A
Iteration:   3%|▎         | 33/1029 [00:00<00:09, 104.97it/s][A
Iteration:   4%|▍         | 44/1029 [00:00<00:09, 103.36it/s][A
Iteration:   5%|▌         | 56/1029 [00:00<00:09, 106.84it/s][A
Iteration:   7%|▋         | 68/1029 [00:00<00:08, 109.77it/s][A
Iteration:   8%|▊         | 79/1029 [00:00<00:08, 109.79it/s][A
Iteration:   9%|▉         | 91/1029 [00:00<00:08, 110.79it/s][A
Iteration:  10%|█         | 103/1029 [00:00<00:08, 110.71it/s][A
Iteration:  11%|█         | 115/1029 [00:01<00:08, 109.48it/s][A
Iteration:  12%|█▏        | 126/1029 [00:01<00:08, 106.75it/s][A
Iteration:  13%|█▎        | 137/1029 [00:01<00:08, 105.04it/s][A
Iteration:  14%|█▍        | 148/1029 [00:01<00:08, 104.29it/s][A
Iteration:  15%|█▌        | 159/1029 [00:01<00:08, 101.33it/s][A
Iteration:  17%|█▋        | 

Saving Model Predictions



Iteration:   0%|          | 0/129 [00:00<?, ?it/s][A
Iteration:  11%|█         | 14/129 [00:00<00:00, 133.24it/s][A
Iteration:  22%|██▏       | 28/129 [00:00<00:00, 121.71it/s][A
Iteration:  32%|███▏      | 41/129 [00:00<00:00, 118.12it/s][A
Iteration:  41%|████      | 53/129 [00:00<00:00, 116.55it/s][A
Iteration:  50%|█████     | 65/129 [00:00<00:00, 116.67it/s][A
Iteration:  60%|██████    | 78/129 [00:00<00:00, 118.34it/s][A
Iteration:  71%|███████   | 91/129 [00:00<00:00, 119.21it/s][A
Iteration:  80%|███████▉  | 103/129 [00:00<00:00, 118.23it/s][A
Iteration:  90%|████████▉ | 116/129 [00:00<00:00, 119.00it/s][A
Iteration: 100%|██████████| 129/129 [00:01<00:00, 118.80it/s][A

Saving Model Predictions
Best model: Train: 83.69%, Valid: 80.06% Test: 73.35%





# Submission

When you submit your assignment, you will have to download this file as an `.ipynb` file.