Your goal in this notebook would be predict formation energy of organic molecules from the [QM9 dataset](http://quantum-machine.org/datasets/). We begin with a simple GCNN and gradually build more complex models.

The notebook uses the [pytorch documnetation](https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html) the [pytorch-geometric example](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/qm9_nn_conv.py).

The notebook runs reasonably fast both on CPU and GPU. You will want GPU for the bonus task.

In [None]:
# Install the required packages
# A failure here probably means that Colab environment has changed
# You can grab the latest config from one of the Colab notebooks here:
# https://pytorch-geometric.readthedocs.io/en/latest/notes/colabs.html
!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-1.10.0+cu113.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-1.10.0+cu113.html
!pip install -q torch-cluster -f https://data.pyg.org/whl/torch-1.10.0+cu113.html
!pip install -q git+https://github.com/pyg-team/pytorch_geometric.git

[K     |████████████████████████████████| 7.9 MB 5.4 MB/s 
[K     |████████████████████████████████| 3.5 MB 5.5 MB/s 
[K     |████████████████████████████████| 2.5 MB 5.4 MB/s 
[?25h  Building wheel for torch-geometric (setup.py) ... [?25l[?25hdone


In [None]:
!apt install libgraphviz-dev
!pip install pygraphviz

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libgail-common libgail18 libgtk2.0-0 libgtk2.0-bin libgtk2.0-common
  libgvc6-plugins-gtk libxdot4
Suggested packages:
  gvfs
The following NEW packages will be installed:
  libgail-common libgail18 libgraphviz-dev libgtk2.0-0 libgtk2.0-bin
  libgtk2.0-common libgvc6-plugins-gtk libxdot4
0 upgraded, 8 newly installed, 0 to remove and 40 not upgraded.
Need to get 2,120 kB of archives.
After this operation, 7,128 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgtk2.0-common all 2.24.32-1ubuntu1 [125 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgtk2.0-0 amd64 2.24.32-1ubuntu1 [1,769 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgail18 amd64 2.24.32-1ubuntu1 [14.2 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgail-common amd64 2.24.32

In [None]:
import pathlib

import torch
import torch.nn.functional as F
from torch.nn import Linear, ReLU, Sequential
import networkx as nx

import torch_geometric
import torch_geometric.transforms as T
from torch_geometric.datasets import QM9
from torch_geometric.loader import DataLoader
from torch_geometric.nn import NNConv, Set2Set, GCNConv
from torch_geometric.utils import remove_self_loops, to_networkx

import matplotlib.pyplot as plt
from IPython.display import Image
from tqdm.auto import trange

Data Handling of Graphs
-----------------------

A graph is used to model pairwise relations (edges) between objects (nodes).
A single graph in PyG is described by an instance of `torch_geometric.data.Data`, which holds the following attributes by default:

- `data.x`: Node feature matrix with shape `[num_nodes, num_node_features]`
- `data.edge_index`: Graph connectivity in COO format with shape `[2, num_edges]` and type `torch.long`
- `data.edge_attr`: Edge feature matrix with shape `[num_edges, num_edge_features]`
- `data.y`: Target to train against (may have arbitrary shape), *e.g.*, node-level targets of shape `[num_nodes, *]` or graph-level targets of shape `[1, *]`
- `data.pos`: Node position matrix with shape `[num_nodes, num_dimensions]`

None of these attributes are required.
In fact, the `~torch_geometric.data.Data` object is not even restricted to these attributes.
We can, *e.g.*, extend it by `data.face` to save the connectivity of triangles from a 3D mesh in a tensor with shape `[3, num_faces]` and type `torch.long`.

Note:
    PyTorch and `torchvision` define an example as a tuple of an image and a target.
    We omit this notation in PyG to allow for various data structures in a clean and understandable way.

Let's see the first molecule we have loaded:

# Training a GCNN
Define a transformation of the input data. `SelectTarget` selects the target column and `torch_geometric.transforms.Distance` appends Euclidean distance between `pos` vectors to the edge attributes.

In [None]:
transform = T.Compose([SelectTarget(TARGET_Y), T.Distance(norm=False)])
torch.manual_seed(42)
dataset = QM9(dataset_path, transform=transform).shuffle()


# Split datasets.
test_dataset = dataset[:10000]
val_dataset = dataset[10000:20000]
train_dataset = dataset[20000:40000]

# Create pytorch objects for handling the data
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)
val_loader = DataLoader(val_dataset, batch_size=128, shuffle=False)
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)

In [None]:
class SlimeNet(torch.nn.Module):
    """
    A base bones graph convolutional neural network for molecules.
    If consists of a linear transofrmation of the nodes' features
    followed by a single graph convolution operation, a set2set transformation,
    a hidden layer, and an output layer.
    """

    def __init__(self, hidden_dim: int):
        """
        Args:
            hidden_dim: the number of units used in the hidden layers
        """
        super().__init__()
        self.lin0 = torch.nn.Linear(dataset.num_features, hidden_dim)
        self.conv = GCNConv(hidden_dim, hidden_dim)
        self.set2set = Set2Set(hidden_dim, processing_steps=1)
        self.lin1 = torch.nn.Linear(2 * hidden_dim, hidden_dim)
        self.lin2 = torch.nn.Linear(hidden_dim, 1)

    def forward(self, data: torch_geometric.data.Data) -> torch_geometric.data.Data:
        # Nothing special, just a sequece of transformations
        # But pay attention to the API signatures of the graph layers
        # Apply a linear layer+ReLu to the featues of each node independenly
        out = F.relu(self.lin0(data.x))
        # Apply a graph convolution
        out = F.relu(self.conv(out, data.edge_index))
        # Apply a Set2Set pooling operation from
        # "Order Matters: Sequence to sequence for sets"
        # https://arxiv.org/abs/1511.06391 paper
        out = self.set2set(out, data.batch)
        # And a classic fully-connected head
        out = F.relu(self.lin1(out))
        out = self.lin2(out)
        return out.view(-1)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
def evaluate_model(model: torch.nn.Module,
                   epochs: int = 15):
    """
    A counvinience routine for evaluating predictive models for our subset of QM9
    """
    model = model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min',
                                                           factor=0.7, patience=5,
                                                           min_lr=0.00001)

    def train(epoch):
        model.train()
        loss_all = 0

        for data in train_loader:
            data = data.to(device)
            optimizer.zero_grad()
            loss = F.mse_loss(model(data), data.y)
            loss.backward()
            loss_all += loss.item() * data.num_graphs
            optimizer.step()
        return loss_all / len(train_loader.dataset)


    def test(loader):
        model.eval()
        error = 0

        for data in loader:
            data = data.to(device)
            error += (model(data) * std - data.y * std).abs().sum().item()  # MAE
        return error / len(loader.dataset)


    best_val_error = None
    for epoch in trange(epochs):
        lr = scheduler.optimizer.param_groups[0]['lr']
        loss = train(epoch)
        val_error = test(val_loader)
        scheduler.step(val_error)

        if best_val_error is None or val_error <= best_val_error:
            test_error = test(test_loader)
            best_val_error = val_error

        print(f'Epoch: {epoch:03d}, LR: {lr:7f}, Loss: {loss:.7f}, '
              f'Val MAE: {val_error:.7f}, Test MAE: {test_error:.7f}')

Let's evaluate our simple NN with hidden dimensionalit 16 and 32. We don't train to saturation to save your time, but the wider network shoud sltill slightly outperform the narrower one.

In [None]:
evaluate_model(SlimeNet(16))

  0%|          | 0/15 [00:00<?, ?it/s]

Epoch: 000, LR: 0.001000, Loss: 0.9360331, Val MAE: 682.2458244, Test MAE: 692.3672027
Epoch: 001, LR: 0.001000, Loss: 0.5933711, Val MAE: 532.5897362, Test MAE: 543.4873308
Epoch: 002, LR: 0.001000, Loss: 0.4681427, Val MAE: 453.7991739, Test MAE: 463.2675561
Epoch: 003, LR: 0.001000, Loss: 0.4253800, Val MAE: 456.0827483, Test MAE: 463.2675561
Epoch: 004, LR: 0.001000, Loss: 0.4050135, Val MAE: 389.3669039, Test MAE: 399.0253170
Epoch: 005, LR: 0.001000, Loss: 0.3909459, Val MAE: 427.9928613, Test MAE: 399.0253170
Epoch: 006, LR: 0.001000, Loss: 0.3846821, Val MAE: 444.2028612, Test MAE: 399.0253170
Epoch: 007, LR: 0.001000, Loss: 0.3812430, Val MAE: 475.1589878, Test MAE: 399.0253170
Epoch: 008, LR: 0.001000, Loss: 0.3811800, Val MAE: 458.9288983, Test MAE: 399.0253170
Epoch: 009, LR: 0.001000, Loss: 0.3779068, Val MAE: 434.5003454, Test MAE: 399.0253170
Epoch: 010, LR: 0.001000, Loss: 0.3769868, Val MAE: 508.2450560, Test MAE: 399.0253170
Epoch: 011, LR: 0.000700, Loss: 0.3742536, 

In [None]:
evaluate_model(SlimeNet(32))

  0%|          | 0/15 [00:00<?, ?it/s]

Epoch: 000, LR: 0.001000, Loss: 0.8836413, Val MAE: 668.9182644, Test MAE: 677.8357302
Epoch: 001, LR: 0.001000, Loss: 0.5293412, Val MAE: 475.3917028, Test MAE: 486.8954077
Epoch: 002, LR: 0.001000, Loss: 0.4269356, Val MAE: 456.2680309, Test MAE: 467.2450060
Epoch: 003, LR: 0.001000, Loss: 0.3952314, Val MAE: 385.5656855, Test MAE: 397.0922582
Epoch: 004, LR: 0.001000, Loss: 0.3840458, Val MAE: 394.4692571, Test MAE: 397.0922582
Epoch: 005, LR: 0.001000, Loss: 0.3777316, Val MAE: 418.5294808, Test MAE: 397.0922582
Epoch: 006, LR: 0.001000, Loss: 0.3742994, Val MAE: 514.6011032, Test MAE: 397.0922582
Epoch: 007, LR: 0.001000, Loss: 0.3774682, Val MAE: 357.7148537, Test MAE: 369.2290359
Epoch: 008, LR: 0.001000, Loss: 0.3741331, Val MAE: 406.6236714, Test MAE: 369.2290359
Epoch: 009, LR: 0.001000, Loss: 0.3701222, Val MAE: 496.2081636, Test MAE: 369.2290359
Epoch: 010, LR: 0.001000, Loss: 0.3728932, Val MAE: 357.1016567, Test MAE: 367.9935802
Epoch: 011, LR: 0.001000, Loss: 0.3698994, 

**Task 1.** Now is the time for upgrading the model! Build and test a network like `SlimeNet`, but with 2 and 3 applications of the *same* convolutional layer. [1 point]

In [None]:
class ConvNetSame(torch.nn.Module):
    def __init__(self, hidden_dim: int, n_convs: int):
        """
        Args:
            hidden_dim: the number of units used in the hidden layers
            n_convs: the number of applications of the GCNN layer
        """
        super().__init__()
        self.n_convs = n_convs
        self.lin0 = torch.nn.Linear(dataset.num_features, hidden_dim)
        self.conv = GCNConv(hidden_dim, hidden_dim)
        self.set2set = Set2Set(hidden_dim, processing_steps=1)
        self.lin1 = torch.nn.Linear(2 * hidden_dim, hidden_dim)
        self.lin2 = torch.nn.Linear(hidden_dim, 1)

    def forward(self, data: torch_geometric.data.Data) -> torch_geometric.data.Data:
        out = F.relu(self.lin0(data.x))

        # Remember, you need to pass the graph through the same
        # layer n_convs times, like in a recurrent network
        # Use ReLu activation
 
        for i in range (self.n_convs):
            out = F.relu(self.conv(out, data.edge_index))


        out = self.set2set(out, data.batch)
        out = F.relu(self.lin1(out))
        out = self.lin2(out)
        return out.view(-1)

Evaluate your model. It's [likely](https://pytorch.org/docs/stable/notes/randomness.html) that you should ovserve a slight improvement in quality.

In [None]:
evaluate_model(ConvNetSame(hidden_dim=16, n_convs=2))

  0%|          | 0/15 [00:00<?, ?it/s]

Epoch: 000, LR: 0.001000, Loss: 0.9182220, Val MAE: 645.4170084, Test MAE: 656.1281942
Epoch: 001, LR: 0.001000, Loss: 0.5140843, Val MAE: 489.5669871, Test MAE: 498.9416423
Epoch: 002, LR: 0.001000, Loss: 0.4134679, Val MAE: 463.0202612, Test MAE: 472.7914091
Epoch: 003, LR: 0.001000, Loss: 0.3963693, Val MAE: 466.1647754, Test MAE: 472.7914091
Epoch: 004, LR: 0.001000, Loss: 0.3867703, Val MAE: 416.8781120, Test MAE: 426.7522303
Epoch: 005, LR: 0.001000, Loss: 0.3858043, Val MAE: 449.8055855, Test MAE: 426.7522303
Epoch: 006, LR: 0.001000, Loss: 0.3815535, Val MAE: 387.3030936, Test MAE: 398.1234169
Epoch: 007, LR: 0.001000, Loss: 0.3804718, Val MAE: 396.0384275, Test MAE: 398.1234169
Epoch: 008, LR: 0.001000, Loss: 0.3789983, Val MAE: 376.3686600, Test MAE: 387.4485964
Epoch: 009, LR: 0.001000, Loss: 0.3752103, Val MAE: 398.5379000, Test MAE: 387.4485964
Epoch: 010, LR: 0.001000, Loss: 0.3727615, Val MAE: 437.4597282, Test MAE: 387.4485964
Epoch: 011, LR: 0.001000, Loss: 0.3726820, 

In [None]:
evaluate_model(ConvNetSame(hidden_dim=16, n_convs=3))

  0%|          | 0/15 [00:00<?, ?it/s]

Epoch: 000, LR: 0.001000, Loss: 0.9361793, Val MAE: 648.6305257, Test MAE: 659.6430601
Epoch: 001, LR: 0.001000, Loss: 0.4988113, Val MAE: 429.0909847, Test MAE: 439.7780438
Epoch: 002, LR: 0.001000, Loss: 0.3943590, Val MAE: 368.5521879, Test MAE: 379.7415748
Epoch: 003, LR: 0.001000, Loss: 0.3868148, Val MAE: 462.6906363, Test MAE: 379.7415748
Epoch: 004, LR: 0.001000, Loss: 0.3800317, Val MAE: 426.0008109, Test MAE: 379.7415748
Epoch: 005, LR: 0.001000, Loss: 0.3804891, Val MAE: 413.2333660, Test MAE: 379.7415748
Epoch: 006, LR: 0.001000, Loss: 0.3780001, Val MAE: 445.7485143, Test MAE: 379.7415748
Epoch: 007, LR: 0.001000, Loss: 0.3763028, Val MAE: 502.4044612, Test MAE: 379.7415748
Epoch: 008, LR: 0.001000, Loss: 0.3757135, Val MAE: 343.0632079, Test MAE: 354.5424012
Epoch: 009, LR: 0.001000, Loss: 0.3732493, Val MAE: 453.3214326, Test MAE: 354.5424012
Epoch: 010, LR: 0.001000, Loss: 0.3748743, Val MAE: 537.3832750, Test MAE: 354.5424012
Epoch: 011, LR: 0.001000, Loss: 0.3729310, 

**Task 2.** Build and test a network like `SlimeNet`, but with 2 separate convolutional layers. [1 point]

In [None]:
class Conv2Net(torch.nn.Module):
    def __init__(self, hidden_dim: int):
        """
        Args:
            hidden_dim: the number of units used in the hidden layers
        """
        super().__init__()
        self.lin0 = torch.nn.Linear(dataset.num_features, hidden_dim)
        
        # Define two layers. Remember, to have two layers with
        # different 
        self.conv_1 = GCNConv(hidden_dim, 3*hidden_dim)
        self.conv_2 = GCNConv(3*hidden_dim, hidden_dim)

        self.set2set = Set2Set(hidden_dim, processing_steps=1)
        self.lin1 = torch.nn.Linear(2 * hidden_dim, hidden_dim)
        self.lin2 = torch.nn.Linear(hidden_dim, 1)

    def forward(self, data: torch_geometric.data.Data) -> torch_geometric.data.Data:
        out = F.relu(self.lin0(data.x))

        # Apply the GCNN layers. Don't forget ReLu
        out = F.relu(self.conv_1(out, data.edge_index))
        out = F.relu(self.conv_2(out, data.edge_index))

        out = self.set2set(out, data.batch)
        out = F.relu(self.lin1(out))
        out = self.lin2(out)
        return out.view(-1)

In [None]:
evaluate_model(Conv2Net(hidden_dim=16))

  0%|          | 0/15 [00:00<?, ?it/s]

Epoch: 000, LR: 0.001000, Loss: 0.8874218, Val MAE: 654.0155737, Test MAE: 664.3787274
Epoch: 001, LR: 0.001000, Loss: 0.5776175, Val MAE: 528.5479387, Test MAE: 538.8200646
Epoch: 002, LR: 0.001000, Loss: 0.4518304, Val MAE: 482.2366445, Test MAE: 492.7473313
Epoch: 003, LR: 0.001000, Loss: 0.4164487, Val MAE: 508.0972448, Test MAE: 492.7473313
Epoch: 004, LR: 0.001000, Loss: 0.3949121, Val MAE: 394.9785733, Test MAE: 405.5035545
Epoch: 005, LR: 0.001000, Loss: 0.3875997, Val MAE: 387.8842515, Test MAE: 398.7319875
Epoch: 006, LR: 0.001000, Loss: 0.3817018, Val MAE: 453.0139918, Test MAE: 398.7319875
Epoch: 007, LR: 0.001000, Loss: 0.3784046, Val MAE: 421.9963172, Test MAE: 398.7319875
Epoch: 008, LR: 0.001000, Loss: 0.3762404, Val MAE: 409.3098956, Test MAE: 398.7319875
Epoch: 009, LR: 0.001000, Loss: 0.3733238, Val MAE: 350.9810153, Test MAE: 362.4881527
Epoch: 010, LR: 0.001000, Loss: 0.3728610, Val MAE: 380.1116391, Test MAE: 362.4881527
Epoch: 011, LR: 0.001000, Loss: 0.3719227, 

The final and most diffucult task is to create a model that would use the continuous kernel-based convolutional operator from the "[Neural Message Passing for Quantum Chemistry](https://arxiv.org/abs/1704.01212)" paper. It's implemented in pytorch-geometric as `NNConv` and the documentation is available [here](https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.conv.NNConv), [2 points].

In [None]:
class PaperNet(torch.nn.Module):
    def __init__(self, hidden_dim: int, n_convs: int):
        """
        Args:
            hidden_dim: the number of units used in the hidden layers
            n_convs: the number of applications of the GCNN layer
        """
        super().__init__()
        self.n_convs = n_convs
        self.lin0 = torch.nn.Linear(dataset.num_features, hidden_dim)

        # Define the NNConv layer and the fully-connected nerual network
        # inside it. The suggested structure for the FCN is
        # Linear(5, hidden_dim*2)
        # ReLu activation
        # Linear(hidden_dim*2, hidden_dim * hidden_dim)
        ### YOUR CODE HERE
        
        # Ок.
        mlp = torch.nn.Sequential(torch.nn.Linear(5, hidden_dim*2), 
                                  torch.nn.ReLU(), 
                                  torch.nn.Linear(hidden_dim*2, hidden_dim**2))
        self.nn_conv = NNConv(hidden_dim, hidden_dim, mlp) 


        self.set2set = Set2Set(hidden_dim, processing_steps=1)
        self.lin1 = torch.nn.Linear(2 * hidden_dim, hidden_dim)
        self.lin2 = torch.nn.Linear(hidden_dim, 1)

    def forward(self, data: torch_geometric.data.Data) -> torch_geometric.data.Data:
        out = F.relu(self.lin0(data.x))
        # Pass the graph through the same NNConv
        # layer n_convs times, like in a recurrent network
        # Use ReLu activation
        ### YOUR CODE HERE

        for i in range (self.n_convs):
          out = self.nn_conv(out,data.edge_index, data.edge_attr)
          out = F.relu(out)
        
        out = self.set2set(out, data.batch)
        out = F.relu(self.lin1(out))
        out = self.lin2(out)
        return out.view(-1)

Note that we use a very simplified configuration for evaluation in order to make it quick, so it's not guranteed that our more complex model would have better performance.

In [None]:
evaluate_model(PaperNet(hidden_dim=16, n_convs=2))

  0%|          | 0/15 [00:00<?, ?it/s]

Epoch: 000, LR: 0.001000, Loss: 0.6358014, Val MAE: 614.8798994, Test MAE: 619.8962432
Epoch: 001, LR: 0.001000, Loss: 0.4456574, Val MAE: 481.9851303, Test MAE: 489.2669185
Epoch: 002, LR: 0.001000, Loss: 0.4034356, Val MAE: 541.7584583, Test MAE: 489.2669185
Epoch: 003, LR: 0.001000, Loss: 0.3933703, Val MAE: 497.0831595, Test MAE: 489.2669185
Epoch: 004, LR: 0.001000, Loss: 0.3841857, Val MAE: 450.7609814, Test MAE: 456.4657076
Epoch: 005, LR: 0.001000, Loss: 0.3718493, Val MAE: 375.9374029, Test MAE: 386.3477453
Epoch: 006, LR: 0.001000, Loss: 0.3693654, Val MAE: 500.5757650, Test MAE: 386.3477453
Epoch: 007, LR: 0.001000, Loss: 0.3659932, Val MAE: 354.7794758, Test MAE: 366.3213454
Epoch: 008, LR: 0.001000, Loss: 0.3690130, Val MAE: 388.8877952, Test MAE: 366.3213454
Epoch: 009, LR: 0.001000, Loss: 0.3616685, Val MAE: 442.3294696, Test MAE: 366.3213454
Epoch: 010, LR: 0.001000, Loss: 0.3627604, Val MAE: 459.4355028, Test MAE: 366.3213454
Epoch: 011, LR: 0.001000, Loss: 0.3608640, 