### In this file, we will conduct all of our tests

In [1]:
!pip install nbimporter

Collecting nbimporter
  Using cached nbimporter-0.3.4-py3-none-any.whl (4.9 kB)
Installing collected packages: nbimporter
Successfully installed nbimporter-0.3.4


In [9]:
import pandas as pd
import numpy as np
import torch
from dgl.dataloading.pytorch import GraphDataLoader
from tqdm.notebook import tqdm
import nbimporter
import dataset as ds
import model as mfile
from score import test

import os

In [10]:
train_dataset = ds.SyntheticDataset()
batch_size = 1

# We want batch size to be 1 because do not want batched graphs (as this is not the correct structure of our individual molecules)
train_dataloader = GraphDataLoader(train_dataset, batch_size = batch_size, shuffle = True)

In [34]:
from sklearn.metrics import mean_absolute_error as MAE
from os.path import exists

def train(model, epochs, file_name='SavedModels/electron.pth', output=False, debug_batch_interval=5):
    optimizer = torch.optim.Adam(model.parameters(),lr=0.01)
    
    # Try to load best_mae
    best_mae = None
    if exists('SavedModels/bestmae.txt'):
        with open('SavedModels/bestmae.txt', 'r') as f:
            best_mae = float(f.read())
    
    model.train()
    for epoch in tqdm(range(epochs), position=0, desc="Epochs"):
        
        running, batch_running, ct, batch_ct = 0, 0, 0, 0
        print('Epoch', epoch+1)
        for batch_idx, (graph, label) in tqdm(enumerate(train_dataloader), position=1, desc="Batches", total=len(train_dataloader) * batch_size):
            optimizer.zero_grad()

            bf = graph.edata['bond_feats'].float()
            af = graph.ndata['atom_feats'].float()
            y_pred = model(graph, af, bf)
            
#             if y_pred.item() == 0:
#                 print(batch_idx, "pred = 0")
#             if y_pred.item() == 0 and len(DEBUG_PREDS) > 5 and sum(DEBUG_PREDS[-5:-1]) == 0:
#                 print("Cut")
#                 return
            
            # The 23.06 is the same value used in score.py (conversion to kcal/mol)
            # L1 is MAE, L2 is MSE
            loss = torch.nn.functional.l1_loss(y_pred.reshape(1), label) * 23.06 # ((y_pred.reshape(1,-1) - batch_y)**2).sum()
            running += loss.item()
            batch_running += loss.item()
            ct += 1
            batch_ct += 1
            loss.backward()
            optimizer.step()
            
            # Every debug_batch_interval iterations, print the data we've churned through (iterations * data per batch)
            if output and batch_idx % (len(train_dataloader) // debug_batch_interval) == 0:                
                print('Epoch: {} [{}/{} ({:.0f}%)]\tBatch Loss: {:.2f}\tEpoch Loss: {:.2f}'.format(
                          epoch+1, batch_idx, len(train_dataloader) * batch_size,    # current sample num / total num
                          100. * batch_idx / len(train_dataloader), # this batch num's % of total dataset
                          batch_running // batch_ct, # the loss for this batch
                          running // ct) # running loss for the epoch
                     )
                batch_running, batch_ct = 0, 0
                
        this_loss = running / ct
        if output:
            print("\nAverage Loss:", round(running / ct * 100) / 100.0,"\n")
        else:
            print("Epoch", epoch+1, "Average Loss:", round(this_loss * 100) / 100.0)
            
        # Save our model
        if not best_mae:
            best_mae = this_loss
            checkpoint = {'state_dict': model.state_dict(),'optimizer': optimizer.state_dict()}
            torch.save(checkpoint, file_name)
        if this_loss < best_mae:
            best_mae = this_loss
            print("New best model found! Saving with loss of", best_mae)
            
            # Write our best mae so we can keep track every time we retrain
            with open('SavedModels/bestmae.txt', 'w') as f:
                f.write(str(best_mae))
            checkpoint = {'state_dict': model.state_dict(),'optimizer': optimizer.state_dict()}
            torch.save(checkpoint, file_name)

#### Create and Train Model

In [35]:
# All graphs in the list have the same scheme size, so pull the dimensions from the first
node_dim = train_dataset[0][0].ndata['atom_feats'].shape[1]
edge_dim = train_dataset[0][0].edata['bond_feats'].shape[1]
print("Dimensions:", node_dim, "(node),", edge_dim, "(edge)")

Dimensions: 11 (node), 5 (edge)


In [36]:
import dgllife
model = mfile.Electron_MPNN(node_dim, edge_dim)
# Attempt to load model if electron_mpnn.pth exists (check with os)

#### Our Model

Basic Description: \
Our model follows a similar architecture as the MPNN model. It consists of a two linear layers (one at the front, one at the end), a convolution layer, and a GRU layer.

- **fc1**: This linear + relu is our first "line of attack," looking for connectings between our data before we lose information on individual atoms via convolution
- **gnn_layer**: This layer uses convolution involving two hidden layers to try and grab information about neighbors in an efficient manner
- **gru**: To be completely honest, I am not entirely sure I understand GRUs. My only understanding of it is that it serves to eliminate the issue of the vanishing gradient which we could expect to stumble upon after our fc1 and gnn layers. We are experimenting with getting rid of it to better understand its impact
- ~~**fc2**: This fully-connected layer serves as our final decision maker, projecting back into 1 dimension (granted there is only 1 dimension at this point anyways) and trying to making sense of the previously convoluted data~~

Some of the most important modifications of this model which differentiates it from the MPNN model stems from the negative min_PE output labels. For this reason, many Relu's were stripped from the model, both in the architecture itself and in the forward passes. I experimented with a linear "decision" layer at the very end, but this caused the model to try to make an approximation of the output labels which would end up with the average of the output labels (minimizing error with a constant). As you can imagine, this is unideal, so we ended up scrapping this idea.

Training Description: \
To train, I have found that after about 8 epochs, the model begins to stablize. So, the training scheme is planned as follows:

- 8 epochs w/ Adam opt @ 0.1
- 8 epochs w/ Adam opt @ 0.01

This is to help refine the smaller details of the gradient with respect to the weights in our model. This is essentially our own version of momentum because we try to have the model drop mae rather quickly, and then be refined with minute changes in our network.

In [37]:
print(model)

Electron_MPNN(
  (fc1): Sequential(
    (0): Linear(in_features=11, out_features=1, bias=True)
    (1): ReLU()
  )
  (gnn_layer): NNConv(
    (edge_func): Sequential(
      (0): Linear(in_features=5, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=1, bias=True)
    )
  )
)


In [38]:
train(model, 8, output=True)

Epochs:   0%|          | 0/8 [00:00<?, ?it/s]

Epoch 1


Batches:   0%|          | 0/1210 [00:00<?, ?it/s]


Average Loss: 16269.84 

Epoch 2


Batches:   0%|          | 0/1210 [00:00<?, ?it/s]


Average Loss: 12308.54 

Epoch 3


Batches:   0%|          | 0/1210 [00:00<?, ?it/s]


Average Loss: 12205.47 

Epoch 4


Batches:   0%|          | 0/1210 [00:00<?, ?it/s]


Average Loss: 12392.66 

Epoch 5


Batches:   0%|          | 0/1210 [00:00<?, ?it/s]


Average Loss: 12071.68 

Epoch 6


Batches:   0%|          | 0/1210 [00:00<?, ?it/s]


Average Loss: 12293.76 

Epoch 7


Batches:   0%|          | 0/1210 [00:00<?, ?it/s]


Average Loss: 12125.23 

Epoch 8


Batches:   0%|          | 0/1210 [00:00<?, ?it/s]


Average Loss: 12028.69 



In [15]:
# checkpoint = {'state_dict': model.state_dict()}
# torch.save(checkpoint, "electron_mpnn_no_ReLU.pth")

hi


### Load best model

In [5]:
best_model = mfile.Electron_MPNN(node_dim, edge_dim, out_dim=1)
best_model.load_state_dict(torch.load("electron_mpnn_v1_ReLU.pth")["state_dict"])

<All keys matched successfully>

In [43]:
best_model.fc2.weight

Parameter containing:
tensor([[1458.2877]], requires_grad=True)