Here's we'll take our building blocks from the previous notebooks and build a preliminary training loop with pytorch

*some things to consider:*<br>
(0) We are not using sensible parameters in our model instantiation, nor will we invoke the full effnet model yet <br>
(1) Based on (0) we are not expecting the model to learn anything at this point<br>
(2) the goal of this notebook is to have a preliminary pipeline in place --> this way we can play around with various parameters within our model and see how they impact the ability of a model to learn something useful

In [1]:
import torch
import numpy as np
import pandas as pd
import os
import h5py
from exabiome.nn.loader import read_dataset, LazySeqDataset
import argparse
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
from model import *
from data import *

In [2]:
hparams = argparse.Namespace(**{'load': False,
                                'window': 4096,
                                'step': 4096,
                                'classify': True,
                                'tgt_tax_lvl': "phylum",
                                'fwd_only': True})

In [72]:
def get_toy_dl(hparams, batch_size=16):
    path = '/global/homes/a/azaidi/ar122_r202.toy.input.h5'
    chunks = LazySeqDataset(hparams, path=path,
                           keep_open=True)
    ds = taxon_ds(chunks, old_pad_seq)
    return DataLoader(ds, batch_size=batch_size, 
                      shuffle=True)#, drop_last=True)

In [73]:
dl = get_toy_dl(hparams)
batch = next(iter(dl))
len(dl), batch[0].shape, batch[1].shape

(1189, torch.Size([16, 1, 4096]), torch.Size([16]))

In [11]:
nn.Sequential(
    get_base_layer(),
    get_dep_sep(32,16),
    get_inv_res(16, 12),
    get_head_layer(12, 1,
                  lin_out_feats=12)
)(batch[0]).shape

torch.Size([16, 1, 12])

In [53]:
def get_model():
    model = nn.Sequential(
        get_base_layer(),
        get_dep_sep(32,16),
        get_inv_res(16, 12),
        get_head_layer(12, 1,
                    lin_out_feats=18))
    return model

We just want to make sure a loss function works for now -- this dataset only has 18 potential classes, so we select 18 out features in the model definition above

In [90]:
m = get_model()
out = m(batch[0]).squeeze(1)
out.shape, batch[1].shape

(torch.Size([16, 18]), torch.Size([16]))

In [91]:
loss = nn.CrossEntropyLoss()(out, batch[1])
loss

tensor(3.0212, grad_fn=<NllLossBackward>)

Looks like our loss function works!

This call below determines if we have a GPU available -- if so, we will want to use that

In [80]:
device = 'cpu' if not torch.cuda.is_available() else 'cuda'
device

'cpu'

We will need to update the gradients after our backward pass -- we could do this manually but it would be better to use one of pytorch's optimizers, we'll go with Adam

In [98]:
torch.optim.Adam(m.parameters())

Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.001
    weight_decay: 0
)

# Preliminary training loop

In [96]:
m = get_model()
loss_fxn = nn.CrossEntropyLoss()
dl = get_toy_dl(hparams)

device = 'cpu' if not torch.cuda.is_available() else 'cuda'
opt = torch.optim.Adam(m.parameters())
m.to(device)
i = 0

for x, y in dl:
#    x, y = batch
    out = m(x.to(device))
    loss = loss_fxn(out.squeeze(1), y.to(device))

    loss.backward() #pytorch computes the gradients for us
    opt.step() #out optimizer does the weight updates for us
    opt.zero_grad() #this could be moved to the start of for loop
    
    #this is just for debugging purposes + to see loss value as we train
    if(i == 10): 
        break
    else:
        print(loss)
        i+=1

tensor(2.9753, grad_fn=<NllLossBackward>)
tensor(2.9210, grad_fn=<NllLossBackward>)
tensor(3.1081, grad_fn=<NllLossBackward>)
tensor(3.0681, grad_fn=<NllLossBackward>)
tensor(2.8755, grad_fn=<NllLossBackward>)
tensor(2.9873, grad_fn=<NllLossBackward>)
tensor(2.9198, grad_fn=<NllLossBackward>)
tensor(2.9386, grad_fn=<NllLossBackward>)
tensor(3.2251, grad_fn=<NllLossBackward>)
tensor(2.9081, grad_fn=<NllLossBackward>)


That's our training loop - pretty simple! It's basically 6 lines of code


for x, y in dl:
>   out = m(x.to(device)) <br>
    loss = loss_fxn(out.squeeze(1), y.to(device)) <br>
    loss.backward() <br>
    opt.step() <br>
    opt.zero_grad() <br>

As stated before, this model will not be learning much of value with the way it's been setup + parameterized -- but this simple model is small enough to work on a pipeline with a cpu