Based off of the efficientnet in Timm: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/efficientnet.py  we will reporpose this architecture for our use with 1-dimension sequence data

In [33]:
import torch
import numpy as np
import pandas as pd
import os
import h5py
from exabiome.nn.loader import read_dataset, LazySeqDataset
import argparse
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.nn.functional as F

In [2]:
path = '/global/homes/a/azaidi/ar122_r202.toy.input.h5'

In [3]:
hparams = argparse.Namespace(**{'load': False,
                            'window': 4096,
                            'step': 4096,
                             'classify': True,
                               'tgt_tax_lvl': "phylum",
                               'fwd_only': True})

In [4]:
chunks = LazySeqDataset(hparams, path=path, keep_open=True)
len(chunks)

19010

Let's use a function to use a transform for the x value (for padding) instead of having that logic in the dataset class 

In [93]:
def old_pad_seq(seq):
    if(len(seq) < 4096):
        padded = torch.zeros(4096)
        padded[:len(seq)] = seq
        return padded
    else:
        return seq

That's not a very clean transform fxn above, but w/e -- Pytorch uses lambda functions in their docs anyways ;)

Pytorch has a F.pad function that would do the work for -- this is causing issues below, we'll proceed with the old_pad_seq fxn for now

In [102]:
def pad_seq(seq):
    if(len(seq) < 4096):
        return F.pad(seq, (0, 4096-len(seq))).long()
    else:
        return seq

We also don't want to do the unsqueezing at the batch level everytime it's called -- let's do it here :)

In [103]:
class taxon_ds(Dataset):
    def __init__(self, chunks, transform=None):
        self.chunks = chunks
        self.transform = transform
    
    def __len__(self):
        return len(self.chunks)
    
    def __getitem__(self, idx):
        x = chunks[idx][1]
        if self.transform:
            x = self.transform(x)
        y = chunks[idx][2]
        return (x.unsqueeze(0), y)

In [109]:
%time
ds = taxon_ds(chunks, pad_seq)

CPU times: user 5 µs, sys: 0 ns, total: 5 µs
Wall time: 10.3 µs


In [113]:
%time
ds = taxon_ds(chunks, old_pad_seq)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.48 µs


If you keep running the cells above, you can see that sometimes the old padding function is faster

F.pad: https://pytorch.org/docs/stable/nn.functional.html#pad has a note about nondeterministic behavior in a backward pass -- not sure if that's relevant, but these are the tensors that will eventually make it into training loop, so maybe something to come back to

In [114]:
#the second sample in chunks is not 4096 in length, let's confirm here that our padding is working
chunks[2][1].shape, ds[2][0].shape

(torch.Size([1180]), torch.Size([1, 4096]))

In [115]:
dl = DataLoader(ds, batch_size=16, shuffle=True)
len(dl)

1189

In [116]:
batch = next(iter(dl))
batch[0].shape, batch[1].shape

(torch.Size([16, 1, 4096]), torch.Size([16]))

# An Efficientnet has basically three parts: 
**(0) Base (Feet) --> (1) Body --> (2) Head**

Within these three parts -- we are **mainly** only using three tools/units of computation:

(0) Conv1d: https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html <br>
(1) BatchNorm1d: https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html <br>
(2) SiLU: https://pytorch.org/docs/stable/generated/torch.nn.SiLU.html <br>

*There are a few other items that are added as well, that we will see below

<br>**Base** (feet):<br>
0) Conv1d --> 1) BatchNorm1d --> 2) SiLU

**Head**: <br>
(0) Conv1d --> (1) BatchNorm1d --> (2) SiLU --> (3) SelectAdaptivePool1d --> (4) Linear

*the base & head are relatively straightforward -- we'll implement both below:*

In [79]:
def get_base_layer(in_chans=1, out_chans=32, ks=3, stride=2):
    return nn.Sequential(
        nn.Conv1d(in_channels= in_chans, out_channels= out_chans, 
                  kernel_size= ks, stride= stride),
        nn.BatchNorm1d(num_features = out_chans),
        nn.SiLU())

In [80]:
#uncomment to confirm that this produces what was expected
#get_base_layer()

In [81]:
def get_head_layer(in_chans=1, out_chans=32, ks=3, stride=2,
              avg_out_feats=10, lin_out_feats=1):
    return nn.Sequential(
        nn.Conv1d(in_channels= in_chans, out_channels= out_chans, 
                  kernel_size= ks, stride= stride),
        nn.BatchNorm1d(num_features = out_chans),
        nn.SiLU(),
        nn.AdaptiveAvgPool1d(output_size=avg_out_feats),
        nn.Linear(in_features=avg_out_feats, out_features=lin_out_feats))

**The parameters chosen above are arbitrary for the time being**

In [82]:
#uncomment to confirm that this produces what was expected
#get_head_layer()

**Body**:<br>
(0) DepthwiseSeparableConv <br>
(1) InvertedResidual (two in a row) <br>
(2) InvertedResidual (two in a row) <br>
(3) InvertedResidual (three in a row) <br>
(4) InvertedResidual (three in a row) <br>
(5) InvertedResidual (three in a row) <br>
(6) InvertedResidual (one) <br>

*ok so what are these layers in the body?*

# DepthwiseSeperable:
(0) Conv1d <br>
(1) BatchNorm1d <br>
(2) SiLU <br>
(3) **Squeeze Excite**<br>
(4) Conv1d <br>
(5) BatchNorm1d <br>
(6) Identity <br>

# InvertedResidual:
(0) Conv1d <br>
(1) BatchNorm1d <br>
(2) SiLU <br>
(3) Conv1d <br>
(4) BatchNorm1d <br>
(5) SiLU <br>
(6) **Squeeze Excite**<br>
(7) Conv1d <br>
(8) BatchNorm1d <br>

**"Squeeze Excite" = Conv1d --> SiLU --> Conv1d**

Let's first define our squeeze excite function -- since we have two conv layers, let's use tuples for our parameters for now

In [69]:
def get_sq_ex(in_ch= (1,1), out_ch= (2,2), ks= (2,2), stride= (2,2)):
    return nn.Sequential(
        nn.Conv1d(in_channels= in_ch[0], out_channels= out_ch[0], 
                  kernel_size= ks[0], stride= stride[0]),
        nn.SiLU(),
        nn.Conv1d(in_channels= in_ch[1], out_channels= out_ch[1], 
                  kernel_size= ks[1], stride= stride[1])
    )

In [73]:
#uncomment to confirm the above function works
get_sq_ex()

Sequential(
  (0): Conv1d(1, 2, kernel_size=(2,), stride=(2,))
  (1): SiLU()
  (2): Conv1d(1, 2, kernel_size=(2,), stride=(2,))
)

We also have a ton of Conv1d --> BatchNorm1d sets, let's define a function to pull that out

In [77]:
def get_conv_bn(in_ch=1, out_ch=2, ks=2, stride=2):
    return nn.Sequential(
        nn.Conv1d(in_channels = in_ch, out_channels = out_ch,
                 kernel_size = ks, stride = stride),
        nn.BatchNorm1d(num_features = out_ch)
    )

In [78]:
#uncomment to confirm the above function works
get_conv_bn()

Sequential(
  (0): Conv1d(1, 2, kernel_size=(2,), stride=(2,))
  (1): BatchNorm1d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

The above functions have simplified our work to produce the desired layers -- we have everything we need to create both the layer types in our models body

**DepthwiseSeperable**: <br>
(0) get_conv_bn <br>
(1) SiLU <br>
(2) get_sq_ex <br>
(3) get_conv_bn <br>
(4) Identity <br>

**InvertedResidual**: <br>
(0) get_conv_bn <br>
(1) SiLU <br>
(2) get_conv_bn <br>
(3) SiLU <br>
(4) get_sq_ex <br>
(5) get_conv_bn <br>

A squeeze-excite unit compresses the number of channels down and then expands it back to the original amount

In [140]:
def get_dep_sep(in_ch, out_ch, ks=3, reduction=6):
    return nn.Sequential(
        get_conv_bn(in_ch=in_ch, out_ch=in_ch*2, ks=ks),
        nn.SiLU(),
        get_sq_ex(in_ch=(in_ch*2, reduction), 
                  out_ch=(reduction, in_ch*2)),
        get_conv_bn(in_ch=in_ch*2, out_ch=out_ch),
        nn.Identity()
    )

In [145]:
#let's just make sure things are moving forward with our depthwise seperable layer
nn.Sequential(
    get_base_layer(),
    get_dep_sep(32, 16)
)(batch[0]).shape

torch.Size([16, 16, 127])

In order to preserve some semblance of clarity and avoid making this notebook too long, we will only add a single inverted residual layer and confirm that we can pass our data through this (An EfficientNet_b0 has 14 inverted residual layers)

In [147]:
def get_inv_res(in_ch, out_ch, ks=3, reduction=4):
    return nn.Sequential(
        get_conv_bn(in_ch=in_ch, out_ch=in_ch*4, ks=1),
        nn.SiLU(),
        get_conv_bn(in_ch=in_ch*4, out_ch=in_ch*4, ks=3),
        get_sq_ex(in_ch=(in_ch*4, reduction),
                 out_ch=(reduction, in_ch*4)),
        get_conv_bn(in_ch=in_ch*4, out_ch=out_ch)
    )

In [150]:
nn.Sequential(
    get_base_layer(),
    get_dep_sep(32,16),
    get_inv_res(16, 12)
)(batch[0]).shape

torch.Size([16, 12, 3])

Looks like things are working! :)

Obviously a better encoding needs to be put into place in order for the model to sensibly parse the data (additional inverted residiaul layers) but we have the building blocks to refactor this and make the model creation much easier! 

*since our activation function (SiLU) occurs only after the get_conv_bn call, we could actually include this into our definition of that function and simply add a parameter + some logic to determine if we want to append an activation to that sequential layer group. This will be added in the refactored notebook*