<a href="https://colab.research.google.com/github/davidcpage/Imagenette-experiments/blob/master/Imagenette_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The aim of this notebook is to convert the Imagenette/woof training examples from https://github.com/lessw2020/Ranger-Mish-ImageWoof-5 to the new fastai v2 codebase. 

### Setup

Install fastai2. You may need to restart after installing to pick up new versions.

In [0]:
!python -m pip install typeguard
!python -m pip install --upgrade pillow fastprogress
!python -m pip install git+https://github.com/fastai/fastai2

RANGER = 'https://raw.githubusercontent.com/lessw2020/Ranger-Mish-ImageWoof-5/master/ranger.py'
MXRESNET = 'https://raw.githubusercontent.com/lessw2020/Ranger-Mish-ImageWoof-5/master/mxresnet.py'
UTILS = 'https://raw.githubusercontent.com/davidcpage/Imagenette-experiments/master/utils.py'

!wget $RANGER -O ranger.py
!wget $MXRESNET -O mxresnet.py
!wget $UTILS -O utils.py

Install Nvidia DALI, which we use for fast dataloading/augmentation below.

In [0]:
!python -m pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali

Basic imports and device setup.

In [0]:
from functools import partial
import numpy as np
import torch
import torch.nn as nn

device = torch.device(torch.cuda.current_device())
torch.backends.cudnn.benchmark = True

#### Params

In [0]:
size = 128
bs = 64

random_aspect_ratio = (3/4, 4/3)
random_area = (0.35, 1.)
val_xtra_size = 32 
interpolation = 2 #FIXME: what is this?

### Fastai v1 training

First let's establish a baseline on the v1 codebase. The code here is essentially taken from https://github.com/lessw2020/Ranger-Mish-ImageWoof-5/blob/master/train.py. 

**NB:** throughout the notebook we use fully qualified names for imported functions to avoid name-clashes between the two fastai codebases.

In [3]:
import fastai.vision, fastai.vision.data, fastai.basic_train
import fastai.metrics, fastai.layers, fastai.callbacks

import ranger
import mxresnet

data_dir = fastai.datasets.untar_data(fastai.datasets.URLs.IMAGENETTE_320)

Mish activation loaded...


In [0]:
data = lambda data_dir=data_dir: (
    fastai.vision.data.ImageList.from_folder(data_dir)
            .split_by_folder(valid='val')
            .label_from_folder().transform(([fastai.vision.flip_lr(p=0.5)], []), size=size)
            .databunch(bs=bs, num_workers=4)
            .presize(size, scale=random_area, ratio=random_aspect_ratio, val_xtra_size=val_xtra_size, interpolation=interpolation)
            .normalize(fastai.vision.imagenet_stats)
    )

def flat_then_cosine_sched(learn, n_batch, lr, annealing_start):
    return fastai.callbacks.GeneralScheduler(learn, phases=[
        fastai.callbacks.TrainingPhase(annealing_start*n_batch).schedule_hp('lr', lr),
        fastai.callbacks.TrainingPhase((1-annealing_start)*n_batch).schedule_hp('lr', lr, anneal=fastai.callback.annealing_cos)     
    ])

def fit_flat_cos(learn, num_epoch, lr=4e-3, annealing_start=0.72):
    learn.fit(num_epoch, callbacks=[
        flat_then_cosine_sched(learn, len(learn.data.train_dl) * num_epoch, lr=lr, annealing_start=annealing_start)])
    return learn

learner = partial(fastai.basic_train.Learner, 
                  wd=1e-2,
                  opt_func=partial(ranger.Ranger, betas=(0.95, 0.99), eps=1e-6),
                  metrics=(fastai.metrics.accuracy,),
                  bn_wd=False, true_wd=True,
                  loss_func=fastai.layers.LabelSmoothingCrossEntropy(),
                  )

In order to keep things relatively fast, let's train an mxresnet18 model for 5 epochs:

In [5]:
model = mxresnet.mxresnet18(c_out=10, sa=1, sym=0)
learn = fit_flat_cos(learner(data(), model).to_fp16(), num_epoch=5)

epoch,train_loss,valid_loss,accuracy,time
0,1.441207,1.277145,0.684,00:44
1,1.221641,1.014491,0.808,00:44
2,1.137839,1.073034,0.772,00:44
3,1.059366,0.949235,0.842,00:43
4,0.946012,0.840164,0.884,00:43


### Nvidia DALI

We can speed things up (especially for small models) by using Nvidia DALI to do the dataloading/augmentation. For now we will use this for both fastai v1 and v2 models although we will want to test v2 dataloading at the end. The details of the code are not very interesting.


In [0]:
import nvidia.dali.ops as ops
import nvidia.dali.types as types

imagenet_stats = ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))

def imagenet_train_graph(data_dir, size, random_aspect_ratio, random_area, 
                interp_type=types.INTERP_TRIANGULAR,
                stats=imagenet_stats):
    inputs = ops.FileReader(file_root=data_dir, random_shuffle=True)
    decode = ops.ImageDecoderRandomCrop(device='mixed',
            random_aspect_ratio=random_aspect_ratio, random_area=random_area)
    resize = ops.Resize(device='gpu', resize_x=size, resize_y=size, 
                        interp_type=interp_type)
    mean, std = [[x*255 for x in stat] for stat in stats]
    crop_mirror_norm = ops.CropMirrorNormalize(
                        device='gpu', output_dtype=types.FLOAT16, 
                        crop=(size, size), mean=mean, std=std)
    coin = ops.CoinFlip(probability=0.5)

    def define_graph():    
        jpegs, labels = inputs(name='Reader')
        output = crop_mirror_norm(resize(decode(jpegs)), mirror=coin())
        return [output, labels]
    return define_graph

def imagenet_valid_graph(data_dir, size, val_xtra_size, mirror=0,
                interp_type=types.INTERP_TRIANGULAR, 
                stats=imagenet_stats):
    inputs = ops.FileReader(file_root=data_dir, random_shuffle=False)
    decode = ops.ImageDecoder(device='mixed', output_type=types.RGB)
    resize = ops.Resize(device='gpu', resize_shorter=size+val_xtra_size, 
                        interp_type=interp_type)
    mean, std = [[x*255 for x in stat] for stat in stats]
    crop_mirror_norm = ops.CropMirrorNormalize(
                        device='gpu', output_dtype=types.FLOAT16,
                        crop=(size, size), mean=mean, std=std, mirror=mirror)
    
    def define_graph():
        jpegs, labels = inputs(name='Reader')
        output = crop_mirror_norm(resize(decode(jpegs)))
        return [output, labels]
    return define_graph

The validation set of Imagenette is tiny, consisting of 500 examples only. This leads to substantial noise in the validation accuracies. To help a little, we are going to concatenate a left-right flipped version onto to the validation set to get an effective 1000 examples:

In [0]:
from utils import DALIDataLoader, Chain, MockV1DataBunch

train_dl = lambda folder, bs, seed=-1: (
        DALIDataLoader(imagenet_train_graph(folder, size, random_aspect_ratio, random_area), bs, drop_last=True, device=device, seed=seed))
valid_dl = lambda folder, bs, : Chain(
        DALIDataLoader(imagenet_valid_graph(folder, size, val_xtra_size), bs, drop_last=False, device=device),
        DALIDataLoader(imagenet_valid_graph(folder, size, val_xtra_size, mirror=1), bs, drop_last=False, device=device),
    )

dali_data = lambda data_dir=data_dir, bs=bs: MockV1DataBunch(train_dl(data_dir/'train', bs), valid_dl(data_dir/'val', bs))

Now we are ready to test training on the v1 codebase with DALI dataloading:

In [8]:
model = mxresnet.mxresnet18(c_out=10, sa=1, sym=0)
learn = fit_flat_cos(learner(dali_data(), model).to_fp16(), num_epoch=5)

epoch,train_loss,valid_loss,accuracy,time
0,1.422536,1.233743,0.716,00:18
1,1.202731,1.047867,0.811,00:18
2,1.094656,0.991178,0.813,00:18
3,1.026366,0.925935,0.843,00:18
4,0.917531,0.842249,0.877,00:18


The training is substantially faster (at least on a GCP T4 GPU with 4 cpu cores) and accuracy is similar (although noise is still very large.)

### Fastai v2 model

Next we want to compare a model using the v2 codebase to the v1 model above. Here is the v1 model again:

In [0]:
model_v1 = partial(mxresnet.mxresnet18, c_out=10, sa=1, sym=0)

A similar model is available in v2. To start with let's use the Mish activation class from the v1 model to minimise differences.

In [10]:
import fastai2.vision.models.xresnet
mish = mxresnet.Mish()
model_v2 = partial(fastai2.vision.models.xresnet.xresnet18, c_out=10, sa=1, sym=0, act_cls=(lambda: mish))

Mish activation loaded...


First let's compare the types of modules in the two models:

In [11]:
s1 = set(type(x) for x in model_v1().modules())
s2 = set(type(x) for x in model_v2().modules())
s1^s2 #symmetric difference

{fastai2.layers.ConvLayer,
 fastai2.layers.Flatten,
 fastai2.layers.ResBlock,
 fastai2.vision.models.xresnet.XResNet,
 mxresnet.Flatten,
 mxresnet.MXResNet,
 mxresnet.ResBlock,
 mxresnet.SimpleSelfAttention,
 torch.nn.modules.activation.ReLU,
 torch.nn.modules.conv.Conv1d}

Some of these are implementation specific compound layers. Let's ignore them for now and we will come back to them later if necessary.

In [0]:
types_to_ignore = {
    torch.nn.Sequential,

    mxresnet.Flatten, 
    mxresnet.MXResNet, 
    mxresnet.ResBlock, 

    fastai2.layers.Flatten, 
    fastai2.vision.models.xresnet.XResNet,
    fastai2.layers.ResBlock, 
    fastai2.layers.ConvLayer, 
}

filtered_modules = lambda model: (x for x in model.modules() if type(x) not in types_to_ignore)

Now let's compare the filtered modules in a bit more detail by comparing `repr` strings.

In [13]:
s1 = set(repr(x) for x in filtered_modules(model_v1()))
s2 = set(repr(x) for x in filtered_modules(model_v2()))
s1 ^ s2

{'Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)',
 'Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)',
 'ReLU()',
 'SimpleSelfAttention(\n  (conv): Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)\n)'}

Let's locate the offending modules in the v1 model:

In [14]:
{k: x for k,x in model_v1().named_modules() if repr(x) in s1^s2}

{'4.1.sa': SimpleSelfAttention(
   (conv): Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
 ), '4.1.sa.conv': Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)}

and in the v2 model:

In [15]:
{k: x for k,x in model_v2().named_modules() if repr(x) in s1^s2}

{'0.2': ReLU(),
 '1.0': Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False),
 '1.2': ReLU(),
 '2.2': ReLU()}

The v1 model has a SimpleSelfAttention layer which is missing from the v2 model, whilst the v2 model still has some ReLU activations at early layers and one Conv layer has a different shape. Let's fix these issues with a modified v2 model:

In [0]:
class XResNet(nn.Sequential):
    def __init__(self, expansion, layers, c_in=3, c_out=1000, 
                 sa=False, sym=False, act_cls=fastai2.basics.defaults.activation,
                 ):
        stem = []
        sizes = [c_in, 16,32,64] if c_in < 3 else [c_in, 32, 64, 64] 
        for i in range(3):
            stem.append(fastai2.layers.ConvLayer(sizes[i], sizes[i+1], stride=2 if i==0 else 1, act_cls=act_cls))

        block_szs = [64//expansion,64,128,256,512] +[256]*(len(layers)-4)
        blocks = [self._make_layer(expansion, ni=block_szs[i], nf=block_szs[i+1], blocks=l, stride=1 if i==0 else 2,
                                  sa=sa if i==len(layers)-4 else False, sym=sym, act_cls=act_cls)
                  for i,l in enumerate(layers)]
        super().__init__(
            *stem,
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            *blocks,
            nn.AdaptiveAvgPool2d(1), fastai2.layers.Flatten(),
            nn.Linear(block_szs[-1]*expansion, c_out),
        )
        fastai2.vision.models.xresnet.init_cnn(self)

    def _make_layer(self, expansion, ni, nf, blocks, stride, sa, sym, act_cls):
        return nn.Sequential(
            *[fastai2.layers.ResBlock(expansion, ni if i==0 else nf, nf, stride if i==0 else 1,
                      sa if i==(blocks-1) else False, sym=sym, act_cls=act_cls)
              for i in range(blocks)])
        
xresnet18 = partial(XResNet, expansion=1, layers=[2,2,2,2])
xresnet50 = partial(XResNet, expansion=4, layers=[3,4,6,3])

Let's instantiate the new model and run our check from before to compare with the v1 model:

In [0]:
model_v2b = partial(xresnet18, c_out=10, sa=1, sym=0, act_cls=(lambda: mish))

In [18]:
types_to_ignore.add(XResNet)

s1 = set(repr(x) for x in filtered_modules(model_v1()))
s2 = set(repr(x) for x in filtered_modules(model_v2b()))
s1 ^ s2

set()

Great. Next we'd like to check that the forward computation of the two models ties out. The difficulty with this is that initialisation calls random number generators and even if we fix random seeds beforehand, any difference in the sequence of random calls for the two models will lead to divergence.

We can check that other details of the forward computation agree by attempting to set the same initialisation for both models:

In [0]:
def reset_dummy_init(model, seed=123):
    for m in filtered_modules(model):
        if hasattr(m, 'reset_parameters'):
            torch.manual_seed(seed)
            m.reset_parameters()
    return model

def compare_fwd(model1, model2):
    random_batch = torch.randn(bs,3,size,size, device=device)
    assert np.allclose(
        model1.to(device)(random_batch).detach().cpu().numpy(), 
        model2.to(device)(random_batch).detach().cpu().numpy()
    )
    return True

In [20]:
compare_fwd(
    reset_dummy_init(model_v1()),
    reset_dummy_init(model_v2b()),
)

True

This is promising. It remains to check that the initialisations of the two models agree. In general this can be somewhat tricky for the reasons given above. The current situation is actually much nicer as both models are initialised with a final call to an `init_cnn` function and if we fix the random seed before this call we get agreement:

In [21]:
model1 = model_v1()
torch.manual_seed(123)
mxresnet.init_cnn(model1)

model2 = model_v2b()
torch.manual_seed(123)
fastai2.vision.models.xresnet.init_cnn(model2)

compare_fwd(model1, model2)

True

### Fastai v2 training

Here is a first attempt at training using the v2 model and codebase + the DALI dataloader we used above. We will use the ranger optimiser from the v1 codebase for now:

In [0]:
import fastai2.callback.all #need to import this to patch fastai2.basics.Learner with .to_fp16() method
RangerWrapper = lambda *args, **kwargs: fastai2.basics.OptimWrapper(ranger.Ranger(*args, **kwargs))

dali_data_v2 = lambda data_dir=data_dir, bs=bs: fastai2.basics.DataBunch(train_dl(data_dir/'train', bs), valid_dl(data_dir/'val', bs))

In [0]:
model_v2 = partial(xresnet18, c_out=10, sa=1, sym=0, act_cls=(lambda: mish))

learner_v2 = partial(fastai2.basics.Learner, 
                  lr=4e-3,
                  opt_func=partial(RangerWrapper, betas=(0.95, 0.99), eps=1e-6),
                  metrics=(fastai2.metrics.accuracy,),
                  loss_func=fastai2.basics.LabelSmoothingCrossEntropy(),
                  )

In [24]:
learn = learner_v2(dali_data_v2(), model_v2()).to_fp16().fit_flat_cos(5, pct_start=0.72)

epoch,train_loss,valid_loss,accuracy,time
0,1.442993,1.237608,0.697,00:19
1,1.199762,1.103699,0.767,00:19
2,1.11046,1.06197,0.785,00:19
3,1.019471,0.928476,0.859,00:19
4,0.916012,0.843927,0.883,00:19


To be continued...