## Using Pretrained Pytorch Models

This notebook covers how to use pretrained models using FastAI library. I will specify three sources each of which will be covered here:
+ Fastai models: https://github.com/fastai/fastai/blob/fe4ab9ee0100f1ea390787beb0b7a1dd82412e61/fastai/torch_imports.py
+ Cadence models: https://github.com/Cadene/pretrained-models.pytorch
+ Torchvision models: https://pytorch.org/docs/master/torchvision/models.html

I will be using Dogs and Cats dataset for demonstration in all the cases. I will also assume some familiarity with FAI library. The dataset can be found at http://files.fast.ai/data/

## Using FAI models

In [1]:
import matplotlib
matplotlib.use('Agg')

In [2]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [3]:
from fastai.imports import *
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

In [4]:
PATH = "../data/dogscats/"
sz=224

### FAI models

In [5]:
arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))

Note that in most cases you would need to define tfms on your own. I will cover data part in another notebook. In this case, I am assuming somehow you have the data object.

You can use any of the architectures defined in https://github.com/fastai/fastai/blob/fe4ab9ee0100f1ea390787beb0b7a1dd82412e61/fastai/torch_imports.py. What this essentially means is that you can replace the line `arch=resnet34` with any of the architectures and everything else remains the same. Voila.

Now we will need to create a learner object. For now, I am assuming we are working with Convolution models. There are two ways to do this. First is to use `ConvLearner.pretrained` method. The other way, which I find to be more general is to use `ConvLearner.from_model_data`.

#### FAI models Method-1 : `ConvLearner.pretrained`

What this does is to use the pretrained model (downloads it automatically). For few architectures you might need to download the weights file from http://files.fast.ai/models/ and put it inside fastai folder.

In this, the first argument is the model function call. For example, here it is resnet34 which is a function rather than a class or something.

In [6]:
learn1 = ConvLearner.pretrained(arch, data)

In [7]:
learn1.fit(1e-2, 1, cycle_len=1, best_save_name='dc_a1', metrics=[accuracy])

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                     
    0      0.054967   0.031165   0.989     



[array([0.03116]), 0.989]

In [8]:
learn1.unfreeze()
learn1.fit(1e-2, 1, cycle_len=1, best_save_name='dc_a1', metrics=[accuracy])

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                     
    0      0.052801   0.028175   0.991     



[array([0.02817]), 0.991]

#### FAI models Method-2 : `ConvLearner.from_model_data`

In this case the argument required is the model which should be inherited from the pytorch class `nn.Module`

In [9]:
model = resnet34(pretrained=True)

learn2 = ConvLearner.from_model_data(model, data)

We can have a look at what our model looks like using `learn.model` or `learn.models.model`. And compare the differences with the model obtained from the pretrained method of ConvLearner.

In [10]:
learn1.model

Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU(inplace)
  (3): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), ceil_mode=False)
  (4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d

In [11]:
learn2.model

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (b

In [12]:
children(learn2.model)[-2:]

[AvgPool2d(kernel_size=7, stride=1, padding=0, ceil_mode=False, count_include_pad=True),
 Linear(in_features=512, out_features=1000, bias=True)]

In [13]:
children(learn1.model)[-10:]

[AdaptiveConcatPool2d(
   (ap): AdaptiveAvgPool2d(output_size=(1, 1))
   (mp): AdaptiveMaxPool2d(output_size=(1, 1))
 ), Flatten(
 ), BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True), Dropout(p=0.25), Linear(in_features=1024, out_features=512, bias=True), ReLU(), BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True), Dropout(p=0.5), Linear(in_features=512, out_features=2, bias=True), LogSoftmax()]

As we can see the `ConvLearner.pretrained` method changes the last few layers of the input model. Specifically it changes the `AvgPool2d` with `AdaptiveConcatPool2d`, Flattens it, Adds to linear layers with batch norms and relu, adds dropout, and finally puts a LogSoftmax layer.

The reason for using Adaptive Pooling layers is that it allows the model to use any image size rather than being restricted to a constant size of say 224 x 224. What Adaptive Pooling layer does is it to specify the output size rather than the kernel size.

We will now try to replicate what is happening inside the pretrained method to be able to apply to other models as well. It is a good idea to see the function definitions of `ConvnetBuilder`, `ConvLearner.pretrained` and `ConvLearner.from_model_data` if you haven't already done so

In [20]:
??ConvnetBuilder

In [15]:
??ConvLearner.pretrained

In [16]:
??ConvLearner.from_model_data

We will now write a custom head which we will append in front of the existing model. There are again two ways to do so.

If you have a fixed size input say 224x224, then you can directly append another linear layer in front of the existing model and another log softmax. In this what essentially happens is that the linear layer outputs a 1000 dimensional output which is trained on ImageNet. Usually, it is a good idea to retrain the fully connected layers which brings us to the other method.

The other way is to remove the average pooling, replace it with adaptive concat pooling and add your own linear layer.

In [17]:
custom_head1 = nn.Sequential(nn.ReLU(), nn.BatchNorm1d(1000),
                             nn.Linear(in_features=1000, out_features=2), nn.LogSoftmax())
model_ch1 = nn.Sequential(model,custom_head1)

In [18]:
learn2_ch1 = ConvLearner.from_model_data(model_ch1, data)

In [19]:
learn2_ch1.fit(1e-2, 1, cycle_len=1, best_save_name='dc_ach1', metrics=[accuracy])

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                     
    0      0.070181   0.057458   0.976     



[array([0.05746]), 0.976]

It is also a good idea to have a look at the number of parameters of the each layer. It can be easily done using `learn.summary()`

In [21]:
learn2_ch1.summary()

OrderedDict([('Conv2d-1',
              OrderedDict([('input_shape', [-1, 3, 224, 224]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('trainable', True),
                           ('nb_params', 9408)])),
             ('BatchNorm2d-2',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('trainable', True),
                           ('nb_params', 128)])),
             ('ReLU-3',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('nb_params', 0)])),
             ('MaxPool2d-4',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-5',
              OrderedDict([('input_shape', [-1, 64, 56, 56

Before we can apply method 2, we need to know the output dimension after adaptiveconcatpool2d and flatten. Do note we need to this for any one size, as adaptive pooling will ensure the same size will remain with any input size. For this, we will create a custom head till the flatten layer, and pass one minibatch and note the output size.

In [23]:
custom_head2 = nn.Sequential(AdaptiveConcatPool2d(), Flatten())
model_ch2 = nn.Sequential(*list(children(model))[:-2], custom_head2)

In [25]:
learn2_ch2 = ConvLearner.from_model_data(model_ch2, data)

In [26]:
learn2_ch2.models.model(V(next(iter(data.trn_dl))[0]))

Variable containing:
  2.1928   0.2979   0.0000  ...    0.9831   0.2992   0.6577
  1.6296   3.7968   4.8847  ...    1.0180   0.6856   0.5800
  4.4176   6.9506   2.2004  ...    2.3628   0.2063   0.2136
           ...               ⋱              ...            
  5.5922   3.3013   0.2849  ...    0.5287   0.0443   1.3995
  4.4066   0.9158   1.2564  ...    1.0420   0.1587   0.7406
  5.9668   2.0380   0.0000  ...    2.5776   0.2262   0.6182
[torch.cuda.FloatTensor of size 64x1024 (GPU 0)]

So we now know that the output size is 1024.

To check that the output is independent of input shape, we create another data object with a different size

In [27]:
sz2 = 299
data2 = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, 299))

In [30]:
learn2_ch2_d2 = ConvLearner.from_model_data(model_ch2, data2)

In [31]:
learn2_ch2_d2.models.model(V(next(iter(data2.trn_dl))[0]))

Variable containing:
  4.5496   0.1859   3.1584  ...    0.4355   0.3250   1.0967
  0.9409   2.3361  17.4928  ...    0.0437   1.1525   1.8423
  2.7002   2.8415   1.1543  ...    0.1919   0.3417   0.0063
           ...               ⋱              ...            
  3.2327   4.7135   2.4346  ...    0.6134   0.3094   0.2462
  1.0582   4.7659  12.3103  ...    0.3718   1.0124   0.8549
  3.3615   2.7063   0.9418  ...    0.9223   0.1614   0.3817
[torch.cuda.FloatTensor of size 64x1024 (GPU 0)]

Voila, the output size is again of the size 1024.

Now, we can create rest of the custom head

In [35]:
custom_head3 = nn.Sequential(AdaptiveConcatPool2d(), Flatten(), nn.BatchNorm1d(1024), 
                            nn.Linear(in_features=1024, out_features=512), nn.ReLU(),
                            nn.BatchNorm1d(512), nn.Linear(in_features=512, out_features=2),
                            nn.LogSoftmax())
model_ch3 = nn.Sequential(*list(children(model))[:-2], custom_head3)

In [36]:
learn2_ch3 = ConvLearner.from_model_data(model_ch3, data)

In [38]:
learn2_ch3.fit(1e-2, 1, cycle_len=1, best_save_name='dc_ach3', metrics=[accuracy])

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                     
    0      0.06065    0.042667   0.983     



[array([0.04267]), 0.983]