# How to use a custom model with fastai cnn_learner?
- toc: false 
- badges: true
- categories: [fastai]
- comments: true

fastai's `cnn_learner` performs several actions to create a model from given a pretrained model architecture such as `resnet18`.
First, it gets model meta from `model_meta` registry. `model_meta` registry contains model meta data such as dataset statistics that the model is pre-trained on, the index to split the network into backbone and head. For instance,

```py
def _xresnet_split(m): return L(m[0][:3], m[0][3:], m[1:]).map(params)

model_meta = {
    models.xresnet.xresnet18 :{'cut':-4, 'split':_xresnet_split, 'stats':imagenet_stats},
    ...
}
```
The `cut` value is used for stripping off the existing classification head of the network so that we can add a custom head and fine-tune it for our task.
The `split` function is used when discriminative learning rate schema is applied such that 
the layers of a model are trained with different learning rates. 
The `stats` refer to the channel means and standard deviations of the images in ImageNet dataset, which the model is pretrained on.
Many CNN architectures are already registered in fastai. 

There are two alternative ways to to use a custom model not present in model registry:
1. Create a new helper function similar to `cnn_learner` that splits the network into backbone and head.
2. Register the architecture in `model_meta`.

We prefer the second way since it is easier and shorter.

In [17]:
from fastai.vision.all import *

Let's first inspect an architecture registered already, e.g. `resnet18`.

Here is its model meta data from the registry:

In [23]:
model_meta[resnet18]

{'cut': -2,
 'split': <function fastai.vision.learner._resnet_split(m)>,
 'stats': ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])}

And the model layers:

In [45]:
m = resnet18()
children = list(m.children())
print(f'There are {len(children)} children modules of network')
children

There are 10 children modules of network


[Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False),
 BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
 ReLU(inplace=True),
 MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False),
 Sequential(
   (0): BasicBlock(
     (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (relu): ReLU(inplace=True)
     (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
   )
   (1): BasicBlock(
     (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (relu): ReLU(inplace=True)
     (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), pad

`create_body` function called by `create_cnn_model` which is called in `cnn_learner`, strips off the head by `cut` index as such:

```py
...
if   isinstance(cut, int): return nn.Sequential(*list(model.children())[:cut])
...
```


In our case, it'll remove the last two layers of `resnet18` network: `AdaptiveAvgPool2d` and fully connected `Linear` layer. 

TODO: Why does it remove pooling layer?

In [26]:
body = create_body(resnet18, pretrained=False, cut=model_meta[resnet18]['cut'])
body

Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
  (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Con

Similarly, we need to determine the `cut` index for the custom model we use. Let's do it with `EfficientNetB0` network available in `torchvision` library. First, we inspect the network layers to find out where to split it into backbone and head.

In [27]:
from torchvision.models import efficientnet_b0
# model_meta[efficientnet_b0] = {'cut': -1, 'split': default_split, 'stats': imagenet_stats}

In [46]:
m = efficientnet_b0()
children = list(m.children())
print(f'There are {len(children)} children modules of network')
children

There are 3 children modules of network


[Sequential(
   (0): ConvNormActivation(
     (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
     (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (2): SiLU(inplace=True)
   )
   (1): Sequential(
     (0): MBConv(
       (block): Sequential(
         (0): ConvNormActivation(
           (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
           (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
           (2): SiLU(inplace=True)
         )
         (1): SqueezeExcitation(
           (avgpool): AdaptiveAvgPool2d(output_size=1)
           (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
           (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
           (activation): SiLU(inplace=True)
           (scale_activation): Sigmoid()
         )
         (2): ConvNormActivation(
           (0): Conv2d(32, 16, kernel_size=(1, 1)

As it can be seen, the pooling layer is at index `-2`, which corresponds to the `cut` value. We'll use the `default_split` for `split` function and ImageNet stats for `EfficientNetB0`. 

In [49]:
from fastai.vision.learner import default_split
model_meta[efficientnet_b0] = {'cut': -2, 'split': default_split, 'stats': imagenet_stats}

Now we can create a `cnn_learner` since our custom architecture is registered.

In [57]:
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")
def label_func(f): return f[0].isupper()
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))
learn = cnn_learner(dls, arch=efficientnet_b0)

Let's verify that the body and head are created correctly.

Body:

In [74]:
list(learn.model.children())[:-1]

[Sequential(
   (0): Sequential(
     (0): ConvNormActivation(
       (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
       (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
       (2): SiLU(inplace=True)
     )
     (1): Sequential(
       (0): MBConv(
         (block): Sequential(
           (0): ConvNormActivation(
             (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
             (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
             (2): SiLU(inplace=True)
           )
           (1): SqueezeExcitation(
             (avgpool): AdaptiveAvgPool2d(output_size=1)
             (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
             (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
             (activation): SiLU(inplace=True)
             (scale_activation): Sigmoid()
           )
           (2): ConvNor

Head:

In [75]:
list(learn.model.children())[-1]

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): Flatten(full=False)
  (2): BatchNorm1d(2560, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25, inplace=False)
  (4): Linear(in_features=2560, out_features=512, bias=False)
  (5): ReLU(inplace=True)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5, inplace=False)
  (8): Linear(in_features=512, out_features=2, bias=False)
)

As it's seen, `cnn_learner` created a new head starting with a pooling layer while keeping the backbone from the pre-trained model.