# More CNN Architectures

The CNN models in [this notebook (cnn_archs)](cnn_archs.ipynb) are all from [torchvision](https://pytorch.org/docs/stable/torchvision/index.html) and [Cadene](https://github.com/Cadene/pretrained-models.pytorch). Here's another [great repo (pytorchcv)](https://github.com/osmr/imgclsmob/tree/master/pytorch) providing even more comprehensive list of models. Moreover, the implementation is _much_ easier to use in fastai, comparing with Cadene. E.g., the model body are encapsulated in `features`, which is a `Sequential`. With this, for fastai's `cut`, we can just take the `features` part; for fastai's `split`, since it's a `Sequential`, we can just slice it!

Here're some examples.

In [1]:
from pytorchcv.model_provider import get_model as ptcv_get_model
from torchvision.models import *

from fastai.vision import *
from fastai.vision.models import *
from fastai.vision.learner import model_meta
from fastai.layers import Flatten

from utils import *
import sys

The fastai version used in here is:

In [2]:
__version__

'1.0.53.dev0'

Load the model:

In [66]:
m = ptcv_get_model("resnet18", pretrained=False)

The model body is in the `features`:

In [7]:
arch_summary(lambda _: m.features)

(0) ResInitBlock: 4   layers (total: 4)
(1) Sequential  : 12  layers (total: 16)
(2) Sequential  : 14  layers (total: 30)
(3) Sequential  : 14  layers (total: 44)
(4) Sequential  : 14  layers (total: 58)
(5) AvgPool2d   : 1   layers (total: 59)


The model head is in the `output`:

In [9]:
m.output

Linear(in_features=512, out_features=1000, bias=True)

## When `features` are `Sequential`

When the `features` are `Sequential`, it's very straightforward for fastai's `split`: just slice it! Here're some examples.

### inceptionv3

In [10]:
m = ptcv_get_model("inceptionv3", pretrained=False)

In [11]:
arch_summary(lambda _: m.features)

(0) InceptInitBlock: 17  layers (total: 17)
(1) Sequential  : 66  layers (total: 83)
(2) Sequential  : 137 layers (total: 220)
(3) Sequential  : 75  layers (total: 295)
(4) AvgPool2d   : 1   layers (total: 296)


In [19]:
for i in range(4):
    print(f'---------({i})---------')
    arch_summary(lambda _: m.features[i])

---------(0)---------
(0) InceptConv  : 3   layers (total: 3)
(1) InceptConv  : 3   layers (total: 6)
(2) InceptConv  : 3   layers (total: 9)
(3) MaxPool2d   : 1   layers (total: 10)
(4) InceptConv  : 3   layers (total: 13)
(5) InceptConv  : 3   layers (total: 16)
(6) MaxPool2d   : 1   layers (total: 17)
---------(1)---------
(0) InceptionAUnit: 22  layers (total: 22)
(1) InceptionAUnit: 22  layers (total: 44)
(2) InceptionAUnit: 22  layers (total: 66)
---------(2)---------
(0) ReductionAUnit: 13  layers (total: 13)
(1) InceptionBUnit: 31  layers (total: 44)
(2) InceptionBUnit: 31  layers (total: 75)
(3) InceptionBUnit: 31  layers (total: 106)
(4) InceptionBUnit: 31  layers (total: 137)
---------(3)---------
(0) ReductionBUnit: 19  layers (total: 19)
(1) InceptionCUnit: 28  layers (total: 47)
(2) InceptionCUnit: 28  layers (total: 75)


In [34]:
def inceptionv3(pretrained=False):
    return ptcv_get_model("inceptionv3", pretrained=False).features

We have already cut the head out, so don't need to anything for `cut`. Now create learner, split at layer (3): 

In [37]:
learn = cnn_learner(FakeData(), inceptionv3, pretrained=False,
                    cut=noop, split_on=lambda m: (m[0][3], m[1]))

To check the cut and split work as we expected, we extract the groups:

In [38]:
get_groups(nn.Sequential(*learn.model[0], *learn.model[1]), learn.layer_groups)

Group 1: ['InceptInitBlock', 'Sequential', 'Sequential']
Group 2: ['Sequential', 'AvgPool2d']
Group 3: ['AdaptiveConcatPool2d', 'Flatten', 'BatchNorm1d', 'Dropout', 'Linear', 'ReLU', 'BatchNorm1d', 'Dropout', 'Linear']


The customization works as expected. Isn't it much easier, comparing with what we did in [this notebook](cnn_archs.ipynb)?

### EfficientNet

In [46]:
m = ptcv_get_model("efficientnet_b0", pretrained=False)

In [47]:
m.features.__class__

torch.nn.modules.container.Sequential

In [32]:
arch_summary(lambda _: m.features)

(0) EffiInitBlock: 3   layers (total: 3)
(1) Sequential  : 10  layers (total: 13)
(2) Sequential  : 26  layers (total: 39)
(3) Sequential  : 26  layers (total: 65)
(4) Sequential  : 78  layers (total: 143)
(5) Sequential  : 65  layers (total: 208)
(6) ConvBlock   : 3   layers (total: 211)
(7) AdaptiveAvgPool2d: 1   layers (total: 212)


In [33]:
for i in range(6):
    print(f'---------({i})---------')
    arch_summary(lambda _: m.features[i])

---------(0)---------
(0) ConvBlock   : 3   layers (total: 3)
---------(1)---------
(0) EffiDwsConvUnit: 10  layers (total: 10)
---------(2)---------
(0) EffiInvResUnit: 13  layers (total: 13)
(1) EffiInvResUnit: 13  layers (total: 26)
---------(3)---------
(0) EffiInvResUnit: 13  layers (total: 13)
(1) EffiInvResUnit: 13  layers (total: 26)
---------(4)---------
(0) EffiInvResUnit: 13  layers (total: 13)
(1) EffiInvResUnit: 13  layers (total: 26)
(2) EffiInvResUnit: 13  layers (total: 39)
(3) EffiInvResUnit: 13  layers (total: 52)
(4) EffiInvResUnit: 13  layers (total: 65)
(5) EffiInvResUnit: 13  layers (total: 78)
---------(5)---------
(0) EffiInvResUnit: 13  layers (total: 13)
(1) EffiInvResUnit: 13  layers (total: 26)
(2) EffiInvResUnit: 13  layers (total: 39)
(3) EffiInvResUnit: 13  layers (total: 52)
(4) EffiInvResUnit: 13  layers (total: 65)


In [48]:
def efficientnet_b0(pretrained=False):
    return ptcv_get_model("efficientnet_b0", pretrained=False).features

Create learner, split at layer (4): 

In [49]:
learn = cnn_learner(FakeData(), efficientnet_b0, pretrained=False,
                    cut=noop, split_on=lambda m: (m[0][4], m[1]))

To check the cut and split work as we expected, we extract the groups:

In [50]:
get_groups(nn.Sequential(*learn.model[0], *learn.model[1]), learn.layer_groups)

Group 1: ['EffiInitBlock', 'Sequential', 'Sequential', 'Sequential']
Group 2: ['Sequential', 'Sequential', 'ConvBlock', 'AdaptiveAvgPool2d']
Group 3: ['AdaptiveConcatPool2d', 'Flatten', 'BatchNorm1d', 'Dropout', 'Linear', 'ReLU', 'BatchNorm1d', 'Dropout', 'Linear']


The customization works as expected.

## When `features` are not `Sequential`

Actually, in the author's implementation, the `features` are always `Sequential`. But in the following case, it inherits `Sequential` and customized the `forward` method. Although we can still split by slicing, but because of some hardcoded part in fastai, we'll have some error. Let's take a look.

### NASNet-A-Mobile

In [20]:
m = ptcv_get_model("nasnet_4a1056", pretrained=False)

In [43]:
m.features.__class__

pytorchcv.models.common.DualPathSequential

In [25]:
arch_summary(lambda _: m.features)

(0) NASNetInitBlock: 2   layers (total: 2)
(1) Stem1Unit   : 47  layers (total: 49)
(2) Stem2Unit   : 62  layers (total: 111)
(3) DualPathSequential: 200 layers (total: 311)
(4) DualPathSequential: 258 layers (total: 569)
(5) DualPathSequential: 258 layers (total: 827)
(6) ReLU        : 1   layers (total: 828)
(7) AvgPool2d   : 1   layers (total: 829)


In [26]:
for i in range(6):
    print(f'---------({i})---------')
    arch_summary(lambda _: m.features[i])

---------(0)---------
(0) Conv2d      : 1   layers (total: 1)
(1) BatchNorm2d : 1   layers (total: 2)
---------(1)---------
(0) NasConv     : 3   layers (total: 3)
(1) DwsBranch   : 8   layers (total: 11)
(2) DwsBranch   : 8   layers (total: 19)
(3) NasMaxPoolBlock: 1   layers (total: 20)
(4) DwsBranch   : 8   layers (total: 28)
(5) AvgPool2d   : 1   layers (total: 29)
(6) DwsBranch   : 8   layers (total: 37)
(7) AvgPool2d   : 1   layers (total: 38)
(8) DwsBranch   : 8   layers (total: 46)
(9) NasMaxPoolBlock: 1   layers (total: 47)
---------(2)---------
(0) NasConv     : 3   layers (total: 3)
(1) NasPathBlock: 7   layers (total: 10)
(2) DwsBranch   : 9   layers (total: 19)
(3) DwsBranch   : 9   layers (total: 28)
(4) NasMaxPoolBlock: 2   layers (total: 30)
(5) DwsBranch   : 9   layers (total: 39)
(6) NasAvgPoolBlock: 2   layers (total: 41)
(7) DwsBranch   : 9   layers (total: 50)
(8) AvgPool2d   : 1   layers (total: 51)
(9) DwsBranch   : 9   layers (total: 60)
(10) NasMaxPoolBlock: 2 

In [52]:
m.output

Sequential(
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=1056, out_features=1000, bias=True)
)

In [21]:
output_size = m.output[1].in_features

In [22]:
def nasnetamobile(pretrained=False):
    return ptcv_get_model("nasnet_4a1056", pretrained=False).features

In this case, we can still take the `features` part, but we can't use it directly in fastai. 

Why? Because in fastai's `create_cnn_model`, if `custom_head` is not given, then it'll try to convert the body into a `Sequential` to determine the number of features (which is really not necessary, should have an option to let user set the number `nf`). 

The body in here was `DualPathSequential`, which has two parameters in the `forward`, so it won't work after converting. We must prevent fastai from reaching this line. Looks like right now the only way is to define a `custom_head` so this line will be bypassed. In the following example, a simple head is used for demonstration. You can use more complicated head, like the one fastai adds by default.

Create learner, split at layer (4): 

In [56]:
data = FakeData()

In [23]:
learn = cnn_learner(data, nasnetamobile, pretrained=False,
                    cut=noop, split_on=lambda m: (m[0][4], m[1]),
                    custom_head=nn.Sequential(Flatten(), nn.Linear(output_size, data.c)))

To check the cut and split work as we expected, we extract the groups:

In [25]:
get_groups(nn.Sequential(*learn.model[0], *learn.model[1]), learn.layer_groups)

Group 1: ['NASNetInitBlock', 'Stem1Unit', 'Stem2Unit', 'DualPathSequential']
Group 2: ['DualPathSequential', 'DualPathSequential', 'ReLU', 'AvgPool2d']
Group 3: ['Flatten', 'Linear']


The customization works as expected. You can add more complicated head in here.

Another way is to modify fastai's `create_cnn_model` so that it takes a `nf` from input, rather than making the body `Sequential` just for calculating the number of features. This can be an improvement for fastai.

In [26]:
with torch.no_grad():
    learn.model.eval()
    print(learn.model(torch.randn(1,3,224,224)))

tensor([[-0.1088,  0.0262]])


### PNASNet-5-Large

One more example, which is just the same as `NASNet-A-Mobile`:

In [2]:
m = ptcv_get_model("pnasnet5large", pretrained=False)

In [45]:
m.features.__class__

pytorchcv.models.common.DualPathSequential

In [28]:
arch_summary(lambda _: m.features)

(0) NASNetInitBlock: 2   layers (total: 2)
(1) Stem1Unit   : 59  layers (total: 61)
(2) DualPathSequential: 296 layers (total: 357)
(3) DualPathSequential: 243 layers (total: 600)
(4) DualPathSequential: 235 layers (total: 835)
(5) ReLU        : 1   layers (total: 836)
(6) AvgPool2d   : 1   layers (total: 837)


In [30]:
for i in range(5):
    print(f'---------({i})---------')
    arch_summary(lambda _: m.features[i])

---------(0)---------
(0) Conv2d      : 1   layers (total: 1)
(1) BatchNorm2d : 1   layers (total: 2)
---------(1)---------
(0) NasConv     : 3   layers (total: 3)
(1) DwsBranch   : 8   layers (total: 11)
(2) PnasMaxPathBlock: 3   layers (total: 14)
(3) DwsBranch   : 8   layers (total: 22)
(4) PnasMaxPoolBlock: 1   layers (total: 23)
(5) DwsBranch   : 8   layers (total: 31)
(6) DwsBranch   : 8   layers (total: 39)
(7) DwsBranch   : 8   layers (total: 47)
(8) PnasMaxPoolBlock: 1   layers (total: 48)
(9) DwsBranch   : 8   layers (total: 56)
(10) NasConv     : 3   layers (total: 59)
---------(2)---------
(0) PnasUnit    : 64  layers (total: 64)
(1) PnasUnit    : 61  layers (total: 125)
(2) PnasUnit    : 57  layers (total: 182)
(3) PnasUnit    : 57  layers (total: 239)
(4) PnasUnit    : 57  layers (total: 296)
---------(3)---------
(0) PnasUnit    : 68  layers (total: 68)
(1) PnasUnit    : 61  layers (total: 129)
(2) PnasUnit    : 57  layers (total: 186)
(3) PnasUnit    : 57  layers (total

In [61]:
m.output

Sequential(
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=4320, out_features=1000, bias=True)
)

In [16]:
output_size = m.output[1].in_features

In [4]:
def pnasnet5large(pretrained=False):
    return ptcv_get_model("pnasnet5large", pretrained=False).features

Create learner, split at layer (3): 

In [5]:
data = FakeData()

In [17]:
learn = cnn_learner(data, pnasnet5large, pretrained=False,
                    cut=noop, split_on=lambda m: (m[0][3], m[1]),
                    custom_head=nn.Sequential(Flatten(), nn.Linear(output_size, data.c)))

To check the cut and split work as we expected, we extract the groups:

In [18]:
get_groups(nn.Sequential(*learn.model[0], *learn.model[1]), learn.layer_groups)

Group 1: ['NASNetInitBlock', 'Stem1Unit', 'DualPathSequential']
Group 2: ['DualPathSequential', 'DualPathSequential', 'ReLU', 'AvgPool2d']
Group 3: ['Flatten', 'Linear']


The customization works as expected.

In [19]:
with torch.no_grad():
    learn.model.eval()
    print(learn.model(torch.randn(1,3,331,331)))

tensor([[0.0053, 0.0588]])


### Test

In [21]:
from pytorchcv.models.nasnet import nasnet_dual_path_sequential

In [50]:
nasnet = ptcv_get_model("nasnet_4a1056", pretrained=False)

In [54]:
arch_summary(lambda _: nasnet.features)

(0) NASNetInitBlock: 2   layers (total: 2)
(1) Stem1Unit   : 47  layers (total: 49)
(2) Stem2Unit   : 62  layers (total: 111)
(3) DualPathSequential: 200 layers (total: 311)
(4) DualPathSequential: 258 layers (total: 569)
(5) DualPathSequential: 258 layers (total: 827)
(6) ReLU        : 1   layers (total: 828)
(7) AvgPool2d   : 1   layers (total: 829)


In this example, I cut out layer (4) and (5):

In [63]:
features = nasnet_dual_path_sequential(
    return_two=False,
    first_ordinals=1,
    last_ordinals=2)

In [64]:
for m in list(nasnet.features.children())[:4]:
    features.add_module(m.__class__.__name__, m)

In [65]:
for m in list(nasnet.features.children())[-2:]:
    features.add_module(m.__class__.__name__, m)

In [59]:
arch_summary(lambda _: features)

(0) NASNetInitBlock: 2   layers (total: 2)
(1) Stem1Unit   : 47  layers (total: 49)
(2) Stem2Unit   : 62  layers (total: 111)
(3) DualPathSequential: 200 layers (total: 311)
(4) ReLU        : 1   layers (total: 312)
(5) AvgPool2d   : 1   layers (total: 313)


In [3]:
data = FakeData()

In [74]:
learn = cnn_learner(data, lambda _: features, pretrained=False,
                    cut=noop, split_on=lambda m: (m[0][3], m[1]),
                    custom_head=nn.Sequential(Flatten(), nn.Linear(127776, data.c)))

You can add your own custom head. The "127776" is hard coded here, it's for input image size of 224. 

In [75]:
get_groups(nn.Sequential(*learn.model[0], *learn.model[1]), learn.layer_groups)

Group 1: ['NASNetInitBlock', 'Stem1Unit', 'Stem2Unit']
Group 2: ['DualPathSequential', 'ReLU', 'AvgPool2d']
Group 3: ['Flatten', 'Linear']


In [73]:
with torch.no_grad():
    learn.model.eval()
    print(learn.model(torch.randn(1,3,224,224)))

tensor([[-0.4559,  0.7447]])


In [70]:
with torch.no_grad():
    learn.model.eval()
    print(Flatten()(features(torch.randn(1,3,224,224))).size())

torch.Size([1, 127776])


This is where the "127776" comes from.