
# Pre-Trained Residual Network and Replace the last FC layer
Steps to use pretrained network are:
1. Call a pretrained network using `torchvision.models(pretrained=True)`
2. Save its parameteres using `torch.save(net.state_dict(),path)`
3. Replace the last fully connected layer with one that matchs your categories
4. Freez all layers except the newly added ones
5. Train the network

In [1]:
import torch
import torch.nn as nn
from torchvision import models

In [2]:
torch.set_grad_enabled(False)

<torch.autograd.grad_mode.set_grad_enabled at 0x7fe52844c090>

## Call a pretrained network
From torchvision library, download a pre-trained resnet18 and print last layer's biases value (will come handy later to verify)

In [3]:
resnet18 = models.resnet18(pretrained=True)

In [4]:
resnet18.fc.bias.sum() #last layer's sum of biases. will use it to verify later

tensor(-5.9860e-05)

# Save and Load:

When saving a model for inference, it is only necessary to save the trained model’s learned parameters. Saving the model’s state_dict with the torch.save() function will give you the most flexibility for restoring the model later, which is why it is the recommended method for saving models.

There are two ways to load a pretrained model:
* load parameters only ----> prefered way
* load the whole model

In both cases you MUST call `model.eval()` function to set dropout and batchnorm layers to evaluation mode.


#### 1. Parameters only using `state_dict()`

Save

In [None]:
torch.save(resnet18.state_dict(),'./resnet18_dict.pt') #Saves state dictionary with all parameteres

Load & set to evaluation.

Set `strict=False` if the new network doesn't 100% match the loaded one. python will ignore the missmatching layer

In [None]:
resload = models.resnet18() #builds a shell network
resload.load_state_dict(torch.load('./resnet18_dict.pt'),strict=False) #load parameteres
resload.eval(); #sets batchnorm and dropout layers to evaluation mode

Compare last layer sum of biases with the original saved one

In [None]:
resload.fc.bias.sum()

#### 2. The whole model `state_dict()`

Save

In [None]:
torch.save(resnet18,'./resnet18.pt')

Load & set to evaluation

In [None]:
newmodel = torch.load('./resnet18.pt')
newmodel.eval();

Compare last layer sum of biases with the original saved one

In [None]:
newmodel.fc.bias.sum()

## Replace the last `fc` layer

In [None]:
resnet18.fc = nn.Linear(in_features=512, out_features=1000, bias=True)

In [None]:
#replace the out_features witht he numbers of your classes
resnet18.fc = nn.Linear(in_features=512, out_features=10, bias=True)

#### To freez all imported layers just so they don't train:

In [8]:
for i in resnet18.parameters():
    i.requires_grad = False

resnet18.fc.weight.requires_grad = True
resnet18.fc.bias.requires_grad = True

When you setup the optimizer, make sure it only optimizes gradient required parameters when you define the optimizer:

`optimizer = torch.optim.SGD(filter(lambda a:a.requires_grad,resnet18.parameters(),lr=lr)`

# Replace the last two layers same as Fastai

per this URL https://forums.fast.ai/t/what-is-the-distinct-usage-of-the-adaptiveconcatpool2d-layer/7600

Jeremy Howard (Admin)
Nov '17
I feel like it should help because we’re keeping more information… but does it really help?

Exactly that. Yes it does help. I came up with it during the Planet comp, then 2 weeks later a paper came out that mentioned it in an appendix. Sorry I don’t remember the paper.

It would make for an interesting blog post - you could test using concat pooling vs avg pooling vs max pooling for various datasets.

Imagine, for instance, the Planet satellite comp. You don’t want to know on average whether the pre-pooling cells have, say, a river, but whether any of them have a river - i.e. you want the max. But as to whether the image is ‘hazy’ (another of the labels in this comp), you really want to know whether it’s hazy on average. So by including both, we have access to both types of info.

In [6]:
class AdaptiveConcatPool2d(nn.Module): #This model concatenate Avg and Max pools in one Tensor 
    def __init__(self, sz=None):
        super().__init__()
        sz = sz or (1,1)
        self.ap = nn.AdaptiveAvgPool2d(sz)
        self.mp = nn.AdaptiveMaxPool2d(sz)
    def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)

In [15]:
avgpool = nn.Sequential(
                        AdaptiveConcatPool2d(1),
                        nn.Flatten(),
                        nn.BatchNorm1d(1024),
                        nn.Dropout(p=0.25),
                        nn.Linear(1024,512),
                        nn.ReLU(),
                        nn.BatchNorm1d(512),
                        nn.Dropout(p=0.5))
fc = nn.Linear(512,50)

Delete the final `fc` layer and replace the last `avgpool` with a the sequential above

In [16]:
resnet18.avgpool = avgpool
resnet18.fc = fc

In [17]:
resnet18

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [18]:
resnet18(torch.randn(10,3,224,224)).shape

torch.Size([10, 50])

In [20]:
len(list(filter(lambda a:a.requires_grad,resnet18.parameters())))

8

## Replace the first layer for grey images

The below code freezes all layers except `BatchNorm2d` and the newly added layers.

another way to freez/unfreez the whole network is by running `net.requires_grad_(True) / Flase`

In [6]:
resnet = models.resnet18(pretrained=True)
for node in resnet.modules():
    if type(node)==nn.BatchNorm2d:
        node.weight.requires_grad=True
    else: # another way is to run: resnet.requiers_grad_(False)
        for param in node.parameters():
            param.requires_grad=False

resnet.fc = nn.Sequential(nn.Linear(512,256),nn.BatchNorm1d(256),nn.ReLU(),nn.Dropout(0.25),nn.Linear(256,10))
resnet.conv1 = nn.Conv2d(1,64,7,stride=1,padding=3,bias=False)   

In [8]:
x = torch.randn(100,1,28,28)
y = resnet(x);y.shape

torch.Size([100, 10])

In [4]:
resnet

ResNet(
  (conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  