# Transfer learning

In this lab we will make use of pretrained models in order to boost performance on smaller datasets. For this experiment, we will be working with an AlexNet model pretrained on the Imagenet dataset in order to get a good accuracy score on the Caltech 101 dataset.

### Prerequisites

1. In order to perform the experiments, please download in advance the Caltech 101 dataset from https://drive.google.com/file/d/137RyRjvTBkBiIfeYBNZBtViDHQ6_Ewsp/view
2. In the working directory please create a folder named 'dataset' and a subfolder named 'caltech101' within it. Extract the dataset in the subfolder. The overall folder structure should look as follows: dataset/caltech101/101_ObjectCategories.
3. Install the torchvision module using 'conda install torchvision' if you have not done so already.

In [1]:
from tqdm.notebook import tqdm
# from tqdm import tqdm
import numpy as np
import torch
import torchvision
import warnings

warnings.filterwarnings('ignore')
NW = 4
BS = 128
device = torch.device('cuda:0')

Firstly, we will load the AlexNet model architecture using torchvision. All available models with their respective parameters can be found at: https://pytorch.org/vision/stable/models.html

In [2]:
model = torchvision.models.alexnet(pretrained=True)

In the first run we will just load the model architecture, without the pretrained weights. We can visualize the model architecture as follows:

In [3]:
model

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

Next, we will load the Caltech 101 dataset and apply the neccesary transformations on it. Afterwards, we will split the dataset into train, validation and test.

In this block of code, define the dataloaders for train, validation and test and try to iterate through the data. What happens? Try to fix the problem using a lambda transform: https://pytorch.org/vision/stable/transforms.html#generic-transforms

In [4]:
dataset = torchvision.datasets.Caltech101(
    './dataset',
    transform = torchvision.transforms.Compose([
        torchvision.transforms.PILToTensor(),
        torchvision.transforms.ConvertImageDtype(torch.float),
        torchvision.transforms.Resize((224, 224)),
        # add a lambda transform in order to fix the problem
        torchvision.transforms.Lambda(
#             lambda img: img if img.shape[0] == 3 else torch.cat((img,img,img), 0)
            lambda img: img if img.shape[0] == 3 else img.repeat(3, 1, 1)
        )
    ])
)
n_samples = len(dataset)
n_train_samples = int(.8 * n_samples)
n_val_samples = int(.1 * n_samples)
n_test_samples = n_samples - n_train_samples - n_val_samples

train_ds, val_ds, test_ds = torch.utils.data.random_split(dataset, [
    n_train_samples, n_val_samples, n_test_samples
])

# define dataloaders for train, validation and test
train_dl = torch.utils.data.DataLoader(train_ds, shuffle=True, batch_size=BS, num_workers=NW)
val_dl = torch.utils.data.DataLoader(val_ds, shuffle=True, batch_size=BS, num_workers=NW)
test_dl = torch.utils.data.DataLoader(test_ds, shuffle=True, batch_size=BS, num_workers=NW)
# iterate through the dataloaders
item = iter(train_dl).next()
print(item[0].shape)

torch.Size([128, 3, 224, 224])


With the dataset ready, it is now time to adapt the model architecture in order to fit our needs. Define a new classifier for the AlexNet model having the same structure, changing only the number of output neurons to 101.

In [8]:
model.classifier[6].out_features=101
for param in model.parameters():
    param.requires_grad = False
for param in model.classifier[6].parameters():
    param.requires_grad = True
# model

### Training the model

Define an Adam optimizer with a learining rate of 1e-4 and a cross entropy loss. Afterwards, train the model for 2 epochs. Note the results

In [9]:
optimizer = torch.optim.Adam(model.parameters(),lr=1e-3)
# define a Cross Entropy loss function
loss_func = torch.nn.CrossEntropyLoss()
model.to(device);

In [10]:
def train(model, optimizer, loss_func, train_dl, val_dl, epochs):
  
    for epoch in tqdm(range(epochs)):
        for img, label in tqdm(train_dl):
            
            optimizer.zero_grad()     
            
            img = img.to(device)
            label = label.to(device)
            
            output = model(img)
            
            loss = loss_func(output, label)
            loss.backward()
            optimizer.step()
        accs = []
        # return
        for batch in val_dl:

            img, label = batch
            img = img.to(device)
            label = label.to(device)
            with torch.no_grad():
                output = model(img)

            predict = output.argmax(1)
            acc = (predict == label).float().mean().detach().cpu().numpy()
            accs.append(acc)
    print(f"{np.mean(accs)* 100:.0f}%")

In [11]:
train(model, optimizer, loss_func, train_dl, val_dl, 2)

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/55 [00:00<?, ?it/s]

  0%|          | 0/55 [00:00<?, ?it/s]

74%


## Experiments:

1. Rerun training (restart kernel and run all cells) but this time, when loading the model in the first block of code, specify 'pretrained = True' in order to make use of the weights pretrained on Imagenet.
2. Rerun the code using the pretrained model but this time use a learning rate of 1e-3. What happens?
3. Rerun using the pretrained model and a lr of 1e-4 but this time only change the last layer in the model instead of the entire classifier.
3. Rerun the code using the pretrained model and a lr of 1e-4. This time, freeze the pretrained layers and only update the new layers for the first epochs. Afterwards, proceed to update the entire model. You can freeze parameters by specifying 'requires_grad = False'.
4. Rerun experiment 3 but gradually unfreeze layers instead of unfreezeing the entire model at once.

1. acc = 79%
2. acc = 10%
3. acc  = 74%
4. acc
5. acc