
# Transfer Learning for Computer Vision Tutorial

Adapted from: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html


**Author**: [Sasank Chilamkurthy](https://chsasank.github.io)

In this tutorial, you will learn how to train a convolutional neural network for image classification using transfer learning. You can read more about the transfer learning at [cs231n notes](https://cs231n.github.io/transfer-learning/)_ 
Quoting these notes,

    In practice, very few people train an entire Convolutional Network
    from scratch (with random initialization), because it is relatively
    rare to have a dataset of sufficient size. Instead, it is common to
    pretrain a ConvNet on a very large dataset (e.g. ImageNet, which
    contains 1.2 million images with 1000 categories), and then use the
    ConvNet either as an initialization or a fixed feature extractor for
    the task of interest.

## Resnet18 for Transfer Learning

### What is Resnet18

A ```resnet18``` is a pretrained deep feed forward neural network with 17 convolution layers and 1 fully connected layer. According to [this thesis](ResnetRef.pdf), introducing feed forward shortcuts to a deeply stacked network can further increase the accuracy of the model. Such feed forward shortcuts can counter the problem with gradient vanishing by adding the input of the layer block to its output. A layer block is a pair of convolution layers in different sizes. During training, if the convolution layers in the layer block is not helpful, its weights can be lowered and thus the output of the layer block can be more dependent to its input, essentially bypassing the convolution layers within.

The original ```Resnet18``` is designed to classify 1000 different labels, therefor its final layer is a fully connected layer taking in the features from the previous convolution layer and outputs 1000 scores for each label. The fully connected layer can be modified to different sizes to output different number of classes.

![fff](img/Resnet18.svg)

In [1]:
# License: BSD
# Author: Sasank Chilamkurthy

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision
from torch.utils.data import DataLoader
from torchvision import models, transforms
from torchvision.datasets import ImageFolder
from os.path import join

## Load Data

The raw data can be downloaded from [here](https://download.pytorch.org/tutorial/hymenoptera_data.zip). 
This dataset is a very small subset of imagenet.
Please extract the zip file into the same directory containing this notebook file.

We will use ```torchvision``` and ```torch.utils.data``` packages for loading the
data.

The problem we're going to solve today is to train a model to classify
**ants** and **bees**. We have about 120 training images each for ants and bees.
There are 75 validation images for each class. Usually, this is a very
small dataset to generalize upon, if trained from scratch. Since we
are using transfer learning, we should be able to generalize reasonably
well.

Here are some images that will be used for training:

![](hymenoptera_data/train/ants/560966032_988f4d7bc4.jpg)
![](hymenoptera_data/train/bees/1232245714_f862fbe385.jpg)
![](hymenoptera_data/train/bees/39672681_1302d204d1.jpg)
![](hymenoptera_data/train/ants/9715481_b3cb4114ff.jpg)


In [2]:
# Data augmentation and normalization for training
# Just normalization for validation
DATA_DIR:str = join('SummerVacationMeeting', 'W4_TransferLearning', 'hymenoptera_data')
BATCH_SIZE = 6

data_transforms:dict[str, transforms.Compose] = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

""" 
create dataset objects for training & validation datasets and store them into image_datasets dictionary
torchvision.datasets.ImageFolder inherits torch.utils.data.Dataset class 
"""
training_dataset = ImageFolder(
    root = join(DATA_DIR, 'train'),
    transform = data_transforms['train']
)
val_dataset = ImageFolder(
    root = join(DATA_DIR, 'val'),
    transform = data_transforms['val']
)
image_datasets: dict[str, ImageFolder] = {'train': training_dataset, 'val': val_dataset}


""" create dataloader objects for training & validation dataloaders and store them into dataloaders dictionary """
train_dataloader = DataLoader(training_dataset, 
    batch_size = BATCH_SIZE, 
    shuffle = True, 
    num_workers = 4)
val_dataloader = DataLoader(val_dataset, 
    batch_size = BATCH_SIZE, 
    shuffle = True, 
    num_workers = 4)
dataloaders:dict[str, DataLoader] = {'train': train_dataloader, 'val': val_dataloader}

dataset_sizes = {train_val: len(image_datasets[train_val]) for train_val in ['train', 'val']}
class_names = image_datasets['train'].classes

DEVICE = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

## Train All Layers and Weights in Resnet18

Pytorch provides a ```models.resnet18``` class and its pretrained weights. The following cell makes ```model_ft``` an instance of ```resnet18``` and replaces its pretrained final fully connected layer with our own custom fully connected layer that takes in the same number of inputs from the previous convolution layer and outputs the scores for the 2 labels. This new fully conneced layer is to be trained.

![entire resnet to be trained](img/Resnet18_learn_all.svg)

In [3]:
model_ft = models.resnet18(weights = "IMAGENET1K_V1")
"""
The possible weight options for resnet18 are "DEFAULT", "IMAGENET1K_V1" or None
If the weight argument is left blank or set to None, no weights for resent18 will be loaded
According to https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py#L10 option "DEFAULT" and "IMAGENET_1K_V!" loades the same set of weights
"""

""" 
Replace the last fully connected layer in Resnet18 that takes in model_ft.fc.in_features number of features and out put 1000 class scores with our own custom
fully connected layer that takes in model_ft.fc.in_features features and outputs 2 class scores. 
"""
model_ft.fc = nn.Linear(model_ft.fc.in_features, 2)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

### Train and Evaluate

It should take around 15-25 min on CPU. On GPU though, it takes less than a minute.

The training loop is written in a [seperate file](train_and_test_model.py). Both the training loop provided by the tutorial and the denestified version are present. For some reason, models trained using the denestified version performs significantly worse than the original versoin?

In [4]:
from train_and_test_model import my_train_model
model_ft = my_train_model(model_ft, criterion, dataloaders, dataset_sizes, optimizer_ft, exp_lr_scheduler, DEVICE, 4)

Epoch 0/3
----------
Train Loss: 0.6583 Acc: 0.6803
Vali Loss : 0.0272 Acc: 0.9412
Epoch 1/3
----------
Train Loss: 0.3695 Acc: 0.8484
Vali Loss : 0.0498 Acc: 0.9281
Epoch 2/3
----------
Train Loss: 0.3675 Acc: 0.8402
Vali Loss : 0.0408 Acc: 0.9150
Epoch 3/3
----------
Train Loss: 0.2871 Acc: 0.8730
Vali Loss : 0.0367 Acc: 0.9150
Training complete in 1m 33s
Highest validation accuracy is 0.941


In [5]:
"""The training loop provided by the tutorial is not called"""
# from train_and_test_model import tutorial_train_model
# model_ft = model_ft.to(DEVICE)
# model_ft = tutorial_train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, dataloaders, dataset_sizes, DEVICE, 4)

'The training loop provided by the tutorial is not called'

## Resnet as Fixed Feature Extractor

Retraining all the layers, including those that were pretrained in ```resnet18``` might be unnecessary. The first 17 layers in this network are convolution layers. These convolution layers act like feature extractors, or filters, operating on given images. Their parameters and weights might have been optimized when we load them and don't require futher training.

What requires training is the custom fully connected layer we have just added to this network. To train only the final layer and not train the others, we need to freeze all the network except the final one. We need to set ```requires_grad = False``` to freeze the parameters so that the gradients are not computed in ```backward()```. 
You can read more about this in the documentation
[here](https://pytorch.org/docs/notes/autograd.html#excluding-subgraphs-from-backward)_.

![only train the last layer resnet](img/Resnet18_learn_only_last_layer.svg)

By only training the custom fully connected layer, the model can be trained faster and more efficiently.

In [6]:
model_conv = torchvision.models.resnet18(weights='IMAGENET1K_V1')
for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

model_conv = model_conv.to(DEVICE)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

In [7]:

from train_and_test_model import my_train_model
model_conv = my_train_model(model_conv, criterion, dataloaders, dataset_sizes, optimizer_conv, exp_lr_scheduler, DEVICE, 4)

Epoch 0/3
----------
Train Loss: 0.5746 Acc: 0.6885
Vali Loss : 0.0377 Acc: 0.9346
Epoch 1/3
----------
Train Loss: 0.4205 Acc: 0.8115
Vali Loss : 0.0503 Acc: 0.8824
Epoch 2/3
----------
Train Loss: 0.5282 Acc: 0.7582
Vali Loss : 0.0294 Acc: 0.9412
Epoch 3/3
----------
Train Loss: 0.3651 Acc: 0.8525
Vali Loss : 0.0277 Acc: 0.9412
Training complete in 1m 12s
Highest validation accuracy is 0.941


In [8]:
"""The training loop provided by the tutorial is not called"""
# from train_and_test_model import tutorial_train_model
# model_conv = tutorial_train_model(model_conv, criterion, optimizer_ft, exp_lr_scheduler, dataloaders, dataset_sizes, DEVICE, 4)

'The training loop provided by the tutorial is not called'

## Conclusion
This demo explains the transfer training feature provided in Pytorch. If we wish to create a network, it's probably not necessary to start from scratch. There are pretrained networks such as ```resnet``` that can be modified to suit our needs. By setting the ```requires_grad``` attribute of each parameter in a network to ```False```, we can avoid training the parameters and speed up the process while probably not loose model performance.