# CNNs: Better classifiers with Transfer Learning

This notebook how we can use transfer learning to obtain better neural networks.

In practice, very few people train an entire CNN from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.

**The two major transfer learning scenarios look as follows:**

- **Finetuning the ConvNet:** Instead of random initialization, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
- **ConvNet as fixed feature extractor:** Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.

In [None]:
import torchvision
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.utils.data as data
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm, trange
import torchvision.transforms as transforms
import time

In [None]:
!pip install pytorch-ignite

In [None]:
from ignite.handlers import FastaiLRFinder
from ignite.engine import create_supervised_trainer

In [None]:
DEVICE = 'cuda'
BATCH_SIZE = 64

### Helper functions

In [None]:
def plot_accuracy(stats, title=None):

  epochs = [item['epoch'] for item in stats]
  train_accs = [item['train_acc'] for item in stats]
  val_accs = [item['val_acc'] for item in stats]

  plt.plot(epochs, train_accs, label='Train Acc')
  plt.plot(epochs, val_accs, label=f'Val Acc [Best: {max(val_accs):.2f}%]')
  plt.legend()

  if title:
    plt.title(title)

In [None]:
def eval_accuracy(model, loader):

    '''
    Measure the accuracy of the given model on the provided dataloader
    '''

    epoch_acc = 0

    model.eval()

    num_corr_pred = 0
    num_total_pred = 0

    with torch.no_grad():

        for x, y in loader:

            x = x.to(DEVICE)
            y = y.to(DEVICE)

            y_pred = model(x)

            top_pred = y_pred.argmax(1)
            num_corr_pred += (top_pred == y).sum()
            num_total_pred += len(y)

    acc = num_corr_pred / num_total_pred * 100

    return acc.item()

In [None]:
def train(model, train_loader, optimizer, criterion, num_epochs, val_loader=None):

  '''
  Trains the model on the dataloader for a given number of epochs
   '''

  print('===== Start training ===== \n')

  model.train()

  start = time.time()

  stats = []

  for epoch in range(1, num_epochs+1):

    epoch_loss = 0

    for x, y in train_loader:

      optimizer.zero_grad()

      x = x.to(DEVICE)
      y = y.to(DEVICE)

      y_pred = model(x)

      loss = criterion(y_pred, y)

      loss.backward()
      optimizer.step()

      epoch_loss += loss.item()

    print(f'[{epoch}] Loss: {epoch_loss:.3f}')

    if epoch % 5 == 0:
      train_acc = eval_accuracy(model, train_loader)
      if val_loader:
        val_acc = eval_accuracy(model, val_loader)
        print(f'[{epoch}] Train Acc: {train_acc:.2f}%  /  Val Acc: {val_acc:.2f}%')
        stats.append({'epoch': epoch, 'train_acc': train_acc, 'val_acc': val_acc})
      else:
        print(f'[{epoch}] Train Acc: {train_acc:.2f}%')
        stats.append({'epoch': epoch, 'train_acc': train_acc})


  end = time.time()
  elapsed_time = end - start

  print()
  print('===== Finished training ===== ')
  print(f'Elapsed time in minutes: {elapsed_time/60:.2f}')

  return stats


### Prepare the datasets and dataloaders

The Flowers102 dataset is composed of three subsets (train, val and test). <br/>
In this code block we prepare the dataloaders for those three subsets.

In [None]:
# TODO Prepare the transformation and datasets

In [None]:
train_loader = data.DataLoader(train_dataset, shuffle=True, batch_size=BATCH_SIZE)
val_loader = data.DataLoader(val_dataset, shuffle=False, batch_size=BATCH_SIZE)
test_loader = data.DataLoader(test_dataset, shuffle=False, batch_size=BATCH_SIZE)

## ConvNet as fixed feature extractor

### Prepare the pretrained Alex model



First, let's download a pretrained AlexNet model from PyTorch Hub. Like most CNN models, the model was trained on the ImageNet dataset which has 1000 classes.

In [None]:
# TODO: Load a pretrained AlexNet model

In the first part of this tutorial, our goal will be to train the classification head but keep the weights of the feature extractor fixed. To accomplish this, we first have to expect how we can access the classification head and the feature extractor in the model.

As can be seen below, the classification head is referred to as "classifier" and the feature extractor is referred to as "features".

In [None]:
# TODO: Check the architecture of the model

As the network outputs a 1000-dimensional vector, we have to modify the final linear layer to output 102 classes.

In [None]:
# TODO: Modify the classification head

Furthermore, to reduce the memory footprint and speed-up training, we will disable the gradien computation for the weights in the feature extraction head. As we don't want to update these weights, we also don't need their gradients.

In [None]:
# TODO: Disable gradient computation for the feature extractor

### LR range test

In [None]:
start_lr = 1e-7
end_lr = 1e+1
optimizer = optim.Adam(model.classifier.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
trainer = create_supervised_trainer(model, optimizer, criterion, DEVICE)

lr_finder = FastaiLRFinder()
to_save = {"model": model, "optimizer": optimizer}

with lr_finder.attach(trainer, to_save=to_save, start_lr=start_lr, end_lr=end_lr, num_iter=200) as trainer_with_lr_finder:
    trainer_with_lr_finder.run(train_loader)

# Get lr_finder results
lr_finder.get_results()

# Plot lr_finder results (requires matplotlib)
lr_finder.plot()

# get lr_finder suggestion for lr
lr_finder.lr_suggestion()

### Train the model

In [None]:
num_epochs = 50

# Important: Configure the optimizer to only update the weights of the classifier
optimizer = optim.Adam(model.classifier.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

In [None]:
stats = train(model, train_loader, optimizer, criterion, num_epochs, val_loader)

### Plot accuracy



In [None]:
plot_accuracy(stats, title='Pretrained AlexNet - Finetune classifier [With augmentation]')

In [None]:
test_acc = eval_accuracy(model, test_loader)
print(f'Accuracy on the test set (final model): {test_acc:.2f}%')

## Finetune the entire Conv Net

The following cells show how to train the entire conv net. The training procedure is similar to training a neural network from scratch. The only difference is that pretrained weights are used.

In [None]:
# TODO: Load the pretrained AlexNet model
# TODO: Modify the final layer

In [None]:
model = model.to(DEVICE)

In [None]:
num_epochs = 100
optimizer = optim.Adam(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

In [None]:
stats = train(model, train_loader, optimizer, criterion, num_epochs, val_loader)

In [None]:
plot_accuracy(stats, title='Pretrained AlexNet - Entire network [With augmentation]')

In [None]:
test_acc = eval_accuracy(model, test_loader)
print(f'Accuracy on the test set (final model): {test_acc:.2f}%')