In [None]:
!pip install pycm livelossplot
%pylab inline

#### A few imports before we get started

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import StratifiedShuffleSplit

from livelossplot import PlotLosses
from pycm import *

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import torchvision.transforms as transforms
from torchvision.datasets import MNIST


def set_seed(seed):
    """
    Use this to set ALL the random seeds to a fixed value and take out any randomness from cuda kernels
    """
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.benchmark = False  ##uses the inbuilt cudnn auto-tuner to find the fastest convolution algorithms. -
    torch.backends.cudnn.enabled   = False

    return True

device = 'cpu'
if torch.cuda.device_count() > 0 and torch.cuda.is_available():
    print("Cuda installed! Running on GPU!")
    device = 'cuda'
else:
    print("No GPU available!")

### Mounting the google drive for later storage

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/')

# Morning Session 4 pt II: Transfer Learning

We will now explore one of the arguably most useful practises when dealing with small-datasets and wanting to create a powerful classifier - transfer learning.
Instead of training a network from randomly initialised weights, we start from a network with weights trained in a different domain and fine-tune it to a different source task.  

The basic principle behind is to leverage features learned on very large datasets and recycle them to perform tasks on smaller datasets.  

To be able to apply transfer learning effectively, the data distribution of the data that a very powerful model was trained on should follow a similar distribution as the smaller dataset that we are trying to apply transfer learning to.    

For example:  
_We want to create a new classifier for cats and dogs given only a small set of say 100 training images of each category._

Very deep neural networks that have been trained on ImageNet or CIFAR have similar categories in their dataset, say horses and maybe cows and many more categories of natural images.  
The intuition is that since we've already learned a rich set of features on ImageNet, we can simply use a deep network as a feature extractor and only retrain the final layer of the networks to perform well at our task. So let's work our way towards transfer learning.


In summary, transfer learning can be a powerful tool for:
- Preventing poor training from small-datasets
- Reducing training time for similar tasks

### Exercise 1: Inspecting the features of pre-trained deep neural networks

Pytorch provides users with a rich set of pre-trained neural network architectures. These have mostly been pre-trained on imagenet.   
[```torchvision.models```](https://pytorch.org/vision/stable/models.html) provides us with an interface to these pretrained deep neural networks.

- Load a pretrained [AlexNet](https://arxiv.org/abs/1404.5997) model from ```torchvision.models``` ([Source Code](https://pytorch.org/vision/stable/_modules/torchvision/models/alexnet.html) for AlexNet in Pytorch)
- Obtain the weight kernels of the first layer and display them (11x11 kernels shown as a matplotlib graph)
- Remembering the earlier exercise on traditional computer vision kernels and edge detection, how could these come in handy when learning on new data?


In [None]:
from torchvision import models

alexnet = #### code here ####
print(alexnet)

In [None]:
first_layer = #### code here ####
weights = #### code here ####
print(first_layer)


# Normalisation for plotting because imshow does like negative values
min_w, max_w =  weights.min(), weights.max()
weights -= min_w
weights /= (max_w-min_w)

fig, axarr = plt.subplots(8, 8, figsize=(12, 12))
axarr = axarr.flatten()
for ax, kernel in zip(axarr, weights.numpy()):
  ax.imshow(np.swapaxes(kernel, 0, 2))

### Exercise 2: Transfer learning from ImageNet to Bees and Ants

In the previous exercise we've investigated what some features of a very deep pre-trained network look like and learned about transfer learning. Let's try it out! We will try to apply transfer learning to a small dataset containing images of bees and ants by training on top of networks previously trained on ImageNet.

ImageNet is arguably the most popular dataset for benchmarking classification models. It contains around 14 million annotated natural images spread over 22 thousand categories. Images are of size 3 x 224 x 224, with normalised means and stds of [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225]. In transfer learning is common practice to use the means and standard deviations of the data used for pretraining to normalise the new dataset. Note that the most popular networks (VGG, ResNet, AlexNet, etc..) have been design to take as input 3 x 224 x 224 images to accomodate for ImageNet.

<p style="text-align:center;"><img src="https://paperswithcode.com/media/datasets/ImageNet-0000000008-f2e87edd_Y0fT5zg.jpg" alt="drawing" width="500"/>
</p>

Adapted from [here](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)

Perform the following tasks:

0. Adapt the training, validation and evaluation functions to the appropriate input size
1. [Download](https://download.pytorch.org/tutorial/hymenoptera_data.zip) the dataset into your current directory (you can do it manually, our use the code is provided below)
2. Investigate and visualize a few examples of the dataset. What pre-processing is required for this dataset?
3. Instantiate an untrained ResNet18 from [```torchvision.models```](https://pytorch.org/vision/stable/models.html) and make the necessary adaptations to our task in hands.
4. Train the newly initialised ResNet from scratch. What do you notice?
5. Now instantiate the pre-trained ReseNet18 by passing the argument ``pretrained=True`` and perform fine-tuning using a smaller learning rate
6. Use the provided ``set_parameters_requires_grad`` and ``get_params_to_update`` functions to repeat the step above freezing optimisation for all layers except the final classifying layer.
7. Finally, train a ResNet on MNIST from scratch and use those weights to repeat step 6. Does this work?

  



### 2.0 Adapting training, validation and evaluation functions to our problem size

In [None]:
def train(model, optimizer, criterion, data_loader):
    model.train()
    train_loss, train_accuracy = 0, 0
    for X, y in data_loader:
        X, y = X.to(device), y.to(device)
        optimizer.zero_grad()
        a2 = model(X.view(#### code here ####))
        loss = criterion(a2, y)
        loss.backward()
        train_loss += loss*X.size(0)
        y_pred = F.log_softmax(a2, dim=1).max(1)[1]
        train_accuracy += accuracy_score(y.cpu().numpy(), y_pred.detach().cpu().numpy())*X.size(0)
        optimizer.step()  
        
    return train_loss/len(data_loader.dataset), train_accuracy/len(data_loader.dataset)
  
def validate(model, criterion, data_loader):
    model.eval()
    validation_loss, validation_accuracy = 0., 0.
    for X, y in data_loader:
        with torch.no_grad():
            X, y = X.to(device), y.to(device)
            a2 = model(X.view(#### code here ####))
            loss = criterion(a2, y)
            validation_loss += loss*X.size(0)
            y_pred = F.log_softmax(a2, dim=1).max(1)[1]
            validation_accuracy += accuracy_score(y.cpu().numpy(), y_pred.cpu().numpy())*X.size(0)
            
    return validation_loss/len(data_loader.dataset), validation_accuracy/len(data_loader.dataset)
  
def evaluate(model, data_loader):
    model.eval()
    ys, y_preds = [], []
    for X, y in data_loader:
        with torch.no_grad():
            X, y = X.to(device), y.to(device)
            a2 = model(X.view(#### code here ####))
            y_pred = F.log_softmax(a2, dim=1).max(1)[1]
            ys.append(y.cpu().numpy())
            y_preds.append(y_pred.cpu().numpy())
            
    return np.concatenate(y_preds, 0),  np.concatenate(ys, 0)

### 2.1 Loading and Visualising the Data

In [None]:
!wget -nc https://download.pytorch.org/tutorial/hymenoptera_data.zip && unzip -oq hymenoptera_data.zip

In [None]:
seed = 42
lr = 1e-2
momentum = 0.9
batch_size = 64
test_batch_size = 1000
n_epochs = 30

In [None]:
from torchvision import datasets, transforms, models

transform = transforms.Compose([
        #### code here ####
    ])

train_ds = #### code here ####
test_ds = #### code here ####

print(train_ds)
print(train_ds.classes)
print(train_ds.class_to_idx)
print(train_ds[0]) # an example of calling  __getitem__, which is what the dataloader does
print(train_ds.samples[0]) # get image path inside samples
print("\n\n")

# Get mean and std
tmp_loader = DataLoader(train_ds, batch_size=1, num_workers=0)
data = next(iter(tmp_loader))
mean = [torch.mean(data[0][0][i].flatten()).item() for i in range(3)]
std = [torch.std(data[0][0][i].flatten()).item() for i in range(3)]
print(mean, std)

In [None]:
def show_batch(dataset, nr=4, nc=4):
  fig, axarr = plt.subplots(nr, nc, figsize=(10, 10))
  for i in range(nr):
      for j in range(nc):
          idx = random.randint(0, len(train_ds))
          sample, target = train_ds[idx]
          try:
            axarr[i][j].imshow(sample) # if PIL
          except:
            axarr[i][j].imshow(sample.permute(1,2,0)) # if tensor of shape CHW
          target_name = train_ds.classes[target]
          axarr[i][j].set_title("%s (%i)"%(target_name, target))

  fig.tight_layout(pad=1.5)
  plt.show()

In [None]:
show_batch(train_ds, 5, 5)

In [None]:
# Fix image sizes with transforms
train_transform = transforms.Compose([
        #### code here ####
        transforms.ToTensor()
    ])
test_transform = transforms.Compose([
        #### code here ####
        transforms.ToTensor(),
    ])

train_ds = datasets.ImageFolder("./hymenoptera_data/train", transform=train_transform)
test_ds = datasets.ImageFolder("./hymenoptera_data/val", transform=test_transform)

show_batch(train_ds, 5, 5)

In [None]:
# Finally add normalisation to transforms
train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        #### code here ####
    ])
test_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        #### code here ####
    ])

train_ds = datasets.ImageFolder("./hymenoptera_data/train", transform=train_transform)
test_ds = datasets.ImageFolder("./hymenoptera_data/val", transform=test_transform)

# Create dataloader
train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=0)
test_loader = DataLoader(test_ds, batch_size=batch_size, shuffle=False, num_workers=0)

### 2.2 Training a newly initialized Resnet18


In [None]:
set_seed(seed)

model = models.resnet18().to(device)
print(model)
#### code here ####

# optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)
# criterion = nn.CrossEntropyLoss()

# liveloss = PlotLosses()
# for epoch in range(n_epochs):
#     logs = {}
#     train_loss, train_accuracy = train(model, optimizer, criterion, train_loader)

#     logs['' + 'log loss'] = train_loss.item()
#     logs['' + 'accuracy'] = train_accuracy.item()
#     liveloss.update(logs)
#     liveloss.draw()
#     logs['val_' + 'log loss'] = 0.
#     logs['val_' + 'accuracy'] = 0.
    
# test_loss, test_accuracy = validate(model, criterion, test_loader)    
# print("Avg. Test Loss: %1.3f" % test_loss.item(), " Avg. Test Accuracy: %1.3f" % test_accuracy.item())
# print("")

# model_save_name = 'resnet18_bees_and_ants_classifier_full_training_set_baseline.pt'
# path = F"/content/gdrive/My Drive/models/{model_save_name}" 
# torch.save(model.state_dict(), path)

What is happening to the model? Why?

### 2.3 Finetuning a pre-trained Resenet

In [None]:
set_seed(seed)

train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        #### add normalisation here ####
    ])
test_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        #### add normalisation here ####
    ])

train_ds = datasets.ImageFolder("./hymenoptera_data/train", transform=train_transform)
test_ds = datasets.ImageFolder("./hymenoptera_data/val", transform=test_transform)

train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=0)
test_loader = DataLoader(test_ds, batch_size=batch_size, shuffle=False, num_workers=0)

model = #### load pre-trained resnet here ####
model.fc = nn.Linear(model.fc.in_features, 2).to(device)

optimizer = torch.optim.SGD(model.parameters(), lr= ### what should our learning rate be? #### , momentum=momentum)
criterion = nn.CrossEntropyLoss()

liveloss = PlotLosses()
for epoch in range(n_epochs):
    logs = {}
    train_loss, train_accuracy = train(model, optimizer, criterion, train_loader)

    logs['' + 'log loss'] = train_loss.item()
    logs['' + 'accuracy'] = train_accuracy.item()
    liveloss.update(logs)
    liveloss.draw()
    logs['val_' + 'log loss'] = 0.
    logs['val_' + 'accuracy'] = 0.
    
test_loss, test_accuracy = validate(model, criterion, test_loader)    
print("Avg. Test Loss: %1.3f" % test_loss.item(), " Avg. Test Accuracy: %1.3f" % test_accuracy.item())
print("")

model_save_name = 'resnet18_bees_and_ants_classifier_full_training_set_imagenet_finetune.pt'
path = F"/content/gdrive/My Drive/models/{model_save_name}" 
torch.save(model.state_dict(), path)

### 2.4 Pre-trained Resenet as a Feature Extraction Tool

In [None]:
def set_parameter_requires_grad(model, requires_grad=False):
    """https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html"""
    for param in model.parameters():
        param.requires_grad = requires_grad
    return None

def get_params_to_update(model):
    """ Returns list of model parameters that have required_grad=True"""
    params_to_update = []
    for name,param in model.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
    return params_to_update

In [None]:
set_seed(seed)

train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
test_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

train_ds = datasets.ImageFolder("./hymenoptera_data/train", transform=train_transform)
test_ds = datasets.ImageFolder("./hymenoptera_data/val", transform=test_transform)

train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=0)
test_loader = DataLoader(test_ds, batch_size=batch_size, shuffle=False, num_workers=0)

model = models.resnet18(pretrained=True).to(device)
### disable requires_grad for model parameters here ### 
model.fc = nn.Linear(model.fc.in_features, 2).to(device)  # newly initialised layers automatically have requires_grad=True

optimizer = torch.optim.SGD(get_params_to_update(model), lr=0.1*lr, momentum=momentum)
criterion = nn.CrossEntropyLoss()

liveloss = PlotLosses()
for epoch in range(n_epochs):
    logs = {}
    train_loss, train_accuracy = train(model, optimizer, criterion, train_loader)

    logs['' + 'log loss'] = train_loss.item()
    logs['' + 'accuracy'] = train_accuracy.item()
    liveloss.update(logs)
    liveloss.draw()
    logs['val_' + 'log loss'] = 0.
    logs['val_' + 'accuracy'] = 0.
    
test_loss, test_accuracy = validate(model, criterion, test_loader)    
print("Avg. Test Loss: %1.3f" % test_loss.item(), " Avg. Test Accuracy: %1.3f" % test_accuracy.item())
print("")

model_save_name = 'resnet18_bees_and_ants_classifier_full_training_set_imagenet_feature_extract.pt'
path = F"/content/gdrive/My Drive/models/{model_save_name}" 
torch.save(model.state_dict(), path)

If feature extraction can provide sufficiently good accuracy, we can significantly cut in traning time, particularly when the network is very deep and input images are large.

Fine tuning our model parts from the idea that the model starts from a point already close to the optimisation minimum; and that all we are doing is getting closer to that minimum. Feature extraction makes use of the exact features that are used to classified another dataset, only really tuning the final classifying layer.

<img src="https://pbs.twimg.com/media/Ev-f6AaU8AgMeRd.jpg" alt="drawing" width="600"/>

### 2.5 What if pretrained on MNIST instead?

####  Training a Resnet on MNIST from scratch

In [None]:
train_transform = transforms.Compose([
        transforms.Resize(224),
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,)),  
        transforms.Lambda(lambda x: x.expand(3, 224, 224)), # expand to 3 channels               
    ])
test_transform = transforms.Compose([
        transforms.Resize(224),
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,)),
        transforms.Lambda(lambda x: x.expand(3, 224, 224)), # expand to 3 channels      
    ])

mnist_train = MNIST("./", download=True, train=True, transform=train_transform)
mnist_test = MNIST("./", download=True, train=False, transform=test_transform)

# print(mnist_train[0][0].shape)
# plt.imshow(mnist_train[0][0].permute(1,2,0))
# plt.show()

train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=4)
test_loader = DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=4)

model = models.resnet18().to(device)
#### modify last layer here ####

optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)
criterion = nn.CrossEntropyLoss()

feeling_lazy = True

if feeling_lazy:
  !gdown --id 1tOeWpr3jrKgtFu2qx1_4orLx_bwe-467
  path = F"./resnet18_mnist_trained.pt" 
  model.load_state_dict(torch.load(path))

else:
  ### ~ 7 min per epoch here ###
  liveloss = PlotLosses()
  for epoch in range(n_epochs):
      logs = {}
      train_loss, train_accuracy = train(model, optimizer, criterion, train_loader)

      logs['' + 'log loss'] = train_loss.item()
      logs['' + 'accuracy'] = train_accuracy.item()
      liveloss.update(logs)
      liveloss.draw()
      logs['val_' + 'log loss'] = 0.
      logs['val_' + 'accuracy'] = 0.

model_save_name = 'resnet18_mnist_classifier_full_training_set_baseline_.pt'
path = F"/content/gdrive/My Drive/models/{model_save_name}" 
torch.save(model.state_dict(), path)


test_loss, test_accuracy = validate(model, criterion, test_loader)    
print("Avg. Test Loss: %1.3f" % test_loss.item(), " Avg. Test Accuracy: %1.3f" % test_accuracy.item())
print("")

### 2.6 Feature Extraction on the ResNet pretrained on MNIST


In [None]:
set_seed(seed)

train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        #### add normalisation here ####,
    ])
test_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        #### add normalisation here ####,
    ])

train_ds = datasets.ImageFolder("./hymenoptera_data/train", transform=train_transform)
test_ds = datasets.ImageFolder("./hymenoptera_data/val", transform=test_transform)

train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=0)
test_loader = DataLoader(test_ds, batch_size=batch_size, shuffle=False, num_workers=0)

model = models.resnet18().to(device)
model.fc = nn.Linear(model.fc.in_features, 10).to(device)
path = F"./resnet18_mnist_trained.pt" 
model.load_state_dict(torch.load(path))
model.fc = nn.Linear(model.fc.in_features, 2).to(device)

optimizer = torch.optim.SGD(model.fc.parameters(), lr=0.1*lr, momentum=momentum)
criterion = nn.CrossEntropyLoss()

liveloss = PlotLosses()
for epoch in range(n_epochs):
    logs = {}
    train_loss, train_accuracy = train(model, optimizer, criterion, train_loader)

    logs['' + 'log loss'] = train_loss.item()
    logs['' + 'accuracy'] = train_accuracy.item()
    liveloss.update(logs)
    liveloss.draw()
    logs['val_' + 'log loss'] = 0.
    logs['val_' + 'accuracy'] = 0.

model_save_name = 'resnet18_bees_and_antes_classifier_full_training_set_mnist_transfer.pt'
path = F"/content/gdrive/My Drive/models/{model_save_name}" 
torch.save(model.state_dict(), path)


test_loss, test_accuracy = validate(model, criterion, test_loader)    
print("Avg. Test Loss: %1.3f" % test_loss.item(), " Avg. Test Accuracy: %1.3f" % test_accuracy.item())
print("")



What does this tell us?