HHU Deep Learning, WS2023/24, 08.12.2022

Lecture: Prof. Dr. Markus Kollmann

Exercises: Nikolas Adaloglou, Felix Michels

# Assignment 9 - Transfer Learning in Image Classification

---

Submit the solved notebook (not a zip) with your full name plus assignment number for the filename as an indicator, e.g `max_mustermann_a1.ipynb` for assignment 1. If we feel like you have genuinely tried to solve the exercise, you will receive 1 point for this assignment, regardless of the quality of your solution.

## <center> DUE FRIDAY 15.12.2023 2:30 pm </center>

Drop-off link: [https://uni-duesseldorf.sciebo.de/s/zDoBcOZiMPNar50](https://uni-duesseldorf.sciebo.de/s/zDoBcOZiMPNar50)

---

We will use `medmnist`, a python library that contains multiple medical imaging datasets for experimentation. You can install it locally via `pip install medmnist` or in google colab with `!pip install medmnist`

### The story

We found some pretrained resnet models on large-scale natural image data and we want to see if the learned weights are usefull for our medical image classification.

- Task 1 and 2: To this end, we will first evaluate the [K-nearest neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) (KNN) accuracy, using features from a pretrained resnet. We will compare the representations of resnet18 and resnet50
- Task 3: Next, we will use the features as training data, instead of the images, and train an MLP on top.
- Task 4: We will try to avoid overfitting by applying regularization techniques.
- Task 5: Finally, we will train the whole network (resnet50), starting from pretrained imagenet weights.

Our new goal is to reach a 95% val. accuracy on pathmnist.

In [None]:
!pip install medmnist
!wget -c https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/utils.py

In [None]:
import os
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import torchvision
import medmnist
import torch
import matplotlib.pyplot as plt

from medmnist import INFO
from torchvision import transforms as T
from tqdm import tqdm

# Warning: local import - utils.py must be in the same folder as this notebook
from utils import *

# Specify dataset
data_flag = 'pathmnist'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
download = True
batch_size = 256
info = INFO[data_flag]
n_channels = info['n_channels']
n_classes = len(info['label'])
print("n_classes", n_classes, info)
DataClass = getattr(medmnist, info['python_class'])

os.makedirs("./figs/", exist_ok=True)

In [None]:
# Moves the range [0,1] to [-1,1]
mean = torch.tensor([0.5], dtype=torch.float32)
std = torch.tensor([0.5], dtype=torch.float32)

plain_transform = T.Compose([T.ToTensor(), T.Normalize(list(mean), list(std))])

# load the data
train_ds_plain = DataClass(split='train', transform=plain_transform, download=download)
val_ds = DataClass(split='val', transform=plain_transform, download=download)
test_ds = DataClass(split='test', transform=plain_transform, download=download)

train_loader_plain1 = data.DataLoader(dataset=train_ds_plain, batch_size=batch_size, shuffle=True, drop_last=True)

img1, lab = next(iter(train_loader_plain1))

# show the images
plt.figure(figsize = (50,20))
for i in range(10):
    imshow(train_ds_plain[i][0], i, mean, std)


# Task 1

Implement the logic of using a pretrained model to produce embeddings and apply KNN on top.
More precisly, given a pretrained model and a train and test loader, it computes the embeddings (on the gpu) and then uses the class `sklearn.neighbors.KNeighborsClassifier` to create a classifier. The classifier is created from the train embeddings and computes the train and test accuracy for both data splits.

### Optional - encapsulate the logic in a class
You can encapsulate the aforementioned logic by filling up the methods in the class below.
Minimal documentation is provided for each class method.
The method `execute` illustrates how the class methods should be used.

```python
class KnnConvnet():
    def __init__(self, model, device='cpu', distance='cosine'):
        super(KnnConvnet, self).__init__()
    def get_features(self, mode='train'):
    def set_features(self, embeds, labels, mode='train'):
    def extract_features(self, loader):
    def fit(self, features, labels, k):
    def accuracy(self, features, labels):
    
    @torch.no_grad()
    def execute(self, train_loader, test_loader=None, k=1):
        if self.embeds_train is None:
            embeds_train, lab_train = self.extract_features(train_loader)
            self.set_features(embeds_train, lab_train, mode='train')
        
        self.fit(self.embeds_train, self.lab_train, k)
        train_acc = self.accuracy(self.embeds_train, self.lab_train)

        if test_loader is not None:
            if self.embeds_test is None:
                embeds_test, lab_test = self.extract_features(test_loader)
                self.set_features(embeds_test, lab_test, mode='test')
            
            test_acc = self.accuracy(self.embeds_test, self.lab_test)
            return train_acc, test_acc
        
        return train_acc
```


#### Tips

- Feature extraction is much much faster on the GPU
- Use the cosine similarity as a distance metric


In [None]:
import numpy as np
import torch
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from torch import nn

class KnnConvnet():
    def __init__(self, model, device='cpu', distance='cosine'):
        self.device = device
        self.model = model.to(device)
        self.model.eval()
        self.embeds_train = None
        self.lab_train = None
        self.lab_test = None
        self.embeds_test = None
        self.distance = distance

    def get_features(self, mode='train'):
        """Returns the embeddings of the train or test set.
        Args:
            mode (str, optional): "train" or "test". Defaults to 'train'.
        """
        assert self.embeds_train is not None, 'Training embedding are not computed yet.'
        assert self.embeds_test is not None, 'Test embedding are not computed yet.'
        ### START CODE HERE ### (approx. 4 lines)
        ### END CODE HERE ###

    def set_features(self, embeds, labels, mode='train'):
        """Sets the train or test embeddings and their labels."""
        ### START CODE HERE ### (approx. 6 lines)
        ### END CODE HERE ###

    @torch.no_grad()
    def extract_features(self, loader):
        """Infers features from the provided image loader.
        Args:
            loader: train or test loader
        Returns: 3 tensors of all: features, labels
        """
        features = []
        label_lst = []
        ### START CODE HERE ### (approx. 4 lines)
        ### END CODE HERE ###
        h_total = torch.cat(features)
        label_total = torch.cat(label_lst)
        return h_total, label_total

    @torch.no_grad()
    def fit(self, features, labels, k):
        """Fits the provided features to create a KNN classifer (i.e. self.cls object).
        Args:
            features: [... , dataset_size, feat_dim]
            labels: [... , dataset_size]
            k: number of nearest neighbours for majority voting
        """
        ### START CODE HERE ### (approx. 2 lines)
        ### END CODE HERE ###


    def accuracy(self, features, labels):
        """Uses the features to compute the accuracy of the classifier (i.e. self.cls object)."""
        ### START CODE HERE ### (approx. 2 lines)
        ### END CODE HERE ###
        return acc

    @torch.no_grad()
    def execute(self, train_loader, test_loader=None, k=10):
        if self.embeds_train is None:
            embeds_train, lab_train = self.extract_features(train_loader)
            self.set_features(embeds_train, lab_train, mode='train')

        self.fit(self.embeds_train, self.lab_train, k)
        train_acc = self.accuracy(self.embeds_train, self.lab_train)

        if test_loader is not None:
            if self.embeds_test is None:
                embeds_test, lab_test = self.extract_features(test_loader)
                self.set_features(embeds_test, lab_test, mode='test')

            test_acc = self.accuracy(self.embeds_test, self.lab_test)
            return train_acc, test_acc

        return train_acc

def test_knn():
    d1 = torch.utils.data.Subset(train_ds_plain, list(range(300)))
    d2 = torch.utils.data.Subset(val_ds, list(range(100)))
    train_loader = data.DataLoader(dataset=d1, batch_size=32, shuffle=False, drop_last=False)
    test_loader = data.DataLoader(dataset=d2, batch_size=32, shuffle=False, drop_last=False)
    model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.DEFAULT)
    knn_cls = KnnConvnet(model, device=device)
    for k in 1, 5:
        train_acc, test_acc = knn_cls.execute(train_loader, test_loader, k=k)
        print(f"train acc: {train_acc:.2f}%, test acc: {test_acc:.2f}%")

test_knn()

# Expected results

```
train acc: 100.00%, test acc: 48.00%
train acc: 74.67%, test acc: 56.00%
```


# Task 2: Use the KNN classifier on pretrained features of resnet50

Now we will use the KNN class together with two models: resnet50 and resnet18. These are out-of-the-box models, available in the `torchvision` package.  

- Load the resnet18 and resnet50 models from `torchvision`, pretrained on the imagenet dataset.
- Remove the last layer (use print(model) to see what it's called).
- Compute the image embeddings for all training and test data, using the `extract_features` function. These now serve as our new, transformed data. We will use the validation dataloader as test data.
- When computing the training accuracy, use a random subset of 10000 elements of the training data (otherwise it can be quite slow)
- Find the best choice of K in KNN (hyperparameter search). Try at least 4 different values of K for both models.
- Plot the results in a scatter plot for both resnet18 and resnet50.
- **Important: save the image embeddings** from resnet50 for the next task (i.e. with `torch.save`).

PS: Use the non-augmented train and validation dataloaders.

You are expected to observe at least 72% val accuracy with resnet50.

Hint: those working with google colab you can download files with:

```python
from google.colab import files
import pandas as pd
# saves file to local google colab enviroment
result.to_csv('example_file.csv')
# downloads it to your computer
files.download('example_file.csv')
```
It works with any type of file.

In [None]:
def get_model(modelname="resnet18", pretrained=True):
    ### START CODE HERE ### (approx. 2 lines)
    ### END CODE HERE ###
    return model

def run_knn_hp_tuning(model, train_loader, test_loader,
                        range_k = [5, 10, 15, 20], device='cpu', modelname='resnet'):
    train_acc_all = []
    val_acc_all = []
    knn_cls = KnnConvnet(model, device)

    print("Calc. train features")
    ### START CODE HERE ### (approx. 4 lines)
    ### END CODE HERE ###

    print("Calc. test features")
    ### START CODE HERE ### (approx. 4 lines)
    ### END CODE HERE ###

    for k in range_k:
        ### START CODE HERE ### (approx. 3 lines)
        ### END CODE HERE ###
    return train_acc_all, val_acc_all

train_loader_plain2 = data.DataLoader(dataset=train_ds_plain, batch_size=1024, shuffle=True, drop_last=False)
val_loader = data.DataLoader(dataset=val_ds, batch_size=1024, shuffle=False, drop_last=False)

backbone_r18 = get_model("resnet18")
backbone_r50 = get_model("resnet50")

range_k = [5, 10, 15, 20]
train_acc_r18, val_acc_r18 = run_knn_hp_tuning(backbone_r18, train_loader_plain2,
    val_loader, range_k, device, modelname="resnet18")

train_acc_r50, val_acc_r50 = run_knn_hp_tuning(backbone_r50, train_loader_plain2,
    val_loader, range_k, device, modelname="resnet50")

In [None]:
import matplotlib.pyplot as plt

def plot_knn_accs(range_k, train_acc_all, val_acc_all,  modelname='resnet18'):
    plt.plot(range_k, train_acc_all, marker='o', label=f'{modelname} train acc %')
    plt.plot(range_k , val_acc_all, marker='x', label=f"{modelname} val. acc %")

plt.figure(figsize=(12, 6))
plot_knn_accs(range_k, train_acc_r18, val_acc_r18,  modelname='resnet18')
plot_knn_accs(range_k, train_acc_r50, val_acc_r50,  modelname='resnet50')
plt.legend( bbox_to_anchor=(1,1), loc="upper left")
plt.grid()
plt.xticks([]) # hides x axis
plt.xlabel("k")
plt.ylabel(f"Accuracy %")
plt.title("Benchmarking resnet18 and resnet50 embeddings with KNN")
plt.savefig("./figs/knn_accs_resnets.png", dpi=500, bbox_inches='tight')
plt.show()

### Expected result

![im1](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_accs_resnets.png)

# Task 3: Train an MLP on the extracted image features from resnet50

- Create new datasets and dataloaders that return a pair of (features, labels) from resnet50 instead of the image.

Since we have precomputed the features, you can use `torch.load` here to load the features and labels and `torch.utils.data.TensorDataset` to make a dataset.

- Train a 2-layer MLP with a ReLU activation on the extracted features from the pretrained resnet50 as input data.
- Train for at least 100 epochs with Adam.
- Expected val accuracy is 84%

Hint: For small models, just nn.Sequential produces a fully functioning model. You don't need to define an extra class here.

In [None]:
# load embeds from hard disk
### START CODE HERE ### (approx. 2 lines)
### END CODE HERE ###

# create dataset
### START CODE HERE ### (approx. 2 lines)
### END CODE HERE ###
print("Feature datasets have been created")

# create dataloaders
### START CODE HERE ### (approx. 2 lines)
### END CODE HERE ###

# create model
### START CODE HERE ### (approx. 1 lines)
### END CODE HERE ###

num_epochs = 100
# setup optimizer
### START CODE HERE ### (approx. 1 lines)
### END CODE HERE ###

dict_log = train(model, optimizer, num_epochs, train_loader_features, val_loader_features, device)

figsize = (15,10)
plt.figure(figsize=figsize)
plot_stats(dict_log, baseline=84, modelname="MLP/Resnet50")
plt.savefig(fname="./figs/mlp_resnet50_embeds.png", dpi=500, bbox_inches='tight')
plt.show()
print("Best val. acc", np.max(dict_log["val_acc_epoch"]))


### Expected result

![im2](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/mlp_resnet50_embeds.png)

# Task 4: Regularization strategies
If you trained the model, you should be able to see the overfitting behaviour with a degradation in loss/accuracy towards the end of training.

What regularization strategies could you use to prevent the model from overfitting?

Experiment with at least 1 regularization strategy (apart from tuning the weight decay) and see if you can further increase the performance.

- Does regularization lead to performance improvement?
- How do the training dynamics change when training with stronger regularization strategies?

In [None]:
### START CODE HERE ### (approx. 3 lines)
### END CODE HERE ###

dict_log = train(model, optimizer, num_epochs, train_loader_features, val_loader_features, device)

figsize = (15,10)
plt.figure(figsize=figsize)
plot_stats(dict_log, baseline=84, modelname="MLP/Resnet50")
plt.savefig(fname="./figs/mlp_resnet50_embeds_regularization.png", dpi=500, bbox_inches='tight')
plt.show()
print("Best val. acc", np.max(dict_log["val_acc_epoch"]))

### Expected result

![im3](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/mlp_resnet50_embeds_regularization.png)

# Task 5: Train the whole model (resnet50) on pathmnist starting from imagenet initialization.

Now we will re-train the whole model, using the pretrained weights as a starting point. Also, we add data-augmentation, see below for details.

Training data augmentations:
- Horizontal flip with 50% probability
- Random crop images to 50-100 % of the initial size
- Resize images to 28x28
- intensity jitter: 20% brightness and 20% contrast with 80% probability
- Mean/std norm with mean=0.5 and std=0.5 for all channels

At val/test time, only mean/std normalization will be applied.


Do the following:
- Load a pretrained resnet50. Remove the pretrained head (last layer). Add a two-layer MLP as the new last layer.
- Compute the number of trainable and non-trainable parameters.
- Train for 20 epochs.
- Plot training statistics
- Calculate test accuracy!


Compare the fine-tuned resnet50 on the pathmnist dataset with the previously trained MLP on the extracted features.

Expected val. acc is 95%

In [None]:
### START CODE HERE ### (approx. 9 lines)
### END CODE HERE ###

# For val/time define a plain transform
plain_transform = T.Compose([
        T.ToTensor(),
        T.Normalize(list(mean), list(std))])

# load the data
train_ds = DataClass(split='train', transform=train_transform, download=download)

train_loader = data.DataLoader(train_ds, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=8, pin_memory=True, persistent_workers=True)
val_loader = data.DataLoader(val_ds, batch_size=batch_size, shuffle=False, drop_last=True, num_workers=4, pin_memory=True, persistent_workers=True)
test_loader = data.DataLoader(test_ds, batch_size=batch_size, shuffle=False, drop_last=True)

# create model
class ResnetMedNist(nn.Module):
    def __init__(self, hidden_mlp, n_classes):
        super(ResnetMedNist, self).__init__()
        ### START CODE HERE ### (approx. 3 lines)
        ### END CODE HERE ###

    def forward(self, x):
        ### START CODE HERE ### (approx. 1 lines)
        ### END CODE HERE ###


# Initialize model and optimizer
num_epochs = 20
### START CODE HERE ### (approx. 2 lines)
### END CODE HERE ###



# Number of parameters
### START CODE HERE ### (approx. 2 lines)
### END CODE HERE ###
print(f" Total params: {pytorch_total_params:.1f} M , trainble params {pytorch_total_params_trainable:.1f}")

dict_log = train(model, optimizer, num_epochs, train_loader, val_loader, device)

figsize = (15,10)
plt.figure(figsize=figsize)
plot_stats(dict_log, baseline=94, modelname="Resnet50 + MLP fine-tuning")
plt.savefig(fname="./figs/mlp_resnet50_finetune_embeds.png", dpi=500, bbox_inches='tight')
plt.show()

## Expected results
![im4](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/mlp_resnet50_finetune_embeds.png)

# Task 6: Visualizing the nearest neighbors in the embeddings space.

- Load the embeddings (produced from resnet50) and their labels from the validation set.
- Take 1 image sample from each class.
- Calculate their top k=7 nearest neighbors (NN) using cosine similarity.
- Visualize the 7+1 images in one row. First one should be the reference image that we computed its NN.

PS: we use 7 so we can plot them in 1 row. Feel free to play around with more NN.

#### Questions
- What is the connections between the NN in the embedding space and the image labels?
- How many of the retrieved images in the embedding space share the same label?

In [None]:
plain_transform = T.Compose([
        T.ToTensor(),
        T.Resize(128),
        T.Normalize(list(mean), list(std))])
val_ds = DataClass(split='val', transform=plain_transform, download=download)

### START CODE HERE ### (approx. 3 lines)
### END CODE HERE ###

# Normalize the embeddings, so that a normal vector product is the cosine similarity
### START CODE HERE ### (approx. 1 lines)
### END CODE HERE ###

ref_imgs = []  # Save one image per class
ref_ids = []  # Save the indices for these images
for j in range(n_classes):
    ### START CODE HERE ### (approx. 3 lines)
    ### END CODE HERE ###

ref_imgs = torch.stack(ref_imgs)

# For each image, compute the k nearest neighbours, according to cosine similarity
### START CODE HERE ### (approx. 3 lines)
### END CODE HERE ###

# Visualize the reference image and its 7 nearest neighbors
for c, ref in enumerate(ref_ids):
    knns = indices[c]
    imgs_to_viz = [val_ds[ref][0]]
    true_labels = [val_ds[ref][1]]
    for i in knns:
        imgs_to_viz.append(val_ds[i][0])
        true_labels.append(val_ds[i][1])
    # show the images
    plt.figure(figsize = (22,14))
    for j in range(k+1):
        label = int(true_labels[j])
        imshow(imgs_to_viz[j], j, mean, std)
        plt.title(f"Label {label}", fontsize = 14)
    plt.savefig(f'./figs/knn_from_ref_label_id_{str(c).zfill(2)}',bbox_inches = "tight", dpi = 500)

### Expected results

![im6](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_00.png)

![im7](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_01.png)

![im8](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_02.png)

![im9](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_03.png)

![im10](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_04.png)

![im11](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_05.png)

![im12](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_06.png)

![im13](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_07.png)

![im14](https://github.com/HHU-MMBS/Deep-Learning-Exercise-Extras/raw/main/a09_transfer_learning/figs/knn_from_ref_label_id_08.png)
