<a href="https://colab.research.google.com/github/Evergarden0101/FS23-Deep-Learning/blob/main/%E2%80%9CDL_Assignment07_ipynb%E2%80%9D%E7%9A%84%E5%89%AF%E6%9C%AC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 7: Transfer Learning


The goal of this exercise is to learn how to use pre-trained networks in transfer learning tasks.
We will make use of networks trained on ImageNet, and apply them to related problems, i.e., the classification of $10$ objects not contained in ImageNet.

## Dataset

For this exercise we use the  [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset that can be downloaded from the official website [here]({https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz}).
The dataset contains $60000$ color images of pixels size $32\times 32$ in $10$ classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck, with $6000$ images per class.

### Task 1: Data Transformation

We need to instantiate a proper `torchvision.transform` instance to create the same input structure as used for training our network.
We need to combine 4 transforms, which can be compiled from the PyTorch website: https://pytorch.org/vision/stable/models.html

1. We need to resize the image such that the shorter side has size 256.
2. We need to take the center crop of size $224\times224$ from the image.
3. We need to convert the image into a tensor (including pixel values scaling)
4. We need to normalize the pixel values with mean $(0.485, 0.456, 0.406)$ and standard deviation $(0.229, 0.224, 0.225)$.

Since we will use networks pre-trained on ImageNet, we need to perform the exact same transform as used for ImageNet testing.

In [None]:
import torch
import torchvision
import torchvision.transforms as transform
device = torch.device('cuda')
imagenet_transform = transform.Compose([
    transform.Resize(256),
    transform.CenterCrop(224),
    transform.ToTensor(),
    transform.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

### Task 2: Dataset Loading

We here use the [torchvision.datasets.CIFAR10](https://pytorch.org/vision/0.12/generated/torchvision.datasets.CIFAR10.html) dataset interface for processing images. 
You can use the `train` argument or flag to distinguish between training and test set.

This task consists of two parts:

1. Create two datasets, one for the training set, one for the test set. Use the transform defined above.
2. Once the datasets are created, create two data loaders, one for training set, one for test set. Use a proper value of the batch-size $B$.

In [None]:
trainset = torchvision.datasets.CIFAR10(root='/Users/yixuan/Documents/UZH/23spring/DL/cifar10', train=True, download=True, transform=imagenet_transform)

testset = torchvision.datasets.CIFAR10(root='/Users/yixuan/Documents/UZH/23spring/DL/cifar10', train=False, download=True, transform=imagenet_transform)


Files already downloaded and verified
Files already downloaded and verified


In [None]:
B = 32
trainloader = torch.utils.data.DataLoader(trainset, batch_size=B, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=B, shuffle=False)

### Test 1: Data Size and Types

We check that all input images are `torch.tensors` of size $3\times224\times224$ and of type `torch.float` and that all labels are of type `int`.

Note: the sanity check is only performed on the test set.

In [None]:
for x, t in testset:
    assert isinstance(x, torch.Tensor)
    assert isinstance(t, int)
    assert x.shape==(3,224,224)
    assert x.dtype==torch.float

## Deep Feature Extraction

We will use a pre-trained network available in `PyTorch`. 
Particularly, we will use a ResNet-50 architecture, but other architectures can also be tested. 
Fortunately, PyTorch provides simple interfaces to obtain pre-trained models, e.g., using the `torchvision.models.resnet50` interface function.

In order to use the networks in a different dataset, we need to change their outputs. 
There are several possibilities on how to achieve that, and you have the freedom to choose. 

For your reference, the implementation of the `forward` function of ResNet networks (including ResNet-50) can be found here: https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py#L266

You can also check if other networks perform better, for example, deeper ResNet topologies.
Be aware that the strategy to replace the last fully-connected layer might not work in other network topologies, only in residual networks.

### Task 3: Pre-trained Network Instantiation

Instantiate two pre-trained networks of type ResNet-50.

1. Freeze the feature layers of the first network.

Note: Make use the `old TorchVision Interface` to load your pre-trained network. Here is the link: https://pytorch.org/vision/0.12/models.html 

In [None]:
# instantiate the first pre-trained resnet 50 network
network_1 = torchvision.models.resnet50(pretrained=True)
# Make sure to freeze all the layers of the network.
for param in network_1.parameters():
    param.requires_grad = False

# instantiate the second pre-trained resnet 50 network (optinally)
network_2 = torchvision.models.resnet50(pretrained=True)



### Task 4: Network Implementation

We want to modify the network such that we extract the logits for the 10 classes from CIFAR-10 from the last fully-connected layer of the network.

Implement a function that:
1. Replaces the current last linear layer of the pre-trained network with a new linear layer that has $O$ units ($O$ represents the number of classes in our dataset).
2. Initialize the weights of the new linear layer using Xavier's method **(Optional)**.

Note: Use `torch.nn.init.xavier_uniform_` function to initialize the weights of the new linear layer.

In [None]:
def replace_last_layer(network, O=10):
  # replace the last linear layer with the new layer
    in_features = network.fc.in_features
    network.fc = torch.nn.Linear(in_features, O)
    torch.nn.init.xavier_uniform_(network.fc.weight)
    return network

### Test 2: Last layer dimensions

This test ensures that the function return a network having the correct number of input and output units in the last layer.

In [None]:
O = 10
for network in (network_1, network_2):
    new_model = replace_last_layer(network, O=O)
    assert new_model.fc.out_features == O
    assert new_model.fc.in_features == 2048

## Network Training
Implement a function that takes all necessary parameters to run a training on a given dataset. 
Select the optimizer to be `torch.optim.SGD` and `torch.nn.CrossEntropyLoss` as the loss function. 
The test set will be used as the validation set.

### Task 5: Training and Evaluation Loop

Implement a training loop over a specific number of epochs (10) with a learning rate of $\eta=0.001$ and momentum of $\mu = 0.9$. 
Make sure that you train on the training data only, and `not` on the validation data.
In each loop, compute and print the training loss, training accuracy, validation loss and validation accuracy. 

In [None]:
def train_eval(model, num_epochs=10, lr=0.001, momentum=0.9):
    optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)
    criterion = torch.nn.CrossEntropyLoss()
    train_loss, train_acc = [], []
    test_loss, test_acc = [], []

    for epoch in range(num_epochs):
        
        # training process
        model.to(device)
        model.train()
        train_running_loss = 0.0
        train_running_correct = 0
        for i, (inputs, labels) in enumerate(trainloader):
#             if i % 500 == 0:
#                 print(i)
            inputs = inputs.to(device)
            labels = labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            train_running_loss += loss.item()
            _, preds = torch.max(outputs.data, 1)
            train_running_correct += (preds == labels).sum().item()

        train_loss = train_running_loss/len(trainloader)
        train_acc = 100. * train_running_correct/len(trainloader.dataset)

        # testing process
        model.eval()
        test_running_loss = 0.0
        test_running_correct = 0
        for i, (inputs, labels) in enumerate(testloader):
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            test_running_loss += loss.item()
            _, preds = torch.max(outputs.data, 1)
            test_running_correct += (preds == labels).sum().item()

        test_loss = test_running_loss/len(testloader)
        test_acc = 100. * test_running_correct/len(testloader.dataset)

        # print accuracies and losses for current epoch
        print('Epoch [{}], Loss: {:.4f}, Train Acc: {:.2f}%, Test Loss: {:.4f}, Test Acc: {:.2f}%'
              .format(epoch+1, train_loss, train_acc, test_loss, test_acc))

### Task 6: Network Fine-Tuning with Frozen Layers

Create a network that has feature layers frozen with $10$ output units. 
Fine-tune the created network on our CIFAR-10 data using the previous function.

In [None]:
network_with_frozen_layers = network_1
train_eval(network_with_frozen_layers,num_epochs=10, lr=0.001, momentum=0.9)

Epoch [1], Loss: 0.9751, Train Acc: 68.02%, Test Loss: 0.7310, Test Acc: 75.34%
Epoch [2], Loss: 0.7332, Train Acc: 75.24%, Test Loss: 0.6793, Test Acc: 76.82%
Epoch [3], Loss: 0.6979, Train Acc: 76.21%, Test Loss: 0.6558, Test Acc: 77.81%
Epoch [4], Loss: 0.6705, Train Acc: 77.08%, Test Loss: 0.6430, Test Acc: 78.04%
Epoch [5], Loss: 0.6594, Train Acc: 77.40%, Test Loss: 0.6473, Test Acc: 77.78%
Epoch [6], Loss: 0.6483, Train Acc: 77.90%, Test Loss: 0.6398, Test Acc: 78.38%
Epoch [7], Loss: 0.6391, Train Acc: 78.09%, Test Loss: 0.6239, Test Acc: 78.75%
Epoch [8], Loss: 0.6331, Train Acc: 78.27%, Test Loss: 0.6172, Test Acc: 79.14%
Epoch [9], Loss: 0.6228, Train Acc: 78.66%, Test Loss: 0.6085, Test Acc: 79.20%
Epoch [10], Loss: 0.6153, Train Acc: 78.85%, Test Loss: 0.6023, Test Acc: 79.30%


### Task 7 (Optional): Network Fine-Tuning without Frozen Layers 

Create a network from the second pre-trained network with $10$ output units. 
Fine-tune the created network on our CIFAR-10.

Note:

  * The fine-tuning of the network can take a long time when the layers are not frozen.

In [None]:
network_normal = network_2
train_eval(network_normal,num_epochs=10, lr=0.001, momentum=0.9)

Epoch [1], Loss: 0.3398, Train Acc: 88.55%, Test Loss: 0.1632, Test Acc: 94.69%


## Plotting

Finally, we want to plot the confusion matrix of the test set.
For this, we need to compute the predictions for all of our test samples, and the list of target values.
Finally, we can make use of the `sklearn.metrics.confusion_matrix` to compute the confusion matrix.
You can utilize `sklearn.metrics.ConfusionMatrixDisplay` for displaying the confusion matrix, or `pyplot.imshow` and adding the according labels.

Note:

  * The documentation for the confusion matrix can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
  * The interface and an example for the `ConfusionMatrixDisplay` can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html

### Task 8: Confusion Matrix Plotting

Plot the confusion matrix for the fine-tuned network with frozen layers.
Optionally, also plot the confusion matrix for the second fine-tuned network.

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
# compute predictions and collect targets
network_with_frozen_layers.to(device)
network_with_frozen_layers.eval()
predictions1, targets1 = [], []
for batch in testloader:
    inp=batch[0]
    inp = inp.to(device)
    outputs = network_with_frozen_layers(inp)
    preds = torch.argmax(outputs, dim=1).cpu()
    predictions1.extend(preds)
    targets1.extend(batch[1])

network_normal.to(device)
network_normal.eval()
predictions2, targets2 = [], []
for batch in testloader:
    inp=batch[0]
    inp = inp.to(device)
    outputs = network_normal(inp)
    preds = torch.argmax(outputs, dim=1).cpu()
    predictions2.extend(preds)
    targets2.extend(batch[1])



# compute confusion matrix
targets1 = torch.stack(targets1).cpu() 
predictions1 = torch.stack(predictions1).cpu() 
matrix1 = confusion_matrix(targets1, predictions1)

targets2 = torch.stack(targets2).cpu() 
predictions2 = torch.stack(predictions2).cpu() 
matrix2 = confusion_matrix(targets2, predictions2)

# plot confusion matrix

disp = ConfusionMatrixDisplay(confusion_matrix=matrix1, display_labels=range(10))
disp.plot()


# add axis labels if required
plt.title('Confusion Matrix - network_with_frozen_layers')
plt.xlabel('Predicted Class')
plt.ylabel('Target Class')
plt.show()


disp = ConfusionMatrixDisplay(confusion_matrix=matrix2, display_labels=range(10))
disp.plot()
plt.title('Confusion Matrix - network_normal')
plt.xlabel('Predicted Class')
plt.ylabel('Target Class')
plt.show()

