<div class="alert alert-block alert-warning">

# <p style="text-align: center;">Lab 4: Convolutional Neural Network (CNN)</p>
<div style="text-align: center;">
<img src="data/thumbnail.png" width="700"/>
</div>

Welcome to your fourth and final lab of ECE4179! Labs in this unit will run as a help desk and they are not mandatory to attend.

This notebook contains all the code and comments that you need to submit. Here are the instructions to complete this lab:

- Your grade is entirely based on notebook completion (no quiz).
- This lab requires training deep learning models with large datasets which takes a while to complete. You are **highly encouraged** to start as soon as possible.
- After completing the notebook, submit it to Moodle under '**Lab 4 Submission**'. Along with this notebook, you have to submit your model file for Task 2. Further instructions can be found in Task 2.
- IMPORTANT: The notebook will be auto-graded, therefore please do not edit/rename the already-given variable/function/class names.

This lab has two tasks:

- [Task 1: Design and train a CNN for image classification and analyse the results](#task1)
- [Task 2: Design and train a CNN by yourself](#task2)

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

device = 'cuda' if torch.cuda.is_available() else 'cpu'

def seed_all(seed=0):
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

# if you intend to use GPU, you need to install PyTorch's CUDA support in the virtual environment:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# GPU will be slightly quicker for training, but this lab is tested and fully do-able with just cpu.

<div class="alert alert-block alert-info">

## Task 1 - Design and train a CNN for image classification <a class="anchor" id="task1"></a>
        
In Task 1, you will design a CNN for image classification task and train it on a given dataset. The procedure is similar to Lab 3's MLP, but this time we will be using convolutional layers.

<div style="text-align: center;">
<img src="data/task1.png" width="500" />
</div>

<div class="alert alert-block alert-info">

### 1.1 Create Pytorch Datasets and DataLoaders
     
The dataset for this set in located in `data/task1_train` and `data/task1_test`, which contains 6000 train images and 1250 test images (we will not have a validation set for simplicity). Each image has a resolution of 32 x 32 pixels and belongs to one of five classes (airplane, bird, cat, dog, ship).

In the same directory as this notebook, there are two csv files: `task1_train.csv` and `task1_test.csv`, which describes the dataset with the information "filename", "label", and "path". Have a look at these files and images to understand their structure.

In [None]:
# The classes are provided for you. You can run and utilise these variables in your code.
class_names = ['airplane', 'bird', 'cat', 'dog', 'ship']
num_class = len(class_names)

#### (a) Transforms & Data Augmentation

Image dataset usually undergoes transforms in the Dataset class, so we need to define them first. This concept is covered in the workshop, and [here](https://pytorch.org/vision/stable/transforms.html) is the PyTorch documentation about transforms.

The transforms we will use are mainly ("ToTensor()" and "Normalize()") which we have used in the workshop. Now in addition to the usual transforms, we will be adding __data augmentation__ as well. The ones we have included are:

- RandomHorizontalFlip()
- RandomRotation()

Now, what is **data augmentation**? Data augmentation in deep learning refers to the process of artificially diversifying a dataset by applying various transformations and modifications to the existing training examples. The objective of data augmentation is to enhance the model's generalization capability and robustness by exposing it to a wider range of variations and scenarios that might be encountered during real-world inference. These transformations can include rotations, flips, translations, changes in lighting and contrast, cropping, and more, depending on the nature of the data. By introducing this augmented data during training, the model becomes better equipped to handle novel and previously unseen examples, effectively reducing overfitting and improving its ability to extract meaningful features from noisy or imperfect inputs.

Data augmentation is specifically applied to the training set, and not the test set, because the purpose of data augmentation is to introduce diversity and variation into the training examples, helping the model to generalize better to real-world scenarios. The test set serves as an unbiased evaluation of the model's performance on unseen data. If data augmentation were applied to the test set, it could lead to overly optimistic performance estimates and potentially inflating its apparent accuracy.

In [None]:
# T1.1a IMPORTANT: Please do not edit/remove this comment.

# The Compose function allows you to combine multiple image transformations into a sequential pipeline. 
# Each transformation will be applied in the order they appear within the list.
transform_train = transforms.Compose([
    ???, # Randomly flip images horizontally (left to right)
    ???, # Apply a random rotation of +-10 degrees to each image
    transforms.ToTensor(), # Convert the image to a PyTorch tensor
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), # Normalise the image tensor with mean and standard deviation values of 0.5 for each channel
])

# Augmentation is not applied to test data
transform_test = transforms.Compose([
    transforms.ToTensor(), # Convert the image to a PyTorch tensor
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), # Normalise the image tensor with mean and standard deviation values of 0.5 for each channel
])

#### (b) Dataset class

Just like in the previous lab, we need a Dataset class first, which contains the three essential parts:
1. The \__init__ function <br>
2. The \__getitem__ function <br>
3. The \__len__ function

In [None]:
# T1.1b IMPORTANT: Please do not edit/remove this comment.

class Task1_Dataset(Dataset):
    def __init__(self, csv_file, class_names, transform=None):
        # In the previous lab, this part is where we load our data. However, if your datasest is very large,
        # you can't load all of them into memory at once, which is the case for this lab.
        # The strategy here is to load the file paths and labels (obtained from the csv file),
        # and load the images on-the-fly when __getitem__ is called by the DataLoader.
        
        self.df = pd.??? # Load the csv file as a pandas DataFrame (df for short)
        self.class_names = class_names
        self.transform = transform

    def __len__(self):
        # total number of samples is the same as the number of rows in the csv file
        return ???

    def __getitem__(self, idx):
        # We load the image and its label for the given index. The index is the row number in the csv file.
        # You should learn about how to access information from a pandas DataFrame.
        img_path = ???
        label = ???
        
        # Load image using PIL's Image.open() function for the given image path
        image = ???

        # Apply any transforms
        if self.transform:
            image = self.transform(image)

        return image, label # image should be a tensor, label should be an integer

#### (c) Create Dataset instance from the defined Class and then the DataLoaders

Now that you've defined your dataset Class, lets create an instance of it for training and testing and then create dataloaders to make it easy to iterate.

In [None]:
# T1.1c IMPORTANT: Please do not edit/remove this comment.

bs_train = 64       # the batch size for training task 1
bs_test = 64        # the batch size for testing task 1

dataset_train = ???
dataset_test = ???
loader_train = ???
loader_test = ???

<div class="alert alert-block alert-info">

### 1.2 Design the CNN

You will design your CNN architecture according to the sequence given below. Write codes in the init function and forward function of the `Task1_CNN` class to implement the following forward pass:
    
| Sequence |    Layer Type     | channels | kernel size | stride | padding |  Input size  |  Output size |
|:--------:|:---------------:  |----------|-------------|--------|---------|:------------:|:------------:|
|    1     |      Conv2D       | 16       | (3,3)       | 1      | 1       |  (3, 32, 32) |  (?, ??, ??) |
|    2     |       ReLU        |          |             |        |         |  (?, ??, ??) |  (?, ??, ??) |
|    3     |     MaxPool2d     |          | (2,2)       | 2      | 0       |  (?, ??, ??) |  (?, ??, ??) |            
|    4     |      Conv2D       | 32       | (3,3)       | 1      | 1       |  (?, ??, ??) |  (?, ??, ??) |    
|    5     |       ReLU        |          |             |        |         |  (?, ??, ??) |  (?, ??, ??) |
|    6     |     MaxPool2d     |          | (2,2)       | 2      | 0       |  (?, ??, ??) |  (?, ??, ??) |              
|    7     |      Conv2D       | 64       | (3,3)       | 1      | 1       |  (?, ??, ??) |  (?, ??, ??) |
|    8     |       ReLU        |          |             |        |         |  (?, ??, ??) |  (?, ??, ??) |
|    9     |     MaxPool2d     |          | (2,2)       | 2      | 0       |  (?, ??, ??) |  (?, ??, ??) |       
|   10     |      Flatten      |          |             |        |         |  (?, ??, ??) |  (?, ??, ??) |
|   11     |       Linear      | ??       |             |        |         |  (?, ??, ??) |  (?, ??, ??) |

You should compute and fill in the sizes to ensure that you understand how the forward pass works.

For the other methods in the class, it will be similar to the previous lab. The difference is that we are doing **multiclass classification** instead of regression, so we shouldn't use MSE loss here. (What should be used?)

In [None]:
# T1.2 IMPORTANT: Please do not edit/remove this comment.

class Task1_CNN(nn.Module):
    def __init__(self, device='cpu'):
        super().__init__()
        self.device = device

        ???

        self.to(device=self.device)

    def forward(self, x):
        
        ???

        return ???
    
    def Train(self, epochs, optimizer, loader_train, loader_test, verbose=True):
        self.loss_train_log = []
        self.loss_test_log = []
        self.best_loss = np.inf
        for epoch in range(epochs):
            self.train() # need to specify as certain layers (e.g. dropout) behave differently in train/eval
            
            # in every epoch, we will:
            # (1) loop over loader_train to train the model
            # (2) calculate the loss of training data and save it (for loss curve)
            # (3) calculate the loss of val/testing data and save it (for loss curve)
            # (4) print training progress
            # (5) early stopping

            # Step (1)        
            for x, y in loader_train:
                x = x.to(device=self.device, dtype=torch.float)
                y = y.to(device=self.device, dtype=torch.long) # label is torch.float if regression, torch.long if classification

                ???

            # Step (2) (need to complete the self.evaluate function to work)
            ???

            # Step (3) (need to complete the self.evaluate function to work)
            ???

            # Step (4)
            if verbose:
                ???

            # Step (5) save the best model (state_dict) as "task1_best_params.pt"
            ???
        
        print(f'Best model saved at epoch {best_epoch} with loss {self.best_loss:.4f}.')

    def evaluate(self, loader):
        # this function is to evaluate the model on a given dataset (loader) by computing the average loss

        self.eval() # need to specify as certain layers (e.g. dropout) behave differently in train/eval
        
        loss = 0
        with torch.no_grad():
            for x, y in loader:
                x = x.to(device=self.device, dtype=torch.float)
                y = y.to(device=self.device, dtype=torch.long) # label is torch.float if regression, torch.long if classification

                # forward pass and calculate loss
                ???
        
        ???
        return loss.cpu()
    
    def predict(self, loader):
        # this function is to provide the model's prediction on a given dataset (loader).
        # it returns the prediction, together with the corresponding input x and label y (for evaluation/visualization purposes)
        
        self.eval() # need to specify as certain layers (e.g. dropout) behave differently in train/eval

         # SOLUTION
        x_all, y_all, logit = [], [], []
        with torch.no_grad():
            for x, y in loader:
                x_all += [x]
                y_all += [y]
                x = x.to(device=self.device, dtype=torch.float)

                # forward pass and store predictions
                ???
            
            ???
            return x_all, y_all, logit

<div class="alert alert-block alert-info">

### 1.3 Train and evaluate the network  <a class="anchor" id="1_3"></a>

Now let's do some training for the network we just created! In this task, train our model using the following hyperparameters:
- Learning rate = 5e-3 (same as 0.005)
- Number of epochs = 10
- Optimizer = Adam

In [None]:
# T1.3 IMPORTANT: Please do not edit/remove this comment.

lr = ???
epochs = ???
model = ???
optimizer = ???
model.Train(epochs, optimizer, loader_train, loader_test)

<div class="alert alert-block alert-info">

### 1.4 Visualise and Analyse the Experimental Results <a class="anchor" id="1_4"></a>

Now the training is done. Let's check how well our model has performed.

There are a few ways we can evaluate the model performance:
- Inspect the loss
- Evaluate accuracy
- Precision, recall and F1-Score
- Confusion Matrix
And more

Let's try some of them here.

#### (a) Inspecting the loss

In the model class, there are two variables `self.loss_train_log` and `self.loss_test_log` which record the historical losses as the training progresses. Plot both the losses in the same figure to visualize the training progress, and ensure that proper labels and legend are in-place.

In [None]:
# T1.4a IMPORTANT: Please do not edit/remove this comment.

# Plot the losses in the same figure
???

#### (b) Evaluate the classification accuracy

In a classification task, ultimately we want our model to make correct predictions. Using the values returned by the `predict` method, calculate the accuracy of your model, which is essentially the **fraction of correctly predicted classes** for all test data.

In [None]:
# This code loads the best model and calls the predict function for you, to be used in (b), (c), and (d).
best_params = torch.load('task1_best_params.pt')
model_best = Task1_CNN(device=device)
model_best.load_state_dict(best_params)
x_all, y_all, logit = model_best.predict(loader_test)

In [None]:
# T1.4b IMPORTANT: Please do not edit/remove this comment.

# To obtain the prediction classes from logit, you have to use softmax and then look for the class with the highest probability
accuracy = ???
print(f'Accuracy: {accuracy:.4f}')

#### (c) Top (correctly) Classified Images


In this task, we find the top 5 correctly classified images for each of the classes from the test set and visualize them.

Top 5 correctly  classified - for e.g. take the 'airplane' class. First, find the test images that that were correctly predicted as 'airplane'. Out of those images, find out the predctions that got highest softmax scores and visualize them. Do this for all classes (you might want to use a for loop for this).   
    
In total, you have 5 classes, and 5 top correctly classified images for each class, hence you will have 25 images altogether. On top of each image, display its label, prediction, as well the the top softmax score as a percentage.
    
An example of an expected output for the Top Classified images for the "car" class are shown below (this example is a different dataset with higher resolution): 

<img src="./data/top-classified-demo.PNG" />


In [None]:
# T1.4c IMPORTANT: Please do not edit/remove this comment.

???

#### (d) Top Misclassified Images

Here, we will do the opposite: find the top 5 misclassified images for each of the classes from the test set and visualize them.

- Top 5 Misclassified - for e.g. take the 'airplane' class. First, find the test images which were airplanes but were wrongly predicted as a different class. Out of those images, find out the 5 predictions that got highest softmax scores for the wrong class and visualize them. Do this for all classes (you might want to use a for loop for this as well).  
    
In total, you have 5 classes, and 5 top misclassified images for each class, hence you will have 25 images altogether. On top of each image, display its label, prediction, as well the the top softmax score as a percentage.
    
An example of an expected output for Top Missclassifed images for the "dog" class are shown below:  

<img src="./data/top-mis-classified-demo.PNG" />

In [None]:
# T1.4d IMPORTANT: Please do not edit/remove this comment.

???

<div class="alert alert-block alert-info">

## Task 2 - Design and train a CNN on your own <a class="anchor" id="task2"></a>
        
In this task, you will design and train a CNN for image classification using another dataset. The task is to classify images of facial expression into four different classes of emotions. There will be no guidance, and you should use the knowledge obtained thus far to complete this task.

The dataset that you will be using contains grayscale images with a resolution of 48x48 pixels. It is located in `data/task2_data`, and you are also provided with `task2_data.csv` similar to what you are given in Task 1. However, the dataset is not nicely partitioned for you in separate train-test folder. You would have to do it yourself, similar in a real-world setting where most problems don't come with a conveniently formatted dataset for you.

**READ** the following instructions carefully to understand how this task will be graded:

- You are free to design and explore whatever methods/layers/techniques you want.
- In the end, you are required to produce a model that you think is best, and named it as `task2_best_params.pt`. Use exactly this file name and there is no need to rename it.
- You will submit this `.pt` file, together with this notebook (`.ipynb`), in the Lab 4 Submission. Do **NOT** zip your submission.
- For grading, I will run your submitted model using a private test dataset (not given to you), and evaluate your model performance.
- In order to run your model, I have to instantiate your model class, i.e. `Task2_CNN`. Therefore, make sure all your model code are in this class, and in the cell commented with  "**# T2(model) ...**".
- As a check, there is a cell at the end of this task which you can run to make sure that your submission will be ok. In this cell, your best model will be used to run a few sample test images located in `data/task2_sample_test`. Make sure this cell can run without error before you submit the files.

Rubric for this task (worth 3%):

Your grade is based solely on your model’s performance on the private test data, according to the following criteria:
\begin{align*}
\text{Grade} = \begin{cases}
3\% & \text{Accuracy} \geq 0.75 \\
2.5\% & 0.7\leq\text{Accuracy} < 0.75 \\
2\% & 0.65\leq\text{Accuracy} < 0.7 \\
1.5\% & 0.6\leq\text{Accuracy} < 0.65 \\
1\% & \text{Accuracy} < 0.6 \\
0\% & \text{Code not able to run}
\end{cases}
\end{align*}

**[Bonus]**: The one with the highest accuracy will receive a bonus mark for the unit! 😄

Final notes:
- Your submitted model will be checked for similarity. If your model happens to have the same weight values with others, which is statistically impossible, it will be marked as plagiarism.
- If your training code (with your final hyperparameters) is run with the given dataset, the trained model should have similar performance to the one that you submit. If the difference is large, you will be questioned and asked to resubmit a new one in the presence of a demonstrator.

Good luck !

In [None]:
class_names = ['happy', 'surprise', 'neutral', 'sad']
num_class = len(class_names)

In [None]:
# T2(model) IMPORTANT: Please do not edit/remove this comment.
# Make sure this cell only has the model class. Other codes can go in new cells. 

class Task2_CNN(nn.Module):
    ???

In [None]:
# For Task 2 to be graded, make sure this cell can run without error and without modification
# You have to define and run your Task2_Dataset first

dataset_sample_test = Task2_Dataset('task2_sample_test.csv', class_names, transform=transform_test)
loader_sample_test = DataLoader(dataset_sample_test, batch_size=4, shuffle=False)

task2_best_params = torch.load('task2_best_params.pt')
model_best = Task2_CNN(device=device)
model_best.load_state_dict(task2_best_params)
_, y_all, logit = model_best.predict(loader_sample_test)
print(f'y_all is {y_all} with shape: {y_all.shape}') # should be tensor([0, 1, 2, 3]) with shape torch.Size([4])
print(f'logit is {logit} with shape: {logit.shape}') # should be tensor(<bunch of logits>) with shape torch.Size([4, 4])