## COMP5623M Assessment Coursework 1 - Image Classification [100 marks]

The maximum number of marks for each part are shown in the section headers. As indicated in the main heading above, the overall assessment carries a maximum of 100 marks.

This summative assessment is weighted 25% of the final grade for the module.

### Motivation 

Through this coursework, you will:

> 1. Practice building, evaluating, and finetuning a convolutional neural network on an image dataset from development to testing. 
> 2. Gain a deeper understanding of feature maps and filters by visualizing some from a pre-trained network. 


### Setup and resources 

You must work using this provided template notebook.

Having a GPU will speed up the training process, especially for Question 1.3. See the provided document on Minerva about setting up a working environment for various ways to access a GPU.

Please implement the coursework using **Python and PyTorch**, and refer to the notebooks and exercises provided.

This coursework will use a subset of images from Tiny ImageNet, which is a subset of the ImageNet dataset [https://image-net.org/]. Our subset of Tiny ImageNet contains 30 different categories, we will refer to it as TinyImageNet30. The training set has 450 resized images (64x64 pixels) for each category (13,500 images in total). You can download the training and test set from the Kaggle website:

>[Private class Kaggle competition and data](https://www.kaggle.com/t/9b703e0d71824a658e186d5f69960e27)

To access the dataset, you will need an account on the Kaggle website. Even if you have an existing Kaggle account, please carefully adhere to these instructions, or we may not be able to locate your entries:

> 1. Use your **university email** to register a new account.
> 2. Set your **Kaggle account NAME** to your university username, for example, ``sc15jb``.

The class Kaggle competition also includes a blind test set, which will be used in Question 1 for evaluating your custom model's performance on a test set. The competition website will compute the test set accuracy, as well as position your model on the class leaderboard.

### Submission

Please submit the following:

> 1. Your completed Jupyter notebook file, without removing anything in the template, in **.ipynb format.**
> 2. The **.html version** of your notebook; File > Download as > HTML (.html). Check that all cells have been run and all outputs (including all graphs you would like to be marked) displayed in the .html for marking.
> 3. Your selected image from section 2.4.2 "Failure analysis"

Final note:

> **Please display everything that you would like to be marked. Under each section, put the relevant code containing your solution. You may re-use functions you defined previously, but any new code must be in the relevant section.** Feel free to add as many code cells as you need under each section.

Your student username (for example, ```sc15jb```):


--> sc21kj

Your full name:

--> KALYAN JOTHIMURUGAN

## Imports

Feel free to add to this section as needed.

You may need to download `cv2` using [pip](https://pypi.org/project/opencv-python/) or [conda](https://anaconda.org/conda-forge/opencv).

In [1]:
import cv2
import math
import os
import csv
from csv import writer
import pandas as pd
import numpy as np
import torch
import seaborn as sns
from collections import OrderedDict
from torch import nn, optim
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets, transforms
from torchvision.datasets import ImageFolder
from torch.hub import load_state_dict_from_url
from natsort import natsorted
from tqdm import tqdm
import torch.nn.functional as F
from PIL import Image
import matplotlib.pyplot as plt
from torch.optim import lr_scheduler

In [2]:
# GPU support
if torch.cuda.is_available(): 
    device = torch.device('cuda')
else: 
    device = torch.device('cpu')

print(device)

cuda


## QUESTION 1 [55 marks]

One challenge of building a deep learning model is to choose an architecture that can learn the features in the dataset without being unnecessarily complex. The first part of the coursework involves building a CNN and training it on TinyImageNet30. 

### **Overview:**
*   **1.1.1** PyTorch ```Dataset``` and ```DataLoader``` classes
*   **1.1.2** PyTorch ```Model``` class for simple CNN model
*   **1.1.3** Overfitting on a single batch
*   **1.2.1** Training on complete dataset
*   **1.2.2** Fine-tuning model
*   **1.2.3** Generating confusion matrices
*   **1.3**   Testing on test set on Kaggle


## 1.1 Single-batch training [14 marks]

We will use a method of development called “single-batch training”, or "overfitting a single batch", in which we check that our model and the training code is working properly and can overfit a single training batch (i.e., we can drive the training loss to zero). Then we move on to training on the complete training set and adjust for any overfitting and fine-tune the model via regularisation.

### 1.1.1 Dataset class [3 marks]

Write a PyTorch ```Dataset``` class (an example [here](https://www.askpython.com/python-modules/pytorch-custom-datasets) for reference) which loads the TinyImage30 dataset and ```DataLoaders``` for training and validation sets.


In [4]:
# replace with your own root directory
root="./data"
train_set_path = "/train_set/train_set/"
test_set_path = "/test_set"

In [5]:
# Define your own class LoadFromFolder
# Below code is from the link provided in this template

class LoadFromFolder(Dataset):
    def __init__(self, main_dir, transform):
         
        # Set the loading directory
        self.main_dir =main_dir
        self.transform = transform
         
        # List all images in folder and count them
        all_imgs = os.listdir(main_dir)
        self.total_imgs = natsorted(all_imgs)

    def __len__(self):
    # Return the previously computed number of images
     return len(self.total_imgs)
    
    def __getitem__(self, idx):
        
        img_loc = os.path.join(self.main_dir, self.total_imgs[idx])
        
        # Use PIL for image loading
        image = Image.open(img_loc).convert("RGB")
        # Apply the transformations
        tensor_image = self.transform(image)
        
        #return self.total_imgs[idx] ,torch.unsqueeze(tensor_image, 0)
        #todo - above is modified
        
        return tensor_image

In [6]:

tensorTransform = transforms.Compose([transforms.ToTensor()])

single_batch_dataset = ImageFolder(root+train_set_path,transform = tensorTransform)

print(single_batch_dataset)
print(len(single_batch_dataset.classes))


train_size = int(0.8 * len(single_batch_dataset))
test_size = len(single_batch_dataset) - train_size


train_dataset, test_dataset = torch.utils.data.random_split(single_batch_dataset,
                                                                  [train_size, test_size])

train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset,batch_size=16, shuffle=True)


 

Dataset ImageFolder
    Number of datapoints: 13500
    Root location: ./data/train_set/train_set/
    StandardTransform
Transform: Compose(
               ToTensor()
           )
30


### 1.1.2 Define a CNN model [3 marks]

Create a new model class using a combination of convolutional and fully connected layers, ReLU, and max-pool. 

In [13]:
#conv2d parameters --> Conv2d(input_size, output_size, kernel_size, padding)

cnnModel = nn.Sequential(
    nn.Conv2d(3,8,3,1),    
    nn.ReLU(),
    nn.MaxPool2d(2,2),
    nn.Conv2d(8,16,3,1),
    nn.ReLU(),
    nn.MaxPool2d(2,2),
    nn.Conv2d(16,32,3,1),
    nn.ReLU(),
    nn.MaxPool2d(2,2),
    nn.Flatten(),
    nn.Linear(32*8*8,128),   
    nn.ReLU(),
    nn.Linear(128,30)
)

cnnModel = cnnModel.to(device)

for param in cnnModel.parameters():
    print(param.shape)

torch.Size([8, 3, 3, 3])
torch.Size([8])
torch.Size([16, 8, 3, 3])
torch.Size([16])
torch.Size([32, 16, 3, 3])
torch.Size([32])
torch.Size([128, 2048])
torch.Size([128])
torch.Size([30, 128])
torch.Size([30])


### 1.1.3 Single-batch training [8 marks]

Write the foundational code which trains your network given **one single batch** of training data and computes the loss on the complete validation set for each epoch. Set ```batch_size = 64```. 

Display the graph of the training and validation loss over training epochs, showing as long as necessary to show you can drive the training loss to zero.

> Please leave all graphs and code you would like to be marked clearly displayed without needing to run code cells or wait for training.


In [14]:
#Computation of loss and accuracy for given dataset loader and model. 
#This will be used for computing loss and accuracy on the test set after each training epoch.

def stats(loader, cnnModel):
    correct = 0
    total = 0
    running_loss = 0
    n = 0    # count of minibatches
    with torch.no_grad():
        for data in loader:
            image_s, label_s = data
            images, labels = image_s.to(device), label_s.to(device)
            outputs = cnnModel(images)      
            
            # accumulate loss
            running_loss += loss_fn(outputs, labels)
            n += 1
            
            # accumulate data for accuracy
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)    # add in the number of labels in this minibatch
            correct += (predicted == labels).sum().item()  # add in the number of correct labels
            
    return running_loss/n, correct/total 


In [16]:
%%time
nepochs = 2
results_path = root+"/results/cnnclassifier1model.pt"

statsrec = np.zeros((4,nepochs))

loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(cnnModel.parameters(), lr=0.001, momentum=0.9)
#inputs, labels = next(iter(train_loader))# single batch

iteration = iter(train_loader)
input_s, label_s = next(iteration)
inputs, labels = input_s.to(device), label_s.to(device)

for epoch in tqdm(range(nepochs)):  # loop over the dataset multiple times
    correct = 0          # number of examples predicted correctly (for accuracy)
    total = 0            # number of examples
    running_loss = 0.0   # accumulated loss (for mean loss)
    # Zero the parameter gradients
    optimizer.zero_grad()

    # Forward, backward, and update parameters
    outputs = cnnModel(inputs)
    loss = loss_fn(outputs, labels)
    loss.backward()
    optimizer.step()
    
        # accumulate loss
    running_loss = loss.item()
        
    # accumulate data for accuracy
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)    # add in the number of labels in this batch
    correct += (predicted == labels).sum().item()  # add in the number of correct labels
    
    # collect together statistics for this epoch
    ltrn = running_loss
    atrn = correct/total 
    ltst, atst = stats(test_loader, cnnModel)
    statsrec[:,epoch] = (ltrn, atrn, ltst, atst)
    print(f"epoch: {epoch} training loss: {ltrn: .3f} training accuracy: {atrn: .1%}  test loss: {ltst: .3f} test accuracy: {atst: .1%}")

# save network parameters, losses and accuracy
torch.save({"state_dict": cnnModel.state_dict(), "stats": statsrec}, results_path)



  0%|          | 0/2 [00:03<?, ?it/s]


RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x1152 and 2048x128)

# Visualizing the Training and Test Loss and Accuracy


In [None]:
#Load the training dataset epoch history
data = torch.load(results_path)
statsrec = data["stats"]

fig, ax1 = plt.subplots()
plt.plot(statsrec[0], 'r', label = 'training loss', )
plt.plot(statsrec[2], 'g', label = 'test loss' )
plt.legend(loc='lower right')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('Training and test loss, and test accuracy')
ax2=ax1.twinx()
ax2.plot(statsrec[1], 'm', label = 'training accuracy')
ax2.plot(statsrec[3], 'b', label = 'test accuracy')
ax2.set_ylabel('accuracy')
plt.legend(loc='upper right')
fig.savefig("roc.svg")
plt.show()



## 1.2 Training on complete dataset [23 marks]

### 1.2.1 Train CNN and show loss graph [6 marks]

Train your model on the complete training dataset, and use the validation set to determine when to stop training.

Display the graph of training and validation loss over epochs to show how you determined the optimal number of training epochs.

> As in previous sections, please leave the graph clearly displayed.


In [None]:
%%time
nepochs = 300
n_epochs_stop = 10
epochs_no_improve = 0
early_stop=False

results_path_cnn200 = root+"/results/cnnclassifier200model.pt"

statsrec = np.zeros((4,nepochs))
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(cnnModel.parameters(), lr=0.001, momentum=0.9)
min_val_loss=np.Inf
for epoch in range(nepochs):  # loop over the dataset multiple times
    correct = 0          # number of examples predicted correctly (for accuracy)
    total = 0            # number of examples
    running_loss = 0.0   # accumulated loss (for mean loss)
    n = 0                # number of minibatches
    for data in train_loader:
        inputs, labels = data
        
         # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward, backward, and update parameters
        outputs = cnnModel(inputs)
        loss = loss_fn(outputs, labels)
        loss.backward()
        optimizer.step()
    
        # accumulate loss
        running_loss += loss.item()
        n += 1
        # accumulate data for accuracy
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)    
        correct += (predicted == labels).sum().item()  
    
    ltrn = running_loss/n
    atrn = correct/total 
    ltst, atst = stats(test_loader, cnnModel)
    ### early stop if there is not change in validation loss over 10 instances of the epochs.
    if ltst<min_val_loss:
        epochs_no_improve = 0
        min_val_loss = ltst
    else:
        epochs_no_improve+=1
    if epoch > 5 and epochs_no_improve == n_epochs_stop:
            print('Early stopping!' )
            early_stop = True
            break
    statsrec[:,epoch] = (ltrn, atrn, ltst, atst)
    print(f"epoch: {epoch} training loss: {ltrn: .3f} training accuracy: {atrn: .1%}  test loss: {ltst: .3f} test accuracy: {atst: .1%}")

# save network parameters, losses and accuracy
torch.save({"state_dict": cnnModel.state_dict(), "stats": statsrec}, results_path_cnn200)



In [None]:

data2 = torch.load(results_path_cnn200)
statsrec2 = data2["stats"]

fig, ax1 = plt.subplots()
plt.plot(statsrec2[0], 'r', label = 'training loss', )
plt.plot(statsrec2[2], 'g', label = 'test loss' )
plt.legend(loc='lower right')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('Training and test loss, and test accuracy')
ax2=ax1.twinx()
ax2.plot(statsrec2[1], 'm', label = 'training accuracy')
ax2.plot(statsrec2[3], 'b', label = 'test accuracy')
ax2.set_ylabel('accuracy')
plt.legend(loc='upper right')
fig.savefig("roc1.svg")
plt.show()


### 1.2.2 Finetuning [6 marks]

Now finetune your architecture by implementing at least 2 methods of reducing overfitting and increasing the model's ability to generalise. You are encouraged to further adjust the model after you have done the minimum requirement, to increase your model performance. Please do not use any pre-trained weights from a model trained on ImageNet.


**Method 1:** Data augmentation of your choice

**Method 2:** Adding dropout and/or batch normalisation to the model

If you adjust the Model class, redefine it below and instantiate it as ```model_122a```, ```model_122b```, and so on.



In [None]:

trnsfrm_ft =  transforms.Compose([
    transforms.ToTensor(),
    transforms.ColorJitter(hue=0.2, saturation=0.2, brightness=0.2),
    transforms.RandomAffine(degrees=10, translate=(0.1,0.1), scale=(0.9,1.1)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomGrayscale(p=0.3)
])

In [None]:
dataset_ft= ImageFolder(root+train_set_path,transform=trnsfrm_ft)


print(dataset_ft)
print(len(dataset_ft.classes))
train_size = int(0.8 * len(dataset_ft))
validation_size = len(dataset_ft) - train_size

train_dataset, validation_dataset = torch.utils.data.random_split(dataset_ft, [train_size, validation_size])## spliting train_set for both train_dataset and validation_set.


train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(validation_dataset,batch_size=15, shuffle=True)


In [None]:
# Adding drop out
model_122 = nn.Sequential(
    nn.Conv2d(in_channels=3,out_channels=8, kernel_size=3,padding=1),    # no padding, stride=1, dilation=1 by default
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Dropout(p=0.8),
    nn.Conv2d(in_channels=8,out_channels=16,  kernel_size=3,padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=16,out_channels=32,  kernel_size=3,padding=1),
    nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(32*8*8,128),     # with 32x32 input, the feature map size reduces to 8x8 with 16 channels.
    nn.ReLU(),
    nn.Linear(128,30)
)

for param in model_122.parameters():
    print(param.shape)
    
    

In [None]:
def stats1(loader, model_122):
    correct = 0
    total = 0
    running_loss = 0
    n = 0    # counter for number of minibatches
    with torch.no_grad():
        for data in loader:
            images, labels = data
            outputs = model_122(images)      
            
            # accumulate loss
            running_loss += loss_fn(outputs, labels)
            n += 1
            
            # accumulate data for accuracy
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)    # add in the number of labels in this minibatch
            correct += (predicted == labels).sum().item()  # add in the number of correct labels
            
    return running_loss/n, correct/total 

In [None]:
%%time
nepochs = 200

results_path_finetune= root+'/results/cnnfinetunedmodel.pt'
n_epochs_stop = 15
epochs_no_improve = 0
early_stop=False

statsrec = np.zeros((4,nepochs))
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_122.parameters(), lr=0.001, momentum=0.9)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience = 5)## lr will change with patience 5
min_val_loss=np.Inf
for epoch in range(nepochs):  # loop over the dataset multiple times
    correct = 0          # number of examples predicted correctly (for accuracy)
    total = 0            # number of examples
    running_loss = 0.0   # accumulated loss (for mean loss)
    n = 0
    for data in train_loader:
        inputs, labels = data
        
         # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward, backward, and update parameters
        outputs = model_122(inputs)
        loss = loss_fn(outputs, labels)
        loss.backward()
        optimizer.step()
    
        # accumulate loss
        running_loss += loss.item()
        n += 1
        # accumulate data for accuracy
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)    # add in the number of labels in this minibatch
        correct += (predicted == labels).sum().item()  # add in the number of correct labels
    # collect together statistics for this epoch
    ltrn = running_loss/n
    atrn = correct/total 
    ltst, atst = stats1(test_loader, model_122)
    scheduler.step(ltst/len(test_loader))
    ### early stop if there is not change in validation loss for 15 instance epochs.
    if ltst<min_val_loss:
        epochs_no_improve = 0
        min_val_loss = ltst
    else:
        epochs_no_improve+=1
    if epoch > 5 and epochs_no_improve == n_epochs_stop:
            print('Early stopping!' )
            early_stop = True
            break
    statsrec[:,epoch] = (ltrn, atrn, ltst, atst)
    print(f"epoch: {epoch} training loss: {ltrn: .3f} training accuracy: {atrn: .1%}  test loss: {ltst: .3f} test accuracy: {atst: .1%}")
# save network parameters, losses and accuracy
torch.save({"state_dict": model_122.state_dict(), "stats": statsrec}, results_path_finetune)




In [None]:
data3 = torch.load(results_path_finetune)
statsrec3 = data3["stats"]

fig, ax1 = plt.subplots()
plt.plot(statsrec3[0], 'r', label = 'training loss', )
plt.plot(statsrec3[2], 'g', label = 'test loss' )
plt.legend(loc='lower right')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('Training and test loss, and test accuracy')
ax2=ax1.twinx()
ax2.plot(statsrec3[1], 'm', label = 'training accuracy')
ax2.plot(statsrec3[3], 'b', label = 'test accuracy')
ax2.set_ylabel('accuracy')
plt.legend(loc='upper right')
fig.savefig("roc1.svg")
plt.show()


### 1.2.3 Training comparison [4 marks]

Display, side-by-side or on one single graph, the training and validation loss graphs for the single-batch training (section 1.1.3), on the full training set (1.2.1) and your final fine-tuned model (1.2.2). 

In [None]:
plt.figure(figsize=(12,7))

plt.plot(statsrec[0], 'r', label = 'single-batch training', marker='o', alpha=0.4)
plt.plot(statsrec[2], 'g', label = 'batch validation', marker='o', alpha=0.4)


plt.plot(statsrec2[0], 'r', label = 'full training', marker='*', alpha=0.4)
plt.plot(statsrec2[2], 'g', label = 'full validation' , marker='*', alpha=0.4)


plt.plot(statsrec3[0], 'r', label = 'fine-tuned training', marker='d', alpha=0.4)
plt.plot(statsrec3[2], 'g', label = 'fine-tuned validation', marker='d', alpha=0.4)


plt.legend(loc='center right')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('Training Loss Comparison')

Explain what can be seen in the graphs.

--> Double click here to respond


### 1.2.4 Confusion matrices [7 marks]

Use your architecture with best accuracy to generate two confusion matrices, one for the training set and one for the validation set. Remember to use the whole validation and training sets, and to include all your relevant code. Display the confusion matrices in a meaningful way which clearly indicates what percentage of the data is represented in each position.



In [None]:
categories_names = (pd.read_csv(root+'/mapping.txt',header = None, sep ="\t")).drop(columns=[2])
categories_names = categories_names[1]

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns

def generate_confusion_matrix(model, loader, categories_name):
    all_preds = torch.tensor([])
    all_labels = torch.tensor([])
    i=0
    with torch.no_grad():
        for batch in loader:
            images, labels = batch
            image_s, label_s = images.to(device), labels.to(device)

            preds = model(image_s)

            all_preds = torch.cat((all_preds.to(device), preds),dim=0)
            all_labels = torch.cat((all_labels.to(device), label_s),dim=0)
            fig, ax = plt.subplots(figsize=(25, 25))
            cm = confusion_matrix(all_labels.tolist(), all_preds.argmax(dim=1).tolist())

    conf_matrix=pd.DataFrame(data=cm, columns=classes, index=classes)
    sns.heatmap(conf_matrix, annot=True, fmt='d')

    plt.show()

In [None]:
generate_confusion_matrix(model_122, train_loader, classes)

In [None]:
# Loading the Classes
classes  = list()
labels = open(root+"/mapping.txt")
for map in labels:
  key, value =map.split()
  classes.append(value)
classes = tuple(classes) 

What conclusions can be drawn from the confusion matrices?

---> Double click to respond here



## 1.3 Testing on test data [18 marks]

### 1.3.1 Dataset and generating predictions [6 marks]

Create a PyTorch ```Dataset``` for the unlabeled test data in the test_set folder of the Kaggle competition and generate predictions using your final model. 


In [None]:
results = []

with torch.no_grad():
    for data in test_loader:
        input_s, label_s = data
        inputs, labels = input_s.to(device), label_s.to(device)

        outputs = model_122(inputs)
        results.extend(outputs.argmax(dim=1).type(torch.int32).cpu().numpy())


### 1.3.2 CSV file and test set accuracy [12 marks]

Save all test predictions to a CSV file and submit it to the private class Kaggle competition. **Please save your test CSV file submissions using your student username (the one with letters, ie., ``sc15jb``, not the ID with only numbers)**, for example, `sc15jb.csv`. That will help us to identify your submissions.

The CSV file must contain only two columns: ‘Id’ and ‘Category’ (predicted class ID) as shown below:

```txt
Id,Category
28d0f5e9_373c.JPEG,2
bbe4895f_40bf.JPEG,18
```

The ‘Id’ column should include the name of the image. It is important to keep the same name as the one on the test set. Do not include any path, just the name of file (with extension). Your csv file must contain 1501 rows, one for each image on test set and 1 row for the headers.

> You may submit multiple times. We will use your personal top entry for allocating marks for this [10 marks]. The class leaderboard will not affect marking (brownie points!).



In [None]:
with open("/content/drive/MyDrive/AI/sc21vp.csv", 'a', newline='') as f_object:  
    # Pass the CSV  file object to the writer() function
    writer_object = writer(f_object)
    # Result - a writer object
    # Pass the data in the list as an argument into the writerow() function
    for list_data in predicted_list:
     writer_object.writerow(list_data)  
    # Close the file object
    f_object.close()

In [None]:
folder_names = validation_dataset.imgs
names = []
for path in folder_names:
    names.append(path[0].split('/')[-1])
    names[0:10]



## QUESTION 2 [40 marks]



In this question, you will visualize the filters and feature maps of a fully-trained CNN (AlexNet) on the full ImageNet 2012 dataset.

> Please do not alter the name of the function or the number and type of its arguments and return values, otherwise the automatic grading function will not work correctly. You are welcome to import other modules (though the simplest solution only requires the ones below).


### **Overview:**
*   **2.1.1** Extract filters from model: ``fetch_filters(layer_idx, model)``
*   **2.2.1** Load test image
*   **2.2.2** Extract feature maps for given test image: ``fetch_feature_maps(image, model)``
*   **2.2.3** Display feature maps
*   **2.3.1** Generate Grad-CAM heatmaps: ``generate_heatmap(output, class_id, model, image)``
*   **2.3.2** Display heatmaps: add code to cell
*   **2.3.3** Generate heatmaps for failure analysis


### Loading a pre-trained model

Run the cell below to load an AlexNet model with pre-trained weights.

In [None]:
model = torch.hub.load('pytorch/vision:v0.6.0', 'alexnet', pretrained=True)
model.eval()

In [None]:
model.features

In [None]:
model.features[0]

In [None]:
model.features[0].weight.shape


## 2.1 Extract and visualize the filters [6 marks]

In this section you will extract and visualize the filters from the pre-trained AlexNet.

### 2.1.1 Extract filters [4 marks]

Complete the following function ```fetch_filters``` to return all the filters from the convolutional layers at the given index in ```model.features``` (see printed model above for reference). 





> We will not test the behaviour of your function using invalid indices.



In [None]:
def fetch_filters(layer_idx, model):
    """ 
        Args:
            layer_idx (int): the index of model.features specifying which conv layer
            model (AlexNet): PyTorch AlexNet object
        Return:
            filters (Tensor):      
    """
    return model.features[layer_idx].weight.data

In [None]:
# all the indices of the conv layers
conv_layer_idx = [0, 3, 6, 8, 10]

filters = []

for layer_idx in conv_layer_idx:
    filters.append(fetch_filters(layer_idx, model))

For your testing purposes, the following code blocks test the dimensions of the function output.

In [None]:
filters[0].shape

In [None]:
assert list(filters[0].shape) == [64, 3, 11, 11]



### 2.1.2 Display filters [2 marks]

The following code will visualize some of the filters from each layer. Play around with viewing filters at different depths into the network. Note that ```filters[0]``` could be viewed in colour if you prefer, whereas the subsequent layers must be viewed one channel at a time in grayscale. 



In [None]:
# limit how many filters to show
to_show = 16

# compute the dimensions of the plot
plt_dim = int(math.sqrt(to_show))

# plot the first channel of each filter in a grid
for i, filt in enumerate(filters[0].numpy()[:to_show]):
    plt.subplot(plt_dim, plt_dim, i+1)
    plt.imshow(filt[0], cmap="Greens")
    plt.axis('off')
plt.show()



## 2.2 Extract and visualize feature maps [10 marks]

In this section, you will pass a test image through the AlexNet and extract and visualize the resulting convolutional layer feature maps.

Complete the following code cell to load the test image ```man_bike.JPEG```.



### 2.2.1 Load test image [1 mark]


In [None]:
#load image
im = Image.open(root+'/man_bike.JPEG')

Run the code cell below to apply the image transformation expected by the model.

In [None]:
# ImageNet normalisation values, to apply to the image transform
norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]

data_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(256),
        transforms.ToTensor(),
        transforms.Normalize(norm_mean, norm_std),
    ])

im = data_transform(im)


### 2.2.2 Extract feature maps [5 marks]

Complete the function below to pass the test image through a single forward pass of the network. We are interested in the outputs of the max pool layers (outputs of conv layers at model.features indices 0, 3, and 10) for best visualization. Note that the input should pass through *every layer* of the model.

In [None]:
def fetch_feature_maps(image, model):
    """
    Args:
        image (Tensor): a single input image with transform applied
        model (AlexNet): PyTorch AlexNet object
        
    Return:
        feature_maps (Tensor): all the feature maps from conv layers 
                    at indices 0, 3, and 10 (outputs of the MaxPool layers)
    """

    model_weights=[]
    conv_layers=[]
    all_layers=[]
    maxpool=[]
    counter = 0
    model_children = list(model.children())

    model_children = list(model.children())


   
    for j in range(len(model_children[0])):
        child= model_children[0][j]
        all_layers.append(child)
        if type(child) == nn.MaxPool2d:
            counter += 1
            conv_layers.append(child)



    results = [all_layers[0](image)] # -> (1, 1, 64, 63, 63) first dim to save each result
  
    for i in range(1, len(all_layers)):
        # use last result to calculate next function
        x=all_layers[i](results[-1])
        # each function's result will be save in the list of result
        results.append(x)
        # after MaxPool, Height(or Width) will be half
        if len(results[i][0][0]) == len(results[i-1][0][0])// 2:
            maxpool.append(x)
       
    # transfer to Tensor
    maxpool=[torch.tensor(x,requires_grad=False) for x in maxpool ]
    
    return maxpool, results

In [None]:
feature_maps = fetch_feature_maps(im.unsqueeze(0), model)

For your testing purposes, the following code block tests the dimensions of part of the function output. Note that the first dimension is the batch size.

In [None]:
assert len(feature_maps) == 3
assert list(feature_maps[0].shape) == [1, 64, 31, 31]



### 2.2.3 Display feature maps [4 marks]

Using the code for displaying filters as reference, write code in the block below to display the outputs of the first **16 feature maps from each of the 3 max-pool layers**.

In [None]:
feature_map_size = 16

to_show = feature_map_size

# compute the dimensions of the plot
plt_dim = int(math.sqrt(to_show))

# plot the first channel of each filter in a grid
for i, filt in enumerate(feature_maps[0][0].numpy()[:to_show]):
    plt.subplot(plt_dim, plt_dim, i+1)
    plt.imshow(filt, cmap="Blues")
    plt.axis('off')
plt.show()





## 2.3 Understanding of filters and feature maps [7 marks]

Respond in detail to the questions below. (Note that all text boxes can be formatted using Markdown if desired).

### 2.3.1 [3 marks]
Describe what the three filters at indices 0, 4, and 6 from the first convolutional layer are detecting (reference the corresponding feature maps to support your discussion).


--> Filter 0 -> It is classifying and grouping man and block as one element. Able to see the silhoute of the man as part of fore ground. It also detects foreground elements.

--> Filter 4 -> It detects much clearer picture of man's head and his torso.

--> Filter 6 -> Able to identify the wheels of the cycle.

### 2.3.2 [2 marks]
Discuss how the filters change with depth into the network.

--> At first, the filters detect multiple colors and edges and is able to identify different shapes.

--> At final layers, we can see more clearer image with more complex patterns that is used to identify the element in the image.


### 2.3.3 [2 marks]
Discuss how the feature maps change with depth into the network.

--> As depth increases, the image is broken down into simpler blocks.
--> At first, we will be able to see and differentiate between foreground and background scenes, but as the depth increases, it will be reduced and will be able to only see shapes that appears in the image.


## 2.4 Gradient-weighted Class Activation Mapping (Grad-CAM) [17 marks]

In this section, we will explore using Gradient-weighted Class Activation Mapping (Grad-CAM) to generate coarse localization maps highlighting the important regions in the test images guiding the model's prediction. We will continue using the pre-trained AlexNet.

#### Preparation
>It is recommended to first read the relevant paper [Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/abs/1610.02391), and refer to relevant course material.

#### The AlexNet class

>To implement Grad-CAM, we need to edit the AlexNet ```module``` class itself, so instead of loading the AlexNet model from ```torch.hub``` as we did above, we will use the official PyTorch AlexNet class code ([taken from here](https://pytorch.org/vision/stable/_modules/torchvision/models/alexnet.html)). In addition to the class definition, there is also a function below called ```alexnet()``` which allows you to specify whether you want the pretrained version or not, and if so, loads the weights. 

#### The hook

>[Hooks](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks) in PyTorch are functions which can be registered, or attached, to a ```Module``` or ```Tensor```. Hooks can be *forward* hooks or *backward* hooks; forward hooks are called with ```forward()``` and backward hooks with ```backward()```. In the model below, we register a forward hook that saves the **gradients of the activations** to the Tensor output of ```model.features```. The gradients are saved to a class variable so we can easily access them.

Carefully read the code block below. You do not need to add anything to the model.

In [None]:
# defining where to load the pre-trained weights from
model_urls = {
    'alexnet': 'https://download.pytorch.org/models/alexnet-owt-7be5be79.pth',
}

# the class definition
class AlexNet(nn.Module):

    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        
        # a placeholder for storing the gradients
        self.gradients = None
        
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )
        
    # the hook for the gradients of the activations
    def activations_hook(self, grad):
        # stores the gradients of the hook's tensor to our placeholder variable
        self.gradients = grad

    # a method for extracting the activations of the last conv layer only (when we're 
    # not interested in a full forward pass)
    def get_activations(self, x):
        return self.features(x)
    
    def forward(self, x):
        x = self.features(x)
        
        # we register the hook here to save the gradients of the last convolutional
        # layer outputs
        hook = x.register_hook(self.activations_hook)
        
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x


def alexnet(pretrained=False, progress=True, **kwargs) -> AlexNet:
    """AlexNet model architecture from the
    `"One weird trick..." <https://arxiv.org/abs/1404.5997>`_ paper.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    model = AlexNet(**kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['alexnet'],
                                              progress=progress)
        model.load_state_dict(state_dict)
    return model

In [None]:
model = alexnet(pretrained=True)

In [None]:
# pass our test image through our new model with the hook
output = model(im.unsqueeze(0))

# save the predicted class
_, pred_cls = output.max(dim=1, keepdim=True)

print(output.shape)
print(pred_cls)

Examine and understand the values stored in ```output``` and ```pred_cls```. What does AlexNet classify the test image as?

In [None]:
output.shape # torch.Size([1, 1000])
max(output.detach().numpy().tolist()[0])
output.detach().numpy().tolist()[0].index(13.724449157714844)

There are 1000 classes that any image can be classified into.
pred_cls has the value of 671, which maps to mountain bike, all-terrai bike, off-roader.

### 2.4.1 Generate Grad-CAM heatmaps [8 marks]

With the hooks in place, now implement the code to generate Grad-CAM heatmaps, by following the guiding comments in the code block below.

In [None]:
def display_heatmap(heatmap,image):
    (w, h) = (image.shape[0], image.shape[1])
    heatmap = cv2.resize(heatmap.numpy(), (w, h))

    numer = heatmap - np.min(heatmap)
    denom = (heatmap.max() - heatmap.min()) + 1e-8
    heatmap_normalized = numer / denom
    heatmap_normalized = (heatmap_normalized * 255).astype("uint8")
    
    heatmap_normalized = cv2.applyColorMap(heatmap_normalized, cv2.COLORMAP_JET)
    
    heatmap_normalized=heatmap_normalized * 0.4
    
    #heatmap_normalized=heatmap_normalized.reshape((256,256,1))*np.ones([256,256,3])
    weighted_image= cv2.addWeighted(heatmap_normalized, 0.7, image, 0.3, 0)
    
    numer = weighted_image - np.min(weighted_image)
    denom = (weighted_image.max() - weighted_image.min()) + 1e-8
    final_image = numer / denom
    final_image = (final_image * 255).astype("uint8")

    return final_image

In [None]:
heatmap = generate_heatmap(output, pred_cls, model, im.unsqueeze(0))

Check the dimensions of ```heatmap```. Do they make sense?

In [None]:
print(heatmap.shape)

### 2.4.2 Display heatmaps [4 marks]

Display ```heatmap``` as a coloured heatmap super-imposed onto the original image. To get results as shown in the paper, we recommend the following steps:

1. Resize the heatmap to match the size of the image.
2. Rescale the image to a 0-255 integer range.
3. Apply a colormap to the heatmap using ```cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)```.
4. Multiply all values of heatmap by 0.4 to reduce colour saturation.
5. Superimpose the heatmap onto the original image (Note: please perform cv2's addition - addition of two cv2 images, not numpy addition. See [here](https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_core/py_image_arithmetics/py_image_arithmetics.html#:~:text=addWeighted()%20etc.-,Image%20Addition,OpenCV%20addition%20and%20Numpy%20addition.) for explanation.)
6. Normalize the image between 0-255 again.
7. Display the resulting image.

In [None]:
# TO COMPLETE

def display_heatmap(image, heatmap):
    heatmap = cv2.resize(heatmap.numpy(), (image.shape[1], image.shape[0]))
    
    heatmap = cv2.normalize(heatmap,  None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
    
    heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
    
    heatmap = heatmap * 0.4
    heatmap = np.uint8(heatmap)
    
    superimposed = cv2.add(image, heatmap)
    
    superimposed = cv2.normalize(superimposed,  None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
    plt.imshow(superimposed)

In [None]:
image = cv2.imread(root+"/man_bike.JPEG")
image = cv2.resize(image, (256, 256))
display_heatmap(image, heatmap)

Show the heatmap for class ```'seashore, coast, seacoast, sea-coast'``` (```class_id = 978```), super-imposed onto the original image.

In [None]:
sea_heatmap = generate_heatmap(output, torch.tensor([[978]]), model, im.unsqueeze(0))
retain_graph=True
display_heatmap(image, sea_heatmap)

### 2.4.3 Failure analysis using Grad-CAM [5 marks]

Find an image (online, or from ImageNet or another dataset) which AlexNet classifies *incorrectly*. Display the image below, and show the model's predicted class. Then, generate the Grad-CAM heatmap and display it super-imposed onto the image.

In [None]:
mamooth=Image.open(root+'mammoth.jpg')

# ImageNet normalisation values, to apply to the image transform
norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]

data_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(256),
        transforms.ToTensor(),
        transforms.Normalize(norm_mean, norm_std),
    ])

mamooth = data_transform(mamooth)

In [None]:
mode001 = alexnet(pretrained=True)

# pass our test image through our new model with the hook
output = mode001(mamooth.unsqueeze(0))

# save the predicted class
_, pred_cls = output.max(dim=1, keepdim=True)

pred_cls 

In [None]:
percentage = torch.nn.functional.softmax(output, dim=1)[0] * 100  
print(classes.iloc[pred_cls.numpy()[0][0]], percentage[pred_cls[0][0]].item())

In [None]:
heatmap = generate_heatmap(output, pred_cls, mode001, mammoth.unsqueeze(0))

In [None]:
mamooth=cv2.imread(root+'mammoth.jpg')

mamooth=cv2.resize(mamooth,(256,256))
mamooth = np.asarray(mamooth, np.float64)
show_result = display_heatmap(heatmap, mamooth)
plt.matshow(show_result[:, :, ::-1])

Briefly describe what explanation the Grad-CAM heatmap provides about why the model has failed to correctly classify your test image.

--> Double click to respond here

### 3 Overall quality [2 marks]

Marks awarded for overall degree of code readibility and omission of unnecessary messy outupts (for example, please avoid printed losses for every batch of a long training process, large numpy arrays, etc.) throughout the work.

**Please refer to the submission section at the top of this notebook to prepare your submission.**
