<a href="https://colab.research.google.com/github/Cherishings/Deep_Learning/blob/main/Lab_3_Part_3_PyTorch_FashionMNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 3, Part 3 Image Recognition in PyTorch

## 3.1 Introduction

**Note: before starting save a copy of this file in your own Google Drive - go to File and click Save a copy in Drive**

The objective of this part of the lab is to understand how to design and implement a deep convolutional network in Python with PyTorch, for image recognition, i.e. classification of an image.

In this lab the dataset is the well-known Fashion MNIST data set, where there are 10 classes of different types of clothing.

This part of the lab essentially replicates the image recognition problem in Keras from part 2 of the lab, demonstrating how to design and evaluate a CNN in PyTorch.

## 3.2 Methods

#### **Data**

In this lab the problem is one of multiclass classification and therefore the input data $\mathbf{x}$ is an image (that we will write as a vector for convenience - just assume the image is flattened into a vector for notational convenience), and the class label output data $y$ is one of 10 class values, representing different types of clothing.

#### **Model**

The model predicts the probabilty of each class based on a deep convolutional  network,
$$ \hat{\mathbf{y}} = f\left( \phi(\mathbf{x}) ; \boldsymbol{\theta} \right) $$
where $\hat{\mathbf{y}}$ is the prediction of the probability of each model class arising from a softmax output layer function $f$; For notational convenience we represent all model parameters in a vector $\boldsymbol{\theta}$, which comprises all the model parameters in the deep network. The function $\phi(\mathbf{x})$ performs feature extraction on the input  and is constructed from layers of simple functions, mainly convolutional functions here.

The model uses chains of convolutional layers, among other types of layer, to extract features from the raw input image, to obtain $\phi(\mathbf{x})$, where the convolutional layer is described as
\begin{equation}
    z_{i,j,k}^{(l)} = \sum_{c}^{}{\sum_{m}^{}{\sum_{n}^{}{w_{m,n,c,k}^{(l)}x_{i + m,j + n,c}^{(l - 1)} + b_{k}^{(l)}}}} \quad \text{for} \quad k = 1,\ldots,N_{F}
\end{equation}
where $N_{F}$ is the number of filters. The output of the convolutional layer is passed through a nonlinear activation function
\begin{equation}
    x_{i,j,k}^{(l)} = h\left( z_{i,j,k}^{(l)} \right)
\end{equation}
where the activation function $h(.)$ could be e.g. a rectified linear
unit (ReLU).

#### **Loss function**

The loss function, $J(\boldsymbol{\theta})$, that we minimise to estimate the model parameters, $\boldsymbol{\theta}$, is the categorical cross-entropy loss for multiclass classification,
$$ J(\boldsymbol{\theta}) =  - \sum_{j = 1}^{m}{{\sum_{k = 1}^{K} y_{j,k}\log{\hat{y}}_{j,k}}} $$
where the number of classes $K=10$ and $m$ is the number of data samples. \\


#### **Parameter estimation algorithm**

The parameter estimation algorithm here is based on the Adam algorithm, which combines a momemtum-like term, $v_{j}$, with an adaptive learning rate, $r_{j}$, and bias corrected versions of these terms, $\hat{v}_j $ and $ \hat{r}_j$, where the $j$-th parameter update is

\begin{equation}
    \theta_{j} \leftarrow \theta_{j} - \frac{\epsilon}{\sqrt{\hat{r}_{j}}} \hat{v}_{j}
\end{equation}

where $\epsilon$ is a learning rate and

\begin{gather}
    \hat{r}_j = \frac{r_{j}}{1- \beta_1^t} \\
    \hat{v}_j = \frac{v_{j}}{1- \beta_2^t} \\
    v_{j} \leftarrow \beta_1 v_{j} + \left( 1 - \beta_1 \right)g_{j} \\
    r_{j} \leftarrow \beta_2 r_{j} + \left( 1 - \beta_2 \right)g_{j}^{2}  
\end{gather}
where $t$ is the time-step of the optimization and $g_j$ is the stochastic estimate of the loss function gradient for parameter $j$,
\begin{equation}
        g_{j} = \nabla_{\boldsymbol{\theta}} \hat{J}(\theta_j)
\end{equation}
and  the estimate of the gradient of the loss function $\nabla_{\boldsymbol{\theta}} \hat{J}(\boldsymbol{\theta})$ is obtained from a mini-batch of data via automatic differentiation.


## 3.3 Import packages

In [None]:
# Importing essential libraries and modules for deep learning and visualization
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import numpy as np
import matplotlib.pyplot as plt

# For reproducibility
torch.manual_seed(42)
np.random.seed(42)

## 3.4 Data preprocessing

Note that in PyTorch the batch size for training is set here at the Data Loader stage.



In [None]:
# Import the necessary transformations module from torchvision
import torchvision.transforms as transforms

# Define a transformation pipeline.
# Here, we're only converting the images to PyTorch tensor format.
transform = transforms.Compose([transforms.ToTensor()])

# Using torchvision, load the Fashion MNIST training dataset.
# root specifies the directory where the dataset will be stored.
# train=True indicates that we want the training dataset.
# download=True will download the dataset if it's not present in the specified root directory.
# transform applies the defined transformations to the images.
trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)

# Create a data loader for the training set.
# It will provide batches of data, in this case, batches of size 64.
# shuffle=True ensures that the data is shuffled at the start of each epoch.
# num_workers=2 indicates that two subprocesses will be used for data loading.
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=2)

# Similarly, load the Fashion MNIST test dataset.
testset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False, num_workers=2)

# Define the class labels for the Fashion MNIST dataset.
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')

## 3.5 Define the CNN model

In PyTorch, we define the the neural network in a class with a constructor that defines the layers, and a method (or function) that defines the forward pass through the network - here the network is named ConvNet.

A feature of PyTorch compared to Keras and Matlab is that you need to calculate the size of an input to a layer. This can be simple for convolutional layers (if you parameterise the number of filters and keep it constant) where the number of features map inputs is equivalent to the number of filters at the preceding layer .

However, for the final linear (fully connected) layer this can be more complicated - in this case the input image is 28x28 but the maxpool downsamples them to 14x14. So the final linear layer needs to take account of this.

The correct size of the final linear layer input at nn.Linear is: num_filters * 14 * 14.
Let's see how that number is acheived.
* The input image is 28x28.
* The first convolution has padding of 1, and kernel size of 3, so the output feature map is also 28x28.
* The MaxPool2d layer halves the spatial dimensions (28 / 2 = 14).
* The second convolution layer, again with padding of 1 and a kernal size of 3, does not alter the 14x14 dimension.
* The flatten layer then turns the num_filters x 14 x 14 tensor into a 1 dimensional tensor of length num_filters * 14 * 14.

Note that to avoid having to

You can check out the PyTorch docs for info on exactly how each layer works:
https://pytorch.org/docs/stable/nn.html

In [None]:
# Define the ConvNet model
class ConvNet(nn.Module):
    def __init__(self, num_filters=4, num_classes=10):
        super(ConvNet, self).__init__()
        self.conv1 = nn.Conv2d(1, num_filters, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(num_filters)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(num_filters, num_filters, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(num_filters)
        self.flatten = nn.Flatten()
        self.fc = nn.Linear(num_filters * 14 * 14, num_classes)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.pool1(x)
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.flatten(x)
        x = self.fc(x)
        return x

# Instantiate the model
model = ConvNet(num_filters=16, num_classes=10)

# print the model to inspect
print(model)


## 3.6 Train the model

The model is trained here using the categorical cross-entropy loss with Adam optimizer.

Note that in PyTorch, unlike Keras, you have to explicitly code out the main steps in the training algorithm for each epoch of training and iterate over these steps in a for loop.

In addition, you have to explicitly evaluate performance on validation data in a separate for loop.

In [None]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 5
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in trainloader:
        optimizer.zero_grad()              # initialise loss function gradients
        outputs = model(images)            # forward pass through network
        loss = criterion(outputs, labels)  # compute the loss
        loss.backward()                    # backward pass for loss function gradients
        optimizer.step()                   # update model parameters

        running_loss += loss.item()        # accumulate the loss
        _, predicted = torch.max(outputs.data, 1)  # get the predicted class
        total += labels.size(0)                    # number of predictions
        correct += (predicted == labels).sum().item()  # sum correct predictions

    epoch_loss = running_loss / len(trainloader)  # epoch loss
    epoch_accuracy = 100 * correct / total        # epoch accuracy
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}, Training Accuracy: {epoch_accuracy:.2f}%')

    # Validation loop - to monitor accuracy on independent validation data
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for images, labels in testloader:
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()

    val_epoch_loss = val_loss / len(testloader)
    val_epoch_accuracy = 100 * val_correct / val_total
    print(f'Validation Loss: {val_epoch_loss:.4f}, Validation Accuracy: {val_epoch_accuracy:.2f}%')

## 3.7 Evaluate the model

In [None]:
# import libraries
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Set the model to evaluation mode. This is important as certain layers like dropout behave differently during training and evaluation.
model.eval()

# Lists to store all predictions and true labels
all_preds = []
all_labels = []

# We don't want to compute gradients during evaluation, hence wrap the code inside torch.no_grad()
with torch.no_grad():

    # Iterate over all batches in the test loader
    for images, labels in testloader:
        # No device transfer needed as we are running on CPU

        # Pass the images through the model to get predictions
        outputs = model(images)

        # Get the class with the maximum probability as the predicted class
        _, predicted = torch.max(outputs, 1)

        # Extend the all_preds list with predictions from this batch
        all_preds.extend(predicted.numpy()) # removed .cpu()

        # Extend the all_labels list with true labels from this batch
        all_labels.extend(labels.numpy()) # removed .cpu()

# Print a classification report which provides an overview of the model's performance for each class
print(classification_report(all_labels, all_preds, target_names=classes))

# Compute the confusion matrix using true labels and predictions
cm = confusion_matrix(all_labels, all_preds)

# Visualize the confusion matrix using seaborn's heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap=plt.cm.Blues, xticklabels=classes, yticklabels=classes)
plt.xlabel('Predicted Label')  # x-axis label
plt.ylabel('True Label')       # y-axis label
plt.title('Confusion Matrix')  # Title of the plot
plt.show()                     # Display the plot

# End of part 3 of the lab