# Image Classification with PyTorch
### *CIFAR100 Dataset*

---

# Predicting 100 types of objects (multi-class classification)

## Dataset

The data was originally collected and shared by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

Each observation is a color image of size 32 by 32 representing 100 different types of objects such as vehicles, fruits or animals.

This dataset is avalaible from PyTorch: [CIFAR100]
https://pytorch.org/vision/stable/generated/torchvision.datasets.CIFAR100.html)

## Objective

Our goal is to build a Convolution Neural Network model that can predict accurately the different types of objects from the images.

## Instructions

This is a guided exercise where some of the code have already been pre-defined. Your task is to fill the remaining part of the code (it will be highlighted with placehoders) to train and evaluate your model.

This exercise is split in several parts:
1.   Loading and Exploration of the Dataset
2.   Preparing the Dataset
3.   Defining the CNN Architecture:
    - Convolutional layer with 32 kernels of (3,3) and padding = 1
    - ReLU as the activation function
    - Max pooling of (2,2)
    - Convolutional layer with 64 kernels of (3,3) and padding = 1
    - ReLU as the activation function
    - Max pooling of (2,2)
    - Convolutional layer with 128 kernels of (3,3) and padding = 1
    - ReLU as the activation function
    - Max pooling of (2,2)
    - Fully-connected layer of 512 units
    - Fully-connected layer of 100 (output) units
    - ReLU as the activation function for the hidden layers
    - Adam as the optimiser
4.   Training and Evaluation of the Model
5.   Analysing the Results

## Exercise 2 Solution

### 1. Loading and Exploration of the Dataset

**[1.1]** First we need to import the relevant class and libraries that contains the dataset from PyTorch

In [1]:
# Solution
import torch
import torchvision
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

**[1.2]** Apply transformation using compose class. Use Normalize function of PyTorch (https://pytorch.org/vision/main/generated/torchvision.transforms.Normalize.html) to get data data within a range and reduce skewness.

In [2]:
# Solution
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

**[1.3]** Then we will load the CIFAR100 dataset into separate variables: `train_data`, `test_data`. (https://www.cs.toronto.edu/~kriz/cifar.html)

In [3]:
# Solution
train_data = datasets.CIFAR100(root='./data', train=True, download=True, transform=transform)
test_data = datasets.CIFAR100(root='./data', train=False, download=True, transform=transform)

100.0%


### 2.   Preparing the Dataset

[2.1] We already prepared the dataset in section 1.3 by using Nromalize function for training and testing sets. Now, lets have a look at the maximum value and minimum value of train_data[0][0] and test_data[0][0].

In [4]:
# Solution
print(train_data[0][0].min())
print(train_data[0][0].max())
print(test_data[0][0].min())
print(test_data[0][0].max())

tensor(-0.9922)
tensor(1.)
tensor(-0.6314)
tensor(1.)


**[2.2]** Look at the dimensions of train_data and test_data numpy arrays using the [.size()](https://pytorch.org/docs/stable/generated/torch.Tensor.size.html) method


##### Task: Display the dimensions of train_data and test_data

In [5]:
# Solution
print(train_data[0][0].size(), len(train_data))
print(test_data[0][0].size(), len(test_data))

torch.Size([3, 32, 32]) 50000
torch.Size([3, 32, 32]) 10000


[2.3] Let's print the details of train data to know the number of data points, transformation method.

In [6]:
# Solution
print (train_data)

Dataset CIFAR100
    Number of datapoints: 50000
    Root location: ./data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
           )


[2.4] Let's print the details of test data to know the number of data points, transformation method.

In [7]:
# Solution
print (test_data)

Dataset CIFAR100
    Number of datapoints: 10000
    Root location: ./data
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
           )


### 3.   Defining the Architecture of CNN

**[3.1]** Import `torch.nn` as `nn`, `torch.optim` as `optim` and other relevant packages to define the architecture of CNN

In [8]:
# Solution
import torch.nn as nn
import torch.optim as optim

**[3.2]** We will set the seeds for Pytorch in order to get reproducible results

In [9]:
# Solution
torch.manual_seed(42)

<torch._C.Generator at 0x12154a6d0>

**[3.3]** Now we will define the model architecture. In this architecture,we will create a class named CNN_CIFAR consists of 3 convolutional layers using the [Conv2D](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) class from PyTorch. After then we will instantiate 2 fully-connected layers for making the
predictions.

##### Task:
1. Create 3 convolutional layers with the right number of kernels, size of kernel and activation function, Max Pooling layer with [MaxPool2D](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) that will be used after each Conv2D
2. Create 2 fully-connected layers and specify the right number of units and activation function

Note: the first Conv2D will be the input layer so you will need to specify the input channels.

**Task: You need to create 2 fully-connected layers with the relevant number of units and activation function.**

In [10]:
# Solution
# Define the CNN model
class CNN_CIFAR(nn.Module):
    def __init__(self):
        super(CNN_CIFAR, self).__init__()
        # ------------------ First Convolutional Block ------------------
        # in_channels=3 because CIFAR100 images are in RGB
        # out_channels=32 defines how many feature maps this layer will produce
        # kernel_size=3 with padding=1 preserves spatial dimensions (32x32) after convolution
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.relu1 = nn.ReLU()                      # Activation function
        self.pool1 = nn.MaxPool2d(2, stride=2)      # Reduces image dimension by a factor of 2 (32 x 32 -> 16 x 16)

        # ------------------ Second Convolutional Block ------------------
        # in_channels=32 from the previous layer, out_channels=64
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(2, stride=2)      # Reduces image dimension by a factor of 2 (16 x 16 -> 8 x 8)

        # ------------------ Third Convolutional Block ------------------
        # in_channels=64 from the previous layer, out_channels=128
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.relu3 = nn.ReLU()
        self.pool3 = nn.MaxPool2d(2, stride=2)      # Reduces image dimension by a factor of 2 (8 x 8 -> 4 x 4)

        # Flatten layer to convert 3D feature maps into a 1D vector
        self.flatten = nn.Flatten()

        # ------------------ Fully Connected Layers ------------------
        # After 3 rounds of pooling, each dimension is halved thrice: 32 -> 16 -> 8 -> 4
        # Hence, the input to the first FC layer is 128 * 4 * 4 = 2048
        self.fc1 = nn.Linear(128 * 4 * 4, 512)
        self.relu4 = nn.ReLU()
        self.fc2 = nn.Linear(512, 100)  # 100 classes for CIFAR-100

    def forward(self, x):
        # Pass input through the first convolutional block
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool1(x)

        # Pass through the second convolutional block
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.pool2(x)

        # Pass through the third convolutional block
        x = self.conv3(x)
        x = self.relu3(x)
        x = self.pool3(x)

        # Flatten the feature maps for the fully connected layers
        x = self.flatten(x)

        # Pass through the first fully connected layer with ReLU
        x = self.fc1(x)
        x = self.relu4(x)

        # Output layer (no activation here since we'll use CrossEntropyLoss)
        x = self.fc2(x)

        return x

**[3.4]** Now instantiate the class and save it into a variable named `model'. Now our architecture is ready. Lets print the model summary.

In [11]:
# Solution
model = CNN_CIFAR()
print(model)

CNN_CIFAR(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu2): ReLU()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu3): ReLU()
  (pool3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=2048, out_features=512, bias=True)
  (relu4): ReLU()
  (fc2): Linear(in_features=512, out_features=100, bias=True)
)


### 4. Training and Evaluation of the Model

**[4.1]** Instantiate a `nn.CrossEntropyLoss()` (https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) and save it into a variable called `criterion`.

In [12]:
# Solution
criterion = nn.CrossEntropyLoss()

**[4.2]**  Instantiate a `torch.optim.Adam()` optimizer with the model's parameters and 0.001 as learning rate and save it into a variable called `optimizer`

In [13]:
# Solution
optimizer = optim.Adam(model.parameters(), lr = 0.001)

**[4.3]**  Now we will call the DataLoader function that iteratively loads data based on batch size, and save it into two different variables called `train_dataloader` and `test_dataloader`. Set the `batch_size` to 64.

In [14]:
# Solution
batch_size = 64
train_dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

**[4.4]** **Training:** Now it is time to train our model. Set the `EPOCHS` to 5 and create a for loop that will iterate based on the EPOCHS value. A nested loop is initiated that extracts data and target from dataloader_train and introduce the following logics:
- reset the gradients
- perform the forward propagation and get the model predictions
- calculate the loss between the predictions and the actuals
- perform back propagation
- update the weights
- Count the total loss

In [15]:
# Solution
epochs = 5
losses = []
for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_dataloader:
        outputs = model(images) # Forward Propagation to get predicted outcome
        loss = criterion(outputs, labels) # Compute the loss
        losses.append(loss.detach().numpy()) # Keep track of the losses
        optimizer.zero_grad()  # clear gradients for this training step
        loss.backward() # Back propagation
        optimizer.step() # Update the weights
        running_loss += loss.item()

    print(f'Epoch {epoch + 1}/{epochs}, Loss: {running_loss / len(train_dataloader)}')

Epoch 1/5, Loss: 3.580439422136682
Epoch 2/5, Loss: 2.7749517473113507
Epoch 3/5, Loss: 2.3853881872828353
Epoch 4/5, Loss: 2.097533574951884
Epoch 5/5, Loss: 1.8507179412085686


**[4.5]** **Testing:** Now it is time to test our model. Initiate the `model.eval()` along with `torch.no_grad()` to turn off the gradients. Finally calculate the total and correct value. If the predicted output equals the actual output then count the correct value.

In [16]:
# Solution
# Evaluate the model on the test set
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_dataloader:
        outputs = model(images) # get the predicted outcome
        predicted = torch.max(outputs, 1)[1].squeeze()
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

### 5. Analysing the Results

**[5.1]** Let's calculate the `accuracy` of the model by dividing the correct value with the total value and print the `accuracy`.

In [17]:
# Solution
accuracy = correct / total
print(f'Test Accuracy: {accuracy}')

Test Accuracy: 0.4088


After 5 epochs, the performance of the model is still quite low even though we create a deeper CNN model. The reason is the relatively small size of the images (32 by 32). This is not enough for the model to clearly identify relevant patterns for each type of object.