# Introduction

In this assignment you will practice putting together an image classification pipeline based on CNNs for [CIFAR-10 and/or CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) dataset. The goals of this assignment are as follows:



*   Understand the components of a CNN model.
*   Understand how to modify a standard CNN model towards a specific task.
*   Implement and train a LeNet-5 model.
*   Implement and train a VGGNet model.
*   Implement and train a ResNet model.
*   Understand the differences and tradeoffs between these models.

Please fill in all the **TODO** code blocks. Once you are ready to submit:

* Export the notebook `CSCI677_assignment_3.ipynb` as a PDF `[Your USC ID]_CSCI677_assignment_3.pdf`
* Submit your PDF file through [Blackboard](https://blackboard.usc.edu/)

Please make sure that the notebook have been run before exporting PDF, and your code and all cell outputs are visible in your submitted PDF. Regrading request will not be accepted if your code/output is not visible in the original submission. Thank you!

In case you haven't installed PyTorch yet, run the following command to install torch and torchvision.

In [None]:
!pip install torch torchvision

# **Data Preparation**

[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) is a well known dataset composed of 60,000 colored 32x32 images in 10 classes, with 6000 images per class. The utility function `cifar10()` returns the entire CIFAR-10 dataset as a set of four Torch tensors:
* `x_train` contains all training images (real numbers in the range  [0,1] )
* `y_train` contains all training labels (integers in the range  [0,9] )
* `x_test` contains all test images
* `y_test` contains all test labels

This function automatically downloads the CIFAR-10 dataset the first time you run it.

[CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) is just like the CIFAR-10 dataset, except it has 100 classes containing 600 images each. Below we provided wrapper classes for CIFAR-10 and CIFAR-100 datasets. You can choose one or both of them for training your CNNs. If you choose one of them, use the same one to train all your models.

In [None]:
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader

class CIFAR10Dataset:
    def __init__(self, batch_size=128, root="data"):
        self.transform = transforms.Compose(
            [transforms.ToTensor(),
             transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))]
        )
        self.batch_size = batch_size

        self.training_data = datasets.CIFAR10(
            root=root,
            train=True,
            download=True,
            transform=self.transform
        )
        self.train_dataloader = DataLoader(self.training_data, batch_size=self.batch_size, shuffle=True)

        self.test_data = datasets.CIFAR10(
            root=root,
            train=False,
            download=False,
            transform=self.transform
        )
        self.test_dataloader = DataLoader(self.test_data, batch_size=self.batch_size, shuffle=False)

        self.classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


class CIFAR100Dataset:
    def __init__(self, batch_size=128, root="data"):
        self.transform = transforms.Compose(
            [transforms.ToTensor(),
             transforms.Normalize((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761))]  # CIFAR-100 normalization values
        )
        self.batch_size = batch_size

        self.training_data = datasets.CIFAR100(
            root=root,
            train=True,
            download=True,
            transform=self.transform
        )
        self.train_dataloader = DataLoader(self.training_data, batch_size=self.batch_size, shuffle=True)

        self.test_data = datasets.CIFAR100(
            root=root,
            train=False,
            download=False,
            transform=self.transform
        )
        self.test_dataloader = DataLoader(self.test_data, batch_size=self.batch_size, shuffle=False)

        self.classes = self.training_data.classes


# LeNet-5 (20 pts)
LeNet-5, introduced by Yann LeCun in 1998, is a relatively shallow network. It consists of two convolutional layers and two fully connected layers. LeNet-5 was designed for handwritten digit recognition tasks and had a relatively small number of parameters.

## Implement LeNet-5 (10 pts)
Classical LeNet-5 architecture is as follows:


![LeNet-5 Architecture](https://cdn.analyticsvidhya.com/wp-content/uploads/2021/03/Screenshot-from-2021-03-18-12-52-17.png)


Its input is 32x32x1 because it was designed for greyscale images of 32x32. However, inputs from CIFAR-10/100 are colored 32x32 images, therefore you need to modify it. Requirements:
* The model should take inputs of 32x32x3 and output a vector of dimension equal to the number of classes (10 for CIFAR-10 and 100 for CIFAR-100).
* The model should have 2 convolutional layers and 3 fully connected layers::

  (Convolution -> Sigmoid -> Average Pooling) ->

  (Convolution -> Sigmoid -> Average Pooling) ->

  Flattening ->

  (Linear -> Sigmoid) ->

  (Linear -> Sigmoid) -> Linear.
* Use 5x5 convolutional filters.

**Hint**: you can use nn.Sequential() to simplify your implementation.

In [None]:
import torch.nn as nn

class LeNet5(nn.Module):
    def __init__(self, num_classes=10):
        super(LeNet5, self).__init__()
        # TODO

    def forward(self, x):
        # TODO
        return x


## Visualization (10 pts)
Visualize your LeNet-5 using Tensorboard or Netron. Make sure each component of your model is visible.

In [None]:
# TODO

# VGGNet (20 pts)
VGGNet, or Visual Geometry Group Network, is a deep convolutional neural network (CNN) architecture introduced in 2014, known for its simplicity and depth. It employs a uniform structure with small 3x3 convolutional kernels throughout its layers, emphasizing the advantages of increased depth in CNNs. In comparison to AlexNet, VGGNet's uniformity and architectural simplicity make it an influential reference model in deep learning, demonstrating the effectiveness of deeper networks and smaller convolutional kernels for image classification tasks.

In this section, you will implement a variant of VGGNet for CIFAR-10/100.

## Implement VGGNet (20 pts)
Classical VGGNet architecture is as follows:


![VGGNet Architecture](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*HzxRI1qHXjiVXla-_NiMBA.png)


It has 13 convolutional layers and 3 fully connected layers. Its input is 224x224x3 because it was designed for ImageNet. Again, inputs from CIFAR-10/100 are colored 32x32 images, therefore you need to modify it. Requirements:
* The model should take inputs of 32x32x3 and output a vector of dimension equal to the number of classes (10 for CIFAR-10 and 100 for CIFAR-100).
* The model should have 10 convolutional layers and 3 fully connected layers:

 (Conv -> ReLU -> Conv -> ReLU -> Max Pool) ->

 (Conv -> ReLU -> Conv -> ReLU -> Max Pool) ->

 (Conv -> ReLU -> Conv -> ReLU -> Conv -> ReLU -> Max Pool) ->

 (Conv -> ReLU -> Conv -> ReLU -> Conv -> ReLU -> Max Pool) ->

 Flattening ->

 (Linear -> ReLU -> Dropout) ->

 (Linear -> ReLU -> Dropout) -> Linear.
* Use 3x3 convolutional filters with padding 1.

**Hint**: you can use nn.Sequential() or define make_layer() function by yourself to simplify your implementation.

In [None]:
class VGGNet(nn.Module):
    def __init__(self, num_classes=10):
        super(VGGNet, self).__init__()
        # TODO

    def forward(self, x):
        # TODO
        return x

# ResNet (20 pts)
ResNet, short for Residual Network, was introduced in 2015 by Kaiming He et al. At its core, ResNet introduces the concept of residual blocks, which allows gradients to flow directly through the network's many layers. In comparison to earlier architectures like AlexNet, ResNet's approach demonstrates the transformative power of residual connections.

In this section, you will implement ResNet-18 for CIFAR-10/100.

## Implement Residual Block (10 pts)
The Residual Block is a crucial component in ResNet. It works by introducing a shortcut connection, also known as a skip connection, alongside a regular neural network layer. This shortcut connection enables the flow of information directly from one layer to another, bypassing some intermediate layers.

The key idea is to learn a residual function, which represents the difference between the desired output and the current output of the block. By doing so, the block aims to make the output closer to what it should be. This approach mitigates the vanishing gradient problem, which can occur in very deep networks, making it easier to train deep models effectively.

![Residual Block](https://miro.medium.com/v2/resize:fit:1140/format:webp/1*6WlIo8W1_Qc01hjWdZy-1Q.png)


The weight layer usually consists of a convolutional layer and a batch normalization layer. The batch normalization layer, often abbreviated as BatchNorm, normalizes the input of a neural network layer across a mini-batch of data during training. BatchNorm not only accelerates convergence but also acts as a form of regularization, reducing the risk of overfitting. In PyTorch, it is implemented by nn.BatchNorm2d().

You are asked to implement the residual block with the following requirements:
* The residual block takes input of size n * n * `in_channels` and output m * m * `out_channels` with m = (n-1) / `stride` + 1
* The residual function consists of the following components:

  Conv -> BatchNorm -> ReLU -> Conv -> BatchNorm

  where Conv means 3x3 convolutional filters with padding 1. If `stride` != 1, set stride for the first Conv.
* The shortcut should be identity if `in_channels` == `out_channels` and `stride` == 1. Otherwise, it should be a convolutional layer with kernel_size=1 and stride=`stride`.
* After adding the residual function and the shortcut, apply another ReLU activation.

In [None]:
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        # TODO

    def forward(self, x):
        # TODO
        return x

## Implement ResNet-18 (10 pts)
ResNet-18 is part of the ResNet family, known for its exceptional depth and performance in image classification tasks. It consists of 18 layers, beginning with one convolutional layer, followed by a few residual blocks, and ending with a fully-connected layer. Here is a glimpse of its architecture:


![ResNet-18](https://www.researchgate.net/profile/Sajid-Iqbal-13/publication/336642248/figure/fig1/AS:839151377203201@1577080687133/Original-ResNet-18-Architecture.png)


You are asked to implement ResNet-18 for CIFAR-10/100. Requirements:
* The model should take inputs of 32x32x3 and output a vector of dimension equal to the number of classes (10 for CIFAR-10 and 100 for CIFAR-100).
* The model should begin with a convolutional layer with kernel_size=3 and padding=1:

  Conv -> BatchNorm -> ReLU

  The output size should be 32x32x64.
* After the first layer, append with 8 residual blocks such that the output size changes as follows:
  
  32x32x64 -> 32x32x64 -> 32x32x64 -> 16x16x128 -> 16x16x128 -> 8x8x256 -> 8x8x256 -> 4x4x512 -> 4x4x512
* The model should end with average pooling (kernel_size=4), flattening, and a fully-connected layer.


In [None]:
class ResNet18(nn.Module):
    def __init__(self, num_classes=10):
        super(ResNet18, self).__init__()
        # TODO

    def forward(self, x):
        # TODO
        return x

# Training Neural Networks (40 pts)
In this section, you will implement a `Trainer` class, use it to train the models that you defined previously, and evaluate them.

## Check CUDA and GPUs
The following code helps you check if CUDA is available and lists the available GPUs.

In [None]:
import torch
# Check if CUDA is available
if torch.cuda.is_available():
    # Get the number of available GPUs
    num_gpus = torch.cuda.device_count()
    print(f"Number of available GPUs: {num_gpus}")

    # Get the name of each GPU
    for i in range(num_gpus):
        gpu_name = torch.cuda.get_device_name(i)
        print(f"GPU {i}: {gpu_name}")

    # Set the current GPU device
    device = torch.cuda.current_device()
    print(f"Current GPU device: {device} - {torch.cuda.get_device_name(device)}")
else:
    print("CUDA is not available.")

## Complete the Trainer Class (15 pts)
Fill-in all the TODOs

In [None]:
import torch
import torch.nn as nn


class Trainer:
    def __init__(self, dataset, net, optimizer, loss_function=nn.CrossEntropyLoss(),
                 device="cuda:0" if torch.cuda.is_available() else "cpu"):
        self.dataset = dataset
        self.net = net.to(device)
        self.lossFunction = loss_function
        self.optimizer = optimizer
        self.device = device

    def train_one_epoch(self):
        # TODO (5 pts): complete training loop
        pass

    def compute_test_accuracy(self):
        # TODO (5 pts): compute classification accuracy based on test data
        pass

    def train(self, num_epochs=20):
        for epoch in range(num_epochs):
            self.train_one_epoch()
            # TODO (5 pts): print loss for every epoch, print test accuracy for every 5 epochs
            # Feel free to record the training process for analysis

## Training (5 pts)
Follow these steps:
* Create the model, the dataset, and the optimizer.
* Configure the trainer.
* Compute and print test accuracy before training.
* Train the model.
* Compute and print test accuracy after training.

In [None]:
# TODO

## Evaluation using Confusion Matrix (5 pts)
A confusion matrix is a fundamental tool for evaluating the performance of classification models. Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class.

You are asked to evaluate your trained model by computing and printing the confusion matrix. You can either compute it by yourself or use sklearn.metrics.confusion_matrix().

In [None]:
# TODO

## Observations (15 pts)
Write down your observations regarding the results you obtained throughout this assignment. Here are some suggestions:
* **Accuracy and Loss Curves**: Plot and compare the training and validation accuracy and loss curves for each model. This helps visualize how well each model is learning over time and whether they are overfitting or underfitting.
* **Top Misclassified Images**: Examine the classes that are most frequently misclassified by each model. This can provide insights into the types of images that are challenging for each model and may suggest areas for improvement.
* **Feature Visualization**: Visualize the feature maps or activations of intermediate layers in each CNN. This can help you understand what features or patterns each model is learning and whether they differ in terms of learned representations.
* **Robustness Testing**: Assess the robustness of each model by introducing noise, transformations, or adversarial examples to the test data. This can help identify which models are more resilient to perturbations.
* **Runtime and Resource Usage**: Compare the training time and resource usage (e.g., GPU memory) of each model.
* **Hyperparameter Tuning**: Analyze the impact of hyperparameters (learning rates, batch sizes, etc.) on training speed and convergence.
* **Model Size and Efficiency**: Analyze the trade-off between model size and accuracy for each model.
* **Ablation Studies**: Conduct ablation studies by removing or modifying specific components (e.g., dropout, batch normalization, etc.) of each model to understand their contributions to performance.

You don't need to follow them. Feel free to write down any observation you have, or to use tools like Tensorboard to support your observations. You are also welcome to give comments on the design of the assignment.

## **TODO: write down your observations**