# CNNs in PyTorch

In this assignment, you'll implement some Convolutional Neural Networks (CNNs) in PyTorch.

## Setting up

We'll start by importing the following:
- [`torch`](https://pytorch.org/docs/stable/torch.html) - the core PyTorch library.
- [`torch.nn`](https://pytorch.org/docs/stable/nn.html) - a module containing building blocks for NNs such as linear layers, convolutional layers, and so on.
- [`torch.nn.functional`](https://pytorch.org/docs/stable/nn.functional.html) - a module containing activation functions, loss functions, and so on.
- [`torch.optim`](https://pytorch.org/docs/stable/optim.html) - a module containing optimizers which update the parameters of a NN.
- [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) in [`torch.utils.data`](https://pytorch.org/docs/stable/data.html) - Can be used to batch data together and iterate over batches, shuffle data, and parallelize the training process to speed it up.
- [`MNIST`](https://pytorch.org/vision/stable/generated/torchvision.datasets.MNIST.html) in [`torchvision.datasets`](https://pytorch.org/vision/stable/datasets.html) - The [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database) is a collection of images of handwritten digits.
- [`ToTensor`](https://pytorch.org/vision/stable/generated/torchvision.transforms.ToTensor.html#torchvision.transforms.ToTensor) in [`torchvision.transforms`](https://pytorch.org/vision/0.9/transforms.html) - Converts PIL images or NumPy arrays to PyTorch tensors.

In [1]:
# imports
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader

from torchvision.datasets import CIFAR10, MNIST
from torchvision.transforms.v2 import ToTensor
from torchvision import datasets, transforms

## Data

Let's define a transformation for the [CIFAR10 dataset](https://en.wikipedia.org/wiki/CIFAR-10).

We'll first cast the images to PyTorch tensors using [`transforms.ToTensor()`](https://pytorch.org/vision/master/generated/torchvision.transforms.ToTensor.html). These tensors are automatically normalized such that their values are between 0 and 1.

Then, we'll re-normalize the pixel values with [`transforms.Normalize()`](https://pytorch.org/vision/main/generated/torchvision.transforms.Normalize.html) to conform approximately to a standard normal distribution, assuming the mean and standard deviation of any channel of the returned tensor to be 0.5. This is not an unreasonable assumption. It's also a fairly standard thing to do to squash inputs to be in (or close to) the range [-1,1], which is where neural networks work best in terms of converging when performing optimization.

In [2]:
# CIFAR-10 transform - three channels, normalize with 3 means and 3 SDs
cifar_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])

# MNIST transform - single channel, so only 1 mean and 1 SD
mnist_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1308,), (0.3016,))
])

Let's load up the CIFAR-10 dataset. You can specify the split you want using `train=True|False`. `root` is the directory where the dataset will be saved. You can also directly apply the transform from the previous cell by specifying `transform`.

In [3]:
# CIFAR10 data
train_dataset = CIFAR10(
    root='./data',
    train=True,
    download=True,
    transform=cifar_transform
)

val_dataset = CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=cifar_transform
)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:03<00:00, 46767535.02it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


Let's define `DataLoader` objects for the CIFAR10 data now.

We'll use a (mini) batch size of 32. It's common to use powers of 2 in deep learning because it's more efficient to handle such numbers on hardware.

We'll define separate `DataLoader` objects to handle our training and test splits to avoid data leakage (training on the test set or testing on the train set).

We'll also have the `DataLoader` objects shuffle our data whenever we iterate over them (`shuffle=True`). Shuffling data at each epoch is beneficial in that the model won't be optimized in a way that depends on a specific ordering of the data.

Finally, we'll parallelize the loading of the data using 4 CPU processes to load data (`num_workers=4`).

In [4]:
# dataloaders
batch_size = 32

train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=4
)

val_loader = DataLoader(
    val_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=4
)



## Defining and training CNNs

We'll define `criterion` to be [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html), a common loss function used to train classification models.

We'll also define a Stochastic Gradient Descent optmizizer ([`optim.SGD`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html)), which will optimize the parameters of `net`. We'll set two hyperparameters manually: the learning rate (`lr=0.001`) and the momentum (`momentum=0.9`).

In [5]:
# Loss fuction and optimizer
def get_crit_and_opt(net):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
    return criterion, optimizer

Let's see how [LeNet5](https://ieeexplore.ieee.org/document/726791) (Lecun et al. 1998) is implemented. The architecture looks something like this:

![](https://drive.google.com/uc?export=view&id=1PwYfmSXqBnosIQi-ewrr03Ibd_lmRtea)

LeNet5 is compatible with the MNIST dataset. Let's see how to implement the architecture in PyTorch:

In [6]:
class LeNet(nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        self.conv1 = nn.Conv2d(1, 6, 5)
        # 6 input channels to 16 output channels with 5x5 convolution
        self.conv2 = nn.Conv2d(6, 16, 5)
        # affine operations: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120) # 16 channels each of size 5x5 to 1x120 vector
        self.fc2 = nn.Linear(120, 84) # 1x120 vector to 1x84 vector
        self.fc3 = nn.Linear(84, 10) # 1x84 vector to 1x10 vector

    def forward(self, x):
        # average pooling over a 2x2 window
        x = F.avg_pool2d(F.sigmoid(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.avg_pool2d(F.sigmoid(self.conv2(x)), 2)
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.sigmoid(self.fc1(x)) # linear + sig activation
        x = F.sigmoid(self.fc2(x)) # linear + sig activation
        x = self.fc3(x) # linear
        return x

In general, a PyTorch neural network definition must:
- subclass [`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)
- call `super().__init__()` in the constructor (`__init__()`) method
- define the trainable parameters/layers (convolutions, linears, poolings, etc.) in the constructor
- define what should happen to the inputs in the `forward()` method

In [7]:
# Create your own model for the MNIST data here [20 pts]:
class MNISTNet(nn.Module):
    def __init__(self):
        super(MNISTNet, self).__init__()
        # First convolutional layer taking 1 input channel (grayscale image) and producing 32 output channels
        self.conv1 = nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2)
        # Second convolutional layer taking 32 input channels and producing 64 output channels
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2)
        # First fully connected layer: flattening the output of the conv2 to feed into this layer
        self.fc1 = nn.Linear(7*7*64, 1024)
        # Second fully connected layer that outputs 10 classes for MNIST digits (0-9)
        self.fc2 = nn.Linear(1024, 10)

    def forward(self, x):
        # Apply ReLU activation function after first convolution and then max pooling
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        # Apply ReLU activation function after second convolution and then max pooling
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        # Flatten the tensor for feeding into fully connected layers
        x = x.view(-1, 7*7*64)
        # Apply ReLU activation function after first fully connected layer
        x = F.relu(self.fc1(x))
        # Output layer with log softmax activation to get probability distribution over 10 classes
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


In [8]:
# Create your own model for the CIFAR10 data here [20 pts]:
class CIFAR10Net(nn.Module):
    def __init__(self):
        super(CIFAR10Net, self).__init__()
        # First convolutional layer with 3 input channels (RGB image) and 64 output channels
        self.conv1 = nn.Conv2d(3, 64, kernel_size=5, stride=1, padding=2)
        # Second convolutional layer with 64 input channels and 128 output channels
        self.conv2 = nn.Conv2d(64, 128, kernel_size=5, stride=1, padding=2)
        # Adding a third convolutional layer
        self.conv3 = nn.Conv2d(128, 256, kernel_size=5, stride=1, padding=2)
        # Adding a fourth convolutional layer
        self.conv4 = nn.Conv2d(256, 512, kernel_size=5, stride=1, padding=2)


        # Max pooling layer that will be used after each convolutional layer
        #self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        # Overlapping pooling
        self.pool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # Dummy forward pass to determine the size of the feature map before the first fully connected layer
        dummy_input = torch.autograd.Variable(torch.zeros(1, 3, 32, 32))  # Assuming CIFAR10 images are 32x32
        dummy_output = self.pool(self.conv4(self.pool(self.conv3(self.pool(self.conv2(self.pool(self.conv1(dummy_input))))))))
        self.final_feature_map_size = dummy_output.size(-1) * dummy_output.size(-2) * dummy_output.size(-3)

        # First fully connected layer: flattening the output of conv2 to feed into this layer
        self.fc1 = nn.Linear(self.final_feature_map_size, 1024)
        # Second fully connected layer
        self.fc2 = nn.Linear(1024, 512)
        # Third fully connected layer that outputs 10 classes for CIFAR10
        self.fc3 = nn.Linear(512, 10)

    def forward(self, x):
        # Apply ReLU activation function after first convolution and then apply max pooling
        x = F.relu(self.pool(self.conv1(x)))
        # Apply ReLU activation function after second convolution and then apply max pooling
        x = F.relu(self.pool(self.conv2(x)))
        # Apply ReLU activation function after Thrid, fourth, and fifth convolution and then apply max pooling
        x = F.relu(self.pool(self.conv3(x)))
        x = F.relu(self.pool(self.conv4(x)))

        # Flatten the tensor for feeding into fully connected layers
        x = x.view(-1, self.final_feature_map_size)
        # Apply ReLU activation function after first and second fully connected layers
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        # Output layer with log softmax activation to get probability distribution over 10 classes
        x = self.fc3(x)
        return F.log_softmax(x, dim=1)


Below is a useful object for tracking losses/performance during training and dev.

In [9]:
class AverageMeter(object):

    """Computes and stores an average and current value."""

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

We'll define an accuracy metric that flexibly computes top-k accuracies.

In [10]:
def error_rate(output, target, topk=(1,)):

    """Computes the top-k error rate for the specified values of k."""

    maxk = max(topk) # largest k we'll need to work with
    batch_size = target.size(0) # determine batch size

    # get maxk best predictions for each item in the batch, both values and indices
    _, pred = output.topk(maxk, 1, True, True)

    # reshape predictions and targets and compare them element-wise
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk: # for each top-k accuracy we want

        # num correct
        correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
        # num incorrect
        wrong_k = batch_size - correct_k
        # as a percentage
        res.append(wrong_k.mul_(100.0 / batch_size))

    return res

If you connect to a runtime with a T4 available, this line will ensure computations that can be done on the T4 are done there.

In [11]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

The training function below takes the training set's `DataLoader`, the model we are training, the loss function we are using, and the optimizer for this model.

It then trains the model on the data for 1 epoch.

In [12]:
# training function - 1 epoch
def train(
    train_loader,
    model,
    criterion,
    optimizer,
    epoch,
    epochs,
    print_freq = 100,
    verbose = True
):

    # track average and worst losses
    losses = AverageMeter()

    # set training mode
    model.train()

    # iterate over data - automatically shuffled
    for i, (images, labels) in enumerate(train_loader):

        # put batch of image tensors on GPU
        images = images.to(device)
        # put batch of label tensors on GPU
        labels = labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # model output
        outputs = model(images)

        # loss computation
        loss = criterion(outputs, labels)

        # back propagation
        loss.backward()

        # update model parameters
        optimizer.step()

        # update meter with the value of the loss once for each item in the batch
        losses.update(loss.item(), images.size(0))

        # logging during epoch
        if i % print_freq == 0 and verbose == True:
            print(
                f'Epoch: [{epoch+1}/{epochs}][{i:4}/{len(train_loader)}]\t'
                f'Loss: {losses.val:.4f} ({losses.avg:.4f} on avg)'
            )

    # log again at end of epoch
    print(f'\n* Epoch: [{epoch+1}/{epochs}]\tTrain loss: {losses.avg:.3f}\n')

    return losses.avg

In [13]:
# val function
def validate(
    val_loader,
    model,
    criterion,
    epoch,
    epochs,
    print_freq = 100,
    verbose = True
):

    # track average and worst losses and batch-wise top-1 and top-5 accuracies
    losses = AverageMeter()
    top1 = AverageMeter()
    top5 = AverageMeter()

    # set evaluation mode
    model.eval()

    # iterate over data - automatically shuffled
    for i, (images, labels) in enumerate(val_loader):

        # put batch of image tensors on GPU
        images = images.to(device)
        # put batch of label tensors on GPU
        labels = labels.to(device)

        # model output
        output = model(images)

        # loss computation
        loss = criterion(output, labels)

        # top-1 and top-5 accuracy on this batch
        err1, err5, = error_rate(output.data, labels, topk=(1, 5))

        # update meters with the value of the loss once for each item in the batch
        losses.update(loss.item(), images.size(0))
        # update meters with top-1 and top-5 accuracy on this batch once for each item in the batch
        top1.update(err1.item(), images.size(0))
        top5.update(err5.item(), images.size(0))

        # logging during epoch
        if i % print_freq == 0 and verbose == True:
            print(
                f'Test (on val set): [{epoch+1}/{epochs}][{i:4}/{len(val_loader)}]\t'
                f'Loss: {losses.val:.4f} ({losses.avg:.4f} on avg)\t'
                f'Top-1 err: {top1.val:.4f} ({top1.avg:.4f} on avg)\t'
                f'Top-5 err: {top5.val:.4f} ({top5.avg:.4f} on avg)'
            )

    # logging for end of epoch
    print(
        f'\n* Epoch: [{epoch+1}/{epochs}]\t'
        f'Test loss: {losses.avg:.3f}\t'
        f'Top-1 err: {top1.avg:.3f}\t'
        f'Top-5 err: {top5.avg:.3f}\n'
    )

    # avergae top-1 and top-5 accuracies batch-wise, and average loss batch-wise
    return top1.avg, top5.avg, losses.avg

In [14]:
# best error rates so far
best_err1 = 100
best_err5 = 100

In [15]:
# Run the main function.
if __name__ == '__main__':

    # select a model to train here (CIFAR10Net or MNISTNet)
    model = CIFAR10Net()

    # move to GPU
    model.to(device)

    # select number of epochs
    epochs = 10

    # get criterion and optimizer
    criterion, optimizer = get_crit_and_opt(model)

    # epoch loop
    for epoch in range(0, epochs):

        # train for one epoch
        train_loss = train(
          train_loader,
          model,
          criterion,
          optimizer,
          epoch,
          epochs
        )

        # evaluate on validation set
        err1, err5, val_loss = validate(
          val_loader,
          model,
          criterion,
          epoch,
          epochs
        )

        # remember best prec@1 and save checkpoint
        is_best = err1 <= best_err1
        best_err1 = min(err1, best_err1)
        if is_best:
            best_err5 = err5

        print('Current best error rate (top-1 and top-5 error):', best_err1, best_err5, '\n')
    print('Best error rate (top-1 and top-5 error):', best_err1, best_err5)

Epoch: [1/10][   0/1563]	Loss: 2.3060 (2.3060 on avg)
Epoch: [1/10][ 100/1563]	Loss: 2.3057 (2.3020 on avg)
Epoch: [1/10][ 200/1563]	Loss: 2.3058 (2.3023 on avg)
Epoch: [1/10][ 300/1563]	Loss: 2.3025 (2.3020 on avg)
Epoch: [1/10][ 400/1563]	Loss: 2.2998 (2.3016 on avg)
Epoch: [1/10][ 500/1563]	Loss: 2.3023 (2.3012 on avg)
Epoch: [1/10][ 600/1563]	Loss: 2.2948 (2.3006 on avg)
Epoch: [1/10][ 700/1563]	Loss: 2.2985 (2.2999 on avg)
Epoch: [1/10][ 800/1563]	Loss: 2.2892 (2.2989 on avg)
Epoch: [1/10][ 900/1563]	Loss: 2.2742 (2.2973 on avg)
Epoch: [1/10][1000/1563]	Loss: 2.2543 (2.2944 on avg)
Epoch: [1/10][1100/1563]	Loss: 2.1502 (2.2865 on avg)
Epoch: [1/10][1200/1563]	Loss: 2.0576 (2.2707 on avg)
Epoch: [1/10][1300/1563]	Loss: 1.9634 (2.2526 on avg)
Epoch: [1/10][1400/1563]	Loss: 1.9403 (2.2350 on avg)
Epoch: [1/10][1500/1563]	Loss: 2.0529 (2.2175 on avg)

* Epoch: [1/10]	Train loss: 2.208

Test (on val set): [1/10][   0/313]	Loss: 2.0678 (2.0678 on avg)	Top-1 err: 78.1250 (78.1250 on avg)

In [16]:
# Create a classification report for one model [10 pts]
from sklearn.metrics import classification_report
import numpy as np


y_true = []
y_pred = []

# Set the model to evaluation mode
model.eval()

# Disable gradient calculation for efficiency
with torch.no_grad():
    for images, labels in val_loader:
        # Move tensors to the appropriate device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass to get the model's predictions
        outputs = model(images)

        # Convert outputs to predicted class by taking the index with the maximum score in each output row
        _, predicted = torch.max(outputs, 1)

        # Append true and predicted labels to lists
        y_true.extend(labels.cpu().numpy())
        y_pred.extend(predicted.cpu().numpy())

# Convert lists to arrays for compatibility with classification_report
y_true = np.array(y_true)
y_pred = np.array(y_pred)

#___________________________________________________________________________________________________________________________

# Assuming you have 10 classes for CIFAR10, and their names as follows:
target_names = ['airplane', 'bird', 'vegetable', 'dog', 'cat',
                'car', 'fruit', 'train', 'rabbit', 'baby']

# Generate the classification report
report = classification_report(y_true, y_pred, target_names=target_names)

print(report)



              precision    recall  f1-score   support

    airplane       0.81      0.77      0.79      1000
        bird       0.92      0.84      0.88      1000
   vegetable       0.51      0.83      0.63      1000
         dog       0.55      0.66      0.60      1000
         cat       0.77      0.69      0.73      1000
         car       0.90      0.42      0.57      1000
       fruit       0.74      0.88      0.80      1000
       train       0.91      0.72      0.80      1000
      rabbit       0.86      0.89      0.87      1000
        baby       0.87      0.84      0.86      1000

    accuracy                           0.75     10000
   macro avg       0.79      0.75      0.75     10000
weighted avg       0.79      0.75      0.75     10000



Experiment Results:

*   Step 1: The first trial gave me low accuracy of 58%.
Top-1 err: 58.000	Top-5 err: 29.390
*   Step 2: I added 5 layers and the second trial  of 77%.
Top-1 err: 77.000	Top-5 err: 26.390
*   Step 3: Add Overlapping Pooling droppping gave me:
Top-1 err: 69.000	Top-5 err: 13.390
*   Step 3: adding 10 Epochs









