**Welcome to Assignment 3 on Deep Learning for Computer Vision.**

This assignment is based on the content you learned in Week-5 of course. 


#### **Instructions**
1. Use Python 3.x to run this notebook
2. Write your code only in between the lines 'YOUR CODE STARTS HERE' and 'YOUR CODE ENDS HERE'.
you should not change anything else in the code cells, if you do, the answers you are supposed to get at the end of this assignment might be wrong.
3. Read documentation of each function carefully.
4. All the Best!

##MNIST classification using CNN

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import torch.nn.functional as F
import timeit
import unittest

## Please DONOT remove these lines. 
torch.manual_seed(2022)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(2022)

### Data Loading and Pre-processing

In [2]:
# check availability of GPU and set the device accordingly
#### YOUR CODE STARTS HERE ####
device =torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
#### YOUR CODE ENDS HERE ####

# Hyper parameters
num_epochs = 10
num_classes = 10
learning_rate = 0.001

# define a transforms for preparing the dataset
# for normalization of the MNIST dataset, take mean=0.1307 and std=0.3081

#### YOUR CODE STARTS HERE ####
transform = transforms.Compose([
        transforms.ToTensor(), # convert the image to a pytorch tensor
        transforms.Normalize((0.1307,), (0.3081,)) # normalise with mean and std of the dataset
        ])
#### YOUR CODE ENDS HERE ####

In [3]:
# Load the MNIST training, test datasets using `torchvision.datasets.MNIST` using the transform defined above
#### YOUR CODE STARTS HERE ####
train_dataset =datasets.MNIST('./data', train=True, download=True,
                       transform=transform)
test_dataset = datasets.MNIST('./data', train=False, download=True,
                       transform=transform)
#### YOUR CODE ENDS HERE ####

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



In [4]:
# create dataloaders for training and test datasets
# use a batch size of 32 and set shuffle=True for the training set
#### YOUR CODE STARTS HERE ####
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=32)
#### YOUR CODE ENDS HERE ####


### Network Definition

In [5]:
# Convolutional neural network (Two convolutional layers)
class ConvolutionNet(nn.Module):
    def __init__(self, num_classes=10):
        super( ConvolutionNet, self).__init__()
        
        # define a sequential module with 
        # 1. conv layer with input channel as 1, output channels as 32, kernel size of 5, stride of 1 and padding 2
        # 2. 2D BatchNorm of 32 features 
        # 3. ReLU activation
        # 4. 2D MaxPool with kernel size of 2 and stride of 2

        #### YOUR CODE STARTS HERE ####
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        #### YOUR CODE ENDS HERE ####

        # define a sequential module with 
        # 1. conv layer with input channel as 32, output channels as 16, kernel size of 7, stride of 1 and padding 3
        # 2. 2D BatchNorm of 16 features 
        # 3. ReLU activation
        # 4. 2D MaxPool with kernel size of 2 and stride of 2
        
        #### YOUR CODE STARTS HERE ####
        self.conv_block2 =nn.Sequential(
            nn.Conv2d(32, 16, kernel_size=7, stride=1, padding=3),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        #### YOUR CODE ENDS HERE ####
    
        # define a linear(dense) layer with output features corresponding to the number of classes in the dataset

        #### YOUR CODE STARTS HERE ####
        self.fc = nn.Linear(7*7*16, num_classes) 
        #### YOUR CODE ENDS HERE ####

    def forward(self, x):
        # Use the sequential convolution blocks defined above (conv_block1--> conv_block2-->fc) and 
        # write the forward pass.
        
        #### YOUR CODE STARTS HERE ####
        output = self.conv_block1(x)
        output = self.conv_block2(output)
        
        output = output.reshape(output.size(0), -1)
        output = self.fc(output)
        #### YOUR CODE ENDS HERE ####
        return output


### Question 1

What is the size of parameter matrix corresponding to convolution layer of second sequential block ? 

1. 32x16x5x5
2. 32x32x6x6
3. 16x32x7x7
4. 32x16x4x4

Answer (3)

In [6]:
#### YOUR CODE STARTS HERE ####
model = ConvolutionNet(num_classes).to(device)
model.conv_block2[0].weight.shape
#or
# params=list(model.parameters())
# params[4].shape
#### YOUR CODE ENDS HERE ####

torch.Size([16, 32, 7, 7])

### Training and Inference

In [7]:
#define the model
#### YOUR CODE STARTS HERE ####
model = ConvolutionNet(num_classes).to(device)
#### YOUR CODE ENDS HERE ####


#define cross entropy loss and Adam optimizer using learning rate=learning_rate
#### YOUR CODE STARTS HERE ####
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
#### YOUR CODE ENDS HERE ####

# Train the model
total_step = len(train_dataloader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_dataloader):
       #### YOUR CODE STARTS HERE ####
        # send the image, target to the device
        images = images.to(device)
        labels = labels.to(device)
        
        # flush out the gradients stored in optimizer
        optimizer.zero_grad()
        # pass the image to the model and assign the output to variable named output
        output = model(images)
        # calculate the loss (use cross entropy in pytorch)
        loss = criterion(output, labels)
        # do a backward pass
        loss.backward()
        # update the weights
        optimizer.step()
       #### YOUR CODE ENDS HERE ####
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test the model
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_dataloader:
      ### YOUR CODE STARTS HERE ####
        # send the image, target to the device
        images = images.to(device)
        labels = labels.to(device)
        # pass the image to the model and assign the output to variable named output
        outputs = model(images)
      #### YOUR CODE ENDS HERE ####
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model : {} %'.format(100 * correct / total))

Epoch [1/10], Step [100/1875], Loss: 0.2694
Epoch [1/10], Step [200/1875], Loss: 0.0601
Epoch [1/10], Step [300/1875], Loss: 0.4581
Epoch [1/10], Step [400/1875], Loss: 0.0658
Epoch [1/10], Step [500/1875], Loss: 0.0955
Epoch [1/10], Step [600/1875], Loss: 0.0179
Epoch [1/10], Step [700/1875], Loss: 0.0468
Epoch [1/10], Step [800/1875], Loss: 0.0204
Epoch [1/10], Step [900/1875], Loss: 0.1856
Epoch [1/10], Step [1000/1875], Loss: 0.0246
Epoch [1/10], Step [1100/1875], Loss: 0.0068
Epoch [1/10], Step [1200/1875], Loss: 0.0577
Epoch [1/10], Step [1300/1875], Loss: 0.0479
Epoch [1/10], Step [1400/1875], Loss: 0.0889
Epoch [1/10], Step [1500/1875], Loss: 0.0067
Epoch [1/10], Step [1600/1875], Loss: 0.0108
Epoch [1/10], Step [1700/1875], Loss: 0.0112
Epoch [1/10], Step [1800/1875], Loss: 0.0601
Epoch [2/10], Step [100/1875], Loss: 0.0057
Epoch [2/10], Step [200/1875], Loss: 0.0055
Epoch [2/10], Step [300/1875], Loss: 0.0076
Epoch [2/10], Step [400/1875], Loss: 0.0235
Epoch [2/10], Step [500

#### Question-2

Report the final test accuracy displayed above (If you are not getting the exact number shown in options, please report the closest number).
1. 84%
2. 76%
3. 99%
4. 57%

Answer- (3)

### Resnet with Squeeze and Excitation Block
In this question, you'll have to code ResNet with a Squeeze and Excitation block from scratch.

It's suggested you first briefly understand how the Squeeze and Excitation block works.

Sidenote: As this assignment is mainly focused on learning things, we don't focus on architecture design and hyper-parameter tuning. When you start using deep learning in real-world applications and competitions, hyper-parameter tuning plays a decent role!



In [8]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import torch.nn.functional as F
import timeit
import unittest

## Please DONOT remove these lines. 
torch.manual_seed(2022)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(2022)

### Data Loading and Pre-processing

In [9]:
# check availability of GPU and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# define a set of transforms for preparing the dataset
transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=8), 
        transforms.RandomHorizontalFlip(), # flip the image horizontally (use pytorch random horizontal flip)
        transforms.ToTensor(), # convert the image to a pytorch tensor
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) # normalise the images with mean and std of the dataset
        ])

# define transforms for the test data: Should they be same as the one used for train? 
transform_test = transforms.Compose([                    
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
        ])

use_cuda = torch.cuda.is_available() # if you have acess to a GPU, enable it to speed the training 

In [10]:
# Load the CIFAR10 training, test datasets using `torchvision.datasets.CIFAR10` using the transform defined above
#### YOUR CODE STARTS HERE ####
train_dataset =datasets.CIFAR10('./data', train=True, download=True,
                       transform=transform)
test_dataset = datasets.CIFAR10('./data', train=False, download=True,
                       transform=transform)
#### YOUR CODE ENDS HERE ####

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [11]:
# create dataloaders for training and test datasets
# use a batch size of 32 and set shuffle=True for the training set
#### YOUR CODE STARTS HERE ####
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=32)
#### YOUR CODE ENDS HERE ####

### Network Definition

In [12]:
# Squeeze and excitation residual neural network (One convolutional layer and one resnet block)
class SEResNet(nn.Module):
    def __init__(self, num_classes=10):
        super( SEResNet, self).__init__()
        
        # define a sequential module named conv_block1 with 
        # 1. conv layer with input channel as 3, output channels as 64, kernel size of 5, stride of 1 and padding 2
        # 2. 2D BatchNorm of 64 features 
        # 3. ReLU activation
        # 4. 2D MaxPool with kernel size of 2 and stride of 2

        #### YOUR CODE STARTS HERE ####
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        #### YOUR CODE ENDS HERE ####

        # define a sequential module named conv_block2 with 
        # 1. conv layer with input channel as 64, output channels as 64, kernel size of 3, stride of 1 and padding 1
        # 2. 2D BatchNorm of 64 features 
        # 3. ReLU activation

        #### YOUR CODE STARTS HERE ####
        self.conv_block2 =nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU())
        #### YOUR CODE ENDS HERE ####

        # define a sequential module named conv_block3 with 
        # 1. conv layer with input channel as 64, output channels as 64, kernel size of 5, stride of 1 and padding 2
        # 2. 2D BatchNorm of 64 features 
        
        #### YOUR CODE STARTS HERE ####
        self.conv_block3 =nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(64))
        #### YOUR CODE ENDS HERE ####
           
        #define a squeeze and excitation block with following requirements
        #1. the output of conv_block3 should be squeezed with average pooling while maintaining the number of channels
        #2. the excitation block should be a sequential module that passes the squeezed vector through a bottleneck of dimension 8
        #   and then then expand it back to its original size while using relu activation in the bottleneck layer and sigmoid in the expanded output(no bias should be used)
        #3. Define a single relu activation layer named relu

        #### YOUR CODE STARTS HERE ####
        self.squeeze = nn.AdaptiveAvgPool2d(1)
        self.excitation = nn.Sequential(
            nn.Linear(64, 8, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(8, 64, bias=False),
            nn.Sigmoid()
        )
        self.relu = nn.ReLU(inplace=True)
        #### YOUR CODE ENDS HERE ####

        # define a linear(dense) layer with output features corresponding to the number of classes in the dataset and input features as per the forward pass defined later

        #### YOUR CODE STARTS HERE ####
        self.fc = nn.Linear(16*16*64, num_classes) 
        #### YOUR CODE ENDS HERE ####

    def forward(self, x):
        # Use the blocks defined above to write the forward pass for the following squeeze and excitation resnet:
        #(input -> conv_block1 -> conv_block2 -> conv_block3 -> squeeze -> excite -> scale (ie scale the ouputs of conv_block3 with the corresponding excitations)->
        # skip connection(from conv_block1 to the scaled outputs) -> relu -> fc -> prediction 
        
        #### YOUR CODE STARTS HERE ####
        output = self.conv_block1(x)
        identity = output
        output = self.conv_block2(output)
        output = self.conv_block3(output)
        se = self.squeeze(output).view(-1, 64)

        se = self.excitation(se).view(-1, 64, 1, 1)
        output = output * se.expand_as(output)
        output += identity
        output = self.relu(output)
        output = output.reshape(output.size(0), -1)
        output = self.fc(output)
        #### YOUR CODE ENDS HERE ####
        return output

### Training and Inference

In [13]:
# Write the model definition, training and testing code exactly as before and replace model name by SEResNet
#### YOUR CODE STARTS HERE ####
model = SEResNet(num_classes).to(device)

#define cross entropy loss and Adam optimizer using learning rate=learning_rate
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_dataloader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_dataloader):
        # send the image, target to the device
        images = images.to(device)
        labels = labels.to(device)
        
        # flush out the gradients stored in optimizer
        optimizer.zero_grad()
        # pass the image to the model and assign the output to variable named output
        output = model(images)
        # calculate the loss (use cross entropy in pytorch)
        loss = criterion(output, labels)
        # do a backward pass
        loss.backward()
        # update the weights
        optimizer.step()
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test the model
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_dataloader:
        # send the image, target to the device
        images = images.to(device)
        labels = labels.to(device)
        # pass the image to the model and assign the output to variable named output
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model : {} %'.format(100 * correct / total))
#### YOUR CODE ENDS HERE ####

Epoch [1/10], Step [100/1563], Loss: 1.9831
Epoch [1/10], Step [200/1563], Loss: 2.0474
Epoch [1/10], Step [300/1563], Loss: 1.8301
Epoch [1/10], Step [400/1563], Loss: 1.2892
Epoch [1/10], Step [500/1563], Loss: 2.3340
Epoch [1/10], Step [600/1563], Loss: 1.3250
Epoch [1/10], Step [700/1563], Loss: 1.2074
Epoch [1/10], Step [800/1563], Loss: 1.4264
Epoch [1/10], Step [900/1563], Loss: 1.2830
Epoch [1/10], Step [1000/1563], Loss: 1.4074
Epoch [1/10], Step [1100/1563], Loss: 1.1634
Epoch [1/10], Step [1200/1563], Loss: 1.5259
Epoch [1/10], Step [1300/1563], Loss: 1.2691
Epoch [1/10], Step [1400/1563], Loss: 1.0858
Epoch [1/10], Step [1500/1563], Loss: 1.2223
Epoch [2/10], Step [100/1563], Loss: 0.6580
Epoch [2/10], Step [200/1563], Loss: 1.2117
Epoch [2/10], Step [300/1563], Loss: 1.5206
Epoch [2/10], Step [400/1563], Loss: 1.1419
Epoch [2/10], Step [500/1563], Loss: 0.8303
Epoch [2/10], Step [600/1563], Loss: 0.8024
Epoch [2/10], Step [700/1563], Loss: 0.9603
Epoch [2/10], Step [800/15

#### Question-3

Report the final test accuracy displayed above (If you are not getting the exact number shown in options, please report the closest number).
1. 85%
2. 73%
3. 54%
4. 65%

Answer- (2)