# Welcome to CS 5242 **Homework 4**

ASSIGNMENT DEADLINE ⏰ : **19 Sept 2022** 

In this assignment, we have three parts:

1. Implement some functions in CNNs from scratch *(3 Points)*
2. Implement a CNN and train for CIFAR10 using PyTorch *(5 Points)*
3. Discussion (parametes and flops for AlexNet) *(2 Points)*

Colab is a hosted Jupyter notebook service that requires no setup to use, while providing access free of charge to computing resources including GPUs. In this semester, we will use Colab to run our experiments.

> In this assignment, We **need GPU** to training the CNN model. You may need to **choose GPU in Runtime -> Change runtime type -> Hardware accerator**

### **Grades Policy**

We have 10 points for this homework. 15% off per day late, 0 scores if you submit it 7 days after the deadline.

### **Cautions**

**DO NOT** use external libraries like PyTorch or TensorFlow in your implementation.

**DO NOT** copy the code from the internet, e.g. GitHub.

---

### **Contact**

Please feel free to contact us if you have any question about this homework or need any further information.

Slack (Recommend): Shenggan Cheng

TA Email: shenggan@comp.nus.edu.sg

> If you have not join the slack group, you can click [here](https://join.slack.com/t/cs5242ay20222-oiw1784/shared_invite/zt-1eiv24k1t-0J9EI7vz3uQmAHa68qU0aw)

## Setup

Start by running the cell below to set up all required software.

In [8]:
!pip install numpy matplotlib torch

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Import the neccesary library and fix seed for Python, NumPy and PyTorch.

In [9]:
import math
import random

import numpy as np
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

random.seed(0)
np.random.seed(0)
torch.manual_seed(0)

<torch._C.Generator at 0x7f0ce1587810>

Now let's setup the GPU environment. The colab provides a free GPU to use. Do as follows:

- Runtime -> Change Runtime Type -> select `GPU` in Hardware accelerator
- Click `connect` on the top-right

After connecting to one GPU, you can check its status using `nvidia-smi` command.

In [10]:
!nvidia-smi

torch.cuda.is_available()

Sun Sep 25 07:41:04 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   51C    P8    10W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

True

Everything is ready, you can move on and ***Good Luck !*** 😃

## Implement functions in CNNs from scratch

In this section, you need to implement some functions commonly used in CNNs, including convolution, pooling, etc. 

We will compare the computational results of your implemented version with those of pytorch, expecting that the error between the correct implementation and pytorch will be very small.

NOTE: 

1. Implement these functions from scratch, **without** using any neural network libraries. Use linear algebra libraries in python is ok.

2. The performance of the function is not included in this scoring, You just need to pay attention to the correctness of your implementation.

### Step 1
Given a 32x32 pixels, 3 channels input, get a torch tensor with torch.randn().

In [11]:
batch_size = 2
x = torch.randn(batch_size, 3, 32, 32)

### Step 2

For each following functions in the list, get the output tensor "torch_xxx_out" with input as x:

In [12]:
torch_max_pool = nn.MaxPool2d(kernel_size=2,
                              stride=1,
                              padding=0,
                              dilation=1,
                              return_indices=False,
                              ceil_mode=False)
torch_avg_pool = nn.AvgPool2d(kernel_size=2,
                              stride=1,
                              padding=0,
                              ceil_mode=False,
                              count_include_pad=True,
                              divisor_override=None)
torch_conv = nn.Conv2d(in_channels=3,
                       out_channels=6,
                       kernel_size=3,
                       stride=1,
                       padding=0,
                       dilation=1,
                       groups=1,
                       bias=True,
                       padding_mode='zeros')
torch_norm = nn.BatchNorm2d(3)

In [13]:
torch_sigmoid_out = torch.sigmoid(x, out=None)
tmp_tensor = torch.randint(3, (batch_size,))
torch_cross_entropy_out = F.cross_entropy(x[::, ::, 0, 0], tmp_tensor)

In [14]:
torch_max_pool_out = torch_max_pool(x)
torch_avg_pool_out = torch_avg_pool(x)
torch_conv_out = torch_conv(x)
torch_norm_out = torch_norm(x)

### Step 3

Implement these functions from scratch, without using any neural network libraries. Use linear algebra libraries in python is ok. Output your tensors as "my_xxx_out".

In [24]:
def my_pooling_2d(img2d, ker_size, pooling_func, stride=1, padding=0):
    pad_img = np.pad(img2d, ((padding, padding), (padding, padding)), mode='constant')
    img_h, img_w = pad_img.shape

    out_h = int((img_h - ker_size) / stride) + 1
    out_w = int((img_w - ker_size) / stride) + 1

    i0 = np.repeat(np.arange(ker_size), ker_size)
    i1 = np.repeat(np.arange(img_h - ker_size + 1, step=stride), out_w)
    i = i0.reshape(-1, 1) + i1.reshape(1, -1)
    j0 = np.tile(np.arange(ker_size), ker_size)
    j1 = np.tile(np.arange(img_w - ker_size + 1, step=stride), out_h)
    j = j0.reshape(-1, 1) + j1.reshape(1, -1)
    selected_img = pad_img[i, j].squeeze()
    max_pool_out = pooling_func(selected_img, axis=0).reshape(out_h, out_w)
    return max_pool_out
  
def my_max_pool(x, kernel_size, stride, padding):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        kernel_size: size of the window to take a max over, 
        stride: stride of the window,
        padding: implicit zero padding to be added on both sides,
        
    Return:
        y: torch tensor of size (N, C_out, H_out, W_out).
    """

    # === Complete the code (0.5')
    N, C_in, H_in, W_in = x.shape
    H_out = int((H_in - kernel_size + 2 * padding) / stride) + 1
    W_out = int((W_in - kernel_size + 2 * padding) / stride) + 1
    C_out = C_in

    y = torch.empty((N, C_out, H_out, W_out), dtype=x.dtype)
    for n in range(N):
        for c in range(C_in):
            y[n,c,:,:] = torch.tensor(my_pooling_2d(x[n,c,:,:], kernel_size, np.max, stride, padding))
    return y
    # === Complete the code

In [25]:
def my_avg_pool(x, kernel_size, stride, padding):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        kernel_size: size of the window, 
        stride: stride of the window,
        padding: implicit zero padding to be added on both sides,
        
    Return:
        y: torch tensor of size (N, C_out, H_out, W_out).
    """

    # === Complete the code (0.5')
    N, C_in, H_in, W_in = x.shape
    H_out = int((H_in - kernel_size + 2 * padding) / stride) + 1
    W_out = int((W_in - kernel_size + 2 * padding) / stride) + 1
    C_out = C_in

    y = torch.empty((N, C_out, H_out, W_out), dtype=x.dtype)
    for n in range(N):
        for c in range(C_in):
            y[n, c, :, :] = torch.tensor(my_pooling_2d(x[n, c, :, :], kernel_size, np.mean, stride, padding))
    return y
    # === Complete the code

In [26]:
def my_conv_2d(img2d, kernel, stride=1, padding=0):
    pad_img = np.pad(img2d, ((padding, padding), (padding, padding)), mode='constant')
    img_h, img_w = pad_img.shape
    k = kernel.detach().numpy()
    ker_size, _ = k.shape

    out_h = int((img_h - ker_size) / stride) + 1
    out_w = int((img_w - ker_size) / stride) + 1

    i0 = np.repeat(np.arange(ker_size), ker_size)
    i1 = np.repeat(np.arange(img_h - ker_size + 1, step=stride), out_w)
    i = i0.reshape(-1, 1) + i1.reshape(1, -1)
    j0 = np.tile(np.arange(ker_size), ker_size)
    j1 = np.tile(np.arange(img_w - ker_size + 1, step=stride), out_h)
    j = j0.reshape(-1, 1) + j1.reshape(1, -1)
    selected_img = pad_img[i, j].squeeze()
    conv = k.reshape(-1, ker_size * ker_size)@selected_img
    return conv.reshape(out_h, out_w)

def my_conv(x, in_channels, out_channels, kernel_size, stride, padding, weight, bias):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        in_channels: number of channels in the input image, it is C_in;
        out_channels: number of channels produced by the convolution;
        kernel_size: size of onvolving kernel, 
        stride: stride of the convolution,
        padding: implicit zero padding to be added on both sides of each dimension,
        
    Return:
        y: torch tensor of size (N, C_out, H_out, W_out)
    """

    # === Complete the code (0.5')
    N, C_in, H_in, W_in = x.shape
    H_out = int((H_in - kernel_size + 2 * padding) / stride) + 1
    W_out = int((W_in - kernel_size + 2 * padding) / stride) + 1
    C_out = out_channels
    y = torch.empty((N, C_out, H_out, W_out))
    for n in range(N):
        for c_out in range(C_out):
            conv_2d_out = torch.zeros((H_out, W_out))
            for c_in in range(C_in):
                conv_2d_out += torch.tensor(my_conv_2d(x[n, c_in, :, :], weight[c_out, c_in, :, :], stride, padding))
            y[n, c_out, :, :] = conv_2d_out + bias[c_out].detach().numpy()
    return y
    # === Complete the code

In [27]:
def my_batchnorm(x, num_features, eps):
    """
    Args:
        x: torch tensor with size (N, C, H, W),
        num_features: number of features in the input tensor, it is C;
        eps: a value added to the denominator for numerical stability. Default: 1e-5
        
    Return:
        y: torch tensor of size (N, C, H, W)
    """

    # === Complete the code (0.5')
    y = torch.empty_like(x)
    _, C, _, _ = x.shape
    for c in range(C):
        mean = np.mean(x[:,c,:,:].numpy())
        variance = np.var(x[:,c,:,:].numpy())
        d = (variance + eps) ** 0.5
        y[:,c,:,:] = torch.tensor((x[:,c,:,:].numpy() - mean) / d)
    return y
    # === Complete the code

In [28]:
def my_sigmoid(x):
    """
    Args:
        x: torch tensor with any size

    Return:
        y: the logistic sigmoid function of x
    """
    # === Complete the code (0.5')
    y = torch.empty_like(x)
    N, C, _, _ = x.shape
    for n in range(N):
        for c in range(C):
            y[n,c,:,:] = 1 / (1 + np.exp(-x[n,c,:,:]))
    return y
    # === Complete the code

In [29]:
def my_cross_entropy(p, y):
    """
    Args:
        p: torch tensor with size of (N, C),
        y (int): torch tensor with size of (N), the values range from 0 to C-1

    Return:
        loss: the cross_entropy of predicted values p and target y.
    """
    # === Complete the code (0.5')
    loss = 0
    N, C = p.shape
    for n in range(N):
        dividend = np.exp(p[n, y[n]].numpy())
        divisor = np.sum(np.exp(p[n,:]).numpy())
        loss -= np.log(dividend / divisor)
    return torch.tensor(loss / N)
    # === Complete the code

In [30]:
my_max_pool_out = my_max_pool(x, kernel_size=2, stride=1, padding=0)
my_avg_pool_out = my_avg_pool(x, kernel_size=2, stride=1, padding=0)
my_conv_out = my_conv(x,
                      in_channels=3,
                      out_channels=6,
                      kernel_size=3,
                      stride=1,
                      padding=0,
                      weight=torch_conv.weight,
                      bias=torch_conv.bias)
my_norm_out = my_batchnorm(x, num_features=3, eps=1e-5)

In [31]:
my_sigmoid_out = my_sigmoid(x)
my_cross_entropy_out = my_cross_entropy(x[::, ::, 0, 0], tmp_tensor)

### Step 4

Compare and show that "torch_xxx_out" and "my_xxx_out" are equal up to small numerical errors.

In [32]:
print(F.mse_loss(my_max_pool_out, torch_max_pool_out))
print(F.mse_loss(my_avg_pool_out, torch_avg_pool_out))
print(F.mse_loss(my_conv_out, torch_conv_out))
print(F.mse_loss(my_norm_out, torch_norm_out))

tensor(0.)
tensor(0.)
tensor(4.7220e-15, grad_fn=<MseLossBackward0>)
tensor(2.9662e-15, grad_fn=<MseLossBackward0>)


In [33]:
print(F.mse_loss(my_sigmoid_out, torch_sigmoid_out))
print(F.mse_loss(my_cross_entropy_out, torch_cross_entropy_out))

tensor(3.6253e-16)
tensor(0., dtype=torch.float64)


## Train CNNs on CIFAR-10 dataset

Implement a CNN and train for CIFAR10 with these definitions:

1. cA-B = Conv2d with input A channels, output B channels - kernel size 3x3, stride (1,1), padding with zeros to keep image size constant, followed by ReLU;

2. mp = maxpool2d kernel size 2x2, stride (2,2);

3. bn = batchnorm2d with affine=False (i.e. non learning batch norm);

4. fcA-B = nn.linear with input A nodes, output B nodes;

5. aap = adaptive average pooling.

Use the definition to make the architecture c3-16 -> c16-16 -> mp -> c16-32 -> c32-32 -> mp -> c32-64 -> c64-64 -> mp -> c64-128 -> c128-128 -> aap -> flatten -> fc128-10 -> cross entropy loss. Adjust learning rate, batch size and other hyper parameters to make classification results **> 75%**.

In [34]:
# === Complete the code (1')
num_epoch = 20 # TODO: please define the number of epoch here.
batch_size = 128 # TODO: please fill the batch size here.
# === Complete the code

In [35]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data',
                                        train=True,
                                        download=True,
                                        transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,
                                          batch_size=batch_size,
                                          shuffle=True,
                                          num_workers=1)

testset = torchvision.datasets.CIFAR10(root='./data',
                                       train=False,
                                       download=True,
                                       transform=transform)
testloader = torch.utils.data.DataLoader(testset,
                                         batch_size=batch_size,
                                         shuffle=False,
                                         num_workers=1)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [36]:
# Creating a CNN model
class CNN(nn.Module):
    
    def __init__(self, num_classes):
        super(CNN, self).__init__()
       
        # === Complete the code (1.5')
        self.conv1 = self.__conv(3, 16)
        self.conv2 = self.__conv(16, 16)
        self.conv3 = self.__conv(16, 32)
        self.conv4 = self.__conv(32, 32)
        self.conv5 = self.__conv(32, 64)
        self.conv6 = self.__conv(64, 64)
        self.conv7 = self.__conv(64, 128)
        self.conv8 = self.__conv(128, 128)

        self.maxPooling = nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        self.aap = nn.AdaptiveAvgPool2d((1,1))
        self.fc = nn.Linear(128, 10)

        # === Complete the code
        
    def forward(self, x):
        # === Complete the code (1.5')
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = self.maxPooling(x)
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = self.maxPooling(x)
        x = F.relu(self.conv5(x))
        x = F.relu(self.conv6(x))
        x = self.maxPooling(x)
        x = F.relu(self.conv7(x))
        x = F.relu(self.conv8(x))
        x = self.aap(x)
        x = torch.flatten(x,1)
        out = self.fc(x)

        # === Complete the code
        return out
    def __conv(self, in_channels, out_channels):
        return nn.Conv2d(in_channels, out_channels, kernel_size=(3,3), stride=(1,1), padding='same')

In [37]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = CNN(10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0005)

In [38]:
for epoch in range(num_epoch):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):

        inputs, labels = data[0].to(device), data[1].to(device)
        # === Complete the code (1')
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        # === Complete the code

        running_loss += loss.item()
        if (i + 1) % 128 == 0:
            print('epoch {:3d} | {:5d} batches loss: {:.4f}'.format(epoch, i + 1, running_loss/128))
            running_loss = 0.0

print('Finished Training')

epoch   0 |   128 batches loss: 2.1171
epoch   0 |   256 batches loss: 1.8518
epoch   0 |   384 batches loss: 1.6782
epoch   1 |   128 batches loss: 1.5882
epoch   1 |   256 batches loss: 1.5213
epoch   1 |   384 batches loss: 1.4575
epoch   2 |   128 batches loss: 1.4133
epoch   2 |   256 batches loss: 1.3786
epoch   2 |   384 batches loss: 1.3261
epoch   3 |   128 batches loss: 1.2791
epoch   3 |   256 batches loss: 1.2486
epoch   3 |   384 batches loss: 1.2170
epoch   4 |   128 batches loss: 1.1826
epoch   4 |   256 batches loss: 1.1544
epoch   4 |   384 batches loss: 1.1335
epoch   5 |   128 batches loss: 1.0771
epoch   5 |   256 batches loss: 1.0622
epoch   5 |   384 batches loss: 1.0595
epoch   6 |   128 batches loss: 1.0016
epoch   6 |   256 batches loss: 0.9840
epoch   6 |   384 batches loss: 0.9910
epoch   7 |   128 batches loss: 0.9544
epoch   7 |   256 batches loss: 0.9675
epoch   7 |   384 batches loss: 0.9339
epoch   8 |   128 batches loss: 0.8840
epoch   8 |   256 batches

In [39]:
dataiter = iter(testloader)
images, labels = dataiter.next()

In [40]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        #images, labels = data
        images, labels = data[0].to(device), data[1].to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

Accuracy of the network on the 10000 test images: 76 %


## Discussion (2 points)

Calculate Parameters and FLOPs(Floating point operations) of **AlexNet** and analyse the ratio of the number of parameters and the amount of calculations for different layers in AlexNet.

Hint:

1. You can refer https://pytorch.org/vision/stable/_modules/torchvision/models/alexnet.html for architecture of AlexNet.
2. You only need to make estimates and do not need to perform rigorous calculations, (e.g. only consider the FLOPs of the convolution and FC in AlexNet model)
3. Because Multiply Accumulate (MAC) operations are performed on the hardware, it is possible to simply consider only the number of multiplications when considering the number of operations when calculating FLOPs.

| Layer                     | Input_size | Output_size | N(parameters)           | FLOPs                       | N(parameters) / FLOPs |
| ------------------------- | ---------- | ----------- | ----------------------- | --------------------------- | --------------------- |
| Conv2d(3, 64, 11, 4)      | 224        | 55          | 3×64×11×11 + 64 = 23296 | 11×11×3×64×55×55 = 70276800 | 0.00033148919         |
| Conv2d(64,192,5,1)        | 28         | 24          | 307392                  | 176947200                   | 0.00173719618         |
| Conv2d(192,384,3,1)       | 12         | 10          | 663936                  | 66355200                    | 0.01000578703         |
| Conv2d(384,256,3,1)       | 10         | 8           | 884992                  | 56623104                    | 0.01562952112         |
| Conv2d(256,256,3,1)       | 8          | 6           | 590080                  | 21233664                    | 0.0277898341          |
| Linear(256 * 6 * 6, 4096) |            |             | 37752832                | 37748736                    | 1.00010850694         |
| Linear(4096, 4096)        |            |             | 16781312                | 16777216                    | 1.00024414063         |
| Linear(4096, 1000)        |            |             | 4097000                 | 4096000                     | 1.00024414063         |
| **Total**                 |            |             | 60793448≈60.8M          | 273110720                   |                       |