# Milestone 2: Covolutional Neural Networks

Making a ResNetV2-20 model to perform the CIFAR-10 image classification task.

## Model Specfications

Model: ResNetV2-20
- Input layer: Input size: (32 x 32) x 3
    - conv2d (3 x 3) x 64
- ResBlock 1: Input size (32 x 32) x 64
     - conv2d (3 x 3) x 16
     - conv2d (3 x 3) x 16
- ResBlock 2: Input size (32 x 32) x 16
     - conv2d (3 x 3) x 16
     - conv2d (3 x 3) x 16
- ResBlock 3: Input size (32 x 32) x 16
     - conv2d (3 x 3) x 16
     - conv2d (3 x 3) x 16  
- ResBlock 4: Input size (32 x 32) x 16
     - conv2d (3 x 3) x 32, stride 2
     - conv2d (3 x 3) x 32
- ResBlock 5: Input size (16 x 16) x 32
     - conv2d (3 x 3) x 32
     - conv2d (3 x 3) x 32
- ResBlock 6: Input size (16 x 16) x 32
     - conv2d (3 x 3) x 32
     - conv2d (3 x 3) x 32
- ResBlock 7: Input size (16 x 16) x 32
     - conv2d (3 x 3) x 64, stride 2
     - conv2d (3 x 3) x 64
- ResBlock 8: Input size (8 x 8) x 64
     - conv2d (3 x 3) x 64
     - conv2d (3 x 3) x 64
- ResBlock 9: Input size (8 x 8) x 64
     - conv2d (3 x 3) x 64
     - conv2d (3 x 3) x 64
- Pooling: input size (8 x 8) x 64
     - GlobalAveragePooling/AdaptiveAveragePooling((1,1))
- Output layer: Input size (64,)
     - Dense/Linear (64,10)
     - Activation: Softmax



Data: CIFAR-10 tiny images
- 32 x 32 x 3 RGB colour images
- Train/Test split: Use data splits already given (50,000 train, 10,000 test). From the 50,000 train images, use 45,000 for training and 5,000 for validation every epoch inside the training loop. Reserve the 10,000 test set images for final evaluation.
- Pre-processing inputs: 
     - Depending on data source, scale int8 inputs to [0, 1] by dividing by 255
     - ImageNet normalization 
          - From the RGB channels, subtract means [0.485, 0.456, 0.406] and divide by standard deviations [0.229, 0.224, 0.225]
     - 4 pixel padding on the side, then apply 32x32 crop randomly sampled from the padded image or its horizontal flip as in Section 3.2 of [3]
- Preprocessing labels: Use integer indices


Hyperparameters:
- Optimizer: AdamW
- learning rate: 1e-3 
- beta_1: 0.9
- beta_2: 0.999
- weight decay: 0.0001
- Number of epochs for training: 50 (TBD)
- Batch size: 256 (TBD)


Metrics to record:
- Total training time (from start of training script to end of training run)
- Training time per 1 epoch (measure from start to end of each epoch and average over all epochs)
- Inference time per batch (measure per batch and average over all batches)
- Last epoch training loss
- Last epoch eval accuracy (from the 5,000 evaluation dataset)
- Held-out test set accuracy (from the 10,000 test dataset)



#### Importing different libraries needed for model development

In [1]:
# Necessary Libraries
import numpy as np
import mxnet as mx
from mxnet import gluon, nd, autograd as ag, npx
from mxnet.gluon import nn
from mxnet.gluon.data.vision import transforms, CIFAR10
import gluoncv
from gluoncv.data import transforms as gcv_transforms

# Dataset libraries
import tensorflow as tf
import tensorflow_datasets as tfds

# json library neded to export metrics 
import json
import time

import matplotlib as plt


2022-12-21 15:26:45.826626: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-21 15:26:46.551434: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-12-21 15:26:46.651287: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/mxnet/lib/python3.8/site-packages/cv2/../../lib64:
2022-12-21 15:26:46.651365: I tensorflo

In [2]:
# UNCOMMENT if multiple gpus
# # number of GPUs to use
# num_gpus = 1
# ctx = [mx.gpu(i) for i in range(num_gpus)]

In [3]:
#labels just for reference
labels = {
    0: "airplane",
    1: "automobile",
    2: "bird",
    3: "cat",
    4: "deer",
    5: "dog",
    6: "frog",
    7: "horse",
    8: "ship",
    9: "truck"
}


<h4> Importing the CIFAR-10 dataset + pre-processing </h4>

In [4]:
transform_train = transforms.Compose([ gcv_transforms.RandomCrop(32, pad=2), # Randomly crop an area and resize it to be 32x32, then pad it to be 40x40 
                                    transforms.RandomFlipLeftRight(), # Applying a random horizontal flip
                                    transforms.ToTensor(), # Transpose the image from height*width*num_channels to num_channels*height*width
                                                           # and map values from [0, 255] to [0,1]
                                    # Normalize the image with mean and standard deviation calculated across all images
                                    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 
                                ])

# Since training dataset provides more randomized data (and should be more generalizable), i will not be performing the random operations on the testing dataset.
transform_test = transforms.Compose([transforms.ToTensor(),# Transpose the image from height*width*num_channels to num_channels*height*width
                                                           # and map values from [0, 255] to [0,1]
                                    # Normalize the image with mean and standard deviation calculated across all images
                                    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 
                                ])

In [5]:
batch_size = 256

# USE THIS BATCH SIZE IF MULTIPLE GPUS
# # Batch Size for Each GPU
# per_device_batch_size = 128
# # Number of data loader workers
# num_workers = 8
# # Calculate effective total batch size
# batch_size = per_device_batch_size * num_gpus

train_data = gluon.data.DataLoader(
    CIFAR10(train=True).transform_first(transform_train),
    batch_size=batch_size, shuffle=True, last_batch='discard') # add 'num_workers = num_workers'
test_data = gluon.data.DataLoader(
    CIFAR10(train=False).transform_first(transform_test),
    batch_size=batch_size, shuffle=True, last_batch='discard') # add 'num_workers = num_workers'

# will be using- gluon.utils for the rest of the stuff

In [6]:
class BasicBlock(nn.HybridBlock):
    def __init__ (self, in_channels, channels, strides = 1 , downsample = False, **kwargs):
        super(BasicBlock, self).__init__(**kwargs)
        conv_kwargs = {
            "kernel_size": (3,3),
            "padding": 1,
            "bias": False
        }
        self.strides = strides
        self.in_channels = in_channels
        self.channels = channels

        self.bn1 = nn.BatchNorm()
        self.conv1 = nn.Conv2D(channels, kernl_size = 3, strides = strides, padding = 1, use_bias = False, in_channels= in_channels, **conv_kwargs) #verify whether the padding is correct or not
        self.bn2 = nn.BatchNorm()
        self.conv2 = nn.Conv2D(channels, kernel_size=3, strides = 1, padding = 1, use_bias = False, in_channels=in_channels, **conv_kwargs)
        self.relu = nn.Activation('relu')
        
    def downsample(self,x):
    # Downsample with 'nearest' method (this is striding if dims are divisible by stride)
    # Equivalently x = x[:, :, ::stride, ::stride].contiguous()   
        x = nd.UpSampling(x, type = 'nearest',scale = (1/self.stride) )
        #creating padding tenspr for extra channels
        (b, c, h, w) = x.shape
        num_pad_channels = self.channels - self.in_channels
        pad = mx.nd.zeros((b, num_pad_channels, h,w))
        # append this padding to the downsampled identity
        x = mx.nd.concat((x,pad), dim = 1)
        return x


    def forward(self, x):
        if self.strides > 1:
            residual = self.downsample(x)
        else:
            residual = x
        x = self.bn1(x)
        x = npx.activation(x, act_type='relu')
        x = self.conv1(x)

        x = self.bn2(x)
        x = npx.activation(x, act_type='relu')
        x = self.conv2(x)
        return x + residual

In [7]:
class ResNetV2(nn.HybridBlock):
    def __init__(self, **kwargs):
        super(ResNetV2m, self).__init__(**kwargs)

        self.input_layer = nn.Conv2D(3,16, (3,3), padding=1)

        self.layer_1 = BasicBlock(16,16)
        self.layer_2 = BasicBlock(16,16)
        self.layer_3 = BasicBlock(16,16)

        self.layer_4 = BasicBlock(16,32, strides = 2)
        self.layer_5 = BasicBlock(32,32)
        self.layer_6 = BasicBlock(32,32)

        self.layer_7 = BasicBlock(32,64, strides = 2)
        self.layer_8 = BasicBlock(64,64)
        self.layer_9 = BasicBlock(64,64)

        self.pool = nn.AvgPool2D(pool_size=(1,1), layout = 'NCHW')
        self.output_layer = nn.Dense(64,10, activation= 'Softmax')

In [8]:
num_epochs = 50
