# CS492(I) Assignment #1: Image Classification using Convolutional Neural Networks (CNNs)
---
TA : Yoonki Cho (yoonki@kaist.ac.kr)

---

## Instructions
- In this assignment, we will classify the images in CIFAR10 dataset into 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck) using Convolutional Neural Networks(CNNs).  

- To this end, you need to implement necessary network components (e.g. residual blocks) using nn.Module class and complete whole CNNs with those blocks. Then, you will experiment those network architectures using given train/testing pipeline and report classfication accuracies on the test set.      

- In each part, you will be given a starter code for the implementation. Please read the attached illustrations and instructions carefully to implement the codes.  

- As you follow the given steps, fill in the section marked ***Px.x*** (e.g. P1.1, P1.2, etc) with the appropriate code. **Note that you can only fill those marked areas, and cannot modify rest of the  skeleton code.**  

- In short, you should (1) complete the code, (2) experiment with several configurations of CNNs, and (3) report the final classification accuracies on the CIFAR10 test set.
- To start with, you should download this ipynb file into your own google drive.
You can save the file into your own google drive by clicking `make a copy(사본만들기)`. Find the copy in your drive, change their name to `assignment1.ipynb`, if their names were changed to e.g. `Copy of assignment1.ipyb` or `assignment1.ipynb의 사본`.



---

---


# Prerequisite: change the runtime type to **GPU**.

![test](https://docs.google.com/uc?export=download&id=1Jugrjl86L9EY1ePTjH8OVMFq7gmZsoz_)

---
# Prerequisite: mount your gdrive.

In [None]:
# mount drive https://datascience.stackexchange.com/questions/29480/uploading-images-folder-from-my-system-into-google-colab
# login with your google account and type authorization code to mount on your google drive.
import os
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


---
# Prerequisite: setup the `root` directory properly.

In [None]:
# Specify the directory path where `assignemnt1.ipynb` exists.
# For example, if you saved `assignment1.ipynb` in `/gdrive/My Drive/CS492I/assignment1` directory,
# then set root = '/gdrive/My Drive/CS492I/assignment1'
root = '/gdrive/My Drive/CS492I/assignment1'

---
# Import libraries

In [None]:
from PIL import Image
from tqdm import tqdm
from pathlib import Path
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision.datasets import CIFAR10
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

-----

# 1.Implementing Network Modules

In this assignment, you will implement four modularized blocks and one network class as follows:

**Block classes**  
(Example) Multilayer perceptron Block (MLPBlock) **To provide a starting point, the solutions for this section are given below.**  
(1) Convolutional block (ConvBlock)   
(2) Plain residual block (ResBlockPlain)  
(3) Residual block with bottleneck (ResBlockBottleneck)  
(4) Inception Block (InceptionBlock)

**Network class**  
(1) MyNetwork

In each cell, there is a starter code, a schematic illustration, and instructions that will guide you to implement each module correctly. Specifically, the schematic illustrations are to show you the computational graphs of modules, which give you high-level views on how the modules should be constructed and work. (E.g. which nn.Module to use, or input/output shape of each layer written in italics). Therefore, please read the illustrations and instructions carefully to complete the codes.
<!--
Below is an example.

### Example: ConvLayer Module [(Illustration)](https://docs.google.com/drawings/d/1_aPhPSPgh5-5FEfI_jnfp8r6-wNjY_QYXBT3zzjkHk0/edit?usp=sharing) -->

## Block class

### (Example) Implement MLP Block [(Illustration)](https://docs.google.com/drawings/d/1gTPLeK0H5ooMcn7CNPysqwr9_07fTqkHE4-T3ZqyhPo/edit?usp=sharing)  

In [None]:
class MLPBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(MLPBlock, self).__init__()
        """
        Initialize a basic multi-layer perceptron module components.
        Illustration: https://docs.google.com/drawings/d/1gTPLeK0H5ooMcn7CNPysqwr9_07fTqkHE4-T3ZqyhPo/edit?usp=sharing

        Instructions:
            1. Implement an algorithm that initializes necessary components as illustrated in the above link.
            2. Initialized network components will be referred in `forward` method
               for constructing the dynamic computational graph.

        Args:
            1. in_channels (int): Number of channels in input.
            2. out_channels (int): Number of channels to be produced.
        """
        #######################################
        ## This section is an example.       ##
        self.fc1 = nn.Linear(in_channels, 512)
        self.bn1 = nn.BatchNorm1d(512)
        self.fc2 = nn.Linear(512,128)
        self.bn2 = nn.BatchNorm1d(128)
        self.fc3 = nn.Linear(128, out_channels)
        self.bn3 = nn.BatchNorm1d(out_channels)
        self.act = nn.ReLU()
        #######################################

    def forward(self, x):
        """
        Feed-forward data 'x' through the module.

        Instructions:
            1. Construct the feed-forward computational graph as illustrated in the link
               using the initialized components in __init__ method.

        Args:
            1. x (torch.FloatTensor): A tensor of shape (B, in_channels)
            .
        Returns:
            1. output (torch.FloatTensor): An output tensor of shape (B, out_channels).
        """
        #######################################
        ## This section is an example.       ##
        output = self.act(self.bn1(self.fc1(x)))
        output = self.act(self.bn2(self.fc2(output)))
        output = self.act(self.bn3(self.fc3(output)))
        #######################################
        return output

### (1) Implement Convolutional Block[(Illustration)](https://docs.google.com/drawings/d/1MRYBywpuazlldwC11UTa-kuWMWEDsFewDnirKiFX5us/edit?usp=sharing) (10pt)

In [None]:
class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1,
                 padding=1):
        super(ConvBlock, self).__init__()
        """
        Initialize a basic convolutional layer module components.
        Illustration: https://docs.google.com/drawings/d/1MRYBywpuazlldwC11UTa-kuWMWEDsFewDnirKiFX5us/edit?usp=sharing

        Args:
            1. in_channels (int): Number of channels in the input.
            2. out_channels (int): Number of channels produced.
            3. kernel_size (int) : Size of the kernel used in conv layer (Default:3)
            4. stride (int) : Stride of the convolution (Default:1)
            5. padding (int) : Zero-padding added to both sides of the input (Default:1)
        """
        #################################
        ## P1.1. Write your code here  ##
        self.cn = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias = False)
        self.bn = nn.BatchNorm2d(out_channels)
        self.act = nn.ReLU()




        #################################

    def forward(self, x):
        """
        Feed-forward the data 'x' through the module.
        Instructions:
            1. Construct the feed-forward computational graph as illustrated in the link
               using the initialized components in __init__ method.

        Args:
            1. x (torch.FloatTensor): A tensor of shape (B, in_channels, H, W).

        Returns:
            1. output (torch.FloatTensor): An output tensor of shape (B, out_channels, H, W).
        """
        #################################
        ## P1.2. Write your code here  ##
        output = self.act(self.bn(self.cn(x)))



        #################################
        return output

### (2) Implement ResBlockPlain [(Illustration)](https://docs.google.com/drawings/d/19FS5w7anbTAF6UrMPdM4fs8nk9x3Lm5KRIODawC4duQ/edit?usp=sharing) (10pt)

In [None]:
class ResBlockPlain(nn.Module):
    def __init__(self, in_channels):
        super(ResBlockPlain, self).__init__()
        """Initialize a residual block module components.

        Illustration: https://docs.google.com/drawings/d/19FS5w7anbTAF6UrMPdM4fs8nk9x3Lm5KRIODawC4duQ/edit?usp=sharing

        Instructions:
            1. Implement an algorithm that initializes necessary components as illustrated in the above link.
            2. Initialized network components will be referred in `forward` method
               for constructing the dynamic computational graph.

        Args:
            1. in_channels (int): Number of channels in the input.
        """
        #################################
        ## P2.1. Write your code here ##
        self.cn1 = nn.Conv2d(in_channels, in_channels, kernel_size = 3, stride = 1, padding = 1, bias = False)
        self.cn2 = nn.Conv2d(in_channels, in_channels, kernel_size = 3, stride = 1, padding = 1, bias = False)
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.bn2 = nn.BatchNorm2d(in_channels)
        self.act = nn.ReLU()
        #################################

    def forward(self, x):
        """Feed-forward the data `x` through the network.

        Instructions:
            1. Construct the feed-forward computational graph as illustrated in the link
               using the initialized components in __init__ method.

        Args:
            1. x (torch.FloatTensor): An tensor of shape (B, in_channels, H, W).

        Returns:
            1. output (torch.FloatTensor): An output tensor of shape (B, in_channels, H, W).
        """
        ################################
        ## P2.2. Write your code here ##
        residual = x
        output = self.cn1(x)
        output = self.bn1(output)
        output = self.act(output)
        output = self.cn2(output)
        output = self.bn2(output)
        output += residual
        output = self.act(output)
        ################################
        return output

### (3) Implement ResBlockBottleneck [(Illustration)](https://docs.google.com/drawings/d/1n2E0TwiWhf1IGdD16-MeQjzUcys_V7ETTzn33j_bEy0/edit?usp=sharing) (10pt)  

In [None]:
class ResBlockBottleneck(nn.Module):
    def __init__(self, in_channels, hidden_channels):
        super(ResBlockBottleneck, self).__init__()
        """Initialize a residual block module components.

        Illustration: https://docs.google.com/drawings/d/1n2E0TwiWhf1IGdD16-MeQjzUcys_V7ETTzn33j_bEy0/edit?usp=sharing

        Instructions:
            1. Implement an algorithm that initializes necessary components as illustrated in the above link.
            2. Initialized network components will be referred in `forward` method
               for constructing the dynamic computational graph.

        Args:
            1. in_channels (int): Number of channels in the input.
            2. hidden_channels (int): Number of hidden channels produced by the first ConvLayer module.
        """
        #################################
        ## P3.1. Write your code here  ##
        self.cn1 = nn.Conv2d(in_channels, hidden_channels, kernel_size = 1, stride = 1, padding = 0, bias = False)
        self.bn1 = nn.BatchNorm2d(hidden_channels)

        self.cn2 = nn.Conv2d(hidden_channels, hidden_channels, kernel_size = 3, stride = 1, padding = 1, bias = False)
        self.bn2 = nn.BatchNorm2d(hidden_channels)

        self.cn3 = nn.Conv2d(hidden_channels, in_channels, kernel_size =1 , stride = 1, padding = 0, bias = False)
        self.bn3 = nn.BatchNorm2d(in_channels)
        self.act = nn.ReLU()
        #################################

    def forward(self, x):
        """Feed-forward the data `x` through the network.

        Instructions:
            1. Construct the feed-forward computational graph as illustrated in the link
               using the initialized components in __init__ method.

        Args:
            1. x (torch.FloatTensor): An tensor of shape (B, in_channels, H, W).

        Returns:
            1. output (torch.FloatTensor): An output tensor of shape (B, in_channels, H, W).
        """
        ################################
        ## P3.2. Write your code here ##
        residual = x
        output = self.act(self.bn1(self.cn1(x)))
        output = self.act(self.bn2(self.cn2(output)))
        output = self.bn3(self.cn3(output))
        output += residual
        output = self.act(output)
        ################################
        return output

### (4) Implement InceptionBlock[(Illustration)](https://docs.google.com/drawings/d/1I020R1YqVAr8LWKHgm7N5J5fzFpHvx1fqXuAs6z8qyE/edit?usp=sharing)  (20pt)

In [None]:
class InceptionBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(InceptionBlock, self).__init__()
        """Initialize a basic InpcetionBlock module components.

        Illustration: https://docs.google.com/drawings/d/1I020R1YqVAr8LWKHgm7N5J5fzFpHvx1fqXuAs6z8qyE/edit?usp=sharing

        Instructions:
            1. Implement an algorithm that initializes necessary components as illustrated in the above link.
            2. Initialized network components will be referred in `forward` method
               for constructing the dynamic computational graph.

        Args:
            1. in_channels (int): Number of channels in the input.
            2. out_channels (int): Number of channels in the final output.
        """
        assert out_channels%8==0, 'out channel should be mutiplier of 8'

        ################################
        ## P4.1. Write your code here ##
        dim = out_channels//4
        self.cn_1 = nn.Conv2d(in_channels, dim, kernel_size = 1, stride = 1, padding = 0, bias = False)
        self.bn_1 = nn.BatchNorm2d(dim)

        dim = out_channels//2
        self.cn_2 = nn.Conv2d(in_channels, dim, kernel_size = 1, stride = 1, padding = 0, bias = False)
        self.bn_2_0 = nn.BatchNorm2d(dim)
        self.cn3 = nn.Conv2d(dim, dim, kernel_size = 3, stride = 1, padding = 1, bias = False)
        self.bn_2_1 = nn.BatchNorm2d(dim)

        dim = out_channels//8
        self.cn_3 = nn.Conv2d(in_channels, dim, kernel_size = 1, stride = 1, padding = 0, bias = False)
        self.bn_3_0 = nn.BatchNorm2d(dim)
        self.cn5 = nn.Conv2d(dim, dim, kernel_size = 5, stride = 1, padding = 2, bias = False)
        self.bn_3_1 = nn.BatchNorm2d(dim)

        self.mp = nn.MaxPool2d(kernel_size = 3, stride = 1, padding = 1)
        self.cn_4 = nn.Conv2d(in_channels, dim, kernel_size = 1, stride = 1, padding = 0, bias = False)
        self.bn_4 = nn.BatchNorm2d(dim)

        self.act = nn.ReLU()

        self.conv1x1 = nn.Sequential(self.cn_1, self.bn_1, self.act)
        self.conv3x3 = nn.Sequential(self.cn_2, self.bn_2_0, self.act, self.cn3, self.bn_2_1, self.act)
        self.conv5x5 = nn.Sequential(self.cn_3, self.bn_3_0, self.act, self.cn5, self.bn_3_1, self.act)
        self.maxpool = nn.Sequential(self.mp, self.cn_4, self.bn_4, self.act)
        ################################

    def forward(self, x):
        """Feed-forward the data `x` through the module.

        Instructions:
            1. Construct the feed-forward computational graph as illustrated in the link
               using the initialized components in the __init__ method.

        Args:
            1. x (torch.FloatTensor): A tensor of shape (B, in_channels, H, W).

        Returns:
            1. output (torch.FloatTensor): An output tensor of shape (B, out_channels, H, W).

        """
        ################################
        ## P4.2. Write your code here ##
        out1 = self.conv1x1(x)
        out2 = self.conv3x3(x)
        out3 = self.conv5x5(x)
        out4 = self.maxpool(x)
        output = torch.cat([out1, out2, out3, out4], 1)
        ################################
        return output

## Network class

### (Example) MyNetworkExample

The class `MyNetworkExample` is a sample network using `MLPBlock` implemented above. **You don't have to implement anything in this code section.**

In [None]:
class MyNetworkExample(nn.Module):
    def __init__(self, nf, block_type='mlp'):
        super(MyNetworkExample, self).__init__()
        """Initialize an entire network module components.

        Instructions:
            1. Implement an algorithm that initializes necessary components.
            2. Initialized network components will be referred in `forward` method
               for constructing the dynamic computational graph.

        Args:
            1. nf (int): Number of input channels for the first nn.Linear Module. An abbreviation for num_filter.
            2. block_type (str, optional): Type of blocks to use. ('mlp'. default: 'mlp')
        """
        #######################################
        ## This section is an example.       ##
        if block_type == 'mlp':
            block = MLPBlock
            # Since shape of input image is 3 x 32 x 32, the size of flattened input is 3*32*32.
            self.mlp = block(3*32*32, nf)
            self.fc = nn.Linear(nf, 10)
        else:
            raise Exception(f"Wrong type of block: {block_type}.Expected : mlp")
        #######################################

    def forward(self, x):
        """Feed-forward the data `x` through the network.

        Instructions:
            1. Construct the feed-forward computational graph as illustrated in the link
               using the initialized network components in __init__ method.
        Args:
            1. x (torch.FloatTensor): An image tensor of shape (B, 3, 32, 32).

        Returns:
            1. output (torch.FloatTensor): An output tensor of shape (B, 10).
        """
        #######################################
        ## This section is an example.       ##
        output = self.mlp(x.view(x.size()[0], -1))
        output = self.fc(output)
        return output
        #######################################

### (1) MyNetwork[(Illustration)](https://docs.google.com/drawings/d/1L8PYO8A1EL4BN4bzTWH4ygr-WiS7NDeFz7P1PkhBZwE/edit?usp=sharing) (10pt)

There are two functions to implement in this section. **Read the comments and illustration carefully before you type anything.**

In [None]:
class MyNetwork(nn.Module):
    def __init__(self, nf, block_type='conv', num_blocks=[1, 1, 1]):
        super(MyNetwork, self).__init__()
        """Initialize an entire network module components.

        Illustration: https://docs.google.com/drawings/d/1L8PYO8A1EL4BN4bzTWH4ygr-WiS7NDeFz7P1PkhBZwE/edit?usp=sharing

        Instructions:
            1. Implement an algorithm that initializes necessary components as illustrated in the above link.
            2. Initialized network components will be referred in `forward` method
               for constructing the dynamic computational graph.

        Args:
            1. nf (int): Number of output channels for the first nn.Conv2d Module. An abbreviation for num_filter.
            2. block_type (str, optional): Type of blocks to use. ('conv' | 'resPlain' | 'resBottleneck' | 'inception'. default: 'conv')
            3. num_blocks (list or tuple, optional): A list or tuple of length 3.
               Each item at i-th index indicates the number of blocks at i-th Layer.
               (default: [1, 1, 1])
        """

        self.block_type = block_type

        # Define blocks according to block_type
        if self.block_type == 'conv':
            block = ConvBlock
            block_args = lambda x: (x, x, 3, 1, 1)
        elif self.block_type == 'resPlain':
            block = ResBlockPlain
            block_args = lambda x: (x,)
        elif self.block_type == 'resBottleneck':
            block = ResBlockBottleneck
            block_args = lambda x: (x, x//2)
        elif self.block_type == 'inception':
            block = InceptionBlock
            block_args = lambda x: (x, x)
        else:
            raise Exception(f"Wrong type of block: {block_type}")

        # Define block layer by stacking multiple blocks.
        # You don't need to modify it. Just use these block layers in forward function.
        self.block1 = nn.Sequential(*[block(*block_args(nf)) for _ in range(num_blocks[0])])
        self.block2 = nn.Sequential(*[block(*block_args(nf*2)) for _ in range(num_blocks[1])])
        self.block3 = nn.Sequential(*[block(*block_args(nf*4)) for _ in range(num_blocks[2])])

        ################################
        ## P5.1. Write your code here ##
        self.cn1 = nn.Conv2d(3,nf, kernel_size = 3, stride = 1, padding = 1, bias = False)
        self.bn1 = nn.BatchNorm2d(nf)
        self.mp1 = nn.MaxPool2d(kernel_size = 2, stride = 2)

        self.cn2 = nn.Conv2d(nf, nf*2, kernel_size = 3, stride = 1, padding = 1,bias = False)
        self.bn2 = nn.BatchNorm2d(nf*2)
        self.mp2 = nn.MaxPool2d(kernel_size = 2, stride = 2)

        self.cn3 = nn.Conv2d(nf*2, nf*4, kernel_size = 3, stride = 1, padding = 1, bias = False)
        self.bn3 = nn.BatchNorm2d(nf*4)
        self.mp3 = nn.MaxPool2d(kernel_size = 2, stride = 2)

        self.act = nn.ReLU()
        self.avg = nn.AdaptiveAvgPool2d(output_size=(1,1))
        self.flt = nn.Flatten()
        self.lin = nn.Linear(nf*4, 10)
        ################################

    def forward(self, x):
        """Feed-forward the data `x` through the network.

        Instructions:
            1. Construct the feed-forward computational graph as illustrated in the link
               using the initialized network components in __init__ method.
        Args:
            1. x (torch.FloatTensor): An image tensor of shape (B, 3, 32, 32).

        Returns:
            1. output (torch.FloatTensor): An output tensor of shape (B, 10).
        """

        #######################################################################
        ## P5.2. Write your code here                                        ##
        ## Hint : use self.block1, self.block2, self.block3 for block layers ##
        output = self.cn1(x)
        output = self.bn1(output)
        output = self.act(output)
        output = self.mp1(output)
        output = self.block1(output)

        output = self.block2(self.mp2(self.act((self.bn2(self.cn2(output))))))
        output = self.block3(self.mp3(self.act((self.bn3(self.cn3(output))))))
        output = self.lin(self.flt(self.avg(output)))
        #######################################################################
        return output

---

# 2.Experiment with Train/Test Pipeline

This section contains the entire train and test loop of the pipeline, specifically the followings:
1. feed inputs into the network, get outputs, and then compute classification loss.
2. backward the computed loss and update network weights (only in the training loop).
3. save tensorboard logs frequently.
4. save checkpoint weights frequently.

**There are no modifications necessary in this section.** Run the code and enjoy!

## Arguments and Environment Settings

This section contains code that
- defines miscellaneous arguments for our pipeline.
- runs Tensorboard to visualize accuracy and loss curves.

Optionally, you may change `args.ckpt_iter` and `args.`log_iter` as you wish to save space in your Google Drive.



In [None]:
# Configurations & Hyper-parameters

from easydict import EasyDict as edict

# set manual seeds
torch.manual_seed(470)
torch.cuda.manual_seed(470)

args = edict()

# basic options
args.name = 'main'                   # experiment name.
args.ckpt_dir = 'ckpts'              # checkpoint directory name.
args.ckpt_iter = 1000                # how frequently checkpoints are saved.
args.ckpt_reload = 'best'            # which checkpoint to re-load.
args.gpu = True                      # whether or not to use gpu.

# network options
args.num_filters = 16                # number of output channels in the first nn.Conv2d module in MyNetwork.
args.block_type = 'mlp'              # type of block. ('mlp' | 'conv' | 'resPlain' | 'resBottleneck' | 'inception').
args.num_blocks = [5, 5, 5]          # number of blocks in each Layer.

# data options
args.dataroot = 'dataset/cifar10'    # where CIFAR10 images exist.
args.batch_size = 128                # number of mini-batch size.

# training options
args.lr = 0.1                        # learning rate.
args.epoch = 100                     # training epoch.

# tensorboard options
args.tensorboard = True              # whether or not to use tensorboard logging.
args.log_dir = 'logs'                # to which tensorboard logs will be saved.
args.log_iter = 100                  # how frequently logs are saved.

In [None]:
# Basic settings
device = 'cuda' if torch.cuda.is_available() and args.gpu else 'cpu'

result_dir = Path(root) / 'results'
result_dir.mkdir(parents=True, exist_ok=True)

global_step = 0
best_accuracy = 0.

In [None]:
# Define train/test data loaders
# Use data augmentation in training set to mitigate overfitting.
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
    ])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
    ])

train_dataset = CIFAR10(args.dataroot, download=True, train=True, transform=train_transform)
test_dataset = CIFAR10(args.dataroot, download=True, train=False, transform=test_transform)

train_dataloader = DataLoader(train_dataset, batch_size=args.batch_size, shuffle=True, drop_last=True)
test_dataloader = DataLoader(test_dataset, batch_size=args.batch_size, shuffle=False, drop_last=False)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to dataset/cifar10/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting dataset/cifar10/cifar-10-python.tar.gz to dataset/cifar10
Files already downloaded and verified


## Tracking the states with Tensorboard

In following training stage, losses and accuracies will be logged on the tensorboard. It provides an useful data for analyzing training process.
Use tensorboard wisely.



In [None]:
# Setup tensorboard.
if args.tensorboard:
    from torch.utils.tensorboard import SummaryWriter
    %load_ext tensorboard
    %tensorboard --logdir "/gdrive/My Drive/{str(result_dir).replace('/gdrive/My Drive/', '')}"
else:
    writer = None

<IPython.core.display.Javascript object>

## The Train-and-Test pipeline

In [None]:
def train_net(net, optimizer, scheduler, block_type, writer):
    global_step = 0
    best_accuracy = 0

    for epoch in range(args.epoch):
        # Here starts the train loop.
        net.train()
        for batch_idx, (x, y) in enumerate(train_dataloader):

            global_step += 1

            #  Send `x` and `y` to either cpu or gpu using `device` variable.
            x = x.to(device=device)
            y = y.to(device=device)

            # Feed `x` into the network, get an output, and keep it in a variable called `logit`.
            logit = net(x)

            # Compute accuracy of this batch using `logit`, and keep it in a variable called 'accuracy'.
            accuracy = (logit.argmax(1) == y).float().mean()

            # Compute loss using `logit` and `y`, and keep it in a variable called `loss`.
            loss = nn.CrossEntropyLoss()(logit, y)

            # flush out the previously computed gradient.
            optimizer.zero_grad()

            # backward the computed loss.
            loss.backward()

            # update the network weights.
            optimizer.step()

            if global_step % args.log_iter == 0 and writer is not None:
                # Log loss and accuracy values using `writer`. Use `global_step` as a timestamp for the log.
                writer.add_scalar('train_loss', loss, global_step)
                writer.add_scalar('train_accuracy', accuracy, global_step)

            if global_step % args.ckpt_iter == 0:
                # Save network weights in the directory specified by `ckpt_dir` directory.
                torch.save(net.state_dict(), f'{ckpt_dir}/{global_step}.pt')

        # Here starts the test loop.
        net.eval()
        with torch.no_grad():
            test_loss = 0.
            test_accuracy = 0.
            test_num_data = 0.
            for batch_idx, (x, y) in enumerate(test_dataloader):
                # Send `x` and `y` to either cpu or gpu using `device` variable..
                x = x.to(device=device)
                y = y.to(device=device)

                # Feed `x` into the network, get an output, and keep it in a variable called `logit`.
                logit = net(x)

                # Compute loss using `logit` and `y`, and keep it in a variable called `loss`.
                loss = nn.CrossEntropyLoss()(logit, y)

                # Compute accuracy of this batch using `logit`, and keep it in a variable called 'accuracy'.
                accuracy = (logit.argmax(dim=1) == y).float().mean()

                test_loss += loss.item()*x.shape[0]
                test_accuracy += accuracy.item()*x.shape[0]
                test_num_data += x.shape[0]

            test_loss /= test_num_data
            test_accuracy /= test_num_data

            if writer is not None:
                # Log loss and accuracy values using `writer`. Use `global_step` as a timestamp for the log.
                writer.add_scalar('test_loss', test_loss, global_step)
                writer.add_scalar('test_accuracy', test_accuracy, global_step)

                # Just for checking progress
                print(f'Test result of epoch {epoch}/{args.epoch} || loss : {test_loss:.3f} acc : {test_accuracy:.3f} ')

                writer.flush()

            # Whenever `test_accuracy` is greater than `best_accuracy`, save network weights with the filename 'best.pt' in the directory specified by `ckpt_dir`.
            if test_accuracy > best_accuracy:
                best_accuracy = test_accuracy
                torch.save(net.state_dict(), f'{ckpt_dir}/{block_type}_best.pt')

        scheduler.step()
    return best_accuracy


## Train Models Through the Pipeline

Training a single model for 100 epochs will take around 40~50 minutes. Use this information as an indicator for your experiments.



In [None]:
# Function for weight initialization.
def weight_init(m):
    if isinstance(m, nn.Linear) or isinstance(m, nn.Conv2d):
        torch.nn.init.kaiming_normal_(m.weight)
        if m.bias is not None:
            torch.nn.init.constant_(m.bias, 0)
    elif isinstance(m, nn.BatchNorm2d):
        torch.nn.init.constant_(m.weight, 1)
        torch.nn.init.constant_(m.bias, 0)

In [None]:
# List of all block types we will use.
block_types = ['conv','resPlain','resBottleneck','inception']

# Create directory name.
num_trial=0
parent_dir = result_dir / f'trial_{num_trial}'
while parent_dir.is_dir():
    num_trial = int(parent_dir.name.replace('trial_',''))
    parent_dir = result_dir / f'trial_{num_trial+1}'
print(f'Logs and ckpts will be saved in : {parent_dir}')

# Define networks
networks = []
for block_type in block_types:
    if block_type == 'conv':
        args.num_blocks = [10, 10, 10]
    else:
        args.num_blocks = [5, 5, 5]

    if block_type == 'mlp':
        network = MyNetworkExample(64, block_type).to(device)
    else:
        network = MyNetwork(args.num_filters, block_type, args.num_blocks).to(device)

    network.apply(weight_init)
    networks.append(network)

# Count the number of parameters of the models.
# You can use it as an indicator of whether you correctly implemented the model.

correct_params = {'mlp' : 1649354, 'conv' : 510426, 'resPlain' : 510426, 'resBottleneck' : 113946, 'inception' : 124026}
for block_type, net  in zip(block_types, networks):
    # Print the number of parameters in each model.
    num_parameters = sum(p.numel() for p in net.parameters() if p.requires_grad)
    print(f'# of parameters in {block_type} net : {num_parameters}')
    print(f'Correct # of parameters in {block_type} net : {correct_params[block_type]}')

Logs and ckpts will be saved in : /gdrive/My Drive/CS492I/assignment1/results/trial_0
# of parameters in conv net : 510426
Correct # of parameters in conv net : 510426
# of parameters in resPlain net : 510426
Correct # of parameters in resPlain net : 510426
# of parameters in resBottleneck net : 113946
Correct # of parameters in resBottleneck net : 113946
# of parameters in inception net : 124026
Correct # of parameters in inception net : 124026


In [None]:
final_accs = {}

# Start training
for block_type, net in zip(block_types, networks):
    try:
        args.name = block_type

        # Define optimizer
        optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0.9, weight_decay=0.0001)
        scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[50,80], gamma=0.5)

        # Create directories for logs and ckechpoints.
        ckpt_dir = parent_dir / args.name / args.ckpt_dir
        ckpt_dir.mkdir(parents=True, exist_ok=True)
        log_dir = parent_dir / args.name / args.log_dir
        log_dir.mkdir(parents=True, exist_ok=True)

        # Create tensorboard writer,
        if args.tensorboard:
            writer = SummaryWriter(log_dir)

        # Call the train & test function.
        t1 = time.time()
        accuracy = train_net(net, optimizer, scheduler, block_type, writer)
        t = time.time()-t1
        print(f'Best test accuracy of {block_type} network : {accuracy:.3f} took {t:.3f} secs')
        final_accs[f'{block_type}'] = accuracy*100
    except Exception as e:
        print(e)

# Print final best accuracies of the models.
for key in final_accs.keys():
    print(f'Best accuracy of {key} = {final_accs[key]:.2f}%')


Test result of epoch 0/100 || loss : 1.868 acc : 0.307 
Test result of epoch 1/100 || loss : 1.813 acc : 0.320 
Test result of epoch 2/100 || loss : 1.819 acc : 0.352 
Test result of epoch 3/100 || loss : 1.670 acc : 0.400 
Test result of epoch 4/100 || loss : 1.499 acc : 0.447 
Test result of epoch 5/100 || loss : 1.533 acc : 0.440 
Test result of epoch 6/100 || loss : 1.478 acc : 0.480 
Test result of epoch 7/100 || loss : 1.432 acc : 0.472 
Test result of epoch 8/100 || loss : 1.232 acc : 0.556 
Test result of epoch 9/100 || loss : 1.641 acc : 0.457 
Test result of epoch 10/100 || loss : 1.524 acc : 0.510 
Test result of epoch 11/100 || loss : 1.293 acc : 0.543 
Test result of epoch 12/100 || loss : 1.442 acc : 0.502 
Test result of epoch 13/100 || loss : 1.209 acc : 0.582 
Test result of epoch 14/100 || loss : 1.219 acc : 0.565 
Test result of epoch 15/100 || loss : 1.250 acc : 0.588 
Test result of epoch 16/100 || loss : 1.057 acc : 0.630 
Test result of epoch 17/100 || loss : 1.0

---
# 3.Aggregating experimental results and number of model parameters. (10pt)

In this section, we automatically collect the classification performance of trained model. Also, we will count the number of parameters in the models.
You should match your own results with the values we provided. While the number of the parameters should be exactly same, classification accuarcy should be in the range of $\pm$1.5%

In [None]:
block_types = ['conv','resPlain','resBottleneck','inception']
test_accs = {}
test_params= {}

for block_type, net in zip(block_types, networks):
        ckpt_dir = parent_dir / block_type / args.ckpt_dir

        # load weights from best checkpoints.
        ckpt_path = f'{ckpt_dir}/{block_type}_best.pt'
        try:
            net.load_state_dict(torch.load(ckpt_path))
        except Exception as e:
            print(e)

        # Measure test performance.
        net.eval()
        with torch.no_grad():
            test_accuracy = 0.
            test_num_data = 0.
            for batch_idx, (x, y) in enumerate(test_dataloader):
                # Send `x` and `y` to either cpu or gpu using `device` variable..
                x = x.to(device=device)
                y = y.to(device=device)

                # Feed `x` into the network, get an output, and keep it in a variable called `logit`.
                logit = net(x)

                # Compute loss using `logit` and `y`, and keep it in a variable called `loss`.
                loss = nn.CrossEntropyLoss()(logit, y)

                # Compute accuracy of this batch using `logit`, and keep it in a variable called 'accuracy'.
                accuracy = (logit.argmax(dim=1) == y).float().mean()

                test_accuracy += accuracy.item()*x.shape[0]
                test_num_data += x.shape[0]

            # Average classification accuracy.
            test_accuracy /= test_num_data

            # Count the number of implemented models.
            num_parameters = sum(p.numel() for p in net.parameters() if p.requires_grad)

            test_accs[f'{block_type}'] = test_accuracy*100
            test_params[f'{block_type}'] = num_parameters


In [None]:
# Printing final results.
correct_accs = {'mlp' : 62.6,'conv' : 81.9,'resPlain' : 88.6, 'resBottleneck' : 86.5, 'inception' : 83.7}
correct_params = {'mlp' : 1649354, 'conv' : 510426, 'resPlain' : 510426, 'resBottleneck' : 113946, 'inception' : 124026}

print(' Method        | Accuracy   | # Params    | Expected Acc | Expected # Params  ')
print('------------------------------------------------------------------------------')
for block in block_types:
        print(f' {block:14}| {str(test_accs[block])[:5]:11}| {str(test_params[block]):11} | {str(correct_accs[block])[:5]:13}| {str(correct_params[block]):12}')


 Method        | Accuracy   | # Params    | Expected Acc | Expected # Params  
------------------------------------------------------------------------------
 conv          | 81.47      | 510426      | 81.9         | 510426      
 resPlain      | 88.48      | 510426      | 88.6         | 510426      
 resBottleneck | 86.33      | 113946      | 86.5         | 113946      
 inception     | 84.50      | 124026      | 83.7         | 124026      
