**Assignment 3: Image Classification with Neural Networks**

*CPSC 480/580: Computer Vision*

*Yale University*

*Instructor: Alex Wong*

In this assignment, we will create a simple neural network for classifying images. We will experiment with learning rate, batch size, and different configurations of layers within the network. We will demonstrate this on the CIFAR-10 dataset.


**Prerequisites**:

1. Enable Google Colaboratory as an app on your Google Drive account

2. Create a new Google Colab notebook, this will also create a "Colab Notebooks" directory under "MyDrive" i.e.
```
/content/drive/MyDrive/Colab Notebooks
```

3. Create the following directory structure in your Google Drive
```
/content/drive/MyDrive/Colab Notebooks/CPSC 480-580: Computer Vision/Assignments
```

4. Move the 03_assignment.ipynb into
```
/content/drive/MyDrive/Colab Notebooks/CPSC 480-580: Computer Vision/Assignments
```
so that its absolute path is
```
/content/drive/MyDrive/Colab Notebooks/CPSC 480-580: Computer Vision/Assignments/03_assignment.ipynb
```

5. Prior to starting this assignment, please create a directory called 'data' within your 'Assignments' directory and within 'data' create a directory called 'assignment_03', i.e.
```
/content/drive/MyDrive/Colab Notebooks/CPSC 480-580: Computer Vision/Assignments/data/assignment_03
```

6. Set up GPU runtime by selecting `Runtime` on the top tool bar, then selecting `Change runtime type` in the drop-down menu, selecting `GPU` under Hardware accelerator and clicking `Save`.


**Submission**:

1. Implement all TODOs in the code blocks below.

2. Run the Colab Notebook to produce results for each code block.

3. Report accuracy of neural network and ResNet18. Your accuracy should exceed 50% for neural network and 70% for ResNet18.

```
Neural network:
Mean accuracy over 10000 images:

ResNet18:
Mean accuracy over 10000 images:
```

4. Answer the following questions:

```
4a. We have seen how performance of deep neural networks correlate well with their size, e.g., from AlexNet to VGGNet. Suppose that we increased the number of layers in VGGNet by 100x, with sufficient compute resources, will we have observe performance to continue to increase? Explain why or why not?

Answer:
```

```
4b. We have seen figures of convolutional neural networks (CNNs) to resemble a Gaussian Pyramid . Explain each component (convolutional layer, pooling, etc.) of CNNs and how it correspond to Gaussian Pyramid, and how CNNs differ from them.

Answer:
```

```
4c. Suppose that we have a Bag of (Visual) Words classifier with a perceptron and a CNN classifier. Explain how each component in the Bag of Words classifier relate to the CNN classifier in inference

Answer:
```

```
4d. List the different types of regularizations one can impose on CNNs, and provide an example of each.

Answer:
```

5. List any collaborators.

```
Collaborators: Doe, Jane (Please write names in <Last Name, First Name> format)

Collaboration details: Discussed ... implementation details with Jane Doe.
```



Import packages

In [None]:
from google.colab import drive
from google.colab import auth
from google.auth import default
import os

drive.mount('/content/drive/', force_remount=True)
os.chdir('/content/drive/MyDrive/Colab Notebooks/CPSC 480-580: Computer Vision/Assignments')

Mounted at /content/drive/


In [None]:
import numpy as np
import matplotlib.pyplot as plt

import torch, torchvision

Utility functions for plotting

In [None]:
def config_plot():
    '''
    Function to remove axis tickers and box around figure
    '''

    plt.box(False)
    plt.axis('off')

In [None]:
def plot_images(images, n_row, n_col, subplot_titles, dpi=200, cmap=None):
    '''
    Plot images in a grid

    Arg(s):
        images : list[list[numpy]]
            lists of lists of images
        n_row : int
            number of rows in plot
        n_col : int
            number of columns in plot
        subplot_titles : list[list[str]]
            lists of lists of titles corresponding to each subplot
        dpi : int
            dots per inch for figure
        cmap : matplotlib.Colormap
            dots per inch for figure
    '''

    # Instantiate a figure
    fig = plt.figure(dpi=dpi)

    # Iterate through each row of images
    for row_idx in range(n_row):

        # Iterate through each column of row
        for col_idx in range(n_col):

            # Compute subplot index based on row and column indices
            subplot_idx = row_idx * n_col + col_idx + 1

            # Create axis object for current subplot
            ax = fig.add_subplot(n_row, n_col, subplot_idx)

            # Plot the image with provided color
            ax.set_title(subplot_titles[row_idx][col_idx], fontsize=5)
            ax.imshow(images[row_idx][col_idx], cmap=cmap)

            config_plot()

    fig.subplots_adjust(wspace=0, hspace=0.5)
    plt.show()

Hyper-parameters for training neural network

In [None]:
# TODO: Choose hyper-parameters for neural network or ResNet18
# Note: Accuracy of Neural Network should exceed 52%, ResNet18 should exceed 70%

# Architecture - neural_network or resnet18
ARCHITECTURE = 'neural_network'

# Batch size - number of images within a training batch of one training iteration i.e. 64
N_BATCH = None

# Training epoch - number of passes through the full training dataset i.e. 20
N_EPOCH = None

# Learning rate - step size to update parameters i.e. 1e-1
LEARNING_RATE = None

# Learning rate decay - scaling factor to decrease learning rate at the end of each decay period i.e. 0.10
LEARNING_RATE_DECAY = None

# Learning rate decay period - number of epochs before reducing/decaying learning rate i.e. 5
LEARNING_RATE_DECAY_PERIOD = None

Define Neural Network

In [None]:
class NeuralNetwork(torch.nn.Module):
    '''
    Neural network class of fully connected layers

    Arg(s):
        n_input_feature : int
            number of input features
        n_output : int
            number of output classes
    '''

    def __init__(self, n_input_feature, n_output):
        super(NeuralNetwork, self).__init__()

        # Create your 6-layer neural network using fully connected layers with ReLU activations
        # https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
        # https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html
        # https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html

        # TODO: Instantiate 5 fully connected layers
        self.fully_connected_layer_1 = None
        self.fully_connected_layer_2 = None
        self.fully_connected_layer_3 = None
        self.fully_connected_layer_4 = None
        self.fully_connected_layer_5 = None

        # TODO: Define output layer
        self.output = None

    def forward(self, x):
        '''
        Forward pass through the neural network

        Arg(s):
            x : torch.Tensor[float32]
                tensor of N x d
        Returns:
            torch.Tensor[float32]
                tensor of n_output predicted class
        '''

        # TODO: Implement forward function


        output_logits = None

        return output_logits


In [None]:
class ResNetBlock(torch.nn.Module):
    '''
    Basic ResNet block class

    Arg(s):
        in_channels : int
            number of input channels
        out_channels : int
            number of output channels
        stride : int
            stride of convolution
    '''

    def __init__(self,
                 in_feature,
                 out_channels,
                 stride=1):
        super(ResNetBlock, self).__init__()

        # TODO: Implement ResNet block based on
        # Deep Residual Learning for Image Recognition: https://arxiv.org/pdf/1512.03385.pdf

        self.conv1 = None

        self.conv2 = None

        self.conv3 = None

        self.projection = None

    def forward(self, x):
        '''
        Forward input x through a basic ResNet block

        Arg(s):
            x : torch.Tensor[float32]
                N x C x H x W input tensor
        Returns:
            torch.Tensor[float32] : N x K x h x w output tensor
        '''

        # TODO: Implement forward function

        return None


class ResNet18(torch.nn.Module):
    '''
    ResNet18 convolutional neural network

    Arg(s):
        n_input_channel : int
            number of channels in input data
        n_output : int
            number of output classes
    '''

    def __init__(self, n_input_feature, n_output):
        super(ResNet18, self).__init__()

        # TODO: Implement ResNet
        # Based on https://arxiv.org/pdf/1512.03385.pdf

    def forward(self, x):
        '''
        Forward input x through a ResNet encoder

        Arg(s):
            x : torch.Tensor[float32]
                N x C x H x W input tensor
        Returns:
            torch.Tensor[float32] : N x K x h x w output tensor
        '''

        # TODO: Implement forward function

        return None

Define training loop

In [None]:
def train(net,
          dataloader,
          n_epoch,
          optimizer,
          learning_rate_decay,
          learning_rate_decay_period,
          device):
    '''
    Trains the network using a learning rate scheduler

    Arg(s):
        net : torch.nn.Module
            neural network or ResNet
        dataloader : torch.utils.data.DataLoader
            # https://pytorch.org/docs/stable/data.html
            dataloader for training data
        n_epoch : int
            number of epochs to train
        optimizer : torch.optim
            https://pytorch.org/docs/stable/optim.html
            optimizer to use for updating weights
        learning_rate_decay : float
            rate of learning rate decay
        learning_rate_decay_period : int
            period to reduce learning rate based on decay e.g. every 2 epoch
        device : str
            device to run on
    Returns:
        torch.nn.Module : trained network
    '''

    device = 'cuda' if device == 'gpu' or device == 'cuda' else 'cpu'
    device = torch.device(device)

    # TODO: Move model to device
    net = None

    # TODO: Define cross entropy loss
    # https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
    loss_func = None

    for epoch in range(n_epoch):

        # Accumulate total loss for each epoch
        total_loss = 0.0

        # TODO: Decrease learning rate when learning rate decay period is met
        # e.g. decrease learning rate by a factor of decay rate every 2 epoch
        # by modifying optimizer.param_groups
        if epoch and epoch % learning_rate_decay_period == 0:
            pass

        for batch, (images, labels) in enumerate(dataloader):

            # TODO: Move images and labels to device
            images = None
            labels = None

            # TODO: Vectorize images
            images = None

            # TODO: Forward through the network
            outputs = None

            # TODO: Clear gradients so we don't accumlate them from previous batches


            # TODO: Compute loss function and parameters by backpropagation
            loss = None

            # TODO: Accumulate total loss for the epoch
            total_loss = None

        mean_loss = None

        # Log average loss over the epoch
        print('Epoch={}/{}  Loss: {:.3f}'.format(epoch + 1, n_epoch, mean_loss))

    return net


Define evaluation loop

In [None]:
def evaluate(net, dataloader, class_names, device):
    '''
    Evaluates the network on a dataset

    Arg(s):
        net : torch.nn.Module
            neural network
        dataloader : torch.utils.data.DataLoader
            # https://pytorch.org/docs/stable/data.html
            dataloader for training data
        class_names : list[str]
            list of class names to be used in plot
        device : str
            device to run on
    '''

    device = 'cuda' if device == 'gpu' or device == 'cuda' else 'cpu'
    device = torch.device(device)

    # TODO: Move model to device
    net = None

    n_correct = 0
    n_sample = 0

    # Make sure we do not backpropagate
    with torch.no_grad():

        for (images, labels) in dataloader:

            # TODO: Move images and labels to device
            images = None
            labels = None

            # TODO: Vectorize images
            images = None

            # TODO: Forward through the network and take the class with max response
            outputs = None

            # Accumulate number of samples
            n_sample = n_sample + labels.shape[0]

            # TODO: Check if our prediction is correct
            n_correct = None

    # TODO: Compute mean accuracy
    mean_accuracy = None

    print('Mean accuracy over {} images: {:.3f}%'.format(n_sample, mean_accuracy))

    # TODO: Convert the last batch of images back to original shape
    images = None

    # TODO: Move images back to cpu and to numpy array
    images = None

    # TODO: torch.Tensor operate in (N x C x H x W), convert it to (N x H x W x C)
    images = None

    # TODO: Move the last batch of labels to cpu and convert them to numpy and
    # map them to their corresponding class labels
    labels = None

    # TODO: Move the last batch of outputs to cpu, convert them to numpy and
    # map them to their corresponding class labels
    outputs = None


    # Convert images, outputs and labels to a lists of lists
    grid_size = 5

    images_display = []
    subplot_titles = []

    for row_idx in range(grid_size):
        # TODO: Get start and end indices of a row
        idx_start = None
        idx_end = None

        # TODO: Append images from start to end to image display array


        # TODO: Append text of 'output={}\nlabel={}' substituted with output and label to subplot titles
        titles = None

        subplot_titles.append(titles)

    # TODO: Plot images with class names and corresponding groundtruth label in a 5 by 5 grid


Training a neural network for image classification

In [None]:
'''
Set up dataloading
'''
# Create transformations to apply to data during training
# https://pytorch.org/docs/stable/torchvision/transforms.html
transforms_train = torchvision.transforms.Compose([
    # TODO: Include random brightness, contrast, saturation between [0.8, 1.2] and
    # horizontal flip augmentations
    torchvision.transforms.ToTensor(),
])

# Download and setup CIFAR10 training set using preconfigured torchvision.datasets.CIFAR10
cifar10_train = torchvision.datasets.CIFAR10(
    root=os.path.join('data', 'assignment_03'),
    train=True,
    download=True,
    transform=transforms_train)

# TODO: Setup a dataloader (iterator) to fetch from the training set using
# torch.utils.data.DataLoader and set shuffle=True, drop_last=True, num_workers=2
dataloader_train = None

# Define the possible classes in CIFAR10
class_names = [
    'plane',
    'car',
    'bird',
    'cat',
    'deer',
    'dog',
    'frog',
    'horse',
    'ship',
    'truck'
]

# CIFAR10 has 10 classes
n_class = len(class_names)

'''
Set up model and optimizer
'''
# TODO: Compute number of input features depending on ARCHITECTURE
if ARCHITECTURE == 'neural_network':
    n_input_feature = None
elif ARCHITECTURE == 'resnet18':
    n_input_feature = None

# TODO: Instantiate neural network or ResNet18 depending on ARCHITECTURE
if ARCHITECTURE == 'neural_network':
    net = None
elif ARCHITECTURE == 'resnet18':
    net = None

# TODO: Setup learning rate SGD optimizer
# https://pytorch.org/docs/stable/optim.html?#torch.optim.SGD
optimizer = None

'''
Train network and store weights
'''
# TODO: Set network to training mode

# TODO: Train network with device='cuda'
net = None

# TODO: Save weights into checkpoint
None

In [None]:
'''
Set up dataloading
'''
# TODO: Create transformations to apply to data during testing
# https://pytorch.org/docs/stable/torchvision/transforms.html
transforms_test = None

# TODO: Download and setup CIFAR10 testing set using
# preconfigured torchvision.datasets.CIFAR10
cifar10_test = None

# TODO: Setup a dataloader (iterator) to fetch from the testing set using
# torch.utils.data.DataLoader and set shuffle=False, drop_last=False, num_workers=2
# Set batch_size to 25
dataloader_test = None

'''
Set up model
'''
# TODO: Compute number of input features depending on ARCHITECTURE


# TODO: Instantiate neural network or ResNet18 depending on ARCHITECTURE

'''
Restore weights and evaluate network
'''
# TODO: Load network from checkpoint
checkpoint = None

# TODO: Set network to evaluation mode


# TODO: Evaluate network on testing set with device='cuda'

