# Homework 2: Convolutional Neural Networks (100 points)

### Overview

With new knowledge of convolutional neural networks, we can accomplish a more difficult image recognition task. The CIFAR-10 classification dataset consists of 60,000 labelled images split between 10 classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and trucks.

For the purposes of this assignment, we will compare two models on the same dataset: a fully connected neural network (as in Homework 1) called ANN and a new convolutional architecture called CNN, as outlined in the next section. To be fair, we attempt to allow the same number of trainable parameters in the ANN as the CNN, which means we need to use the same input transformation to flatten grayscale used in Homework 1 for the ANN. The CNN reaps the full benefit of the original 2D image in RGB.

### CNN Architecture

Each image consists of 32x32 RGB pixel values between 0 and 255. We do not need to perform any preprocessing as the convolutional model will use all three channels concurrently as input.

The architecture in use has 5 layers: a convolution layer followed by a pooling layer, then another convolutional layer, then two fully connected dense layers. The latter of these has 10 neurons to provide classification output.

### Your Task

At the bottom of this notebook file, there are four short answer questions testing your understanding of this neural network architecture. As before, some questions will require you to experiment with model hyperparameters.

Below each question is a cell with the text “Type Markdown and LaTex.” Double-click the cell and type your response to the question. Save your responses by clicking on the floppy disk icon or choosing File - Save and Checkpoint.

After responding to the questions, download your notebook as a `.html` file by choosing File - Download as - html (.html). You will be submitting this `.html` file to your instructor for grading.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

In [2]:
torch.manual_seed(0)
torch.set_num_threads(4)
torch.set_num_interop_threads(4)

In [13]:
trainTransform = transforms.Compose([#add yours here!
                                     transforms.RandomRotation(5),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                                    ])
testTransform = transforms.Compose([transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                                    ])

In [14]:
root_dir = 'assets_week2'
trainDataset = torchvision.datasets.CIFAR10(root=root_dir, train=True, download=True, transform=trainTransform)
trainLoader = torch.utils.data.DataLoader(trainDataset, batch_size=4, shuffle=True, num_workers=2)
testDataset = torchvision.datasets.CIFAR10(root=root_dir, train=False, download=True, transform=testTransform)
testLoader = torch.utils.data.DataLoader(testDataset, batch_size=4, shuffle=False, num_workers=2)

Files already downloaded and verified
Files already downloaded and verified


In [15]:
class ANNModel(nn.Module):
    def __init__(self, hiddenSize, dropoutRate, activate):
        super().__init__()
        # Note that 'layer' and 'dense' differ only in name (to show similarity to CNN)
        self.activate = nn.Sigmoid() if activate == "Sigmoid" else nn.ReLU()
        self.layer1 = nn.Linear(1024, 100)
        self.layer2 = nn.Linear(100, 15 * 5 * 5)
        self.dense1 = nn.Linear(15 * 5 * 5, hiddenSize)
        self.dropout = nn.Dropout(dropoutRate)
        self.dense2 = nn.Linear(hiddenSize, 10)
        
    def forward(self, x):
        x = self.activate(self.layer1(x))
        x = self.activate(self.layer2(x))
        x = self.dropout(self.activate(self.dense1(x)))
        return self.dense2(x)

class CNNModel(nn.Module):
    def __init__(self, hiddenSize, outChannels, dropoutRate, activate):
        super().__init__()
        self.outChannels = outChannels
        self.activate = nn.Sigmoid() if activate == "Sigmoid" else nn.ReLU()
        self.conv1 = nn.Conv2d(3, 24, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(24, outChannels, 5)
        self.dense1 = nn.Linear(outChannels * 5 * 5, hiddenSize)
        self.dropout = nn.Dropout(dropoutRate)
        self.dense2 = nn.Linear(hiddenSize, 10)

    def forward(self, x):
        x = self.pool(self.activate(self.conv1(x)))
        x = self.pool(self.activate(self.conv2(x)))
        x = x.view(-1, self.outChannels * 5 * 5)
        x = self.dropout(self.activate(self.dense1(x)))
        return self.dense2(x)

In [16]:
# Number of neurons in the first fully-connected layer
hiddenSize = 100
# Number of feature filters in second convolutional layer
numFilters = 25
# Dropout rate
dropoutRate = 0
# Activation function
activation = "ReLU"
# Learning rate
learningRate = 0.001
# Momentum for SGD optimizer
momentum = 0.9
# Number of training epochs
numEpochs = 10

In [17]:
ann = ANNModel(hiddenSize, dropoutRate, activation)
cnn = CNNModel(hiddenSize, numFilters, dropoutRate, activation)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(list(ann.parameters()) + list(cnn.parameters()), lr=learningRate, momentum=momentum)

print('>>> Beginning training!') 
ann.train()
cnn.train()
for epoch in range(numEpochs):  # loop over the dataset multiple times
    annRunningLoss, cnnRunningLoss = 0, 0
    for i, (inputs, labels) in enumerate(trainLoader, 0):
        annInputs = torch.sum(inputs, axis=1).view(-1, 32*32)
        
        optimizer.zero_grad()

        # Forward propagation
        annOutputs = ann(annInputs)
        cnnOutputs = cnn(inputs)
        
        # Backpropagation
        annLoss = criterion(annOutputs, labels)
        cnnLoss = criterion(cnnOutputs, labels)
        annLoss.backward()
        cnnLoss.backward()
        
        # Gradient update
        optimizer.step()

        annRunningLoss += annLoss.item()
        cnnRunningLoss += cnnLoss.item()
        if (i+1) % 2000 == 0:    # print every 2000 mini-batches
            print('Epoch [{}/{}], Step [{}/{}], ANN Loss: {}, CNN Loss: {}'.format(epoch + 1, numEpochs, i + 1, len(trainDataset)//4, annRunningLoss/2000, cnnRunningLoss/2000))
            annRunningLoss, cnnRunningLoss = 0, 0

print()
print('>>> Beginning validation!')
ann.eval()
cnn.eval()
annCorrect, cnnCorrect = 0, 0
total = 0
for inputs, labels in testLoader:
    annInputs = torch.sum(inputs, axis=1).view(-1, 32*32)
    annOutputs = ann(annInputs)
    cnnOutputs = cnn(inputs)
    _, annPredicted = torch.max(annOutputs.data, 1)
    _, cnnPredicted = torch.max(cnnOutputs.data, 1)
    total += labels.size(0)
    annCorrect += (annPredicted == labels).sum().item()
    cnnCorrect += (cnnPredicted == labels).sum().item()
print('ANN validation accuracy: {}%, CNN validation accuracy: {}%'.format(annCorrect / total * 100, cnnCorrect / total * 100))

>>> Beginning training!
Epoch [1/10], Step [2000/12500], ANN Loss: 2.0863615140616893, CNN Loss: 1.9889075910151004
Epoch [1/10], Step [4000/12500], ANN Loss: 1.9419296706914901, CNN Loss: 1.6322043964415789
Epoch [1/10], Step [6000/12500], ANN Loss: 1.861800346046686, CNN Loss: 1.4897670658156277
Epoch [1/10], Step [8000/12500], ANN Loss: 1.8459010553061963, CNN Loss: 1.454917872980237
Epoch [1/10], Step [10000/12500], ANN Loss: 1.813395178437233, CNN Loss: 1.3596445694752037
Epoch [1/10], Step [12000/12500], ANN Loss: 1.795057803362608, CNN Loss: 1.33835771510005
Epoch [2/10], Step [2000/12500], ANN Loss: 1.7331264152526855, CNN Loss: 1.2524616960138082
Epoch [2/10], Step [4000/12500], ANN Loss: 1.728240273565054, CNN Loss: 1.239610386964865
Epoch [2/10], Step [6000/12500], ANN Loss: 1.7069842965602875, CNN Loss: 1.2036444773636759
Epoch [2/10], Step [8000/12500], ANN Loss: 1.7102003564983606, CNN Loss: 1.1794171325024216
Epoch [2/10], Step [10000/12500], ANN Loss: 1.6928471272289753

## Homework Questions

**To make sure your code produces consistent results, it is advisable to click "Kernel -> Restart & Run All" every time you want to run your code.**

### Question 1: CNN Advantage (10 points)

Compute the accuracy of a simple dense neural network and a simple CNN on the dataset. Explain the results and briefly overview the advantages of a CNN over a standard neural network for image-related tasks.

The CNN performed better than the dense neural network (~69% compared to 42%).  CNNs perform better than standard neural networks for image-related tasks because of the sheer number of parameters that must be learned in standard neural networks and the importance of spatial structure in images. Spatial proximity is important when understanding images and CNNs maintain the important spatial structure by use of spatial filters. CNNs also benefit from reducing the dimensionality of input image data (through convolution operations) prior to feeding it to a fully connected layer(s). 

### Question 2: Dropout Rate (25 points)

Explain the purpose of dropout in any neural network model. In doing so, note what can happen if the dropout rate is too high and what can happen if the dropout rate is too low.

Dropout modifies the model by randomly selecting activations and setting them to zero in the hidden layer (creating a simpler model) during each iteration of training.  Its purpose is to force the model not to rely on any one node while making predictions and reduce model complexity.  If the dropout rate is too high, our model's convergence rate will be too slow and its performance will suffer.  Too low of a dropout rate does not produce generalization improvements for our model and we run the risk of overfitting.    

### Question 3: Kernel Size (25 points)

Explain the purpose of spatial filters (kernels) in a CNN. Additionally, explain where they fit into the overall architecture of the CNN in this coding example. Finally, explain what can happen if the kernel size is too large and what can happen if the kernel size is too small.

The kernel in a CNN is a small matrix that moves over the input data, performs the dot product with a region of the input volume and gets the output as a matrix of dot products. The kernel size fits into the convolution layer; namely, above the kernel size is being set to 5 in both the first and second convolution layers (nn.Conv2d(3, 24, 5) & nn.Conv2d(24, outChannels, 5)). If the kernel size is too large, you miss out on extracting more low-level details from the image and lead to underfitting.  Conversely, if the kernel size is too small, you may miss out on more global features of the input data and run the risk of overfitting.

### Question 4: Data Augmentation (40 points)

Use the code snippet provided in the next box to implement data augmentation by updating the contents of box 3 and re-running the model. Compare your accuracy without and with data augmentation and explain the results. In doing so, explain the purpose of data augmentation.

In [None]:
transforms.RandomRotation(5),

In Q4 we implemented data augmentation in our model.  Data augmentation is a way to generate more training data from our current set by transforming the input data (via rotaion, scaling, brightness) and adding transformed data to the training data. This method forces the model to generalize as it is unable to overfit to the diverse training data. In this case after adding data augmentation the ANN validation accuracy increased very little (42.12% to 43.28%) and the CNN validation accuracy decreased slightly (68.66% to 67.40%). This is a good reminder that just adding any amount of regularization techniques does not always lead to increased accuracy scores.