<a href="https://colab.research.google.com/github/aspenjkmorgan/CNN_in_pytorch/blob/main/CSCI491_DL_Fall2024_Assignment4_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 4 - Convolutional Neural Networks for Computer Vision (v2)

## *Aspen Morgan*

Note: this assignment falls under collaboration Mode 2: Individual Assignment – Collaboration Permitted. Please refer to the syllabus for additional information.

General instructions for software assignments can be found on Moodle.

**Introduction.**  In this exercise you will utilize convolutional neural networks to solve problems in computer vision, such as image classification, object detection, and segmentation.  

**Instructions.** As usual, please submit your code and its output as a pdf file, generated from a Jupyter notebook.  I recommend you complete this assignment in Google CoLab [(link)](https://colab.research.google.com/), but it is also certainly possible to complete it in a local IDE if you first install pytorch (instructions not included here).  The assignment will be divided into "Problems", which will be indicated below along with the number of points awarded for completion.  We will begin the assignment by importing important libraries.

## **PROBLEM 1 (40 Total Points)**

**Part (a) (10 points)**
You will begin this problem by setting up a baseline neural network model, and helper functions, that you developed in the last assignment.  Therefore this first part will mostly involve running code that I provide to you here, or utilizing code that you developed in the previous assignment.  Start by importing the software libraries below.

In [None]:
# You will need the following libraries to complete the assignment
import torch
from torch import nn
from torch.utils.data import DataLoader
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms
from torchvision import datasets
from torchvision.transforms import ToTensor

import matplotlib.pyplot as plt
import numpy as np

import torch.optim as optim

**Now load the MNIST Data**, along with built-in PyTorch data loaders. Run the following code to load the MNIST data.

In [None]:
# Fill in the details for the "transform" variable
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.05), (0.05))])

# We will use a relatively large batch size of 128 here to
#  accelerate the training process
batch_size = 128

# Download the MNIST dataset and data loaders
trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

# Label the classes
classes = ('zero', 'one', 'two', 'three',
           'four', 'five', 'six', 'seven', 'eight', 'nine')

**Construct a Baseline Model**.  Our baseline model will be a fully-connected neural network with 8 total layers of parameters.  Aside from the output layer, each layer has 50 hidden units, with ReLU activations.  We will call this network *NetFC*.  Note that this is simply the first model that you were asked to create in the previous assignment, however I have provided the code for you below.

In [3]:
class NetFc(nn.Module):
    def __init__(self):
      super().__init__()
      self.fc1 = nn.Linear(28*28, 50)
      self.fc2 = nn.Linear(50, 50)
      self.fc3 = nn.Linear(50, 50)
      self.fc4 = nn.Linear(50, 50)
      self.fc5 = nn.Linear(50, 50)
      self.fc6 = nn.Linear(50, 50)
      self.fc7 = nn.Linear(50, 50)
      self.fc8 = nn.Linear(50, 10)


    def forward(self, x):

      x = torch.flatten(x, 1)
      x1 = F.relu(self.fc1(x))
      x2 = F.relu(self.fc2(x1))
      x3 = F.relu(self.fc3(x2))
      x4 = F.relu(self.fc4(x3))
      x5 = F.relu(self.fc5(x4))
      x6 = F.relu(self.fc6(x5))
      x7 = F.relu(self.fc7(x6))

      output = self.fc8(x5)

      # Return the output of the network
      return output

**Import Helper Functions** In the last assignment you were required to create two functions: *trainMyModel* and *testMyModel*.  You will need to re-use these functions again here and you can paste them here and run them.  To keep the notebook a little cleaner, I import these two functions from another python file called *dl_assignment4_helper_functions*, below.  I use the prefix *hlp* to call these functions. It is up to you whether you paste these functions into the notebook, or import them.  However, note that the code skeletons below assume that they are imported with the *hlp* prefix, and you will have to remove/modify the prefix if you don't import them in a similar fashion.

In [None]:
def trainMyModel(net, lr, trainloader, n_epochs=2, useGPU=False):
  optimizer = optim.Adam(net.parameters(), lr)
  criterion = nn.CrossEntropyLoss()

  # Attempt to put your neural network onto the GPU
  if useGPU == True:
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    net.to(device)

  for epoch in range(n_epochs):  # loop over the dataset multiple times

      running_loss = 0.0
      for i, data in enumerate(trainloader, 0):
          # get the inputs; data is a list of [inputs, labels]
          # inputs is one batch of 128 images, each [1, 28, 28]
          if  useGPU == True:
            inputs, labels = data[0].to(device), data[1].to(device)
          else:
            inputs, labels = data

          # zero the parameter gradients
          optimizer.zero_grad()

          # forward + backward + optimize
          outputs = net(inputs)
          loss = criterion(outputs, labels)
          loss.backward()
          optimizer.step()

          # Don't forget, your function must print out the training loss on each
          # 100th mini-batch
          running_loss += loss.item()
          if i % 100 == 99:    # print every 100 mini-batches
              print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 100:.3f}')
              running_loss = 0.0

  print('Finished Training')
  return net

def testMyModel(trainedNet, testloader):
  correct = 0
  total = 0

  # since we're not training, we don't need to calculate the gradients for our outputs
  with torch.no_grad():
      for data in testloader:
          images, labels = data

          # calculate outputs by running images through the network
          outputs = trainedNet(images)

          # the class with the highest energy is what we choose as prediction
          _, predicted = torch.max(outputs.data, 1)
          total += labels.size(0)
          correct += (predicted == labels).sum().item()

  acc = round((100 * correct / total), 2)
  return acc

**Train and Test the Baseline Model** Now Run your *NetFC* model using a learning rate of 0.01 for 2 epochs.  These values are chosen because they work relatively well.  This model should usually achieve around 94% accuracy, and this will serve as our baseline.

In [45]:
# Train your model.
net = NetFc();
lr = 0.01;
n_epochs = 2;
trainedNet = trainMyModel(net, lr, trainloader, n_epochs);

  self.pid = os.fork()


[1,   100] loss: 0.680
[1,   200] loss: 0.313
[1,   300] loss: 0.281
[1,   400] loss: 0.264


  self.pid = os.fork()


[2,   100] loss: 0.214
[2,   200] loss: 0.206
[2,   300] loss: 0.200
[2,   400] loss: 0.213
Finished Training


In [48]:
# Test your model
accuracy = testMyModel(trainedNet,testloader)
print(f'The accuracy was {accuracy} percent');

The accuracy was 94.62 percent


**Part (b) (20 points)** Now we will see the advantages of convolutional structures in a deep neural network.  Below, fill in the template to create convolutional neural network, called 'NetCnn' that has the following structure:

layer1: 8 3x3 convolutional filters, one pixel of zero-padding, and stride of one

layer2: 16 3x3 convolutional filters, one pixel of zero-padding, and stride of one

layer3: 2x2 max pooling, with stride of 2. No zero-padding.

layer4: 32 3x3 convolutional filters, one pixel of zero-padding, and stride of one

layer5: 64 3x3 convolutional filters, one pixel of zero-padding, and stride of one

layer6: 2x2 max pooling, with stride of 2. No zero-padding.

layer7: a fully connected layer of 50 neurons.

Layer8: a fully connected layer of 10 neurons.

In [8]:
# Convolutional model - adding in convolutional layers

# torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0,
#   dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

# torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

class NetCnn(nn.Module):
    def __init__(self):
      super().__init__()
      self.conv1 = nn.Conv2d(1, 8, (3, 3), padding=1) # out: 8 channel of 28x28
      self.conv2 = nn.Conv2d(8, 16, (3, 3), padding=1) # out: 16 channels of 28x28
      self.pool1 = nn.MaxPool2d((2, 2), stride=2) # out: 16 channels of 14x14
      self.conv3 = nn.Conv2d(16, 32, (3, 3), padding=1) # out: 32 channels of 14x14
      self.conv4 = nn.Conv2d(32, 64, (3, 3), padding=1) # out: 64 channels of 14x14
      self.pool2 = nn.MaxPool2d((2, 2), stride=2) # out: 64 channels of 7x7
      self.fc1 = nn.Linear(64*7*7, 50) # out: 1 channel flattened with 50 nodes
      self.fc2 = nn.Linear(50, 10) # out: 10 nodes corresponding to the 10 classes

    def forward(self, x):
      x1 = F.relu(self.conv1(x))
      x2 = F.relu(self.conv2(x1))
      x3 = F.relu(self.pool1(x2))
      x4 = F.relu(self.conv3(x3))
      x5 = F.relu(self.conv4(x4))
      x6 = F.relu(self.pool2(x5))
      xbuf = torch.flatten(x6, 1) # flatten before passing into fc layer
      x7 = F.relu(self.fc1(xbuf))
      output = self.fc2(x7)

      # Return the output of the network
      return output

Now train the model for 2 epochs using your *trainMyModel* function, which should report the loss every 100 iterations, as was requested in the last assignment.  Then, using *testMyModel* function, evaluate the accuracy of your trained model on the test set. If done correctly, you should obtain around 97% accuracy on the testing set, a relatively significant improvement over *NetFc* if you consider how much error remains.  Note that you may need to tune the learning rate a little bit to achieve this level of accuracy.  

Note that we could add skip connections to 'NetCnn' as well, which would further improve its performance, but this is a little tricky and it will not be part of this assignment.  

In [12]:
# Train your model.
net = NetCnn()
lr = 0.01
n_epochs = 2;

trainedNet = trainMyModel(net,lr,trainloader,n_epochs)

[1,   100] loss: 0.648
[1,   200] loss: 0.118
[1,   300] loss: 0.086
[1,   400] loss: 0.085
[2,   100] loss: 0.075
[2,   200] loss: 0.070
[2,   300] loss: 0.073
[2,   400] loss: 0.065
Finished Training


In [13]:
# Test your model
acc = testMyModel(trainedNet,testloader)
print(f'The accuracy was {acc} percent');

The accuracy was 97.92 percent


**Part (c) (10 points)**  Compute the number of parameters in the NetFC model and the NetCnn model, respectively, as described in UDL.  Please show your work, and then report your final answer in scientific notation $x \times 10^y$ where you need to fill in $x$ and $y$.  You need only report $x$ reported to one decimal place, and $y$ should be an integer.  You will be primarily graded on a correct order of magnitude, $y$.

**NetFC Model**:
For a fully connected model, the parameters (weights + bias terms) of each layer will be (in_nodes + bias_node)x(out_nodes).

Therefore:
$((28*28 + 1)*50 + 7((50 + 1)*50) + ((50 + 1)*10))$ = $5.8 x 10^4$


**NetCnn Model**:
In a convolutional layer, the kernel_size * num_of_filters * num_input_layers + num_of_filters = weights for that layer. Note, the number of filters = the number of output layers. There are no weights in a max pooling layer since it does not need to be trained, it simply chooses the largest value in the pool each time. And the equation for a fully connected layer is above.

Therefore: $(1*8*3*3 + 8) + (8*16*3*3 + 16) + (16*32*3*3 + 32) + (32*64*3*3 + 64) + ((64*7*7 + 1) * 50) + ((50 + 1)*10)$ = $1.8x10^5$

**Part (d) (10 POINTS)**  In this last subproblem, you will add batch normalization layers to your network.  Batch normalization, and its variants (e.g., "layer norm") are another structure that is now widely-used in modern deep neural networks.  In this problem you will design a neural network called *NetCnnBn* with the exact same structure as 'NetCnn' except you will add two batch normalization layers in the following locations: (i) after the 2nd convolutional layer, and (ii) after the 1st fully connected layer.  

Train your NetCnnBn for 2 epochs using your *trainMyModel* function, and then report its accuracy on the test set using the *testMyModel* function.  If done properly, you should be now be able to achieve approximately 99% accuracy on the testing dataset after two epochs of training.  NOte that you may need to adjust the learning rate again.  Despite this significant performance improvement, note that batch normalization contributes a very small number of parameters.  In our case, for example, it adds $<200$ parameters.

In [14]:
# Convolutional model - adding in batch norm
class NetCnnBn(nn.Module):
    def __init__(self):
      super().__init__()
      self.conv1 = nn.Conv2d(1, 8, (3, 3), padding=1) # out: 8 channel of 28x28
      self.conv2 = nn.Conv2d(8, 16, (3, 3), padding=1) # out: 16 channels of 28x28
      self.batch1 = nn.BatchNorm2d(16) # 2d here
      self.pool1 = nn.MaxPool2d((2, 2), stride=2) # out: 16 channels of 14x14
      self.conv3 = nn.Conv2d(16, 32, (3, 3), padding=1) # out: 32 channels of 14x14
      self.conv4 = nn.Conv2d(32, 64, (3, 3), padding=1) # out: 64 channels of 14x14
      self.pool2 = nn.MaxPool2d((2, 2), stride=2) # out: 64 channels of 7x7
      self.fc1 = nn.Linear(64*7*7, 50) # out: 1 channel flattened with 50 nodes
      self.batch2 = nn.BatchNorm1d(50) # 1d now that we've flattended
      self.fc2 = nn.Linear(50, 10) # out: 10 nodes corresponding to the 10 classes

    def forward(self, x):
      x1 = F.relu(self.conv1(x))
      x2 = F.relu(self.batch1(self.conv2(x1))) # 1st batch norm
      x3 = F.relu(self.pool1(x2))
      x4 = F.relu(self.conv3(x3))
      x5 = F.relu(self.conv4(x4))
      x6 = F.relu(self.pool2(x5))
      xbuf = torch.flatten(x6, 1)
      x7 = F.relu(self.batch2(self.fc1(xbuf))) # 2nd batch norm
      output = self.fc2(x7)

      # Return the output of the network
      return output

In [21]:
# Train your model.
net = NetCnnBn();
lr = 0.005;
n_epochs = 2;

trainedNet = trainMyModel(net,lr,trainloader,n_epochs);

[1,   100] loss: 0.295
[1,   200] loss: 0.066
[1,   300] loss: 0.054
[1,   400] loss: 0.053
[2,   100] loss: 0.036
[2,   200] loss: 0.032
[2,   300] loss: 0.036
[2,   400] loss: 0.035
Finished Training


In [22]:
# Test your model
acc = testMyModel(trainedNet,testloader)
print(f'The accuracy was {acc} percent');

The accuracy was 99.26 percent


## **PROBLEM 2 (20 Total Points)**

In this problem you will investigate transfer learning.  Load in a resnet18 model, and initialize its training with weights that were pre-trained on the ImageNet dataset.  Call this model *pretrainedResNet* As a hint, you cannot simply apply a pre-trained resnet18 to this problem; you will need to make two changes to the model structure for it to work properly.  

Once you have made the proper modifications (fill in code below), train and test your model on the MNIST data, as you have done with previous models.  If done properly, you should only require a few lines of code, and you should usually obtain around 97% accuracy with 1 epoch of training and the learning rate provided below (lr = 0.0001).  You only need to show your results with these settings. Unfortunately the MNIST dataset is not ideal for demonstrating the tremendous benefits of transfer learning, but this exercise will help familiarize you with the process of adapting pre-trained models to a custom task, which is important in practice.   

For this problem I highly recommend that you use a GPU because training willbe  relatively slowly without it (e.g., a couple minutes for 1 epoch, depending upon your hardware).  With a GPU the training should generally run very quickly, finishing in under 30 seconds or less.  Note that you can procure a free GPU to use on Google Colab, however, you are given a limited GPU compute per day unless you pay. Consequently I strongly recommend that you debug on a CPU before deploying onto the GPU.  

In [36]:
# Please load a pre-trained resnet-18 model and make the necessary changes so that it will work on the MNIST problem
from torchvision.models import resnet18, ResNet18_Weights
preTrainedResNet = resnet18(weights=ResNet18_Weights)
preTrainedResNet



ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

Two changes:
1. We want 10 output variables (classes) rather than 1000
2. We have B/W images (1 channel) vs 3 input channels to first layer

In [37]:
# update first layer (conv1)
preTrainedResNet.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

# update last layer
preTrainedResNet.fc = nn.Linear(512, 10)

In [38]:
preTrainedResNet

ResNet(
  (conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [39]:
# Train and test your model
lr = 0.0001;
n_epochs = 1;
trainedNet = trainMyModel(preTrainedResNet,lr,trainloader,n_epochs, useGPU=True);

[1,   100] loss: 0.834
[1,   200] loss: 0.219
[1,   300] loss: 0.147
[1,   400] loss: 0.117
Finished Training


In [40]:
acc = testMyModel(trainedNet,testloader)
print(f'The accuracy was {acc} percent');

The accuracy was 97.07 percent
