# ML NN Project: Convolutional Neural Network

In this project you will complete the provided Python code, using PyTorch, to perform image classification on the Fashion-MNIST and MNIST Digits datasets using various CNN architectures. In the following we provide the code skeleton for working with the Fashion-MNIST dataset.  After completing that part of the project, in Part 5 below you will repeat experimentations with the MNIST Digits dataset.

## **Important!**
# Make sure you change your runtime type to GPU! This can be found in the top menu following "Runtime->Change runtime type" then select from the dropdown T4 GPU.


## **Part A - Code skeleton for experimenting with MNIST Fashion dataset**

## Imports, Data, and Hyperparameters

### Imports

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

import numpy
import matplotlib.pyplot as plt

import time

print(torch.__version__)

### Data (for Fashion-MNIST)

In [None]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)
print()

batch_size = 64

train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# get info about data
for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break


### Global Hyperparameters

In [None]:
LEARNING_RATE = 0.001
BATCH_SIZE = 32
EPOCHS = 25

## **Part A.1 - The Effect of Filters**

In this section you will be comparing the relative performance and training times for two different CNN architectures.

Both architectures will have the following layers in order:

``` Python
Conv2D <- input layer
ReLU
Conv2D
ReLU
MaxPooling2D((2,2))
Conv2D
ReLU
Conv2D
ReLU
MaxPooling2D((2,2))
Flatten
Linear(1024 nodes)
ReLU
Linear(num_classes) <- output layer
```

In ```build_model_1A()``` the first two Conv2D layers should have 16 filters, and the second two Conv2D layers should have 32 filters.

In ```build_model_1B()``` the first two Conv2d layers should have 64 filters, and the second two Conv2D layers should have 128 filters.

All the Conv2D layers should have ```padding="same"``` and ```kernel_size=(3,3)``` passed as parameters.

Finally both max pooling layers should have a pool size of (2, 2)

**Documentation for the each layer type:**

*Conv2D*: https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d

*MaxPooling2D*: https://docs.pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html

*Flatten*: https://docs.pytorch.org/docs/stable/generated/torch.nn.modules.flatten.Flatten.html

*Linear*: https://docs.pytorch.org/docs/stable/generated/torch.nn.Linear.html

*ReLU*: https://docs.pytorch.org/docs/stable/generated/torch.nn.ReLU.html

**Documentation for Sequential model**: https://docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html

### ***TODO*** Model 1A

In [None]:
def build_model_1A():
  # !!! Define Model 1A !!!

### ***TODO*** Model 1B

In [None]:
def build_model_1B():
  # !!! Define Model 1B !!!

### ***TODO*** Model Compilation and Summaries

Compile and print summaries for the two architectures.

When compiling the models, use Adam as the optimizer with ```LEARNING_RATE```, sparse categorical crossentropy as the loss function (may have to manually set ```from_logits```), and accuracy as the only performance metric.

In [None]:
# !!! Get accelerator device !!!
device = torch.accelerator.current_accelerator().type if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# !!! Send networks to gpu !!!
model_1a = build_model_1A().to(device)
model_1b = build_model_1B().to(device)

# !!! Print summaries !!!
print(model_1a)
print(model_1b)

### Testing Helper Method

Method to help with testing. **Do not change.**

When compiling the model, this method uses Adam as the optimizer with ```LEARNING_RATE``` and sparse categorical crossentropy as the loss function.

*CrossEntropyLoss*: https://docs.pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

In [None]:
def perf_and_test(model, model_name):
  loss_fn = nn.CrossEntropyLoss()
  optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
  losses = []
  accuracies = []
  batches = [0]
  start = time.time()

  model.train()
  optimizer.zero_grad()
  for epoch in range(EPOCHS):
    for batch, (X, y) in enumerate(train_dataloader):
      X, y = X.to(device), y.to(device)

      # Compute prediction error
      pred = model(X)
      loss = loss_fn(pred, y)

      # Backpropagation
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()

      loss, current = loss.item(), (batch + 1) * len(X)
      correct = (pred.argmax(1) == y).type(torch.float).sum().item()
      accuracy = correct / len(X)
      losses.append(loss)
      accuracies.append(accuracy)
      if (epoch != 0) or (batch != 0):
        batches.append(batches[-1] + 1)
      if batch % 500 == 0:
        print(f"batch: {batch}, loss: {loss:<7.4f}, accuracy: {accuracy:<5.2f}")
  print()
  print("" + model_name + " testing information")
  print("Training took ", round((time.time() - start),2), "seconds")

  print("Testing statistics")
  print(accuracies[0:10])
  size = len(test_dataloader.dataset)
  num_batches = len(test_dataloader)
  model.eval()
  test_loss, correct = 0, 0
  with torch.no_grad():
    for X, y in test_dataloader:
      X, y = X.to(device), y.to(device)
      pred = model(X)
      test_loss += loss_fn(pred, y).item()
      correct += (pred.argmax(1) == y).type(torch.float).sum().item()
  test_loss /= num_batches
  correct /= size
  print(f"Test Error:")
  print(f"Avg loss: {test_loss:>5.2f}, Accuracy: {(100*correct):>7.4f}%")

  fig,ax = plt.subplots()
  ax.plot(batches, accuracies, color="red")
  ax.set_xlabel("batches",fontsize=14)
  ax.set_ylabel("accuracy", color="red", fontsize=14)
  ax2=ax.twinx()
  ax2.plot(batches, losses, color="blue")
  ax2.set_ylabel("loss", color="blue", fontsize=14)
  plt.title("Training Acc. and Loss: " + model_name)
  plt.show()

### ***TODO*** Global Hyperparameters

Start with a number of epochs that seems too high Ex:(between 30 and 80). After training, examine the loss plot to identify a point of diminishing returns in training, and then set the number of training epochs equal to that. Note any changes in testing accuracy.

In [None]:
EPOCHS = 50

### Testing models

The following code cell will run both models and provided statistics for the training time, testing accuracy, and testing loss of the models. The loss function used is cross-entropy loss. It also generates graphs for training accuracy and loss over time. These graphs can be used in your report.  


In [None]:
perf_and_test(model_1a, "Model 1A")
perf_and_test(model_1b, "Model 1B")

## **Part A.2 - Dropout Layers**

In part 2, we will be examining the effect of dropout layers on the performance of the model. Copy your code from the ```build_model_1B()``` method into the ```build_model_2()``` method, and then add dropout layers after each max pooling layer. Model 2 architecture should be:

```
Conv2D
ReLU
Conv2D
ReLU
MaxPooling2D((2,2))
Dropout(0.5) <- Add this
Conv2D
ReLU
Conv2D
ReLU
MaxPooling2D((2,2))
Dropout(0.5) <- Add this
Flatten
Linear(1024 nodes)
ReLU
Linear(num_classes, softmax)
```

Note that the we are passing ```0.5``` as an argument to the dropout layers. This means that 50% of the connection between the max pooling and next layer will be dropped.

*Documentation on dropout layers can be found here*: https://docs.pytorch.org/docs/stable/generated/torch.nn.Dropout.html

### ***TODO*** Model 2

In [None]:
def build_model_2():
  # Implement your model here

### ***TODO*** Model Creation and Summary

Create the model, send it to the gpu, and print a summary for the architecture.

### **TODO** Global Hyperparameters

Dropout layers, while capable of making the model more robust, also require more training. You should again increase the number of epochs to find a point of diminishing returns.

In [None]:
EPOCHS = 50

### Testing Model 2

When looking at your results, pay attention to the difference between the training and test accuracy of Model 1B to the difference of the two accuracies in Model 2.

In [None]:
perf_and_test(model_2, "Model 2")

## **Part A.3 - Batch Normalization**

In part 3, we will be examining the effect of batch normalization layers on the performance of the model. Copy your code from the ```build_model_1B()``` method into the ```build_model_3()``` method, and then add batch normalization layers after each Conv2D layer. The Model 3 architecture should be:

```
Conv2D
ReLU
BatchNormalization
ReLU
Conv2D
BatchNormalization
MaxPooling2D((2,2))
Conv2D
ReLU
BatchNormalization
Conv2D
ReLU
BatchNormalization
MaxPooling2D((2,2))
Flatten
Dense(1024 nodes)
ReLU
Dense(num_classes, softmax)
```
 BatchNormalization layers requires no parameters.

 *Documentation can be found here*: https://keras.io/api/layers/normalization_layers/batch_normalization/

### ***TODO*** Model 3

In [None]:
def build_model_3():
  # Copy your code from build_model_1B here and add BatchNormalization layers as specified

### ***TODO*** Model Creation and Summary

Create the model, send it to the gpu, and print a summary for the architecture.

### ***TODO*** Global Hyperparameters

 Batch normalization has been shown to improve convergence in neural networks, and so the number of training epochs required may be lower than in the other architectures. Again refer to the initial loss plot to find the point of diminishing returns and update the ```EPOCHS``` variable.

In [None]:
EPOCHS = 25

### Testing Model 3


In [None]:
perf_and_test(model_3, "Model 3")

## **Part A.4 - Layer count**

In the final part of this project, you will be looking out how the number of layers affect performance and training time.

In ```build_model_4```, you need to implement the following architecture:

```python
# block 1 - 64 filters per Conv2D
conv2D
ReLU
BatchNormaliztion
conv2D
ReLU
BatchNormalization
conv2D
ReLU
BatchNormalization
MaxPooling2D
Dropout(0.5)
#block 2 - 128 filters per Conv2D
conv2D
ReLU
BatchNormaliztion
conv2D
ReLU
BatchNormalization
conv2D
ReLU
BatchNormalization
MaxPooling2D
Dropout(0.5)
#block 3 - 256 filters per Conv2D
conv2D
ReLU
BatchNormaliztion
conv2D
ReLU
BatchNormalization
conv2D
ReLU
BatchNormalization
MaxPooling2D
Dropout(0.5)
# Dense layers
Flatten
Dense(1024)
ReLU
Dense(512)
ReLU
Dense(num_classes, softmax)
```

Notice each block consists of 3 conv2d layers, with 3 batch normalization layers, a maxpooling and dropout layer. The first block should have 64 filters for each Conv2d layer, the second should have 128 filters, and the third should have 256. Also, we are adding an additional dense layer with 512 nodes to the model architecture.

### ***TODO*** Model 4

In [None]:
def build_model_4():
  # Implement your model here. You can copy code from previous models as a reference.

### ***TODO*** Global Hyperparameters

Larger models generally require more training as there are more parameters, so you may need to increase the number of training epochs for this model. In addition, more complicated architectures and problem domains can benefit from lower learning rates. Experiment with the learning rate to see if you notice any improvements. Keep in mind that decreasing the learning rate increases the training time, and to fully train your model you may need to increase the number of epochs as well.

In [None]:
EPOCHS = 25
LEARNING_RATE = 0.0001

### ***TODO*** Model Creation and Summary

Create the model, send it to the gpu, and print a summary for the architecture.



### Testing Model 4

This model will likely take considerably longer to train (10-30 minutes depending on number of epochs, and whichever GPU Colab assigned to your runtime) . It may be a good time to take a coffee break, or work on your report for a bit.


In [None]:
perf_and_test(model_4, "Model 4")

## **Part B - Repeat for MNIST Digits dataset**

Repeat Part A, this time using the MNIST Digits dataset. **Copy the code blocks above and paste below so you do not lose your previous work!**