# Assignment 2: Fashion Dataset and Architecture Comparisons

In this assignment, you are to perform the following:

1. Study the `FashionMNIST` dataset, and understand the images and their respective labels.
1. Implement a `Trainer` class in the file `trainer_lib.py` for reusable and reproducible training loops.
1. Implement the following architectures:
  - Linear model
  - MLP with one hidden layer
  - Simple convolution with pooling
  - Deep convolution with pooling, followed by MLP with one hidden layer.

In [3]:
"🔒"
import torch
from torch import nn
from torch import optim
from torch.utils.data import DataLoader, random_split
import torchvision
from torchsummaryX import summary
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from importlib import reload

import warnings
warnings.filterwarnings('ignore')

ModuleNotFoundError: No module named 'torchsummaryX'

## Loading the data

In this section, you will simply run the following data loading cells to familiarize yourself with the layout and semantics of the training dataset.

In [None]:
"🔒"
#
# Loading data
#
home = os.environ.get('HOME')
root = os.path.join(home, 'public/data')
dataset = torchvision.datasets.FashionMNIST(
    root,
    train=True,
    transform=torchvision.transforms.ToTensor())

dataset

In [None]:
"🔒"
#
# Print out important stats about the dataset
#
image_tensor, label = dataset[0]
print("The first image is of shape:", image_tensor.shape)
print("The first label is:", label)

plt.imshow(np.transpose(image_tensor, (1, 2, 0)));

The dataset object has a `.classes` field that contains the names of
the different labels.  It has 10 classes ranging from **T-shirt** to **Ankle boot**.

In [None]:
"🔒"
#
# Print the lookup
#
lookup = pd.Series({x: i for (i,x) in enumerate(dataset.classes)})
lookup

In fact, we can print the first 36 entries in the dataset, and plot them as a grid.
Below is the result of that.

In [None]:
"🔒"
xs = dataset.data[:36]
xs = xs.reshape(36, 1, 28, 28)
mosiac = torchvision.utils.make_grid(xs, nrow=6)
mosiac = np.transpose(mosiac, (1, 2, 0))
plt.imshow(mosiac)
plt.xticks([])
plt.yticks([]);

## Getting ready for training and validation

We will be using the same training data for all three neural network architectures.

The `train_dataloader` is a dataloader for the training dataset, and `val_dataloader` is the dataloader for the validation dataset.

In [None]:
"🔒"
train_dataset, val_dataset = random_split(dataset, (0.8, 0.2))
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=len(val_dataset), shuffle=False)

## Linear classifier

In this section, you will implement a simple linear model.  The linear model is to be implemented as a `nn.Module`
and should have a field `model` which is a `nn.Sequential` module.

You are to complete the implementation by creating two layers in the `nn.Sequence(...)`:

- A layer to flatten the input from `(1,28,28)` to a vector.
- The second layer perform linear classification to 10 dimensional logit.  You are to use the `nn.LazyLinear` layer to implement
  the linear layer.  The advantage of the `LazyLinear` layer is that you do not need to compute
  the input dimension explicitly.

In [None]:
"✍️"
# @workUnit

class MyLinear(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # complete the architecture
            ...
        )
    def forward(self, image):
        return self.model(image)

In [None]:
"🔒"
# @check
# @title: check linear model architecture

m = MyLinear()
xs, ys = next(iter(train_dataloader))
summary(m, xs);

## A trainer class

In this section, you will be implementing a trainer class that will be used throughout the assignment.

The trainer class has the following methods:

**Constructor:**

- Input:
  - `model`: a `nn.Module` that is the neural network model.
  - `train_dataloader`: a `DataLoader` that is the training data
  - `val_dataloader`: is the `Dataloader` for validation
  - The constructor uses the `optim.Adam(...)` as the optimizer.
  - The constructor uses the `nn.CrossEntropy` as the loss function.
  
**`Trainer.train_one_epoch`**:

It performs training of `model` for **one epoch** by iterating over the batches from `train_dataloader`.

- Input:
  - `max_batches` is the maximum batches taken from the dataloader.  **For performance reasons, we will only sample 10 batches.**
  
- Output: it returns two float numbers:
  - mean loss over the batches
  - mean accuracy over the batches
  
**`Trainer.val_one_epoch`**:

It performs validation by iterating over all batches from the validation dataloader.

- Output: it returns two float numbers:
  - mean loss
  - mean accuracy
  
**`Trainer.train`**

This method performs training.

- Input:
  - `epochs`: the number of epochs.
  - `max_batches`: the maximum batches to take per epoch.
- Output:
  returns a `DataFrame` containing training loss,
  training accuracy, validation loss, validation accuracy, and the time per epoch.
  
**`Trainer.reset`**

This method resets the trainable parameters of the model.

**A skeleton implementation is given in `trainer_lib.py`.**  You are to complete the implementation.

In [None]:
"🔒"
import trainer_lib

Below are some basic sanity checks of the trainer implementation.

In [None]:
"🔒"
# @check
# @title: check trainer construction
reload(trainer_lib)
trainer = trainer_lib.Trainer(m, train_dataloader, val_dataloader)
print(trainer is not None)

In [None]:
"🔒"
# @check
# @title: check trainer methods
for method in ['train_one_epoch', 'val_one_epoch', 'train', 'reset']:
    print(f"trainer.{method}?", hasattr(trainer, method))

In [None]:
"🔒"
# @check
# @title: check trainer.train_one_epoch

(loss, acc) = trainer.train_one_epoch(max_batches=1)
print('loss is numeric', isinstance(loss, float))
print('acc is numeric', isinstance(acc, float))

In [None]:
"🔒"
# @check
# @title: check trainer.val_one_epoch

(loss, acc) = trainer.val_one_epoch()
print('loss is numeric', isinstance(loss, float))
print('acc is numeric', isinstance(acc, float))

In [None]:
"🔒"
# @check
# @title: check trainer.reset

trainer.reset()
print("Ok")

## Training the linear model

In this section, we will train the linear model.  We will take 10 batches per epoch, and train for 20 epochs.

The training history is stored in `model_linear_log`.

This will take approximately **70 seconds**.

In [None]:
"🔒"
reload(trainer_lib)

model_linear = MyLinear()
trainer = trainer_lib.Trainer(model_linear, train_dataloader, val_dataloader)
trainer.reset()
model_linear_log = trainer.train(epochs=20, max_batches=10)
model_linear_log.round(2)

## MLP Models

In this section, you are to construct a MLP model.  It uses `LazyLinear` to create a hidden dimension of 50, with activation `nn.ReLU` for the hidden layer.

In [None]:
"✍️"
# @workUnit

class MLPModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # complete the following
            ...
        )
    def forward(self, x):
        return self.model(x)

In [None]:
"🔒"
# @check
# @title: MLP model architecture

model_mlp = MLPModel()
summary(model_mlp, xs);

## Training MLP

We will train the MLP model in 20 epochs with 10 batches per epoch.

You should expect:

- Total training time: 80 seconds
- Final validation accuracy approximately 75% or higher.

In [None]:
"🔒"
model_mlp = MLPModel()
reload(trainer_lib)
trainer = trainer_lib.Trainer(model_mlp, train_dataloader, val_dataloader)
trainer.reset()
model_mlp_log = trainer.train(epochs=20, max_batches=10)
model_mlp_log.round(2)

## Convolution based networks

In this section, we will construct a simple convolutional network.  It consists of the following layers:

- 2D convolution with the `num_kernels` kernels of size `kernel_size`.
- Max pooling with pooling size `pool_size`.
- ReLU activation
- Flatten the max pooling output to a vector
- A (lazy) linear layer that maps to 10 dimensional logits vector

In [None]:
"✍️"
# @workUnit

class LinearCNNModel(nn.Module):
    def __init__(self, num_kernels, kernel_size, pool_size):
        super().__init__()
        self.model = nn.Sequential(
            # complete the following
            ...
        )
    def forward(self, x):
        return self.model(x)

We will construct a convolutional network with 5 kernels, with kernel size 3, and pooling size of 2.

In [None]:
"🔒"
# @check
# @title: CNN Linear architecture

model_cnn_linear = LinearCNNModel(5, 3, 2)
summary(model_cnn_linear, xs);

## Training of Convolutional Network

We will train a convolutional network using 20 epochs, and 10 batches per epoch.

You should expect:

- Total training time: 120 seconds
- Validation accuracy approximately 76% or higher.

In [None]:
"🔒"
reload(trainer_lib)
trainer = trainer_lib.Trainer(model_cnn_linear, train_dataloader, val_dataloader)
trainer.reset()
model_cnn_linear_log = trainer.train(epochs=20, max_batches=10)
model_cnn_linear_log.round(2)

## Deep Conv model

We will construct a **deep convolutional** network with the following layers:

- Conv2d with 16 kernels of size 5, with `padding='same'`
- MaxPool2d with pooling size of 2
- Conv2d with 32 kernels of size 3, with padding
- MaxPool2d with pooling size of 2
- Flatten the maxpooling output to a vector
- a MLP with hidden layer of 50, and ReLU is used as the activation function, and its output layer produces the 10 dimensional logits vector

In [None]:
"✍️"
# @workUnit

class DeepCNNModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # complete the following
            ...
        )
    def forward(self, x):
        return self.model(x)

In [None]:
"🔒"
# @check
# @title: deep cnn architecture

model_cnn_deep = DeepCNNModel()
summary(model_cnn_deep, xs);

## Training the deep CNN network

We train the deep convolutional network.

You should expect:

- Total training time: 250 seconds
- Validation accuracy should be close to or exceeds 80%.

In [None]:
"🔒"
reload(trainer_lib)
trainer = trainer_lib.Trainer(model_cnn_deep, train_dataloader, val_dataloader)
trainer.reset()
model_cnn_deep_log = trainer.train(epochs=20, max_batches=10)
model_cnn_deep_log.round(2)

## Plot

The following plots the training accuracy.

In [None]:
"🔒"
plt.figure()
plt.plot(model_linear_log.index, model_linear_log.train_accuracy)
plt.plot(model_mlp_log.index, model_mlp_log.train_accuracy)
plt.plot(model_cnn_linear_log.index, model_cnn_linear_log.train_accuracy)
plt.plot(model_cnn_deep_log.index, model_cnn_deep_log.train_accuracy)

plt.title('Training Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(['Linear', 'MLP', 'CNN', 'Deep CNN']);

### Plot the validation accuracy

Make sure you properly label the plot and the axes.

In [None]:
"✍️"
# @workUnit

#
# Generate the validation accuracy of the different models
#