# Lab 6: Neural Networks

In this notebook we will learn how to train a simple Multilayer Perceptron for image classification using PyTorch. 

[Click here to check guide to install PyTorch locally.](https://pytorch.org/get-started/locally/)

You can find additional information [here](https://pytorch.org/tutorials/beginner/basics/intro.html).


In [None]:
import matplotlib.pyplot as plt
import numpy as np

import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data.sampler import SubsetRandomSampler
import torch.nn.functional as F
from torchvision import datasets
import torchvision.transforms as transforms
from sklearn.metrics import accuracy_score
from tqdm import tqdm

## Load dataset

The torchvision package contains a few datasets. We will use the MNIST dataset of handwritten digits.

The dataset comes separated into training and test sets. We will further separate the test set into two smaller sets: validation and test.


In [None]:
# Define transformations that are applied to the image
data_aug = transforms.Compose([transforms.ToTensor()]) # ToTensor() transforms an image into a tensor, normalizing it into values between 0 and 1

# Load training data from MNIST into directory defined in "root"
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=data_aug,
)

# Load test data from MNIST into directory defined in "root"
validation_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=data_aug,
)

# Separate test images into validation (80%) and testing (20%)
indices = list(range(len(validation_data)))
np.random.shuffle(indices)

test_size = 0.2 * len(indices)
split = int(np.floor(test_size))
val_idx, test_idx = indices[split:], indices[:split]

val_sampler = SubsetRandomSampler(val_idx)
test_sampler = SubsetRandomSampler(test_idx)

Define a data loader that automatically fetches batches of images and their labels

In [None]:
batch_size = 64 # number of images loaded at each time
num_workers = 2 # how many processes are used to load the data

# Define data loaders for the train, test and validation data
train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True, num_workers=num_workers, drop_last=True)
validation_dataloader = DataLoader(validation_data, sampler=val_sampler, batch_size=batch_size, shuffle=False, num_workers=num_workers, drop_last=False)
test_dataloader = DataLoader(validation_data, sampler=test_sampler, batch_size=1, shuffle=False, num_workers=num_workers, drop_last=False)

## Visualize the Data

Directly using the dataset

In [None]:
# Get the first sample of the training data (contains an image and its label)

sample = training_data[0] 

# Get image and print its dimensions
img = sample[0]
print(img.shape)

# Get label and print it
label = sample[1]
print(label)

Iterating over the data loader

In [None]:
for batch in train_dataloader:
  # Get images of the batch and print their dimensions
  imgs = batch[0]
  print(imgs.shape)

  # Get labels of each image in the batch and print them
  labels = batch[1]
  print(labels)

  # Show first image of the batch
  plt.imshow(imgs[0][0,:,:], cmap='gray')
  plt.axis('off')
  plt.show()
  
  break

## Defining the model

Create an MLP with the following structure: 

1. Dense/linear layer that takes the images as a flattened input vector and generates an output of 512 of dimension.
2. ReLU activation layer
3. Dense/linear layer with 512 input and output
3. ReLU activation layer
4. Dense/linear layer with 10 output channels (10 classes of MNIST)

You can use PyTorch's layers: https://pytorch.org/docs/stable/nn.html (Conv2d, ReLU, Linear, MaxPool2d, Dropout, Flatten)

In [None]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        # TODO: Define model layers here

    def forward(self, x):
        #TODO: Apply layers to input x and return result

model = NeuralNetwork().to(device) # put model in device (GPU or CPU)
print(model)

Interpret the implemented architecture and try to answer the following questions:

a) What is the shape (width, and # of channels) of the output tensor after the first layer?

b) And after the first 3 layers (dense+dense+dense)?

c) How many parameters (weights) does the model have? Contrary to Keras, PyTorch does not have an official method for counting the number of parameters of a model, but you can use [torchsummary](https://pypi.org/project/torch-summary/)

In [None]:
# TODO

In [None]:
#!pip install torch-summary

# TODO

## Train the model

In [None]:
# Define loss function
loss_fn = nn.CrossEntropyLoss() # already includes the Softmax activation

# Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

Define one iteration

In [None]:
def epoch_iter(dataloader, model, loss_fn, optimizer=None, is_train=True):
    if is_train:
      assert optimizer is not None, "When training, please provide an optimizer."
  
    # Get number of batches
    num_batches = len(dataloader)

    # Set model to train mode or evaluation mode
    if is_train:
      model.train()
    else:
      model.eval()

    # Define variables to save predictions and labels during the epoch
    total_loss = 0.0
    preds = []
    labels = []

    # Enable/disable gradients based on whether the model is in train or evaluation mode
    with torch.set_grad_enabled(is_train):

      # Analyse all batches
      for batch, (X, y) in enumerate(tqdm(dataloader)):
          
          # Put data in same device as model (GPU or CPU)
          X, y = X.to(device), y.to(device)

          # Forward pass to obtain prediction of the model
          pred = model(X)
          
          # Compute loss between prediction and ground-truth
          loss = loss_fn(pred, y)

          # Backward pass
          if is_train:
            # Reset gradients in optimizer
            optimizer.zero_grad()
            # Calculate gradients by backpropagating loss
            loss.backward()
            # Update model weights based on the calculated gradients
            optimizer.step()

          # Apply softmax activation to obtain final prediction
          probs = F.softmax(pred, dim=1)
          final_pred = torch.argmax(probs, dim=1)

          # Save training metrics
          total_loss += loss.item() # IMPORTANT: call .item() to obtain the value of the loss WITHOUT the computational graph attached

          # Add predictions
          preds.extend(final_pred.cpu().numpy())
          labels.extend(y.cpu().numpy())

    return total_loss / num_batches, accuracy_score(labels, preds)

Define training cycle

In [None]:
num_epochs = 10
train_history = {'loss': [], 'accuracy': []}
val_history = {'loss': [], 'accuracy': []}
best_val_loss = np.inf

# Training cycle
print("Start training...")
for t in range(num_epochs):
    print(f"\nEpoch {t+1}")

    # Train model for one iteration on training data
    train_loss, train_acc = epoch_iter(train_dataloader, model, loss_fn, optimizer)
    print(f"Train loss: {train_loss:.3f} \t Train acc: {train_acc:.3f}")
    
    # Evaluate model on validation data
    val_loss, val_acc = epoch_iter(validation_dataloader, model, loss_fn, is_train=False)
    print(f"Val loss: {val_loss:.3f} \t Val acc: {val_acc:.3f}")

    # Save model when validation loss improves
    if val_loss < best_val_loss:
      best_val_loss = val_loss
      save_dict = {'model': model.state_dict(), 'optimizer': optimizer.state_dict(), 'epoch': t}
      torch.save(save_dict, 'best_model.pth')

    # Save latest model
    save_dict = {'model': model.state_dict(), 'optimizer': optimizer.state_dict(), 'epoch': t}
    torch.save(save_dict, 'latest_model.pth')

    # Save training history for plotting purposes
    train_history["loss"].append(train_loss)
    train_history["accuracy"].append(train_acc)

    val_history["loss"].append(val_loss)
    val_history["accuracy"].append(val_acc)
    
print("Finished")

## Analyse training evolution

Plot loss and accuracy throughout training on train and validation data

In [None]:
# TODO

## Test the model

Evaluate the model in the test set

In [None]:
# TODO

In [None]:
def showErrors(model, dataloader, num_examples=20):    
    plt.figure(figsize=(15, 15))

    for ind, (X, y) in enumerate(dataloader):
      if ind >= 20: break
      X, y = X.to(device), y.to(device)    
      pred = model(X)
      probs = F.softmax(pred, dim=1)
      final_pred = torch.argmax(probs, dim=1)

      plt.subplot(10, 10, ind + 1)
      plt.axis("off")
      plt.text(0, -1, y[0].item(), fontsize=14, color='green') # correct
      plt.text(8, -1, final_pred[0].item(), fontsize=14, color='red')  # predicted
      plt.imshow(X[0][0,:,:].cpu(), cmap='gray')
    plt.show()

In [None]:
showErrors(model, test_dataloader)

## Additional Challenges

a) As the test accuracy should show, the MNIST dataset is not very challenging, change the code to use Fashion-MNIST and compare the results.

b) Do the same for the CIFAR10 (or CIFAR100) dataset. Note that, in this case, each image is a 32x32 color image; convert it to grayscale or concatenate the RGB channels in one single vector (e.g. using the reshape method).

c) The test accuracy for CIFAR is significantly worse. Try improving the results by using: 1) a deeper architecture, and 2) a different optmizer.

You can load the datasets from [here](https://pytorch.org/vision/stable/datasets.html).
