<a href="https://colab.research.google.com/github/aldolipani/SDAE19/blob/master/SDAE_MLP_CNN_Tutorial_on_MINST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MLP/CNN Tutoral on MINST Datasets

This is Colaboratory. Colaboratory (or briefly Colab) is a Google research project created to help disseminate machine learning education and research. It uses the Jupyter notebook environment, which you can also set up in your machines. However, the one implemented by Google requires no setup to use and runs entirely in the cloud.

In this tutorial you will learn how to train MLP and CNN architectures in CPU and GPU. Today you will use the MNIST dataset. This dataset is a large database of handwritten digits. With this tutorial you will: 

1. familiarize with Python and a deep learning framework called PyTorch;
2. learn how to define deep learning architectures (in both simplified and advanced mode);
3. train architecutres and evaluate losses, and; 
4. evaluate the trained model.

The task you will solve today is: given a handwritten digit, from 0 to 9, output the digit. This task is not a task for binary classifiers because the output required by the classifier is not binary but decimal. This will require the definition of a so called multi-way classifier, a 10-way classifier in this case. You can think of a 10-way classifier as 10 indipendet binary classifiers, one for each digit, which all at the same time output a prediction about the probability of having recognized their digit or not. Then, take the most confident classifier as the correct answer.

Let's get started!

##Install PyTorch

The first thing you should do is to install the libraries needed for this tutorial. In the next code cell you will install torch, which is the PyTorch library, and torchvision where the helper code to download the MINST dataset is located. These libraries will be installed using the package manager of python, called `pip`. The '!' symbol is a special symbol which tells the interpreter to run the following string as a shell command and not as python code. 

### Tips 

If you have never used colab before, you can run the code cell below or by clicking on the play button located on its left-hand side or by pressing `SHIFT`+`ENTER`.

In [0]:
!pip install torch
!pip install torchvision

## Import Dependencies

Now that you have installed the libraries you need, you can import them. In python this is done by using the string `import`. Read the comments below to know more about why each library is needed.

In [0]:
# here you import pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F

# here you import torchvision
import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms

# this library is used to print nice progress bars
from tqdm import tqdm_notebook

# with this library you will plot losses
import matplotlib.pyplot as plt

# this is numpy! Numpy implements operations with tensors and much more
import numpy as np

## Download MNIST Dataset

First things first. Download the MINST dataset and setup train and validation sets.

In [0]:
# download train and validation sets
train_set = datasets.MNIST(root = './data', train = True, transform = transforms.ToTensor(), download = True)
validation_set  = datasets.MNIST(root = './data', train = False, transform = transforms.ToTensor())

# feeling like handwritten digits do not excite you enough? Try FashionMNIST by uncommenting the two lines below
#train_set = datasets.FashionMNIST(root = './data', train = True, transform = transforms.ToTensor(), download = True)
#validation_set  = datasets.FashionMNIST(root = './data', train = False, transform = transforms.ToTensor())

# print the number of examples in each set
print("\nnumber of train set examples:\n", train_set)
print("number of validation set examples:\n", validation_set)

# use the dataloader helper to generate an iterator for the train set to return random examples 5 by 5 (train batch size).
batch_size = 5
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2)
# use the dataloader helper to generate an iterator for the validation set to return examples 100 by 100.
validation_loader =  torch.utils.data.DataLoader(validation_set, batch_size=100, shuffle=False, num_workers=2)

## Analyse MNIST Dataset

Let's start by visualising the first batch of examples. Run this code cell multiple times to see more examples.

In [0]:
# functions to show an image
def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(train_loader)
images, labels = dataiter.next()

# show images
print("Input values (X):")
imshow(torchvision.utils.make_grid(images))
# print labels
print("Labels:")
print(' '.join('%10s' % str(train_set.classes[labels[j].item()]) for j in range(batch_size)))

Let's now print some statistics about this batch.

In [0]:
print("Shape of the input: ", images.shape)
print("Shape of the output:", labels.shape)
print("Number of classes:", len(train_set.classes))
print("Classes:", train_set.classes)

# Hyperparameters based on the Dataset

In [0]:
input_size = 784  # img_size = (28,28) ---> 28*28=784 in total
num_classes = 10  # number of output classes discrete range [0,9]

# MLP in PyTorch

Congrats, now you are ready to define your first MLP model. 
In PyTorch you can do this in two ways: 

1. the easy but limited way, or; 
2. the hard but flexible way.

Let's start with the hard way.

## The Hard Way

In [0]:
class MLP(nn.Module):
  
  def __init__(self, input_size, hidden_sizes, output_size):
    super(MLP, self).__init__()
    self.layers = nn.ModuleList() # list of layers
    next_hidden_size = input_size # input size of the first layer, this depends on the dataset
    for hidden_size in hidden_sizes: # for loop to define the next hidden layers
      self.layers.append(nn.Linear(next_hidden_size, hidden_size))
      next_hidden_size = hidden_size
    self.output = nn.Linear(next_hidden_size, output_size) # output layer

  def forward(self, x): # define the foward pass
    out = x
    for layer in self.layers: 
        out = layer(out)
        out = F.relu(out) # activation function
    out = self.output(out)
    return out

Now you can instantiate this model.

In [0]:
model = MLP(input_size, [400, 400, 10], num_classes)

print("Print the structure of the model:")
model

## The Easy Way

In [0]:
model = nn.Sequential(
    nn.Linear(input_size, 400), # input layer
    nn.ReLU(), # activation function
    nn.Linear(400, 400), # first hidden layer
    nn.ReLU(), # activation function
    nn.Linear(400, 400), # second hidden layer
    nn.ReLU(), # activation function
    nn.Linear(400, num_classes)) # output layer

print("Print the structure of the model:")
model

# CPU or GPU?

Where will you run your model? On a CPU or a GPU? The code in the cell below checks whether you have a GPU available and if yes it will load your model in it. If not, it will keep it in the CPU. By default this notebook runs on a CPU. To activate the GPU you need to open the `Runtime` menu, then click on `Change runtime type` and select `GPU`.

In [0]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("The model will run on", device)

model = model.to(device)

# Optimizer

Now that you have a model, you can select the optimizer. For this exercise we use Stochastic Gradient Descent (SGD). To do this we need to pass which parameters of our model need to be optimized, a learning rate and a momentum.

Commented you find an example of another optimizer, Adam. If you uncomment this line and comment the previous one you will use Adam optimizer to train your model.

In [0]:
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.99)

#optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)


# Loss Function

The best loss for an MLP in a multi-way classification setting is the cross entropy loss. In the next code cell you will define this loss. 


In [0]:
criterium = nn.CrossEntropyLoss()

#Training

We use the DataLoader class to help us generate batches. Pay attention at the parameters.  

In [0]:
batch_size = 50
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2)

This is an helper function that will be used to compute the loss of the model on the validation set while training.

In [0]:
def compute_loss_validation(model):
  validation_loader =  torch.utils.data.DataLoader(validation_set, batch_size=batch_size, shuffle=True, num_workers=2)
  
  tot_loss = 0
  for i, batch in enumerate(validation_loader):
    batch = [item.to(device) for item in batch] # we move the batch to the device
    images, labels = batch
    xs = images.view(-1, input_size)
    ys = labels
    
    pred_ys = model(xs) # generate the predictions using the model
    loss = criterium(pred_ys, ys) # evaluate the predictions using the cross entropy loss
    tot_loss += loss.item() # get the number and sum it to the total loss
    
    if (i+1) % 200 == 0:
      break
    
  loss = tot_loss / 200 # normalize the loss based on the number of testing examples
  return loss

Now you define how many times your model will be trained on the entire training set.

In [0]:
num_epochs = 10 # how many times you want to train on the dataset

Now you will first set the model to training mode. This means that the provided examples will be used for training. Then, the model will loop over two for loops, one over the epochs and one over the batches.

In [0]:
%%time

model.train() # set the model to training mode

# these variables are used to store the losses
running_loss = 0
training_loss = []
validating_loss = []
# loop over the epochs
for epoch in range(num_epochs):
  # loop over the batches
  for i, batch in enumerate(train_loader):
    batch = [item.to(device) for item in batch] # move the batch to the device
    images, labels = batch # extract images and labels
    xs = images.view(-1, input_size) # flatten the image into a vector
    ys = labels
    
    optimizer.zero_grad() # reset the gradients
    pred_ys = model(xs) # generate the predictions
    loss = criterium(pred_ys, ys) # compute the loss
    loss.backward() # backpropagation
    optimizer.step() # optmizes here
    
    running_loss += loss.item()
    if (i+1) % 200 == 0: # every 100 batches print statistics about the training
      running_loss /= 200 # training loss on the last batch
      training_loss.append(running_loss) # keep track of the training loss
      model.eval() # set the model to eval, we can now test the model 
      validation_loss = compute_loss_validation(model) # compute the validation loss
      validating_loss.append(validation_loss) # keep track of the validation loss
      model.train() # set back the model to train
      print('Epoch [%d/%d], Step [%d/%d], Train Loss: %.4f, Validation Loss: %.4f'%(epoch+1, num_epochs, i+1, len(train_set)//batch_size, running_loss, validation_loss))
      running_loss = 0

# Analyse Losses

You can now plot the train and validation losses. However, in order to get a better plot we need to shift the training loss by one. This because, although the training loss and validation loss are computed over 200 batches, the training loss is used to improve the model. This means that its loss decreases in the process, while this does not happen for the validation loss where this loss is computed with the current best model. Hence, the validation loss may become lower than the training loss. To avoid this artifact, we shift forward the validation loss by one.   

In [0]:
training_loss.pop()
validating_loss.insert(0, float('NaN'))

Now you can plot the losses.

In [0]:
plt.plot(training_loss, label="train")
plt.plot(validating_loss, label="validation")
plt.legend(loc='upper right')
for i in range(num_epochs):
  plt.axvline(x=(len(train_set)//batch_size)*(i+1)/200,color='gray')
plt.show()

#Evaluating the Classfier

You can evaluate this classifier using Accuracy. This is defined in the following code cell.

In [0]:
def accuracy(data_loader):
  correct = 0.0 # here you will count the correct answers
  total = 0.0 # here you will count all the answers
  with torch.no_grad(): # ingnore the gradient graph
    for batch in tqdm_notebook(data_loader):
      batch = [item.to(device) for item in batch]
      images, labels = batch
      xs = images.view(-1, input_size)
      ys = labels

      pred_ys = model(xs).detach().cpu()
      _, pred_ys = torch.max(pred_ys, 1)
      correct += (pred_ys == ys.detach().cpu()).sum().numpy()
      total += labels.size(0)
      
  return correct/total

You can now compute the accuracy of the model on the train and validation sets.

In [0]:
model.eval() # set the model to evaluation

train_accuracy = accuracy(train_loader)
print('Train accuracy: {:.3f}'.format(train_accuracy))

validation_accuracy = accuracy(validation_loader)
print('Validation accuracy: {:.3f}'.format(validation_accuracy))

#Convolutional Neural Networks

You can now move to convolutional neural networks. 

## The Hard Way

In [0]:
class CNN(nn.Module):
  def __init__(self, conv_hidden_sizes:list, linear_hidden_sizes, output_size):
    super(CNN, self).__init__()
    self.conv_layers = nn.ModuleList()
    next_conv_size = 1
    for conv_hidden_size in conv_hidden_sizes:
      conv = nn.Conv2d(next_conv_size, conv_hidden_size, kernel_size=5)
      self.conv_layers.append(conv)
      next_conv_size = conv_hidden_size
    self.linear_layers = nn.ModuleList()
    next_linear_size = 3*3*next_conv_size
    for linear_hidden_size in linear_hidden_sizes:
      linear = nn.Linear(next_linear_size, linear_hidden_size)
      self.linear_layers.append(linear)
      next_linear_size = linear_hidden_size
    self.output = nn.Linear(linear_hidden_size, output_size)

  def forward(self, x):
    out = x
    for i, layer in enumerate(self.conv_layers):
      out = layer(out)
      if i > 0:
        out = F.max_pool2d(out, 2)
      out = F.relu(out)    
    out = out.view(-1,3*3*64)
    for layer in self.linear_layers:
      out = layer(out)
      out = F.relu(out)
    out = self.output(out)
    return out

model = CNN([32, 32, 64], [256], 10)

model = model.to(device)

model

## The Easy Way

In [0]:
class Flatten(nn.Module):
    def forward(self, x):
        return x.view(x.size()[0], -1)

model = nn.Sequential(
    nn.Conv2d(1, 32, kernel_size = 5), 
    nn.ReLU(),
    nn.Conv2d(32, 32, kernel_size = 5), 
    nn.MaxPool2d(2),
    nn.ReLU(),
    nn.Conv2d(32, 64, kernel_size = 5), 
    nn.MaxPool2d(2),
    nn.ReLU(),
    Flatten(),
    nn.Linear(3*3*64, 256),
    nn.ReLU(),
    nn.Linear(256, 10))

model = model.to(device)

model

# Training

In [0]:
def compute_loss_validation(model):
  validation_loader =  torch.utils.data.DataLoader(validation_set, batch_size=batch_size, shuffle=True, num_workers=2)
  
  tot_loss = 0
  for i, batch in enumerate(validation_loader):
    batch = [item.to(device) for item in batch]
    images, labels = batch
    xs = images
    ys = labels
    
    pred_ys = model(xs) # generate the predictions using the model
    loss = criterium(pred_ys, ys) # evaluate the predictions using the cross entropy loss
    tot_loss += loss.item() # get the number and sum it to the total loss
    
    if (i+1) % 200 == 0:
      break
    
  loss = tot_loss / 200 # normalize the loss based on the number of testing examples
  return loss

num_epochs = 10

optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.99) #

criterium = nn.CrossEntropyLoss() # loss function

batch_size = 50
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2)

model.train() # set the model to training mode

running_loss = 0
training_loss = []
validating_loss = []
for epoch in range(num_epochs):
  for i, batch in enumerate(train_loader):
    batch = [t.to(device) for t in batch]
    images, labels = batch
    xs = images
    ys = labels
    
    optimizer.zero_grad() # reset the gradients
    pred_ys = model(xs) # generate the predictions
    loss = criterium(pred_ys, ys) # compute the loss
    loss.backward() # backpropagation
    optimizer.step() # optmizes here
    
    running_loss += loss.item()
    if (i+1) % 200 == 0: # every 100 batches print statistics about the training
      running_loss /= 200 # training loss on the last batch
      training_loss.append(running_loss) # keep track of the training loss
      model.eval() # set the model to eval, we can now test the model 
      validation_loss = compute_loss_validation(model) # compute the validation loss
      validating_loss.append(validation_loss) # keep track of the validation loss
      model.train() # set back the model to train
      print('Epoch [%d/%d], Step [%d/%d], Train Loss: %.4f, Validation Loss: %.4f'%(epoch+1, num_epochs, i+1, len(train_set)//batch_size, running_loss, validation_loss))
      running_loss = 0

# Anaylize Losses

In [0]:
training_loss.pop()
validating_loss.insert(0, float('NaN'))

In [0]:
plt.plot(training_loss, label="train")
plt.plot(validating_loss, label="validation")
plt.legend(loc='upper right')
for i in range(num_epochs):
  plt.axvline(x=(len(train_set)//batch_size)*(i+1)/200,color='gray')
plt.show()

In [0]:
def accuracy(data_loader):
  correct = 0.0
  total = 0.0
  with torch.no_grad():
    for batch in tqdm_notebook(data_loader):
      batch = [item.to(device) for item in batch]
      images, labels = batch
      xs = images
      ys = labels

      pred_ys = model(xs).detach().cpu()
      _, pred_ys = torch.max(pred_ys, 1)
      correct += (pred_ys == ys.detach().cpu()).sum().numpy()
      total += labels.size(0)
      
  return correct/total

model.eval()

train_accuracy = accuracy(train_loader)
print('Train accuracy: {:.3f}'.format(train_accuracy))

validation_accuracy = accuracy(validation_loader)
print('Validation accuracy: {:.3f}'.format(validation_accuracy))