# Multi-Layer Perceptron, MNIST
---
In this notebook, we will train an MLP to classify images from the [MNIST database](http://yann.lecun.com/exdb/mnist/) hand-written digit database.

The process will be broken down into the following steps:
>1. Load and visualize the data
2. Define a neural network
3. Train the model
4. Evaluate the performance of our trained model on a test dataset!

Before we begin, we have to import the necessary libraries for working with data and PyTorch.

In [None]:
# import libraries
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms, models

---
## Load and Visualize the [Data](http://pytorch.org/docs/stable/torchvision/datasets.html)

Downloading may take a few moments, and you should see your progress as the data is loading. You may also choose to change the `batch_size` if you want to load more data at a time.

This cell will create DataLoaders for each of our datasets.

In [None]:
from torch.utils.data import DataLoader

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20

# define transform Object
transform = transform.ToTensor() # convert data to torch.FloatTensor

# download pytorch MNIST training and test set
train_data = datasets.MNIST(root='data/', train=True, # using MNIST/processed/training.pt
                           download=True, transform=transform)
test_data = datasets.MNIST(root='data/', train=False, # using MNIST/processed/test.pt
                          download=True, transform=transform)

# prepare data loader
train_loader = None
test_ loader = None

# specify the image classes
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']

### Visualize a Batch of Training Data

The first step in a classification task is to take a look at the data, make sure it is loaded in correctly, then make any initial observations about patterns in that data.

In [None]:
import matplotlib.pyplot as pt
%matplotlib inline

dataiter = iter(train_loader) # get iterator for train_loader object
images, labels = next(dataiter)

print(images.shape)

# plot the figure of MNIST images in a batch, along with corresponding labels
fig = plt.figure(figsize=(10,8))

for idx, image in enumerate(images):
    # make figure of 4x5
    ax = fig.add_subplot(4, 5, idx+1, xticks=[], yticks=[])
    ax.imshow(image.squeeze())
    ax.set_title(str(labels[idx].item()))

In [None]:
img = images.numpy().squeeze()[12] # pick any from 0-19
width, height = img.shape

fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(1, 1, 1)
ax.imshow(img)

thresh_color = img.max()/2
for i in range(width):
    for j in range(height):
        val = round(img[i][j], 2) if (img[i][j] < 0.01) else 0
        ax.annotate(str(val), xy=(j,i),
                    horizontalalignment='center',
                    verticalalignment='center',
                    color='white' if img[i][j]<thresh else 'black')

---
## Define the Network [Architecture](http://pytorch.org/docs/stable/nn.html)

The architecture will be responsible for seeing as input a 784-dim Tensor of pixel values for each image, and producing a Tensor of length 10 (our number of classes) that indicates the class scores for an input image. This particular example uses two hidden layers and dropout to avoid overfitting.

In [None]:
class MLPNet(nn.Module):
    def __init__(self, input_size, hidden_layers_sizes, num_classes=10, drop_p=0.5):
        """Initialise a network of fully connected/MLP layers with specification
        Arguments:
            ______________________
            - input_size: (int or tuple) size/dimension of input matrix per img.
            - hidden_layers_sizes: (list) channel lengths of linear layers.
            - num_classes: (int) number of classes to classify.
            - drop_p: (float; 0<=p<=1) dropout layer probability.
        """
        super(MLPNet, self).__init__()
        # Determine initial input size
        if isinstance(input_size, int):
            self.input_size = int(input_size)
        elif isinstance(input_size, (list,tuple)):
            self.input_size = np.prod(input_size)
        else:
            raise Exception(f"Input size: {input_size} expect integer or tuple to specify dimension.")
        
        #TODO: add hidden layers
    
    def forward(self, X):
        """ Feed forward raw input to models
        Arguments
            ______________________
            input: Vectors or Matrices of input data
        """
        # flatten input if not done so.
        
        # forward flow
        
        return None

model = MLPNet(img.shape, [48, 32, 20, 16], num_classes=10, drop_p=0.0)

###  Specify [Loss Function](http://pytorch.org/docs/stable/nn.html#loss-functions) and [Optimizer](http://pytorch.org/docs/stable/optim.html)

It's recommended that you use cross-entropy loss for classification. If you look at the documentation (linked above), you can see that PyTorch's cross entropy function applies a softmax funtion to the output layer *and* then calculates the log loss.

In [None]:
# specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()

# specify optimizer (stochastic gradient descent) and learning rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

### Train the Network
The steps for training/learning from a batch of data are described in the comments below:

1. Clear the gradients of all optimized variables
2. Forward pass: compute predicted outputs by passing inputs to the model
3. Calculate the loss
4. Backward pass: compute gradient of the loss with respect to model parameters
5. Perform a single optimization step (parameter update)
6. Update average training loss

The following loop trains for 50 epochs; take a look at how the values for the training loss decrease over time. We want it to decrease while also avoiding overfitting the training data.

In [None]:
# number of epochs to train the model
n_epochs = 1

model.train()

for epoch in range(n_epochs):
    train_loss = 0.0
    
    ###################
    # train the model #
    ###################
    for data, target in train_loader:
        ###
        train_loss += loss.item() * data.size(0)
    
    # print train_loss



###  Defining [Network Architecture](http://pytorch.org/docs/stable/nn.html)
We will be using an Convolutional Neural Network. We will use the following architecture:
* [Convolutional layers](https://pytorch.org/docs/stable/nn.html#conv2d), which can be thought of as stack of filtered images.
* [Maxpooling layers](https://pytorch.org/docs/stable/nn.html#maxpool2d), which reduce the size of the input matrix, which only keeps the most active pixels from prev layer.
* The usual Linear + Dropout layers to avoid overfitting and produce a 10-dim output.
A network with 2 convolutional layers is shown in the image below and in the code, and you've been given starter code with one convolutional and one maxpooling layer.

### Test the Trained Network
Finally, we test our best model on previously unseen test data and evaluate it's performance. Testing on unseen data is a good way to check that our model generalizes well. It may also be useful to be granular in this analysis and take a look at how this model performs on each class as well as looking at its overall loss and accuracy.

### Visualize Sample Test Results
This cell displays test images and their labels in this format: predicted (ground-truth). The text will be green for accurately classified examples and red for incorrect predictions.