# DATA20001 Deep Learning 2018 - Exercise 2

**Due Wednesday November 21, before 23:59**


## Exercise 2.1. MNIST classifier with MLP in numpy

For this exercise, we will implement a classifier for the MNIST handwritten digit dataset. This dataset consists of grayscale images of handwritten numbers of size 28x28 pixels (with values between 0 and 255) and labels between 0 and 9. You can use the following code to download the dataset and load it into numpy arrays.

In [None]:
import os
import io
import zlib
import struct as st
import numpy as np
import requests
from tqdm import tqdm

train_data_url = 'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz'
train_labels_url = 'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz'
test_data_url = 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz'
test_labels_url = 'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'

bsize = 1048576

def download(url):
    print(f'Downloading {url}...')
    r = requests.get(url, stream=True)
    size = int(r.headers['content-length']) // bsize
    # Download in-memory
    buff = io.BytesIO()
    pbar = tqdm(
        r.iter_content(bsize),
        total = size,
        unit = 'MiB',
        unit_scale = True,
        unit_divisor = 1024,
        bar_format = '{l_bar}{bar}| [{elapsed}/{remaining}, {rate_fmt}]'
    )
    dec = zlib.decompressobj(32 + zlib.MAX_WBITS)
    for chunk in pbar:
        data = dec.decompress(chunk)
        buff.write(data)
    buff.seek(0)
    return buff

def parse_idx(url, idx3=False):
    buff = download(url)
    print('Parsing binary data...')
    magic = st.unpack('>4B', buff.read(4))
    n = st.unpack('>I', buff.read(4))[0]
    if idx3:
        rows = st.unpack('>I', buff.read(4))[0]
        cols = st.unpack('>I', buff.read(4))[0]
        total = n * rows * cols
        shape = (n, rows, cols)
    else:
        total = n
        shape = n
    arr = np.asarray(
        st.unpack(
            f'>{total}B',
            buff.read(total)
        )
    ).reshape(shape)
    buff.close()
    if idx3:
        arr = 255 - arr
    return arr

train_data = parse_idx(train_data_url, idx3=True)
train_labels = parse_idx(train_labels_url)
test_data = parse_idx(test_data_url, idx3=True)
test_labels = parse_idx(test_labels_url)


The classifier should be an MLP with the following architecture:
* 1 hidden layer with 50 neurons and ReLU activation.
* 1 output layer with 10 neurons and Softmax activation.

The ReLU function returns the positive part of an input, or 0 if it is negative:
$$ReLU(x) = \begin{cases} 0 & \text{if } x < 0 \\ x & x \geq 0 \end{cases}$$

Its derivative is given by the Heaviside step function:

$$H(x) = \begin{cases} 0 & \text{if } x < 0 \\ 1 & x \geq 0 \end{cases}$$

The Softmax function converts an arbitrary set of K numbers into a probability distribution, by normalizing them between 0 and 1 and making them add up to 1:

$$\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}}$$

It is a generalization of the logistic function we saw previously, and so its derivative is similar:

$$ \frac{\partial \sigma(\mathbf{z})_j}{\partial  z_i} = \begin{cases} \sigma(\mathbf{z})_i(1 - \sigma(\mathbf{z})_i) & i = j \\ -\sigma(\mathbf{z})_i \sigma(\mathbf{z})_j & i \neq j \end{cases}$$

The output of the network will be the probability that the input corresponds to each of the 10 classes. The true label will be given by the probability distribution where the probability of the true class is 1 and the probability of all other classes is 0, i.e. the 1-hot vector of the class index.

Therefore, we will use cross-entropy as the loss function, which gives a measure of how close two probability distributions p and q are:

$$H(p, q) = -\sum_x p(x)\, \log q(x)$$

Since our true distribution always has a single 1 for the true class t and the rest of the values are 0, the expression for cross-entropy loss for the predicted probability is simplified to:

$$L(\hat{p}_t) = - \log \hat{p}_t$$

The derivative for this expression is thus:

$$ \frac{d L(\hat{p}_t)}{d p_t} = -p_t^{-1}$$

The value of L is 0 for all other classes that are not t, so we don't backpropagate the gradients for those.

You can add more layers and/or change the number of neurons in your hidden layer if you wish. You should train with the training set only, and you can use the test set to verify whether your network is learning properly or not.

## Exercise 2.2. MLP with pytorch

The task is to create an MLP to classify images using pytorch (see e.g. the tutorial from lecture 4).  Here we'll use the FashionMNIST dataset that contains images of ten different classes of clothing.

Below are some commands to get you started.

In [None]:
%matplotlib inline
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np

# Let's load the dataset, fortunately FashionMNIST is also available directly in torchvision
batch_size = 32
train_dataset = datasets.FashionMNIST('./data', train=True, download=True, transform=transforms.ToTensor())
validation_dataset = datasets.FashionMNIST('./data', train=False, transform=transforms.ToTensor())

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
validation_loader = torch.utils.data.DataLoader(dataset=validation_dataset, batch_size=batch_size, shuffle=False)

There are 10 classes as in the regular MNIST, but each class represent an item of clothing, below we defined a map from the class index to the textual label.  This is just for humans to look at :-)

In [None]:
num_classes = 10

labels = {
  0: 'T-shirt/top',
  1: 'Trouser',
  2: 'Pullover',
  3: 'Dress',
  4: 'Coat',
  5: 'Sandal',
  6: 'Shirt',
  7: 'Sneaker',
  8: 'Bag',
  9: 'Ankle boot'
}

Let's take a look at the first ten images in the training set just to get an idea of the data.

In [None]:
for (X_train, y_train) in train_loader:
    print('X_train:', X_train.size(), 'type:', X_train.type())
    print('y_train:', y_train.size(), 'type:', y_train.type())
    break

pltsize=1
plt.figure(figsize=(10*pltsize, pltsize))

for i in range(10):
    plt.subplot(1,10,i+1)
    plt.axis('off')
    plt.imshow(X_train[i,:,:,:].numpy().reshape(28,28), cmap="gray_r")
    plt.title(labels[y_train[i].item()])

Now, <span style="background-color: yellow">your task is to train an MLP model, i.e., a neural network with several fully-connected layers to classify images into the ten classes</span>.  You should train on the training set loaded above, and you should use the validation set to calculate the accuracy of the model (i.e., the percentage of correctly classified images of the validation set).

You can use many layers, and any non-linearities you wish.  You should get an accuracy at least above 85%.

<span style="background-color: yellow">Please also plot the training loss versus the validation loss across the epochs (i.e., epoch number on the x-axis, and the two loss curves on the y-axis). Also discuss the difference in the two curves.</span>.