# MNIST classifier with PyTorch

The MNIST dataset (http://yann.lecun.com/exdb/mnist/) contains 60k labeled images of handwritten digits.
Using the PyTorch framework, we'll write a classifier to classify the given images into their digit class.

We'll proceed as follows:
- Create the data pipeline
- Create model
- Train model
- Test model
- Put everything together

The dataset is located in `./data` and is contained in 4 binary files `train-images`, `train-labels`, `test-images`, `test-labels`. As the data is in binary format it has to be read and decoded. We'll use the following `read_image_file` and `read_label_file` helper functions to do so easily:

In [None]:
import torch
import numpy as np
from PIL import Image
import codecs
from torch.utils.data import Dataset
from torchvision import transforms
from torch.utils.data import DataLoader

########## HELPER FUNCTIONS ##########
# Source: [https://pytorch.org/docs/stable/_modules/torchvision/datasets/mnist.html]
######################################

def read_label_file(path):
    with open(path, 'rb') as f:
        x = read_sn3_pascalvincent_tensor(f, strict=False)
    assert(x.dtype == torch.uint8)
    assert(x.ndimension() == 1)
    return x.long()


def read_image_file(path):
    with open(path, 'rb') as f:
        x = read_sn3_pascalvincent_tensor(f, strict=False)
    assert(x.dtype == torch.uint8)
    assert(x.ndimension() == 3)
    return x

def get_int(b):
  return int(codecs.encode(b, 'hex'), 16)

def open_maybe_compressed_file(path):
  """Return a file object that possibly decompresses 'path' on the fly.
      Decompression occurs when argument `path` is a string and ends with '.gz' or '.xz'.
  """
  if not isinstance(path, torch._six.string_classes):
      return path
  if path.endswith('.gz'):
      import gzip
      return gzip.open(path, 'rb')
  if path.endswith('.xz'):
      import lzma
      return lzma.open(path, 'rb')
  return open(path, 'rb')

def read_sn3_pascalvincent_tensor(path, strict=True):
    """Read a SN3 file in "Pascal Vincent" format (Lush file 'libidx/idx-io.lsh').
       Argument may be a filename, compressed filename, or file object.
    """
    # typemap
    if not hasattr(read_sn3_pascalvincent_tensor, 'typemap'):
        read_sn3_pascalvincent_tensor.typemap = {
            8: (torch.uint8, np.uint8, np.uint8),
            9: (torch.int8, np.int8, np.int8),
            11: (torch.int16, np.dtype('>i2'), 'i2'),
            12: (torch.int32, np.dtype('>i4'), 'i4'),
            13: (torch.float32, np.dtype('>f4'), 'f4'),
            14: (torch.float64, np.dtype('>f8'), 'f8')}
    # read
    with open_maybe_compressed_file(path) as f:
        data = f.read()
    # parse
    magic = get_int(data[0:4])
    nd = magic % 256
    ty = magic // 256
    assert nd >= 1 and nd <= 3
    assert ty >= 8 and ty <= 14
    m = read_sn3_pascalvincent_tensor.typemap[ty]
    s = [get_int(data[4 * (i + 1): 4 * (i + 2)]) for i in range(nd)]
    parsed = np.frombuffer(data, dtype=m[1], offset=(4 * (nd + 1)))
    assert parsed.shape[0] == np.prod(s) or not strict
    return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)

## 1. Creating the data pipeline
We first start by creating a data pipeline for our dataset. This will allow us to efficiently feed data to our model as it trains. To do so, we inherit from the `Dataset` class and override the `__len__(self)` and `__getitem(self, index)__` methods. We initialise the class with 
- the path to the data
- whether to load the train or test data
- a list of transforms for input (optional)
- a list of transforms for labels (optional)

The `__init__` method should load the right dataset and the `__getitem__` method should apply transforms if necessary.

In [None]:
class MNISTDataset(Dataset):
  def __init__(self, path, test=False, transform=None, label_transform=None):
    # Load train data if test=False, otherwise load test data

    self.transform = transform
    self.label_transform = label_transform

  def __len__(self):
    pass # Code here

  def __getitem__(self, idx):
    sample = self.x[idx]
    label = self.y[idx]

    # Transform image into PIL image
    sample = Image.fromarray(sample.numpy(), mode='L')

    # Code here, apply transforms if necessary

    return sample, label

dataset = MNISTDataset('./data')
print(dataset[0])

Our dataset class will now read the files and output pairs of PIL (Python Image Library) images and the corresponding label class (i.e. the image is a 4).

We can now create our dataloaders for both the train and test set. Note that since our Dataset outputs PIL images, those have to be converted to a PyTorch tensor using the `transforms.ToTensor()` transform. We define the `BATCH_SIZE` hyperparameter to define how big our batches are for gradient descent.

In [None]:
BATCH_SIZE = 32
DATA = './data'

train_data = # Code here
trainloader = # Code here

test_data = # Code here
testloader = # Code here

## 2. Creating the model

Now that our data pipeline is up and running we can create our model. We take as input a 28x28 image which we flatten out to a 28x28=784 vector. As such our input layer will be 784 wide, we'll then use 1 64 hidden layer and a final 10 output layer. The output layer is 10 wide since we output one hot encoded vectors of class. Having a single output neuron ranging from 0 to 9 would be very difficult to train. In our case, if the second neuron is lit up, we classify it as a 1 (we start from 0).

We will use the ReLU activation function for all layers and softmax for the output layer in order to get a probability distribution.

Our model can now be created by inheriting from the nn.Module class and defining the `forward(self, x)` method:

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()

    # Code here, create model design & layers

  def forward(self, x):
    pass # Code here, feed forward x through model

This is a tiny model and far from the most optimal one. It is however quite simple and does the job. Feel free to play around with different designs!

## 3. Train model

Now that we have our data pipeline and our model we have to train it. We'll define the `train` function taking the following arguments:
- epoch: current epoch number
- model: model we're training
- dataloader: dataloader to get the data from
- optimizer: optimizer to run on the model
- loss_fn: loss function to compute loss

Using the dataloader, we iterate over batches, feed them in, compute the loss and backpropagate the gradients.
Note that for every batch `optimizer.zero_grad()` should be called to reset the gradients otherwise they would accumulate indefinitely.

The input data will be of the form `[batch_size, n_channels, height, width]`, so we'll get a tensor along the lines of `[32, 1, 28, 28]` (only 1 channel since the image is grayscale). However, our model expects a 728 flat input so we have to reshape the data into a tensor of dimension `[batch_size, 28 * 28]`.

In [None]:
LOG_INTERVAL=100

def train(epoch, model, dataloader, optimizer, loss_fn):
  for i, batch in enumerate(dataloader):
    
    # Code here

    # Log progress
    if i % LOG_INTERVAL == 0:
      print('Epoch {}: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
          epoch, i * len(inputs), len(dataloader.dataset),
          100. * i / len(dataloader), loss.item()))

## 4. Test model
The dataset gives 10k images to test our model on. Much like the `train` function, we'll create a `test` function to test our model on unseen data. We can run the code under the `with torch.no_grad():` directive to tell PyTorch we won't be using gradients so it shouldn't waste time computing them.
We wish to track the average loss and the percentage of correct predictions.

In [None]:
def test(model, dataloader, loss_fn):
  loss = 0
  correct = 0
  with torch.no_grad():
    for x, y in dataloader:
      # Code here

    # Log progress
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)'.format(
      loss / len(dataloader.dataset), correct, len(dataloader.dataset),
      100. * correct / len(dataloader.dataset)))

## 5. Putting everything together
We now have everything in place to train our model. We define two new hyperparameters to control how many epochs we want to run for and the learning rate. Our loss function will be `nn.CrossEntropyLoss()` and our optimizer will be `optim.SGD(..)`. Note that there are many other (possibly better) options so have a look around!

In [None]:
import torch.optim as optim

N_EPOCHS = 10
LEARNING_RATE = 0.1

# Code here

# Save model
torch.save(model.state_dict(), './model.pt')

And that's it! We have now trained a simple model to recognize handwritten digits using PyTorch.
Although this is a very simple example, it shows how to go about using PyTorch from start to end and those principles will be applied for more or less any PyTorch project.

## Extensions

Recommended extensions:
- Playing around with model and hyperparameters
- Create a KMNIST classifier [ https://github.com/rois-codh/kmnist ]
- Create a Fashion MNIST classifier [ https://github.com/zalandoresearch/fashion-mnist ]
- Look into CNNs for image classification

_William Profit (williamprofit.com) on behalf of ICDSS (icdss.uk)_