# Practical: Image classification

In this practical, we will continue working on the hand-written digit classification dataset, [MNIST](http://yann.lecun.com/exdb/mnist/).

![](mnist.png)

Instead of using a K nearest neighbour or support vector machine classifier from sklearn in the previous practical, we will use a convolutional neural network today.

The two mainstream neural network libraries are [TensorFlow](https://www.tensorflow.org/) by Google and [PyTorch](https://pytorch.org/) by Facebook. We will use PyTorch for this practical. Please install the package first, following the [instruction](https://pytorch.org/get-started/locally/). In most cases, you simply need to run the following command on your computer:

`pip3 install torch torchvision`

In [None]:
# Import libaries (provided)
import numpy as np
import pandas as pd
import os
import gzip
import struct
import matplotlib.pyplot as plt
import time
import random
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# If you use Macbook and get an error message when you use matplotlib, relevant to libomp.dylib initialisation, you may try uncommenting the following line.
# os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'

Note that the output of the `extract_images()` function is a 4D array. The [convolutational layer](https://pytorch.org/docs/stable/nn.html#conv2d) in Pytorch takes 4D arrays of shape $N \times C \times X \times Y$ as input.

In [None]:
# Functions for loading MNIST image data (provided)
def extract_images(f_name):
    """ Extract the images into a 4D uint8 numpy array [index, rows, cols, 1]. """
    print('Extracting', f_name)
    with gzip.open(f_name, 'rb') as f:
        # Read file header
        buffer = f.read(16)
        magic, num_images, rows, cols = struct.unpack(">IIII", buffer)
        if magic != 2051:
            raise ValueError('Invalid magic number {0} in MNIST image file {1}.'.format(magic, f_name))

        # Read data
        buffer = f.read(rows * cols * num_images)
        data = np.frombuffer(buffer, dtype=np.uint8)
        data = data.reshape(num_images, 1, rows, cols)
        return data

# Functions for loading MNIST label data (provided)
def extract_labels(f_name):
    """ Extract the labels into a 1D uint8 numpy vector [index,]. """
    print('Extracting', f_name)
    with gzip.open(f_name, 'rb') as f:
        # Read file header
        buffer = f.read(8)
        magic, num_items = struct.unpack(">II", buffer)
        if magic != 2049:
            raise ValueError('Invalid magic number {0} in MNIST label file {1}.'.format(magic, f_name))

        # Read data
        buffer = f.read(num_items)
        data = np.frombuffer(buffer, dtype=np.uint8)
        return data

## 1. Load and browse data.

The MNIST dataset is split into a training set (60,000 samples) and a test set (10,000 samples). In total, there are 4 files.

* `train-images-idx3-ubyte.gz`: training images
* `train-labels-idx1-ubyte.gz`: training labels
* `t10k-images-idx3-ubyte.gz`: test images
* `t10k-labels-idx1-ubyte.gz`: test labels

#### 1.1 Load data (provided).

In [None]:
# Training set
X_train = extract_images('../practical_07/train-images-idx3-ubyte.gz')
y_train = extract_labels('../practical_07/train-labels-idx1-ubyte.gz')

# Test set
X_test = extract_images('../practical_07/t10k-images-idx3-ubyte.gz')
y_test = extract_labels('../practical_07/t10k-labels-idx1-ubyte.gz')

#### 1.2 Print out the shapes of the four arrays.

## 2. Analyse data.

We are going to provide you with the framework of a convolutional neural network. However, some bits and pieces are missing, which need to be completed by you.

The network provided has an architecture like this, only with minor differences. For example, our input is of shape 28 x 28, instead of 32 x 32.

![](lenet.jpg)

#### 2.1 Build the network (provided).

You can read the code here and compare to the LeNet architecture shown above.

In [None]:
# LeNet (provided)
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        # Construct the layers. The layer names are corresponding to the annotations in the figure above.
        self.C1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=(5, 5), padding=2)
        self.S2 = nn.MaxPool2d(kernel_size=(2, 2), stride=2)
        self.C3 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=(5, 5))
        self.S4 = nn.MaxPool2d(kernel_size=(2, 2), stride=2)
        self.C5 = nn.Conv2d(in_channels=16, out_channels=120, kernel_size=(5, 5))
        self.F6 = nn.Linear(120, 84)
        self.F7 = nn.Linear(84, 10)

    def forward(self, x):
        # Forward propagation
        x = F.relu(self.C1(x))
        x = self.S2(x)
        x = F.relu(self.C3(x))
        x = self.S4(x)
        x = F.relu(self.C5(x))
        x = torch.flatten(x, 1)
        x = F.relu(self.F6(x))
        x = F.relu(self.F7(x))
        return x

#### 2.2 Train the neural network.

During each iteration of training, load a random batch of images and labels, feed them to the network to perform stochastic gradient descent.

Please fill in the missing code to make this working.

In [None]:
# Since most of you use laptops, we use CPU for training.
device = 'cpu'
# If some of you would like to try GPU. You can set the device to be CUDA.
# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Network
model = LeNet().to(device)

# Define the cross entropy loss function
# Fill in code
criterion = ...

# Optimiser and learning rate
lr = 1e-3
optimizer = optim.Adam(model.parameters(), lr)

# Number of iterations for training
num_iter = 1000
loss_curve = []

# Train model
start = time.time()
for it in range(num_iter):
    # Set the modules in training mode, which will have effects on certain modules, e.g. dropout or batchnorm.
    start_iter = time.time()
    model.train()

    # Get a random batch of images and labels from X_train and y_train
    train_batch_size = 32
    # image: batch_size x 1 x X x Y array
    # label: batch_size vector
    # Fill in code
    image, label = ...
    
    # Convert the batch of images and labels into PyTorch tensors on the device    
    image, label = torch.from_numpy(image).float(), torch.from_numpy(label).long()
    image, label = image.to(device), label.to(device)
    output = model(image)

    # The loss for the current batch
    # Fill in code
    loss = ...
        
    # Perform stochastic gradient descent
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print information
    loss_curve += [loss.item()]
    print('--- Iteration {0}: training loss = {1:.4f}, {2:.4f} s ---'.format(it + 1, loss.item(), time.time() - start_iter))
print('Training took {:.3f}s in total.'.format(time.time() - start))

#### 2.4 Plot the training loss curve, which is stored in the variable loss_curve.

#### 2.4 Apply the model onto the full test set (provided).

In [None]:
# Deploy model
start = time.time()
model.eval()
image = X_test
image = torch.from_numpy(image).float()
image = image.to(device)
output = model(image)
y_pred = output.argmax(dim=1, keepdim=True).numpy().flatten()
print('Testing took {:.3f}s in total.'.format(time.time() - start))

#### 2.5 Evaluate the classification accuracy on the test set.

#### 2.6 Is the accuracy satisfactory? What would you do to improve the accuracy?

#### Answer: