# Lab 4 - Pytorch and Convolutional Deep Networks

Objectives:
1. *Write a convolutional neural network for the MNIST database*

In this lab, we will use Pytorch to write Convolutional Neural Networks. We are still going to use the MNIST database and the code to load it, as in the previous lab.

<center><img src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png" height="250px"></center>

In [2]:
import torch
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [3]:
train = datasets.MNIST(
    "", train=True, download=True, transform=transforms.Compose([transforms.ToTensor()])
)
test = datasets.MNIST(
    "", train=False, download=True, transform=transforms.Compose([transforms.ToTensor()])
)

# Get data and store in variables "trainset" and "testset"
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=True)

Unlike the previous lab though, we are going to make use of convolutional layers, instead of linear ones to classify the inputs.

The network will look like this:

<center><img src="https://i.imgur.com/ubyPlIg.png" height="275px"></center>

where the the activation function for each layer is the rectified linear function (ReLu) $r(\cdot)$, the pooling layer $M_p$ takes the maximum value of a patch of the image and the output layer is made of softmax functions.

In [4]:
# Create a class from nn.Module. This PyTorch module
# contains the building blocks to create neural networks.


class Net(nn.Module):
    def __init__(self):
        """Defines the neural network architecure."""

        super().__init__()  # calls __init()__ of nn.Module superclass

        # nn.Conv2d is the convolutional layer we're interested in
        #
        # First two inputs are layer dimenensions (input image & convolution output)
        # Third input represents convolution kernel dimension (5x5 kernel here)
        # Last parameter sets the padding for the image

        self.conv1 = nn.Conv2d(1, 32, 5, padding=2)
        self.conv2 = nn.Conv2d(32, 64, 5, padding=2)

        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)

    def convs(self, x):
        """Applies the max function over a patch of the image (2x2 patch, in this case)."""
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))

        return x

    def forward(self, x):
        """Computes the output of the network for an input x."""
        x = self.convs(x)
        x = x.view(-1, 64 * 7 * 7)

        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        return F.softmax(x, dim=1)

*Why is the padding set to 2?*

Padding, the act of filling the image with a series of 0 on the edges, is done in order to avoid cropping effects due to kernels. It is set to as a large a value it can be without exceeding half the kernel size (which is 5x5), i.e. 2.


*As you can see, the linear layers are set up in such a way to take the “flattened” output of our convolutional layer, as their input is a vector. Can you explain why the input is set to 64\*7\*7?*

Firstly, the output of the last non-linear layer is a series of 64 matrices. Then, we must recall that each image/input is a 28\*28 matrix. Each of these matrices goes through two pooling layers that have `kernel_size` and `stride` of 2, meaning the height and width will be halved twice, meaning that each matrix is now 7\*7. Therefore the input to the linear layers is 64\*7\*7. 

In order to finish this network we still need to set up the optimiser and error function, and compute the gradient with respect to the loss, then train for a number of iterations.

In [5]:
net = Net()  # create instance of neural network

optimizer = optim.Adam(net.parameters(), lr=0.001)
# Using Adam, as this is the standard for neural networks at the moment

epochs = 3

for epoch in range(epochs):
    for data in trainset:
        X, y = data
        net.zero_grad()  # sets up gradient stored in each variable of the network to zero
        output = net.forward(X)
        loss = F.nll_loss(
            output, y
        )  # working with classifier, so using cross-entropy for loss func
        loss.backward()  # computes gradient wrt loss function over each network param
        optimizer.step()  # updates network paramaters (according to optimisation algo) and grad stored in each variable

    print(f"Epoch {epoch}: {loss}")

correct = 0
total = 0

with torch.no_grad():
    # no_grad means network won't update gradient stored in each
    # variable during test session as there is no loss to be minimised

    for data in testset:
        X, y = data
        output = net.forward(X)
        for idx, i in enumerate(output):
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct / total, 3))

Epoch 0: -0.997342586517334
Epoch 1: -1.0
Epoch 2: -1.0
Accuracy:  0.875


*You’ll notice that this code takes significantly more time to run than the previous one. Do you know why that’s the case?*

Convolution steps require many more primitive operations than those for linear layers, e.g. multiplication, summation, copy/access. In a linear layer, weights and inputs are merely multiplied together, whereas with convolution, kernels are multiplied as many times as there are outputs.

In [6]:
# Using the average function instead of the max function in the pooling layer

# Create a class from nn.Module. This PyTorch module
# contains the building blocks to create neural networks.


class Net(nn.Module):
    def __init__(self):
        """Defines the neural network architecure."""

        super().__init__()  # calls __init()__ of nn.Module superclass

        # nn.Conv2d is the convolutional layer we're interested in
        #
        # First two inputs are layer dimenensions (input image & convolution output)
        # Third input represents convolution kernel dimension (5x5 kernel here)
        # Last parameter sets the padding for the image

        self.conv1 = nn.Conv2d(1, 32, 5, padding=2)
        self.conv2 = nn.Conv2d(32, 64, 5, padding=2)

        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)

    def convs(self, x):
        """Applies the average function over a patch of the image (2x2 patch, in this case)."""
        x = F.avg_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.avg_pool2d(F.relu(self.conv2(x)), (2, 2))

        return x

    def forward(self, x):
        """Computes the output of the network for an input x."""
        x = self.convs(x)
        x = x.view(-1, 64 * 7 * 7)

        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        return F.softmax(x, dim=1)


net = Net()  # create instance of neural network

optimizer = optim.Adam(net.parameters(), lr=0.001)
# Using Adam, as this is the standard for neural networks at the moment

epochs = 3

for epoch in range(epochs):
    for data in trainset:
        X, y = data
        net.zero_grad()  # sets up gradient stored in each variable of the network to zero
        output = net.forward(X)
        loss = F.nll_loss(
            output, y
        )  # working with classifier, so using cross-entropy for loss func
        loss.backward()  # computes gradient wrt loss function over each network param
        optimizer.step()  # updates network paramaters (according to optimisation algo) and grad stored in each variable

    print(f"Epoch {epoch}: {loss}")

correct = 0
total = 0

with torch.no_grad():
    # no_grad means network won't update gradient stored in each
    # variable during test session as there is no loss to be minimised

    for data in testset:
        X, y = data
        output = net.forward(X)
        for idx, i in enumerate(output):
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("accuracy: ", round(correct / total, 3))

Epoch 0: -0.9991691708564758
Epoch 1: -0.800000011920929
Epoch 2: -0.9100052118301392
accuracy:  0.894


*Does the performance of the network improve?*

The network performance is improved slightly, with accuracy increasing from 0.875 to 0.894. This could be due to max pooling selecting brighter pixels from the image, which may not be as useful when the background is light and areas of interest dark, as our the digits analysed here.

*Is it reasonable to expect to reach a 100% classification accuracy?*

Probably not, as at this point the model would likely be overfit, where it is evaluating on the noise rather than the actual useful information. This would also require a loss of 0.