# Lab 3 - Pytorch and Deep Networks

Objectives:
1. *Understand how to use Pytorch to write a Neural Network*
2. *Write a neural network with multiple hidden layers*

In the previous labs we saw how to write a Neural Network with one hidden layer to solve various problems. The main issue with writing a network from scratch is that it becomes quite tedious to work with multiple layers, especially when, in the training phase, we want to tweak our network. As an example, if we wanted to change the activation functions, the error functions or the optimiser (for instance using something different than classic gradient descent) that would imply rewriting the vast majority of the code.

Luckily there are several libraries in Python that help us automate the process of writing a network in a modular way. The one we are going to explore in this module is Pytorch.

In [1]:
import torch
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

Pytorch uses a specific data structure to handle data, called a "tensor". In the context of deep learning, a tensor is the generalisation of an array or a matrix in an arbitrary number of dimensions (for instance, a vector and a matrix would be a 1-dimensional and a 2-dimensional tensors respectively). The dimensionality of a tensor coincides with the number of indexes used to refer to scalar values within the tensor.

<center><img src="https://i.imgur.com/owKJBd8.png" height="100px"></center>

A great feature of Pytorch is `Autograd`. PyTorch tensors keep track of the operations that involved them and they automatically provide the series of derivatives with respect to their inputs. This means you won’t need to write expressions for the backpropagation algorithm and compute derivatives by hand, as was done in previous labs. Pytorch, no matter how complicated your model is, will compute them automatically.

The dataset we are going to use is called MNIST and it contains a series of images of handwritten digits. The goal is to teach our network to properly classify the digits. 

Here is an example of the images that we are going to use:

<center><img src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png" height="250px"></center>

The database is organized as follows:
- Input is a 28 by 28 matrix (2-dimensional tensor), which corresponds to the pixels of the image
- The output is an integer between 0 and 9.

In [2]:
# Download the database
train = datasets.MNIST("", train=True, download=True, transform=transforms.Compose([transforms.ToTensor()]))
test = datasets.MNIST("", train=False, download=True, transform=transforms.Compose([transforms.ToTensor()]))

# Get data and store in variables "trainset" and "testset"
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to MNIST\raw\train-images-idx3-ubyte.gz


9913344it [00:01, 6822499.25it/s]                             


Extracting MNIST\raw\train-images-idx3-ubyte.gz to MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to MNIST\raw\train-labels-idx1-ubyte.gz


29696it [00:00, 29790493.08it/s]         

Extracting MNIST\raw\train-labels-idx1-ubyte.gz to MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to MNIST\raw\t10k-images-idx3-ubyte.gz



1649664it [00:00, 6406161.29it/s]                             


Extracting MNIST\raw\t10k-images-idx3-ubyte.gz to MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to MNIST\raw\t10k-labels-idx1-ubyte.gz


5120it [00:00, 5128931.57it/s]          

Extracting MNIST\raw\t10k-labels-idx1-ubyte.gz to MNIST\raw






Suppose that we want to use the following architecture in order to properly classify our dataset:

<center><img src="https://i.imgur.com/dotjyD1.png" height="250px"></center>

According to the picture above, we want a network that takes as an input a 784 (28*28 pixels) dimensional vector, have 3 hidden layers of 64 neurons each and an output layer of 10 units (the integers from 0 to 9). The activation functions should be sigmoid for the hidden layers and the softmax for the output layer. This is a really big network compared to the ones we used before and it would be quite complex to write it from scratch. Luckily, the code for a network such as this one is extremely simple to write in Pytorch.

In [5]:
# Create a class from nn.Module. This PyTorch module
# contains the building blocks to create neural networks.

class Net(nn.Module):
    def __init__(self):
        """Defines the neural network architecure."""
        
        super().__init__() # calls __init() of nn.Module superclass

        # There are various functions that can be used to create different layers
        #
        # nn.Linear is the basic structure of layer where an affine transformation
        # (preserves lines and parallelism) is applied to the input, i.e. w^T*x+b
        #
        # nn.Linear receives the input and output dimensions as arguments, and
        # intialises a certain number of weights (tensors) accordingly
        
        self.fc1 = nn.Linear(28*28, 64) # create 64-neuron layer with 784 inputs each
        self.fc2 = nn.Linear(64, 64)    # 64 inputs and 64 outputs
        self.fc3 = nn.Linear(64, 64)    # 64 inputs and 64 outputs
        self.fc4 = nn.Linear(64, 10)    # 64 inputs and 10 outputs
    
    def forward(self, x):
        """Computes the output of the network for an input x."""

        # torch.sigmoid computes sigmoid function of given input
        x = torch.sigmoid(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        x = torch.sigmoid(self.fc3(x))
        x = self.fc4(x)

        # F.sofmax computes softmax function of given input
        return F.softmax(x, dim=1)

net = Net() # create instance of neural network

In order to finish this network we still need to set up the optimiser and error function, and compute the gradient with respect to the loss, then train for a number of iterations.

In [6]:
# Set up optimisation method to use in the backpropagation,
# which in this case is stochastic gradient descent
optimizer = optim.SGD(net.parameters(), lr=0.001)

epochs = 3

for epoch in range(epochs):
    for data in trainset:
        X, y = data
        net.zero_grad()  # sets up gradient stored in each variable of the network to zero
        output = net.forward(X.view(-1,28*28))
        loss = F.nll_loss(output,y) # working with classifier, so using cross-entropy for loss func
        loss.backward()  # computes gradient wrt loss function over each network param
        optimizer.step() # updates network paramaters (according to optimisation algo) and grad stored in each variable

# If zero_grad() isn't used, then at each iteration autograd will
# keep adding the new value of the gradient to the previous one
#
# view() transforms 2D tensor input (28*28 matrix) to 1D (784 vector)

In [8]:
# Test if training was successful

correct = 0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net.forward(X.view(-1,28*28))
        for idx, i in enumerate(output):
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print(f'Accuracy: {round(correct/total, 3)}')

Accuracy: 0.114
