# Deep Learning Tutorial

Created by Bitville 2019

After this tutorial, you will understand the philosophy behind the deep learning paradigm in artificial intelligence and be able to apply deep learning to solve real world problems. Because of the availability of good deep learning libraries, you should be able to complete this tutorial if you know the basics of computer programming.

To implement a deep learning pipeline, we need to solve the following subproblems:

1. Collecting <b>data</b> and choosing how to represent it.
2. Choosing how to <b>model</b> the data.
3. Choosing how to measure how good the model is with a <b>cost function</b>.
4. Choosing how to <b>optimize</b> the cost function given the model, data and cost function.
5. Choosing how to <b>validate</b> how good the final learned model is.

We will go through all of these steps, and show how they can be implemented in python by the aid of a few software libraries. After this you will get an instructed exercise to improve the pipeline to get a better performing model.

You can run the individual cells below by clicking the cell and the pressing "shift" + "enter".

### Importing the software tools:

We will use <b>pytorch</b> as our deep learning library. In addition, we will use <b>numpy</b> as our math library and <b>matplotlib</b> as our plotting library.

In [None]:
import torch # The pytorch deep learning library
import torchvision # Computer vision module
import torch.nn as nn # Neural network tools
import torch.nn.functional as F # More math functions
import torch.optim as optim # Optimizer
from torch.autograd import Variable # Variables for automatic differentiation
from torchvision import datasets, transforms # Helper modules for loading datasets
import numpy as np # Math library
import matplotlib.pyplot as plt # Plotting library

## Constructing the data set

We are going to use the MNIST data set, the most studied data set in deep learning. The MNIST data set consists of handwritten digits, and the problem is to create a model that is able to automatically <b>classify</b> new digits into 10 discrete classes.

In [None]:
# The loading of data only needs to be understood at a conceptual level

# trainset is an object that represents the training set
trainset = torchvision.datasets.MNIST(root='./data', train=True, 
                                      download=True, transform=transforms.ToTensor())

testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transforms.ToTensor())

# train_iterator is an object that automatically loads training data for us
train_iterator = iter(torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True))


# test_iterator is an object that automatically loads training data for us
test_iterator = iter(torch.utils.data.DataLoader(testset, batch_size=1000, shuffle=True))

In [None]:
# We can now get a batch of data by calling the next() method of the iterator
images, labels = train_iterator.next()

print("images tensor shape: {}".format(images.size()))
print("labels tensor shape: {}\n".format(labels.size()))
print("A data point with the label {} looks like this:".format(labels[0]))
plt.imshow(images[0,0,:,:])
plt.show()

The <i>images</i> and <i>labels</i> variables are pytorch tensors. Tensor objects can be thought of as datastructures that represent multidimensional arrays. Following this logic, a scalar is a rank 0 tensor, a vector is a rank 1 tensor, a matrix is rank 2 and so on. The <i>images</i> tensor has rank 4, the two inner dimensions (axis 2 and 3) denote the horizontal and vertical pixels, axis 1 denotes the color channels and axis 0 denotes the data samples. A collection of data samples is called a batch.

### Preprocessing the data

The best way to represent our data is usually determined by our model. In deep learning, we usually want to preprocess the data as little as possible and offload this job to the model itself. Representing the data in an appropriate way can however make the job for the model much easier. A very common preprocessing step is to normalize the input values, but this time the values are already in a suitable range. In this case, our only needed task is to <b>flatten</b> the data to a rank 2 tensor. We will do this step when we feed the data in the neural network.

# Creating the model

The purpose of a model is to encode our assumptions about the structural form of our data. To make learning tractable, we need to constrain the the set of interpretations about the data that our model can make. More formally, under-constraining our model leads to high amounts of variance in the model’s interpretation of the data. Assumptions in our model introduce invariance (i.e. bias), and can be very effective at lowering variance. In the deep learning framework, assumptions about structural forms of data are referred to as inductive biases, because they induce bias to fight variance.

We are going to use a simple kind of a neural network as our model, namely a multilayer perceptron (MLP). An MLP makes quite weak assumptions about the data, and as such does not scale very well to more difficult problems. One can think of an MLP as a sequence of simple mathematical functions that map an input to a class label. The MLP is parametrised by variables (called weights) that determine how an input will be mapped to an output.

In [None]:
# The class that defines the neural network must be in certain format in pytorch
# The class Mlp inherits from nn.Module that has important properties

class Mlp(nn.Module):
    def __init__(self):                 
        super(Mlp, self).__init__()
        print('constructing model for a MLP network...')
        
        # The following lines define the fully connected network layers
        self.fc1 = nn.Linear(784,15)
        self.fc2 = nn.Linear(15,10)
    
    
    # The following part describes the forward pass and thus the structure of the neural net
    
    def forward(self, x):
        x = x.view(-1, 28*28)           #Reshape input tensor.
        x = F.relu(self.fc1(x))         #Fully connected hidden layer.
        x = self.fc2(x)                 #10 fully connected output neurons for our digits. 
        return x

In [None]:
criterion = nn.CrossEntropyLoss()                   # define what cost function to use

In [None]:
iterations = 1000  # how many iterations in our training loop

train_iterator = iter(torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True))  # reset the train data iterator
test_iterator = iter(torch.utils.data.DataLoader(testset, batch_size=500, shuffle=True))   # reset the test data iterator
net = Mlp()  # create instance of the Mlp class
optimizer = optim.SGD(net.parameters(), lr=0.01) # define what optimizer to use

# let us gather the loss and accuracy values starting from empty lists
losses = []
accuracies = []


# the training loop starts here
for i in range(iterations):
    images, labels = train_iterator.next()  # get a batch of data from the iterator
    optimizer.zero_grad()  # in pytorch, we need to zero the gradients in each iteration
    output = net.forward(images)  # output from our simple neural net
    loss = criterion(output, labels) # calculating the loss
    loss.backward() # calculating the backpropagation
    optimizer.step() # updating the parameter values

    losses.append(loss) # collect loss values for plotting

    if i % 100 == 0: # print loss values every 100th iteration on the screen
        
        
        images_test, labels_test = test_iterator.next()
        output_test = net.forward(images_test)
        y_pred = torch.argmax(output_test, dim=1)
        accuracy = torch.sum(y_pred==labels_test).detach().numpy()/500
        accuracies.append(accuracy)
        print('iteration {}, loss: {}, accuracy: {}'.format(i,loss, accuracy))


# let us print the final accuracy after the trainign loop is finished
images_test, labels_test = test_iterator.next()
output_test = net.forward(images_test)
y_pred = torch.argmax(output_test, dim=1)
accuracy = torch.sum(y_pred==labels_test).detach().numpy()/500
print('Test accuracy now is {}'.format(accuracy))



In [None]:
#For proper evaluation, let us feed in the entire test set (10000 samples) and measure the accuracy
test_iterator = iter(torch.utils.data.DataLoader(testset, batch_size=10000, shuffle=True))   # reset the test data iterator

images_test, labels_test = test_iterator.next()
output_test = net.forward(images_test)
y_pred = torch.argmax(output_test, dim=1)
accuracy = torch.sum(y_pred==labels_test).detach().numpy()/10000
print('FINAL TEST ACCURACY IS: {}'.format(accuracy))

In [None]:
# plot the loss values during the iterations
plt.plot(losses)
plt.show()

In [None]:
# plotting the test set accuracy during the iterations
plt.plot(accuracies)
plt.show()

## Your task

You probably reached a bit better accuracy than 80% with these hyperparameters. The important hyperparameters in our case are the dimensions of the neural network, i.e how many layers and how many neurons in each layer, and the learning rate. 

Try to change the network dimensions by adding another layer and experimenting with different amount of neurons in each layer.

Experiment also with the learning rate value. Observe how the learning rate affects the learning curve, i.e how fast the loss decreases and accuracy increases.

There are also more advanced optimizers in PyTorch that you can experiment.