# Exercise 2
In this exercise you will implement a simple network with pytorch. The network will have the same architecture as the one in exercise 1:
input - fully connected layer - ReLU - fully connected layer - softmax
We will test the network with real data and you will tune in the hyperparameters of the network to achieve high accuracies.

# Get MNIST data
Download the MNIST dataset and set the parameters 'location_images' and 'location_labels' to the location of your downloaded files. 
(You can check out this website http://rasbt.github.io/mlxtend/user_guide/data/loadlocal_mnist/ for help. We recomend downloading the 10k files)

In [1]:
# Change to the path on your device
location_images = 't10k-images-idx3-ubyte'
location_labels = 't10k-labels-idx1-ubyte'

If the location is correct we can import the data.
Note: Make sure mlxtend is installed on your device!
(If you are using anaconda you can find installation help here https://anaconda.org/conda-forge/mlxtend)

In [2]:
from mlxtend.data import loadlocal_mnist

X, y = loadlocal_mnist(images_path=location_images, labels_path=location_labels)

print('Dimension of the data X: %s x %s' % (X.shape[0], X.shape[1]))

Dimension of the data X: 10000 x 784


In [3]:
# This code initializes a mini batch (X_batch and y_batch) of your downloaded data
# as well as the weights of the network. 

import torch
import numpy as np

def get_data_batch(X, y, batch_size, seed):
    np.random.seed(seed)
    num_train = X.shape[0]
    rand_indices = np.random.choice(np.arange(num_train), size=batch_size, replace=True)
    X_batch = torch.FloatTensor(X[rand_indices,:]).type(torch.FloatTensor)
    y_batch = torch.LongTensor(y[rand_indices]).type(torch.LongTensor)
    return X_batch, y_batch

torch.manual_seed(0)

batch_size = 1000
input_dim = X.shape[1]
hidden_dim = 10
num_classes = 10
X_batch, y_batch = get_data_batch(X, y, batch_size, 0)

print(X_batch.size())

# Pay attention to the requires_grad=True statements these will be usefull later!
w1 = torch.randn(input_dim, hidden_dim, requires_grad=True)
w2 = torch.randn(hidden_dim, num_classes, requires_grad=True)

torch.Size([1000, 784])


# Softmax
As in exercise 1 we want to calculate the loss with softmax. Pytorch has built in functions to calculate the softmax function efficiently. Please implement the 'SoftmaxLoss' function below using pytorch. Return the loss with the parameter outputs. You will have to overwrite the parameter in the function.

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

def SoftmaxLoss(outputs, labels):
    """
    Inputs:
    outputs: a NxM tensor of type FloatTensor
    labels: a N shape tensor of type LongTensor
    
    Return:
    outputs: the softmax loss. 
    """
    batch_size = outputs.size()[0]
    
    ######################################## START OF YOUR CODE ########################################
    criterion = nn.CrossEntropyLoss()
    outputs = criterion(outputs, labels)
    ######################################## END OF YOUR CODE ##########################################
    
    return outputs


In [5]:
# Test your implementation of the softmax function
torch.manual_seed(1)

test_out = torch.randn(200,10)
test_lab = torch.LongTensor(200).random_(0, 10)
test_loss = SoftmaxLoss(test_out, test_lab)
print('Your loss:')
print(test_loss)
print()
print('correct loss:')
correct_loss = 2.7732
print(correct_loss)
print()

# The difference should be small. We get < 1e-4
print('Difference between your loss and correct loss:')
print(test_loss - correct_loss)

Your loss:
tensor(2.7732)

correct loss:
2.7732

Difference between your loss and correct loss:
tensor(-2.2888e-05)


# Training the network
With pytorch building and training neural networks gets a lot simpler. Below the forward propagation and the loss function are implemented for you. Implement the missing back propagation below (Hint: you will only need one line of code)

In [6]:
learning_rate = 1e-6
iterations = 500
for t in range(iterations):
    
    #--------------------------------------- forward propagation ---------------------------------------

    y_pred = X_batch.mm(w1).clamp(min=0).mm(w2)
    

    #--------------------------------------- loss function ---------------------------------------------
    
    loss = SoftmaxLoss(y_pred, y_batch)
    
    
    #--------------------------------------- back propagation -------------------------------------------
    
    ######################################## START OF YOUR CODE ########################################
    
    loss.backward()

    ######################################## END OF YOUR CODE ##########################################
    
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

[1,    50] loss: 144.743
[1,   100] loss: 131.756
[1,   150] loss: 121.287
[1,   200] loss: 112.820
[1,   250] loss: 105.722
[1,   300] loss: 99.789
[1,   350] loss: 94.773
[1,   400] loss: 90.405
[1,   450] loss: 86.619
[1,   500] loss: 83.328


# Maximise the accuracy
With the code provided below try to achieve the highest possible accuracy. To do this tune in the 'learning_rate', 'iterations' and 'batch_size' of the training and validation set. You might want to implement code above to loop through different learning_rates and save your best accuracies. Note: There is no START OF YOUR CODE and END OF YOUR CODE line in this task.

In [7]:
# Get Validation set
X_val, y_val = get_data_batch(X, y, batch_size, 100)

# Make a prediction with the trained network on the training and validation batch

y_pred_batch = X_batch.mm(w1).clamp(min=0).mm(w2)

y_pred_val = X_val.mm(w1).clamp(min=0).mm(w2)

loss_train = SoftmaxLoss(y_pred_batch, y_batch)
loss_val = SoftmaxLoss(y_pred_val, y_val)

print(loss_train)
print(loss_val)

# compute the accuracy

# print('Training accuracy: ', torch.mean(compare_train.float()))
# print('Validation accuracy: ', torch.mean(compare_val.float()))

tensor(4163.3174, grad_fn=<NllLossBackward>)
tensor(4414.1660, grad_fn=<NllLossBackward>)


In [8]:
iterations = 500
epoch = 0
for epoch in range (10):
    
    learning_rate = 1e-6 * (epoch+1)
    
    w1.grad.zero_()
    w2.grad.zero_()
    
    for t in range(iterations):
    
        running_loss = 0.0

        y_pred = X_batch.mm(w1).clamp(min=0).mm(w2)
   
        loss = SoftmaxLoss(y_pred, y_batch)
 
        loss.backward()

        with torch.no_grad():
            w1 -= learning_rate * w1.grad
            w2 -= learning_rate * w2.grad
            w1.grad.zero_()
            w2.grad.zero_()
        
        # print statistics
        running_loss += loss.item()
        if t % 100 == 99:    # print every 50 mini-batches
            print('[%d, %5d] loss: %.3f' % (epoch + 1, t + 1, running_loss / 100))
            running_loss = 0.0
        
    print("\n")

[1,   100] loss: 38.927
[1,   200] loss: 36.707
[1,   300] loss: 34.748
[1,   400] loss: 33.034
[1,   500] loss: 31.503


[2,   100] loss: 28.979
[2,   200] loss: 26.847
[2,   300] loss: 25.043
[2,   400] loss: 23.584
[2,   500] loss: 22.314


[3,   100] loss: 20.639
[3,   200] loss: 19.162
[3,   300] loss: 17.821
[3,   400] loss: 16.569
[3,   500] loss: 15.449


[4,   100] loss: 14.124
[4,   200] loss: 12.949
[4,   300] loss: 11.876
[4,   400] loss: 10.884
[4,   500] loss: 9.971


[5,   100] loss: 8.929
[5,   200] loss: 8.047
[5,   300] loss: 7.279
[5,   400] loss: 6.590
[5,   500] loss: 5.983


[6,   100] loss: 5.342
[6,   200] loss: 4.785
[6,   300] loss: 4.288
[6,   400] loss: 3.845
[6,   500] loss: 3.455


[7,   100] loss: 3.056
[7,   200] loss: 2.718
[7,   300] loss: 2.434
[7,   400] loss: 2.199
[7,   500] loss: 1.999


[8,   100] loss: 1.804
[8,   200] loss: 1.642
[8,   300] loss: 1.508
[8,   400] loss: 1.401
[8,   500] loss: 1.309


[9,   100] loss: 1.214
[9,   200] loss: 1.128

What was the highest accuracy you achieved and with which value of the 'learning_rate', 'iterations' and 'batch_size' did you achieve this accuracy? Write down your answere below and shortly describe the procedure you used to find them.