# EE-559: Practical Session 3

## Introduction

The objective of this session is to implement a multi-layer perceptron with one hidden layer from
scratch and test it on MNIST.

You can get information about the practical sessions and the provided helper functions on the course's
website.

In [2]:
import math
import torch
from torch import Tensor

import dlc_practical_prologue as prologue

## Activation function

Write the two functions

1. def sigma(x)
1. def dsigma(x)

that take as input a float tensor and returns a tensor of same size, obtained by applying component-wise
respectively tanh, and the first derivative of tanh.

**Hint**: The functions should have no python loop, and use in particular torch.tanh , torch.exp ,
torch.mul , and torch.pow . My versions are 34 and 62 character long.

In [3]:
def sigma(x):
    return x.tanh()

def dsigma(x):
    return 4 * (x.exp() + x.mul(-1).exp()).pow(-2)

## Loss

Write the two functions
1. def loss(v, t)
1. def dloss(v, t)

that take as input two float tensors of same dimensions with v the predicted tensor and t the target
one, and return respectively, and a tensor of same size equal to the gradient of that quantity as a function of v.

**Hint**: The functions should have no python loop, and use in particular torch.sum , torch.pow . My
versions are 48 and 40 character long.

In [4]:
def loss(v, t):
    return (v - t).pow(2).sum()

def dloss(v, t):
    return 2 * (v - t)

## Forward and backward passes

Write a function

def forward_pass(w1, b1, w2, b2, x)

whose arguments correspond to an input vector to the network, and the weight and bias of the two
layers, and returns a tuple composed of the corresponding x(0), s(1), x(1), s(2), and x(2).

Write a function

def backward_pass(w1, b1, w2, b2, t, x, s1, x1, s2, x2, dl_dw1, dl_db1, dl_dw2, dl_db2)


whose arguments correspond to the target vector, the quantities computed by the forward pass, and
the tensors used to store the cumulated sums of the gradient on individual samples, and update the
latters according to the formula of the backward pass.

**Hint**: The functions should have no python loop, and use in particular torch.t , torch.mv ,
torch.mm , and torch.view , and the functions previously written. The main difficulty is to deal
properly with the tensor size and transpose. My versions are 165 and 436 character long.


In [5]:
def forward_pass(w1, b1, w2, b2, x):
    x0 = x
    s1 = w1.mv(x0) + b1
    x1 = sigma(s1)
    s2 = w2.mv(x1) + b2
    x2 = sigma(s2)

    return x0, s1, x1, s2, x2

def backward_pass(w1, b1, w2, b2,
                  t,
                  x, s1, x1, s2, x2,
                  dl_dw1, dl_db1, dl_dw2, dl_db2):
    x0 = x
    dl_dx2 = dloss(x2, t)
    dl_ds2 = dsigma(s2) * dl_dx2
    dl_dx1 = w2.t().mv(dl_ds2)
    dl_ds1 = dsigma(s1) * dl_dx1

    dl_dw2.add_(dl_ds2.view(-1, 1).mm(x1.view(1, -1)))
    dl_db2.add_(dl_ds2)
    dl_dw1.add_(dl_ds1.view(-1, 1).mm(x0.view(1, -1)))
    dl_db1.add_(dl_ds1)

## Training the network

Write the code to train and test a MLP with one hidden layer of 50 units. This network should have an
input dimension of 784, which is the dimension of the MNIST training set, and an output dimension
of 10, which is the number of classes.

You code should:

1. Load the data using the provided prologue.load_data function, with one-hot label vectors
and normalized inputs. Multiply the target label vectors by sigma = 0:9 (so that they are strictly in
the value range of tanh).
1. Create the four weight and bias tensors, and fill them with random values sampled according to
N(0; epsillon) with epsilon = 1e-6.
1. Create the four tensors to sum up the gradients on individual samples, with respect to the
weights and biases.
1. Perform 1; 000 gradient steps with a step size nu equal to 0:1 divided by the number of training
samples.

Each of these steps requires to reset to zero the tensors for summing up the gradients, and
doing a forward and a backward pass for each training example.

Compute and print the training loss, training error and test error after every step using the class
of maximum response as the predicted one.

**Hint**: My solution is 1987 character long and achieves 3:6% training error and 15:70% test error with
50 hidden units. It takes 1min40s to nish on a Intel i7 with no GPU, using the default small sets of
prologue.load_data.

In [6]:
train_input, train_target, test_input, test_target = prologue.load_data(one_hot_labels = True,
                                                                        normalize = True)

nb_classes = train_target.size(1)
nb_train_samples = train_input.size(0)

zeta = 0.90

train_input = train_input * zeta
test_input = test_input * zeta

nb_hidden = 50
eta = 1e-1 / nb_train_samples
epsilon = 1e-6

w1 = Tensor(nb_hidden, train_input.size(1)).normal_(0, epsilon)
b1 = Tensor(nb_hidden).normal_(0, epsilon)
w2 = Tensor(nb_classes, nb_hidden).normal_(0, epsilon)
b2 = Tensor(nb_classes).normal_(0, epsilon)

dl_dw1 = Tensor(w1.size())
dl_db1 = Tensor(b1.size())
dl_dw2 = Tensor(w2.size())
dl_db2 = Tensor(b2.size())

* Using MNIST
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!
** Reduce the data-set (use --full for the full thing)
** Use 1000 train and 1000 test samples


In [7]:
for k in range(0, 1000):

    # Back-prop

    acc_loss = 0
    nb_train_errors = 0

    dl_dw1.zero_()
    dl_db1.zero_()
    dl_dw2.zero_()
    dl_db2.zero_()

    for n in range(0, nb_train_samples):
        x0, s1, x1, s2, x2 = forward_pass(w1, b1, w2, b2, train_input[n])

        pred = x2.max(0)[1][0]
        if train_target[n, pred] < 0: nb_train_errors = nb_train_errors + 1
        acc_loss = acc_loss + loss(x2, train_target[n])

        backward_pass(w1, b1, w2, b2,
                      train_target[n],
                      x0, s1, x1, s2, x2,
                      dl_dw1, dl_db1, dl_dw2, dl_db2)

    # Gradient step

    w1 = w1 - eta * dl_dw1
    b1 = b1 - eta * dl_db1
    w2 = w2 - eta * dl_dw2
    b2 = b2 - eta * dl_db2

    # Test error

    nb_test_errors = 0

    for n in range(0, test_input.size(0)):
        _, _, _, _, x2 = forward_pass(w1, b1, w2, b2, test_input[n])

        pred = x2.max(0)[1][0]
        if test_target[n, pred] < 0: nb_test_errors = nb_test_errors + 1

    print('{:d} acc_train_loss {:.02f} acc_train_error {:.02f}% test_error {:.02f}%'
          .format(k,
                  acc_loss,
                  (100 * nb_train_errors) / train_input.size(0),
                  (100 * nb_test_errors) / test_input.size(0)))

  app.launch_new_instance()


0 acc_train_loss 10000.00 acc_train_error 90.30% test_error 90.10%
1 acc_train_loss 7712.08 acc_train_error 88.30% test_error 90.10%
2 acc_train_loss 6327.60 acc_train_error 88.30% test_error 90.10%
3 acc_train_loss 5499.01 acc_train_error 88.30% test_error 90.10%
4 acc_train_loss 4982.97 acc_train_error 88.30% test_error 90.10%
5 acc_train_loss 4638.62 acc_train_error 88.30% test_error 90.10%
6 acc_train_loss 4350.50 acc_train_error 88.30% test_error 90.10%
7 acc_train_loss 3952.47 acc_train_error 88.30% test_error 90.10%
8 acc_train_loss 3649.57 acc_train_error 88.30% test_error 90.10%
9 acc_train_loss 3610.92 acc_train_error 88.30% test_error 90.10%
10 acc_train_loss 3605.49 acc_train_error 88.30% test_error 90.10%
11 acc_train_loss 3603.86 acc_train_error 88.30% test_error 90.10%
12 acc_train_loss 3603.02 acc_train_error 88.40% test_error 90.10%
13 acc_train_loss 3602.42 acc_train_error 88.40% test_error 90.10%
14 acc_train_loss 3601.89 acc_train_error 88.40% test_error 90.20%
15 a

123 acc_train_loss 411.24 acc_train_error 3.80% test_error 16.10%
124 acc_train_loss 401.11 acc_train_error 4.10% test_error 16.60%
125 acc_train_loss 395.45 acc_train_error 3.70% test_error 16.10%
126 acc_train_loss 386.39 acc_train_error 3.90% test_error 16.70%
127 acc_train_loss 380.43 acc_train_error 3.60% test_error 16.30%
128 acc_train_loss 372.51 acc_train_error 3.50% test_error 16.70%
129 acc_train_loss 366.60 acc_train_error 3.50% test_error 16.10%
130 acc_train_loss 359.67 acc_train_error 3.30% test_error 16.60%
131 acc_train_loss 354.00 acc_train_error 3.30% test_error 15.90%
132 acc_train_loss 347.86 acc_train_error 3.20% test_error 16.40%
133 acc_train_loss 342.50 acc_train_error 3.00% test_error 16.00%
134 acc_train_loss 336.95 acc_train_error 3.10% test_error 16.20%
135 acc_train_loss 331.90 acc_train_error 2.80% test_error 15.90%
136 acc_train_loss 326.82 acc_train_error 3.00% test_error 16.40%
137 acc_train_loss 322.07 acc_train_error 2.80% test_error 15.70%
138 acc_tr

247 acc_train_loss 120.40 acc_train_error 1.30% test_error 15.00%
248 acc_train_loss 119.71 acc_train_error 1.30% test_error 15.00%
249 acc_train_loss 119.03 acc_train_error 1.30% test_error 15.10%
250 acc_train_loss 118.36 acc_train_error 1.30% test_error 15.10%
251 acc_train_loss 117.70 acc_train_error 1.30% test_error 15.10%
252 acc_train_loss 117.05 acc_train_error 1.30% test_error 15.20%
253 acc_train_loss 116.40 acc_train_error 1.30% test_error 15.20%
254 acc_train_loss 115.77 acc_train_error 1.30% test_error 15.20%
255 acc_train_loss 115.15 acc_train_error 1.30% test_error 15.30%
256 acc_train_loss 114.53 acc_train_error 1.30% test_error 15.30%
257 acc_train_loss 113.92 acc_train_error 1.30% test_error 15.30%
258 acc_train_loss 113.32 acc_train_error 1.30% test_error 15.30%
259 acc_train_loss 112.73 acc_train_error 1.30% test_error 15.30%
260 acc_train_loss 112.15 acc_train_error 1.30% test_error 15.30%
261 acc_train_loss 111.58 acc_train_error 1.30% test_error 15.30%
262 acc_tr

372 acc_train_loss 75.10 acc_train_error 1.10% test_error 14.80%
373 acc_train_loss 74.92 acc_train_error 1.10% test_error 14.80%
374 acc_train_loss 74.74 acc_train_error 1.10% test_error 14.80%
375 acc_train_loss 74.57 acc_train_error 1.10% test_error 14.80%
376 acc_train_loss 74.40 acc_train_error 1.10% test_error 14.80%
377 acc_train_loss 74.23 acc_train_error 1.10% test_error 14.80%
378 acc_train_loss 74.06 acc_train_error 1.10% test_error 14.80%
379 acc_train_loss 73.89 acc_train_error 1.10% test_error 14.80%
380 acc_train_loss 73.73 acc_train_error 1.10% test_error 14.80%
381 acc_train_loss 73.56 acc_train_error 1.10% test_error 14.80%
382 acc_train_loss 73.40 acc_train_error 1.10% test_error 14.80%
383 acc_train_loss 73.24 acc_train_error 1.10% test_error 14.80%
384 acc_train_loss 73.08 acc_train_error 1.10% test_error 14.80%
385 acc_train_loss 72.93 acc_train_error 1.10% test_error 14.80%
386 acc_train_loss 72.77 acc_train_error 1.10% test_error 14.80%
387 acc_train_loss 72.62 

498 acc_train_loss 58.82 acc_train_error 1.00% test_error 14.90%
499 acc_train_loss 58.74 acc_train_error 1.00% test_error 14.90%
500 acc_train_loss 58.66 acc_train_error 1.00% test_error 14.90%
501 acc_train_loss 58.58 acc_train_error 1.00% test_error 14.90%
502 acc_train_loss 58.50 acc_train_error 1.00% test_error 14.90%
503 acc_train_loss 58.42 acc_train_error 1.00% test_error 14.90%
504 acc_train_loss 58.35 acc_train_error 1.00% test_error 14.90%
505 acc_train_loss 58.27 acc_train_error 1.00% test_error 14.90%
506 acc_train_loss 58.20 acc_train_error 1.00% test_error 14.90%
507 acc_train_loss 58.12 acc_train_error 1.00% test_error 14.90%
508 acc_train_loss 58.04 acc_train_error 1.00% test_error 14.90%
509 acc_train_loss 57.97 acc_train_error 1.00% test_error 14.90%
510 acc_train_loss 57.90 acc_train_error 1.00% test_error 14.90%
511 acc_train_loss 57.82 acc_train_error 1.00% test_error 14.90%
512 acc_train_loss 57.75 acc_train_error 1.00% test_error 14.90%
513 acc_train_loss 57.68 

624 acc_train_loss 49.07 acc_train_error 0.90% test_error 15.10%
625 acc_train_loss 49.03 acc_train_error 0.90% test_error 15.10%
626 acc_train_loss 48.99 acc_train_error 0.90% test_error 15.10%
627 acc_train_loss 48.95 acc_train_error 0.90% test_error 15.10%
628 acc_train_loss 48.92 acc_train_error 0.90% test_error 15.10%
629 acc_train_loss 48.88 acc_train_error 0.90% test_error 15.10%
630 acc_train_loss 48.84 acc_train_error 0.90% test_error 15.10%
631 acc_train_loss 48.81 acc_train_error 0.90% test_error 15.10%
632 acc_train_loss 48.77 acc_train_error 0.90% test_error 15.10%
633 acc_train_loss 48.74 acc_train_error 0.90% test_error 15.10%
634 acc_train_loss 48.70 acc_train_error 0.90% test_error 15.10%
635 acc_train_loss 48.67 acc_train_error 0.90% test_error 15.10%
636 acc_train_loss 48.63 acc_train_error 0.90% test_error 15.10%
637 acc_train_loss 48.60 acc_train_error 0.90% test_error 15.10%
638 acc_train_loss 48.56 acc_train_error 0.90% test_error 15.10%
639 acc_train_loss 48.53 

750 acc_train_loss 45.48 acc_train_error 0.90% test_error 15.00%
751 acc_train_loss 45.45 acc_train_error 0.90% test_error 15.00%
752 acc_train_loss 45.43 acc_train_error 0.90% test_error 15.00%
753 acc_train_loss 45.41 acc_train_error 0.90% test_error 15.00%
754 acc_train_loss 45.39 acc_train_error 0.90% test_error 15.00%
755 acc_train_loss 45.36 acc_train_error 0.90% test_error 15.00%
756 acc_train_loss 45.34 acc_train_error 0.90% test_error 15.00%
757 acc_train_loss 45.32 acc_train_error 0.90% test_error 15.00%
758 acc_train_loss 45.30 acc_train_error 0.90% test_error 15.00%
759 acc_train_loss 45.27 acc_train_error 0.90% test_error 15.00%
760 acc_train_loss 45.25 acc_train_error 0.90% test_error 15.00%
761 acc_train_loss 45.23 acc_train_error 0.90% test_error 15.00%
762 acc_train_loss 45.21 acc_train_error 0.90% test_error 15.00%
763 acc_train_loss 45.18 acc_train_error 0.90% test_error 15.00%
764 acc_train_loss 45.16 acc_train_error 0.90% test_error 15.00%
765 acc_train_loss 45.14 

876 acc_train_loss 42.44 acc_train_error 0.80% test_error 15.20%
877 acc_train_loss 42.41 acc_train_error 0.80% test_error 15.20%
878 acc_train_loss 42.38 acc_train_error 0.80% test_error 15.20%
879 acc_train_loss 42.35 acc_train_error 0.80% test_error 15.20%
880 acc_train_loss 42.32 acc_train_error 0.80% test_error 15.20%
881 acc_train_loss 42.29 acc_train_error 0.80% test_error 15.20%
882 acc_train_loss 42.25 acc_train_error 0.80% test_error 15.20%
883 acc_train_loss 42.22 acc_train_error 0.80% test_error 15.20%
884 acc_train_loss 42.19 acc_train_error 0.80% test_error 15.20%
885 acc_train_loss 42.16 acc_train_error 0.80% test_error 15.20%
886 acc_train_loss 42.12 acc_train_error 0.80% test_error 15.20%
887 acc_train_loss 42.09 acc_train_error 0.80% test_error 15.20%
888 acc_train_loss 42.06 acc_train_error 0.80% test_error 15.20%
889 acc_train_loss 42.02 acc_train_error 0.80% test_error 15.20%
890 acc_train_loss 41.99 acc_train_error 0.80% test_error 15.20%
891 acc_train_loss 41.95 