# Simple neural network

Author: Pierre Ablin

In this notebook, we are going to create and train a simple neural network on the digits dataset using pytorch.

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

First, we need to load the data and make them into pytorch tensors.

In [None]:
X, y = load_digits(return_X_y=True)

# Normalize

X -= X.mean(axis=0)
X /= np.std(X)
X_train, X_test, y_train, y_test = train_test_split(X, y)

f, axes = plt.subplots(1, 3)
for i, axe in enumerate(axes):
    axe.imshow(X[i].reshape(8, 8))

x = torch.tensor(X_train).float()
y = torch.tensor(y_train).long()
n, p = x.shape
x_test = torch.tensor(X_test).float()
y_test = torch.tensor(y_test).long()

# Define the network

We will work with a simple network with two layers (one hidden layer).

The input $x$ is transformed into the output $z$ by the following operations:

$$y = \tanh(W_1x + b_1)$$
$$z = W_2y + b_2$$

**Exercise 1**: Define a function `net(x, W1, b1, W2, b2)` that implements this transform. Remember that `x` is a matrix of size $n\times p$

In [None]:
def net(x, W1, b1, W2, b2):
    # Your code here.
    return z

Next, let us specify the parameters of the network, `W1, b1, W2, b2`. You can chose the size of the hidden layer, but the input and output sizes are determined by the problem.

**Exercise 2**: Define a set of parameters `W1, b1, W2, b2`, where you chose the size of the hidden layer. Make sure that all these parameters have their `requires_grad` flag set to true, so that we can compute the gradient with respect to them.

In order to check that eveything works, compute `net(x, W1, b1, W2, b2)`.

In [None]:
hidden_size = # What you want
input_size = # Determined by the problem size
output_size = # Determined by the problem size

W1 = # Your code here
b1 = # Your code here
W2 = # Your code here
b2 = # Your code here

parameters = (W1, b1, W2, b2)

output = net(x, W1, b1, W2, b2)

Next, we will define a cost function. We will use the classical cross entropy loss. It is imported from pytorch in the next cell.

In [None]:
from torch.nn.functional import cross_entropy

In order to compute the gradient with respect to the parameters $W_1, W_2, b_1, b_2$, we will tell pytorch that we need to accumulate gradients by settings their `requires_grad` to `True`:

In [None]:
for parameter in parameters:
    parameter.requires_grad = True

**Exercise 3**: Compute the current loss of the network, and then back-propagate to compute the gradient with respect to the parameters. Check the gradient with respect to W1.

In [None]:
output = net(x, W1, b1, W2, b2)
loss = # Your code here
print(loss.item())
# Back propagate through the network here
print(W1.grad)

We are almost ready to train our network!

But first, we will need to compute the accuracy of the network, on the train and test set.

**Exercise 4**: Define a function `accuracy(X, y, W1, b1, W2, b2)` that computes the accuracy of the network on the dataset `x`with true labels `y`. Remember that the predicted class at the output of the network is computed as the argmaximum of the output. Compute the current accuracy of the network on the train set. Is it normal ?

In [None]:
def accuracy(X, y, W1, b1, W2, b2):
    # Your code here
    return

accuracy(x, y, W1, b1, W2, b2)

# Training the network

We are now ready to train the network, using back-propagation and stochastic gradient descent.
First, we define the number of iterations of the algorithm, the step size, and the batch size. We also reinitialize the weights. Finally, we will store the train and test accuracy during the training.

In [None]:
n_iter = 1000
step_size = 0.1
batch_size = 64


test_list = []
train_list = []

**Exercise 5**: Complete the following training list, so that each parameter is updated at each iteration.

Remember that at each iteration, you should:
* compute the output of the network with respect to the batch
* Compute the loss, and backpropagate
* Update each parameter with gradient descent
* Refresh the gradient of each parameter. To do so, you can do:

```
parameter.grad.data.zero_()
```

In [None]:
for i in range(n_iter):
    batch_idx = torch.randperm(n)[:batch_size]
    x_batch = x[batch_idx]
    y_batch = y[batch_idx]
    # Your code here: compute the loss, and backpropagate
    with torch.no_grad():
        for parameter in parameters:
            # Your code here: update the parameters with SGD, and refresh their gradients
    if i % 10 == 0:
        with torch.no_grad():
            train_acc = accuracy(x, y, W1, b1, W2, b2)
            test_acc = accuracy(x_test, y_test, W1, b1, W2, b2)
        test_list.append(test_acc)
        train_list.append(train_acc)
        print('Iteration {} Train loss: {:1.3f} Train acc: {:1.3f} Test acc {:1.3f}'.format(i, loss.item(), train_acc, test_acc))

**Exercise 6**: Display the learning curves. You can then play with the network and training parameters:
what happens when you change the learning rate, the number of hidden sizes, etc?

In [None]:
plt.plot(test_list, label='test')
plt.plot(train_list, label='train')

plt.ylabel('Accuracy')
plt.xlabel('Iterations')
plt.legend()