# Exercise #4 - Training

Finally, we are going to train our network so it can actually predict something useful.

You will implement the training loop that updates the parameters and let it run for a while.

Let's start with importing the basic necessities. We will (later) import reference implementations of the functions you implemented in the previous exercises.

In [None]:
import numpy as np

## Training intuition

We can get some intuition on how training will work by only using the function `binary_cross_entropy_loss_backward`. This is will return the gradient of the loss with respect to the predictions. So if we change the predictions themselves slightly in the opposite direction, then loss should go down and the predictions will be closer to the ground truth.

Let's test this. We compute (and print) the loss at the start, then we do a number of small steps in the opposite direction of the gradients, followed by a final loss computation. The result should be that the loss ends up close to 0.

In [None]:
from siouxdnn import binary_cross_entropy_loss, binary_cross_entropy_loss_backward

y_true = np.array([[1.0], [1.0], [0.0], [1.0], [0.0], [0.0], [0.0], [1.0], [1.0], [0.0]])
y_pred = np.array([[0.6], [0.4], [0.2], [0.7], [0.1], [0.2], [0.5], [0.9], [0.8], [0.6]])

print(f'loss at start {binary_cross_entropy_loss(y_true, y_pred)}')
dl_dy_pred = binary_cross_entropy_loss_backward(y_true, y_pred)

for _ in range(500):
    y_pred = y_pred - 1e-2*dl_dy_pred
    dl_dy_pred = binary_cross_entropy_loss_backward(y_true, y_pred)

print(f'loss at end {binary_cross_entropy_loss(y_true, y_pred)}')

The training of the whole neural network will work in a similar fashion. Instead of changing the predictions we will then change the parameters `w` and `b` of each layer.

# Training

We have all the building blocks available. The forward pass (also called inference) is implemented, so we can predict values. The backward propagation is implemented, so we can compute the gradients of the loss with respect to the parameters. We are finally ready to actually train the model!

We will train the model on the same dataset as before. This time we will use both the training and the validation set. So let's import that first.

In [None]:
from siouxdnn import load_data
X_train, Y_train, X_val, Y_val = load_data()
print('training set', X_train.shape, Y_train.shape)
print('validation set', X_val.shape, Y_val.shape)

Now we implement the training loop. This is actually pretty straight forward. We first compute the gradients using the function you just implemented. Then you have to update all the parameters in the opposite direction with the `learning_rate` as a factor.

In [None]:
def train(model, x, y_true, learning_rate):
    loss, dl_dx, dl_dw1, dl_db1, dl_dw2, dl_db2, dl_dw3, dl_db3 = model.get_gradients(x, y_true)
    #### BEGIN IMPLEMENTATION ####
    model.w1 = ...
    model.b1 = ...
    model.w2 = ...
    model.b2 = ...
    model.w3 = ...
    model.b3 = ...
    #### END IMPLEMENTATION ####
    return loss

Alright, this is it. Time to train your model. Before the loop starts a new model is created (with a fixed random seed).

You have to add the main part of the loop. First train the loop one step on the training dataset using the just implemented `train` function. Then call the `evaluate` function of `model` to compute the loss on the validation set. The loop will do a 1000 training steps, each time using the whole set at once (i.e. no batches).

The training set is used to actually train the network on, so the weights will be adjusted according to that input and ground truth. The validation set will be used to evaluate the performance of the network on data it hasn't "seen" yet.

A few functions are imported and used to display a nice plot after training. You should see both the training loss and the validation loss going down.

In [None]:
from loss_plot import init_loss_plot, add_loss_to_plot, finish_loss_plot

from siouxdnn import Model, reset_seed
reset_seed(123)
model = Model()

learning_rate = 1e-2

init_loss_plot()
for epoch in range(1000):
    #### BEGIN IMPLEMENTATION ####
    # train the model one step on the training set
    train_loss = ...
    # evaluate the model on the validation set
    val_loss = ...
    #### END IMPLEMENTATION ####

    add_loss_to_plot(train_loss, val_loss)
finish_loss_plot()

# Validation

The model is now trained!

You should see that the loss on the training set has gone down, but so did the loss on the validation set. Even data it was not trained on can be processed quite effectively.

We can also take a look at the binary accuracy of the model. This computes how many of the predictions are correct, i.e. predicted > 0.5 if the ground truth is 1 and < 0.5 if the ground truth is 0.

In [None]:
from siouxdnn import get_accuracy

y_pred = model.predict(X_val)

accuracy = get_accuracy(Y_val, y_pred)
print(f'accuracy {accuracy:.1%}')

The output should be `accuracy 92.2%`.

As you can see we do pretty well on data that was not seen yet.

# Conclusion

We are done. We have written a complete neural network (of 3 layers) all from scratch and trained it on a dataset. And it even performs quite well. Congratulations!