# Week 2a: Building and training a simple MLP in PyTorch

This notebook gives a very simple illustration of how to build the most basic 2-layer MLP example in PyTorch. For simplicity, many of the normal things we do when building and training neural network models in PyTorch have been left out (such as biases, [activation functions](https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions), and [loading and iterating on datasets when training](https://pytorch.org/docs/stable/data.html)). In the second notebook you will get the to build and train a network with all of these components. 

### Setting up your Python environment

Before you work through this notebook, please follow the instructions in [Setup-and-test-conda-environment.ipynb](Setup-and-test-conda-environment.ipynb)

Once you have done that you will need to make sure that the environment selected to run this notebook and all the other notebooks used in this unit is called `aim`. 

To do this click the **Select kernel** button in the top right corner of this notebook, and then select `aim`.

To make sure that is configured properly, Hit the run cell button (▶) on the cell below:

In [None]:
import os
print(os.environ['CONDA_DEFAULT_ENV'])

Does it output the text `aim`?

If it does not output the text `aim`, please revisit and follow the instructions in [Setup-and-test-conda-environment.ipynb](Setup-and-test-conda-environment.ipynb).

If you still cannot get it working, please raise this with the course instructor. 

### Importing torch

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim

# Building a simple MLP 

Below is the code in PyTorch that builds this simple MLP network:

![MLP diagram](../media/simple-mlp.png)

In PyTorch you build neural networks by creating a class that [inherits](https://www.w3schools.com/python/python_inheritance.asp) from the [torch.nn.Module class](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). 

For any network you need to define two functions: 
- The constructor (`__init__`) where you create the layers of the network.
- The `forward` function, where you define how the data `x` is is processed by the network, and the sequence in which the computation of layers is performed.

Here we are creating an MLP with 2 fully connected layers (a unique weighted connection between each input and output). To do this we use the [nn.Linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html). For each layer you define the number of inputs and outputs, pytorch will automatically create and populate the weight matrix to match these dimensions.

In the `forward` function you then have to specify which the chain of computation for the network. Here the data is processed first by the hidden layer (`x = self.hidden(x)`) and then the output layer `x = self.output(x)`), the code for performing the matrix-vector multiplication occurs here. 

In [4]:
class SimpleMLP(nn.Module):
    def __init__(self):
        super(SimpleMLP, self).__init__()
        self.hidden = nn.Linear(2, 2, bias=False)
        self.output = nn.Linear(2, 1, bias=False) 

    def forward(self, x):
        x = self.hidden(x) 
        x = self.output(x) 
        return x

### Instantiating the mlp, optimiser and loss function

Now we need to instantiate our network, which we will call `mlp`. We also need to decided what [loss function](https://pytorch.org/docs/stable/nn.html#loss-functions) we are using to evaluate our networks outputs, and the [optimiser](https://pytorch.org/docs/stable/optim.html) that will be used to update the weights of the network in training. 

There are many options to choose from, but for this example we will use [L1 loss](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html) (absolute error) and [stochastic gradient descent optimiser](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html). 

In [5]:
mlp = SimpleMLP()
criterion = nn.L1Loss() 
optimizer = optim.SGD(mlp.parameters(), lr=0.001) 

Now lets see the weights of the network, just like the task from last week, we have a 2x2 matrix to represent the weights of the first layer and a vector of length 2 for the weights of the second layer. PyTorch will create random values for these weights between -1,1.

In [None]:
print("Hidden layer weights:", mlp.hidden.weight.data)
print("Output layer weights:", mlp.output.weight.data)

### Define input data and true value of prediction

Lets try to recreate the example from task A of last weeks worksheet, where the input to this network was a vector $\vec{x} = \begin{bmatrix}2 \\ 4\end{bmatrix}$ and the correct output was $70$. 

We now need to make these values as [pytorch tensors](https://pytorch.org/docs/stable/tensors.html#torch.Tensor), which are designed to behave a lot like numpy arrays, but are integrated into pytorch's tracking of computational steps in order to backpropagate gradients when training.

In [None]:
input_data = torch.tensor([2,4], dtype=torch.float)
target_output = torch.tensor([70], dtype=torch.float)
print(input_data)
print(target_output)

Now lets process our output and see what the model predicts. By default, if we pass data into our model `mlp(input_data)` this will automatically call the `forward` function that [we defined earlier](#building-a-simple-mlp).

In [None]:
prediction = mlp(input_data)
print(prediction)

Now lets evaluate this prediction against the target output by using our loss function (`criterion`). This calculates the absolute difference between the prediction $p$ and the target value $t$ 

$loss = \lvert p - t \rvert$

In [None]:
loss = criterion(prediction, target_output)
print(loss)

Now we have completed the forward pass, we now need to perform the backward pass, to calculate the gradients that will be used to update the weight parameters of the network based on the error. To do that we call the function `.backward()` on the tensor variable `loss`.

In [10]:
loss.backward()

The weight for each layer has a variable `.grad` that stores the gradient for the weight. This has the same dimensionality as the value of the weights. 

In essence the values here represent how much each weight parameter has contributed to the error, large numbers (positive or negative) mean these weights contributed more to getting an incorrect result, where parameters with low numbers (closer to 0) contributed less to the overall error in prediction.

In [None]:
print("Hidden layer gradients:", mlp.hidden.weight.grad)
print("Output layer gradients:", mlp.output.weight.grad)

While we have computed the gradients, the weights of the network have not yet been changed.

In [None]:
print("Hidden layer weights:", mlp.hidden.weight.data)
print("Output layer weights:", mlp.output.weight.data)

To update the weights of the network based on the gradients, we need to call the `.step()` function on our optimiser.

In [13]:
optimizer.step()

Now the weights will have been adjusted by the optimiser using gradients calculated by pytorch:

In [None]:
print("Hidden layer weights:", mlp.hidden.weight.data)
print("Output layer weights:", mlp.output.weight.data)

If we process the data again with the network, we should get a different result, hopefully that is closer to the value we want!

In [None]:
prediction = mlp(input_data)
print(prediction)

### Training loop

Now lets run this training loop sequence for 1000 iterations, hopefully the loss will iteratively go down as the predictions from the model get closer to the target value of 70. 

However, this is a very small model and whether it trains successfully is dependent on the initial conditions (the choice of random weights) of the network. 

If the model does not improve over the course of training, try hitting the `restart` and `run all` buttons at the top of this notebook to re-run the code with different initial parameters for the weights and see if you can get a model that comes close to giving a correct prediction? You can also try increasing the number of steps (`num_steps`) to train the model for longer if it needs more training.

In [None]:
num_steps = 1000
for step in range(num_steps):
    optimizer.zero_grad()
    mlp.zero_grad()
    prediction = mlp(input_data)
    loss = criterion(prediction, target_output)
    loss.backward()
    optimizer.step()
    if (step + 1) % 100 == 0:
        print(f"Step [{step + 1}/{step}], Loss: {loss.item():.4f}, mlp prediction: {prediction.item():.2f}")

Lets take a look at the weights now. Are they close to the values of the weights in last weeks worksheet?

In [None]:
print("Hidden layer weights:", mlp.hidden.weight.data)
print("Output layer weights:", mlp.output.weight.data)

You have seen your first example of building and training a neural network in PyTorch. In the next notebook you will look at how training is performed with a dataset containing lots of training examples, you will also get to build your own neural networks and experiment with different numbers of layers, and units in the layers of the networks.

### Tasks

- **Task 1:** Are you able to train a network that gets close to making the correct prediction? Try a few times running this notebook to see if you can get one that gives the correct answer. You may need to increase the number of training steps represented by the variable `num_steps` in the [training loop](#training-loop) cell.
- **Task 2:** Adapt this code to remake the model from Task 2 of last weeks worksheet shown in the follow diagram and train it to predict the output value 68 based on the vector input $\vec{x} = \begin{bmatrix}3 \\ 1 \\ 4\end{bmatrix}$:
![MLP task B diagram](media/mlp-task-b.png)
> **Tip:** You will need to change the number of inputs and/or outputs of the layers in [the cell when the MLP is defined](#building-a-simple-mlp). You will also need to [edit the input data and target value for prediction](#define-input-data-and-true-value-of-prediction).
- **Task 3:** Adapt this code to remake the model from Task 2 of last weeks worksheet shown in the follow diagram and train it to predict the output value 39 based on the vector input $\vec{x} = \begin{bmatrix}3 \\ 1\end{bmatrix}$:
![MLP task C diagram](media/mlp-task-c.png)
