# Description of the multilayer perceptron algorithm

Hypotheses : we consider a network with 
* I inputs with indices i = 1..I
* H hidden neurons with indices h = 1..H. W
* O output neurons with indice o = 1..O

The perceptron weights and activation functions will be called 
* $W^H_{ih}$ for the weight between input i and hidden perceptron j, with activation function Hidden()
* $W^O_{ho}$ for the weight between hidden j and ouput perceptrons o, with activation function Output()

The vectors will be called 
* X for the input (vector of I elements $x_i$)
* L for the output of the hidden layer (vector of H elements $l_h$)
* Y for the desired output (vector of O elements $y_o$)
* $\hat{Y}$ for the output coming from the output neurons (vector of O elements $\hat{y}_o$)

*In what follows we will use Einstein's convention*

The activation functions will be called $f^H()$ and $f^O()$. They are usually but not always the sigmoid function. 
A layer with an activation function f(x)=x is a regression layer.

## Forward pass

Given an input vector $x_i$, we compute
$ l_h = f^H(x_i * W^H_{ih}) $ and subsequently 
$ \hat y_o = f^O(l_h * W^O_{ho}) $

## Training

We introduce the following additional concepts :
* the learning rate $ \eta $
* the error function $ E = \frac{(y_o-\hat y_o)^2}{2} $ 

Training for a single example :we have as training example the input vector X and desired output Y
* Run a forward pass as mentioned above from $x_i$, leading to $l_h$ and $\hat y_o$ 
* Backpropagation step 1 : compute the output error term for each output neuron $ \delta^O_o = (y_o- \hat y_o)* f'^O(l_h * W^O_{ho})  $
* Backpropagation step 2 : compute the hidden error term for each hidden neuron $\delta^H_h =  W^O_{ho} * \delta^O_o * f'^H(x_i * W^H_{ih})  $
* Update weights of the output neurons via $ \Delta W^O_{ho} = \eta * \delta^O_o * l_h $
* Update weights of the hidden neurons via $ \Delta W^H_{ih} = \eta * \delta^H_h * x_i $

*Remark : it is also possible to compute the $ \Delta W^O_{ho} $ and $ \Delta W^H_{ih}$ on several training samples and only update the weights at the end. Sometimes this is done with the whole training set at once.*