## Multi-layer Perceptron


The solution to fitting more complex (*i.e.* non-linear) models with neural networks is to use a more complex network that consists of more than just a single perceptron. The take-home message from the perceptron is that all of the learning happens by adapting the synapse weights until prediction is satisfactory. Hence, a reasonable guess at how to make a perceptron more complex is to simply **add more weights**.

There are two ways to add complexity:

1. Add backward connections, so that output neurons feed back to input nodes, resulting in a **recurrent network**
2. Add neurons between the input nodes and the outputs, creating an additional ("hidden") layer to the network, resulting in a **multi-layer perceptron**

The latter approach is more common in applications of neural networks.

<a href="https://i.stack.imgur.com/n2Hde.png">image source</a>

<img src="https://i.stack.imgur.com/n2Hde.png" width=50%>

How to train a multilayer network is not intuitive. Propagating the inputs forward over two layers is straightforward, since the outputs from the hidden layer can be used as inputs for the output layer. However, the process for updating the weights based on the prediction error is less clear, since it is difficult to know whether to change the weights on the input layer or on the hidden layer in order to improve the prediction.

Updating a multi-layer perceptron (MLP) is a matter of: 

1. moving forward through the network, calculating outputs given inputs and current weight estimates
2. moving backward updating weights according to the resulting error from forward propagation. 

In this sense, it is similar to a single-layer perceptron, except it has to be done twice, once for each layer.



## Backpropagation intiution

* In the year 1986 a groundbreaking paper "Learning Internal Representation by Error Propagation" was published by -
    * David Rumelhart,
    * Geoffrey Hinton, &
    * Ronald Williams 
    
* It depicted an efficient way to update weights and biases of the network based on the error/loss function by passing twice through the network i.e forward and backward pass.
    - forward pass: data is passed through the input layer to the hidden layer and it calculates ouput. Its nothing but making prediction.
    - error calculation: Based on loss function error is calculated to check how much deviation is there from the ground truth or actual value and predicted value.
    - error contribution from the each connection of the output layer is calculated.
    - Then algo goes a layer deep and calculates how much previous layer contributed into the error of present layer and this way it propagates till the input layer.
    - This reverse pass measures the error gradient accross all the connection.
    - At last by using these error gradients a gradient step is performed to update the weights.
    
* In MLP key changes were to introduce a sigmoid activation function $$\sigma(z) = \frac{1}{1+e^{-z}}$$
    

## Need of activation function

* No activation function => deep stack of network will behave like a single linear transformation.
* Without activation function all the continuous function cannot be approximated.

In [8]:
!pip install HTMLrenderer

from HTMLrenderer.render import render_site, render_HTML

URL="https://slides.com/supremecommander/basic-neural-network/embed"
render_site(URL, width="100%", height=800)





