# **Neural networks explained**


Backpropagation is a commonly used technique for training neural network. There are many resources explaining the technique, but this post will explain backpropagation with concrete examples.

**Overview**


In this post, we will build a neural network with three layers:



*   Input layer with two inputs neurons
*   One hidden layer with two neurons
*   Output layer with a single neuron


***Note: The approach I've taken here is that after every block of explanation, there will be a code in python that does the exact same operation. ***

Lets consider a network shown below

![alt text](http://hmkcode.github.io/images/ai/nn1.png)

Neural network training is about finding weights that minimize prediction error. We usually start our training with a set of randomly generated weights.Then, backpropagation is used to update the weights in an attempt to correctly map arbitrary inputs to outputs.

For our example, let's consider our initial weights to be


**w1** = 0.34,  **w2**= 0.13,  **w3** = 0.05,  **w4** = 0.07,  **w5** = 0.20 and **w6** = 0.01



---


Let's say that out inputs are

$i_{1}$ = 2 and $i_{2}$  = 3

and our output 

out = 1







In [31]:
'''
We will be using numpy for our matrix operations. Let's import numpy here and initialize the weight matrices
'''

import numpy as np

inputs = np.array([2,3])
f_weights = np.array([[0.34,0.13],[0.05,0.07]]).T#weights of the first layer
s_weights = np.array([0.2,0.01])
expected_output = np.array([1])

print('first matrix', f_weights)
print('second_matrix', s_weights)
print('inputs',inputs)
print('expected output', expected_output)

first matrix [[0.34 0.05]
 [0.13 0.07]]
second_matrix [0.2  0.01]
inputs [2 3]
expected output [1]




---

##Dataset

So, out input = [2,3] and output = [1]

One Epoch in a training consists of the following


*   Forward Pass
*   Calculating Error
*   Reducing Error
*   Backpropogation
*   Backward Pass







##Forward Pass

We will use given weights and inputs to predict the output. Inputs are multiplied by weights; the results are then passed forward to next layer.

$\begin{bmatrix}2 & 3\end{bmatrix}$ . $\begin{bmatrix}0.34 & 0.05\\0.13 & 0.07\end{bmatrix}$
 . $\begin{bmatrix}0.20\\0.01\end{bmatrix}$
 
**h1** = 2 x 0.34 + 2 x 0.13 = 0.94

**h2** = 3 x 0.05 + 3 x 0.07 = 0.29

 $\begin{bmatrix}0.94 &0.29\end{bmatrix}$ . $\begin{bmatrix}0.20\\0.01\end{bmatrix}$ 
 
 (0.94 x 0.20) + (0.29 x 0.01) = 0.190
 
 
 Hence the new matrix is   $\begin{bmatrix}0.190\end{bmatrix}$



In [32]:
'''
Let's compute the values on the hidden nodes
'''
h_weights = np.dot(inputs, f_weights)
print('hidden weights',h_weights)

output = np.dot(h_weights, s_weights)
print('output ',output)

hidden weights [1.07 0.31]
output  0.21710000000000002


##Calculating Error

Now, it’s time to find out how our network performed by calculating the difference between the actual output and predicted one. It’s clear that our network output, or prediction, is not even close to actual output. We can calculate the difference or the error as following.


$$Error = \frac{1}{2}(prediction - actual)^{2}$$


Hence,

$$Error = \frac{1}{2}(0.190 - 1.0)^{2} = 0.32805$$


---



In [35]:
'''
Let's calculate the error
'''
error = 0.5*((output - expected_output[0])**2)
print('error ', error)

error  0.30646620499999994


##Reducing Error

Our main goal of the training is to reduce the error or the difference between prediction and actual output. Since actual output is constant, “not changing”, the only way to reduce the error is to change prediction value. The question now is, how to change prediction value?

By decomposing prediction into its basic elements we can find that weights are the variable elements affecting prediction value. In other words, in order to change prediction value, we need to change weights values.


![alt text](http://hmkcode.github.io/images/ai/bp_prediction_elements.png)


The question now is how to change\update the weights value so that the error is reduced?
The answer is Backpropagation!



---



##Backpropogation

Backpropagation, short for “backward propagation of errors”, is a mechanism used to update the weights using [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent). It calculates the gradient of the error function with respect to the neural network’s weights. The calculation proceeds backwards through the network.

Gradient descent is an iterative optimization algorithm for finding the minimum of a function; in our case we want to minimize th error function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.

![alt text](http://hmkcode.github.io/images/ai/bp_update_formula.png)

For example, to update **w6**, we take the current **w6** and subtract the partial derivative of error function with respect to **w6**. Optionally, we multiply the derivative of the error function by a selected number to make sure that the new updated weight is minimizing the error function; this number is called learning rate.

![alt text](http://hmkcode.github.io/images/ai/bp_w6_update.png)

The derivation of the error function is evaluated by applying the chain rule as following

![alt text](http://hmkcode.github.io/images/ai/bp_error_function_partial_derivative_w6.png)

So to update **w6** we can apply the following formula

![alt text](http://hmkcode.github.io/images/ai/bp_w6_update_closed_form.png)

Similarly, we can derive the update formula for **w5** and any other weights existing between the output and the hidden layer.

![alt text](http://hmkcode.github.io/images/ai/bp_w5_update_closed_form.png)

However, when moving backward to update **w1**, **w2**, **w3** and **w4** existing between input and hidden layer, the partial derivative for the error function with respect to **w1**, for example, will be as following.

![alt text](http://hmkcode.github.io/images/ai/bp_error_function_partial_derivative_w1.png)

We can find the update formula for the remaining weights **w2**, **w3** and **w4** in the same way.

![alt text](http://hmkcode.github.io/images/ai/bp_update_all_weights.png)


We can rewrite the update formulas in matrices as following

![alt text](http://hmkcode.github.io/images/ai/bp_update_all_weights_matrix.png)

##Backward Pass


Using derived formulas we can find the new weights.

Learning rate: is a hyperparameter which means that we need to manually guess its value.

$\Delta = 0.190 - 1  =  -0.81$           

Delta = Predicted - Actual

a = 0.05 (Learning Rate)


now, 

**w1** = 0.34,  **w2**= 0.13,  **w3** = 0.05,  **w4** = 0.07,  **w5** = 0.20, **w6** = 0.01, **h1** = 0.94 and **h2** = 0.29


 $\begin{bmatrix}w5\\w6\end{bmatrix} =  \begin{bmatrix}0.20\\0.01\end{bmatrix} - 0.05(-0.81) \begin{bmatrix}0.94\\0.29\end{bmatrix} = \begin{bmatrix}0.20\\0.01\end{bmatrix} - \begin{bmatrix}-0.038\\-0.011\end{bmatrix} = \begin{bmatrix}0.238\\0.021\end{bmatrix}$ 


$\begin{bmatrix}w1 & w3\\w2 & w4\end{bmatrix} = \begin{bmatrix}0.34 & 0.05\\0.13 & 0.07\end{bmatrix} - 0.05(-0.81) \begin{bmatrix}2\\3\end{bmatrix} .   \begin{bmatrix}0.20 & 0.01\end{bmatrix}= \begin{bmatrix}0.34 & 0.05\\0.13 & 0.07\end{bmatrix} - \begin{bmatrix}-0.016 & 0\\ -0.024&-0.01 \end{bmatrix} =   \begin{bmatrix}0.356 &0.05\\ 0.154&0.08 \end{bmatrix}$





---

**Now, using the new weights we will repeat the forward passed**



$\begin{bmatrix}2 & 3\end{bmatrix}$ . $\begin{bmatrix}0.356 & 0.05\\0.154 & 0.08\end{bmatrix}$
 . $\begin{bmatrix}0.238\\0.021\end{bmatrix}$
 
**h1** = 2 x 0.356 + 2 x 0.154 = 1.02

**h2** = 3 x 0.05 + 3 x 0.08 = 0.39

 $\begin{bmatrix}1.02 &0.39\end{bmatrix}$ . $\begin{bmatrix}0.238\\0.021\end{bmatrix}$ 
 
 (1.02x 0.238) + (0.39 x 0.021) = 0.251
 
 
 Hence the new output is   $\begin{bmatrix}0.251\end{bmatrix}$
 
 
 
 
 
 **We can notice that the prediction 0.251 is a little bit closer to actual output than the previously predicted one 0.190. We can repeat the same process of backward and forward pass until error is close or equal to zero.**

In [50]:
'''
Let's do the backward pass here
'''
print('\nfirst layer weights -->', f_weights)
print('\nsecond layer weights -->', s_weights)
print('\nhidden layer values -->', h_weights)

delta = output - expected_output
print('\ndelta -->', delta)

lr = 0.05
print('\nlearning rate --> ', lr)

updated_second_layer_weights = s_weights - lr*delta*h_weights
print('\non errror corrections')
print('\nUpdated second layer weights are',updated_second_layer_weights)
updated_first_layer_weights = f_weights - lr*delta*np.dot(inputs,s_weights)
print('\nUpdated first layer weights are',updated_first_layer_weights)


'''
Let's use the new weigths to perform a forward pass now

Let's compute the values on the hidden nodes
'''
updated_h_weights = np.dot(inputs, updated_first_layer_weights)
print('\nUpdated hidden weights',updated_h_weights)

new_output = np.dot(updated_h_weights, updated_second_layer_weights)
print('\nUpdated output ',new_output)

'''
Let's calculate the error
'''
new_error = 0.5*((new_output - expected_output[0])**2)
print('\nUpdated error ', new_error)


first layer weights --> [[0.34 0.05]
 [0.13 0.07]]

second layer weights --> [0.2  0.01]

hidden layer values --> [1.07 0.31]

delta --> [-0.7829]

learning rate -->  0.05

on errror corrections

Updated second layer weights are [0.24188515 0.02213495]

Updated first layer weights are [[0.35683235 0.06683235]
 [0.14683235 0.08683235]]

Updated hidden weights [1.15416175 0.39416175]

Updated output  0.28789933865117506

Updated error  0.25354367594671695


##**Learnings**

By the end of the post you've noticed that the weights have changed, the values of the hidden nodes have been updated, the error has reduced and the ouput is progressing towards the actual output. Doing this process continuosly will result in an error that's either equal to or close to 0. This is when you say that training has been completed on your data.

**PS:** The matrix values will vary a bit since the calculations on the markdown are limited to 2 decimal places and numpy uses a 64 bit floating point. The values that the code produces are more precise.

Thanks for reading