# Backpropagation Step by Step

> ![](http://hmkcode.github.io/images/ai/backpropagation.png)

If you are building your own neural network, you will definitely need to understand how to train it. Backpropagation is a commonly used technique for training neural network. There are many resources explaining the technique, but this post will explain backpropagation with concrete example in a very detailed colorful steps.

## Overview

---


In this post, we will build a neural network with three layers


* Input layer with two inputs neurons
* One hidden layer with two neurons
* Output layer with a single neuron



> ![Backpropagation](https://github.com/yoyoasa/backpropagation/blob/master/bp1.png?raw=true)

## Weights, weights, weights

---

Neural network training is about finding weights that minimize prediction error. We usually start our training with a set of randomly generated weights.Then, backpropagation is used to update the weights in an attempt to correctly map arbitrary inputs to outputs.

Our initial weights will be as following: 


**`w1 = 0.15`, 
`w2 = 0.20`, 
`w3 = 0.10`, 
`w4 = 0.18`, 
`w5 = 0.05`, 
`w6 = 0.26`**



> ![Backpropagation](https://github.com/yoyoasa/backpropagation/blob/master/bp2.png?raw=true)

## Dataset

---

#### Our dataset has one sample with two inputs and one output.

> ![dataset](https://github.com/yoyoasa/backpropagation/blob/master/bp3.png?raw=true)



#### Our single sample is as following `inputs=[5, 4]` and `output=[2]`.


> ![dataset](https://github.com/yoyoasa/backpropagation/blob/master/bp4.png?raw=true)

## Forward 

---

We will use given weights and inputs to predict the output. Inputs are multiplied by weights; the results are then passed forward to next layer.


> ![backpropagation](https://github.com/yoyoasa/backpropagation/blob/master/bp5.png?raw=true)


>$\begin{pmatrix}
5 & 4
\end{pmatrix} 
. 
\begin{pmatrix}
0.15 & 0.1\\
0.2 & 0.18 \\
\end{pmatrix}
=\begin{pmatrix}
1.55 & 1.22
\end{pmatrix} 
.
\begin{pmatrix}
0.05  \\
0.26
\end{pmatrix} 
= 
\begin{pmatrix}
0.394
\end{pmatrix} 
$

>  $ 5 * 0.15 + 4 * 0.2 = 1.55 $

> $ 5 * 0.1 + 4 * 0.18 = 1.22 $

> $ 1.55 * 0.05 + 1.22 * 0.26  = 0.394 $

## Calculating Error

---

Now, it’s time to find out how our network performed by calculating the difference between the actual output and predicted one. It’s clear that our network output, or **prediction**, is not even close to **actual output**. We can calculate the difference or the error as following.


> ![backpropogation](https://github.com/yoyoasa/backpropagation/blob/master/bp6.png?raw=true)

> $Error  = \frac{1}{2} (0.394 - 2)^2 = 1.289 $


## Reducing Error

---

Our main goal of the training is to reduce the **error** or the difference between **prediction** and **actual output**. Since **actual output** is constant, “not changing”, the only way to reduce the error is to change **prediction** value. The question now is, how to change **prediction** value?


By decomposing **prediction** into its basic elements we can find that **weights** are the variable elements affecting **prediction** value. In other words, in order to change **prediction** value, we need to change **weights** values.


>![backpropagation](https://github.com/yoyoasa/backpropagation/blob/master/bp7.png?raw=true)


<blockquote style="color: #868e96;">The question now is **how to change\update the weights value so that the error is** reduced?</blockquote>

<blockquote style="color: #868e96;">The answer is **Backpropagation**!</blockquote>

## Backpropagation

---

**Backpropagation**, short for “backward propagation of errors”, is a mechanism used to update the **weights** using [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent). It calculates the gradient of the error function with respect to the neural network’s weights. The calculation proceeds backwards through the network.

**Gradient descent** is an iterative optimization algorithm for finding the minimum of a function; in our case we want to minimize th error function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.

>![backpropagation](https://github.com/yoyoasa/backpropagation/blob/master/bp8.png?raw=true)


For example, to update `w6`, we take the current `w6` and subtract the partial derivative of **error** function with respect to `w6`. Optionally, we multiply the derivative of the **error** function by a selected number to make sure that the new updated **weight** is minimizing the error function; this number is called **learning rate**.


>![backpropogation](https://github.com/yoyoasa/backpropagation/blob/master/bp9.png?raw=true)

However, when moving backward to update `w1`, `w2`, `w3` and `w4` existing between input and hidden layer, the partial derivative for the error function with respect to `w1`, for example, will be as following.

> ![backpropogation](https://github.com/yoyoasa/backpropagation/blob/master/bp10.png?raw=true)

We can find the update formula for the remaining weights `w2`, `w3` and `w4` in the same way.

In summary, the update formulas for all weights will be as following:

> ![backpropogation](https://github.com/yoyoasa/backpropagation/blob/master/bp11.png?raw=true)

We can rewrite the update formulas in matrices as following

> ![backpropogation](https://github.com/yoyoasa/backpropagation/blob/master/bp12.png?raw=true)

## Backward Pass

---

Using derived formulas we can find the new **weights**

**Learning rate:** is a hyperparameter which means that we need to manually guess its value.

> ![backpropagation](https://github.com/yoyoasa/backpropagation/blob/master/bp13.png?raw=true)

Now, using the new **weights** we will repeat the forward passed

> ![backpropagation](https://github.com/yoyoasa/backpropagation/blob/master/bp14.png?raw=true)

>$\begin{pmatrix}
5 & 4
\end{pmatrix} 
. 
\begin{pmatrix}
0.154 & 0.121\\
0.203 & 0.197 \\
\end{pmatrix}
=\begin{pmatrix}
1.582 & 1.393
\end{pmatrix} 
.
\begin{pmatrix}
0.074  \\
0.279
\end{pmatrix} 
= 
\begin{pmatrix}
0.506
\end{pmatrix} 
$

>  $ 5 * 0.154 + 4 * 0.203 = 1.582 $

> $ 5 * 0.121 + 4 * 0.197 = 1.393 $

> $ 1.582 * 0.074 + 1.393 * 0.279  = 0.506 $

In [0]:
import numpy as np

class Neural_Network(object):
  def __init__(self, WL1=None, WL2=None):
    #parameters
    self.inputSize = 2
    self.outputSize = 1
    self.hiddenSize = 2
    
    #weights
    self.WL1 = np.array(WL1) if WL1 else np.array([[0.15, 0.1], [0.2, 0.18]]) # (2x2) weight matrix from input to hidden layer
    self.WL2 = np.array(WL2) if WL2 else np.array([0.05, 0.26]) # (2x1) weight matrix from hidden to output layer
    
  def forward(self, X):
    #forward propagation through our network
    self.HL = np.dot(X, self.WL1) # dot product of X (input) and first set of 2x2 weights
    out = np.dot(self.HL, self.WL2) # dot product of hidden layer (HL) and second set of 2x1 weights
    return out
  
  def backward(self, X, y, out, a):
    # backward propgate through the network
    self.delta = out - y # error in output
    self.a = a # learning rate

    _WL2 = self.a * self.delta * self.HL.T
    _WL1 = self.a * self.delta * X.reshape(1,2).T.dot(self.WL2.reshape(1, 2))
    print(_WL1)
    
    self.WL2 = self.WL2 - _WL2 # adjusting second set (hidden --> output) weights
    self.WL1 = self.WL1 - _WL1 # adjusting first set (input --> hidden) weights

#defining our output 
# X = input, y = output
X = np.array([5, 4])
y = 2
a = 0.01

NN = Neural_Network([[0.15, 0.1], [0.2, 0.18]], [0.05, 0.26])

print("Input:", str(X))
print("Actual Output:", str(y))
print("Learning Rate:", str(a))

predicted = NN.forward(X)
print("Predicted Output:", str(predicted))
print("Loss:", str(np.square(y - predicted)/2)) # mean sum squared loss

NN.backward(X, y, predicted, a)
print("Updated WL1:\n", NN.WL1)
print("Updated WL2:\n", NN.WL2)

predicted = NN.forward(X)
print("Predicted Output:", str(predicted))
print("Loss:", str(np.square(y - predicted)/2)) # mean sum squared loss

print("\n")
# X = input, y = output
X = np.array([2, 3])
y = 1
a = 0.05

NN = Neural_Network([[0.11, 0.12], [0.21, 0.08]], [0.14, 0.15])

print("Input:", str(X))
print("Actual Output:", str(y))
print("Learning Rate:", str(a))

predicted = NN.forward(X)
print("Predicted Output:", str(predicted))
print("Loss:", str(np.square(y - predicted)/2)) # mean sum squared loss

NN.backward(X, y, predicted, a)
print("Updated WL1:\n", NN.WL1)
print("Updated WL2:\n", NN.WL2)

Input: [5 4]
Actual Output: 2
Learning Rate: 0.01
Predicted Output: 0.3947
Loss: 1.288494045
[[-0.00401325 -0.0208689 ]
 [-0.0032106  -0.01669512]]
Updated WL1:
 [[0.15401325 0.1208689 ]
 [0.2032106  0.19669512]]
Updated WL2:
 [0.07488215 0.27958466]
Predicted Output: 0.5074688075164043
Loss: 1.1138246802682523


Input: [2 3]
Actual Output: 1
Learning Rate: 0.05
Predicted Output: 0.191
Loss: 0.32724049999999993
[[-0.011326  -0.012135 ]
 [-0.016989  -0.0182025]]
Updated WL1:
 [[0.121326  0.132135 ]
 [0.226989  0.0982025]]
Updated WL2:
 [0.1743825 0.169416 ]
