# 2A Assignment


<p style="text-align: justify;">
	
	<a href="http://hmkcode.github.io/images/ai/backpropagation.png">
		<img class="size-full wp-image-315 aligncenter" src="http://hmkcode.github.io/images/ai/backpropagation.png" alt="get-location" />
	</a>
	<br>
	If you are building your own neural network, you will definitely need to understand how to train it.
	Backpropagation is a commonly used technique for training neural network. There are many resources explaining the technique, 
	but this post will explain backpropagation with concrete example in a very detailed colorful steps.
</p>

## Overview

In this post, we will build a neural network with three layers:

- **Input** layer with two inputs neurons
- One **hidden** layer with two neurons 
- **Output** layer with a single neuron 




## Weights, weights, weights

Neural network training is about finding weights that minimize prediction error. We usually start our training with a set of randomly generated weights.Then, backpropagation is used to update the weights in an attempt to correctly map arbitrary inputs to outputs.

Our initial weights will be as following:
`w1 = 0.22`,  `w2 = 0.24`, `w3 = 0.42`, `w4 = 0.16`, `w5 = 0.28` and `w6 = 0.30`


## Dataset

Our dataset has one sample with two inputs and one output. 


Our single sample is as following `inputs=[2, 3]` and `output=[1]`.


## Forward Pass

We will use given weights and inputs to predict the output. Inputs are multiplied by weights; the results are then passed forward to next layer. 



$\begin{bmatrix}2 & 3\end{bmatrix}$ .  $\begin{bmatrix}0.22 & 0.42\\0.24 & 0.16\end{bmatrix}$ = $\begin{bmatrix}1.16 & 1.32\end{bmatrix}$ . $\begin{bmatrix}0.28 \\0.30\end{bmatrix}$ = 0.7208 


## Calculating Error

Now, it's time to find out how our network performed by calculating the difference between the actual output and predicted one. It's clear that our network output, or **prediction**, is not even close to **actual output**. We can calculate the difference or the error as following.


Error = $\frac{1}{2}$ ${(0.7208 - 1.0)^2}$ = 0.03898

## Reducing Error

Our main goal of the training is to reduce the **error** or the difference between **prediction** and **actual output**. Since **actual output** is constant, "not changing", the only way to reduce the error is to change **prediction** value. The question now is, how to change **prediction** value?

By decomposing **prediction** into its basic elements we can find that **weights** are the variable elements affecting **prediction** value. In other words, in order to change **prediction** value, we need to change **weights** values. 



> The question now is **how to change\update the weights value so that the error is reduced?**  
> The answer is **Backpropagation!**


## **Backpropagation**

**Backpropagation**,  short for "backward propagation of errors", is a mechanism used to update the **weights** using [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent). It calculates the gradient of the error function with respect to the neural network's weights. The calculation proceeds backwards through the network.

> **Gradient descent** is an iterative optimization algorithm for finding the minimum of a function; in our case we want to minimize th error function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point. 


For example, to update `w6`, we take the current `w6` and subtract the partial derivative of **error** function with respect to `w6`. Optionally, we multiply the derivative of the **error** function by a selected number to make sure that the new updated **weight** is minimizing the error function; this number is called ***learning rate***. 




The derivation of the error function is evaluated by applying the chain rule as following


So to update `w6` we can apply the following formula


Similarly, we can derive the update formula for `w5` and any other weights existing between the output and the hidden layer.


However, when moving backward to update `w1`, `w2`, `w3` and `w4` existing between input and hidden layer, the partial derivative for the error function with respect to `w1`, for example, will be as following.

We can find the update formula for the remaining weights `w2`, `w3` and `w4` in the same way. 

In summary, the update formulas for all weights will be as following:

We can rewrite the update formulas in matrices as following

## Backward Pass

Using derived formulas we can find the new **weights**. 

> **Learning rate:** is a hyperparameter which means that we need to manually guess its value.


`delta = -0.27920`
<br>
`a = 0.05`
<br>
$\begin{bmatrix} W5 \\ W6 \end{bmatrix}$ = $\begin{bmatrix} 0.28 \\ 0.30 \end{bmatrix}$ - 0.05(-0.27920)$\begin{bmatrix}1.16  \\ 1.32 \end{bmatrix}$ = $\begin{bmatrix} 0.28 \\ 0.30 \end{bmatrix}$ - $\begin{bmatrix} -0.01619 \\ -0.01843 \end{bmatrix}$ = $\begin{bmatrix} 0.29619 \\ 0.31843 \end{bmatrix}$
<br>
<br>
$\begin{bmatrix} W1 & W3\\ W2 & W4 \end{bmatrix}$  = $\begin{bmatrix} 0.22 & 0.42\\ 0.24 & 0.16 \end{bmatrix}$  - (0.05)(-0.27920)$\begin{bmatrix} 2 \\3 \end{bmatrix}$. $\begin{bmatrix} 0.28 & 0.30 \end{bmatrix}$ = $\begin{bmatrix} 0.22 & 0.42\\ 0.24 & 0.16 \end{bmatrix}$  - $\begin{bmatrix} -0.00782 & - 0.00838 \\  -0.011 & -0.01256 \end{bmatrix}$ = $\begin{bmatrix} 0.22782 & 0.42838\\0.2510  & 0.17256 \end{bmatrix}$ 
<br>
<br>
Now, using the new **weights** we will repeat the forward passed
<br>
<br>
$\begin{bmatrix}2 & 3\end{bmatrix}$ .  $\begin{bmatrix}0.227 & 0.429\\0.251 & 0.172\end{bmatrix}$ = $\begin{bmatrix}1.207 & 1.375\end{bmatrix}$  $\begin{bmatrix}0.296 \\0.318\end{bmatrix}$ = 0.7945
<br>
<br>
We can notice that the **prediction** `0.7945` is a little bit closer to **actual output** than the previously predicted one `0.7208`. We can repeat the same process of backward and forward pass until **error** is close or equal to zero.


# Assignment 2B

In [0]:
import numpy as np

In [0]:
# Initialize the weights
weights = {}
weights['w1'] = 0.22
weights['w2'] = 0.24
weights['w3'] = 0.42
weights['w4'] = 0.16
weights['w5'] = 0.28
weights['w6'] = 0.30

# Input
inp = {}
inp['i1'] = 2
inp['i2'] = 3



In [0]:
# Forward Propagation
def forward_prop(inp, weights):
  input_val = np.array([inp['i1'], inp['i2']])
  
  layer1 = np.array([[weights['w1'], weights['w3'] ],[weights['w2'], weights['w4']]])
  
  out1 = np.matmul(input_val, layer1)
  
  layer2 = np.array([weights['w5'], weights['w6']])
  
  out2 = np.matmul(out1, layer2)
  
  return out2

In [34]:
# Initial Forward Prop
val = forward_prop(inp, weights)
print(f'The initial prediction value is: {val}')

The initial prediction value is: 0.7208


In [0]:
# Backpropagation
def back_prop(inp, weights, lr, y_pred, y_true):
  error = (0.5) * (y_pred - y_true)**2
  
  delta = (y_pred - y_true)
  
  layer2 = np.array([[weights['w5']], [weights['w6']]])
  
  layer1 = np.array([[weights['w1'], weights['w3']],[weights['w2'], weights['w4']]])
  
  weights['w1'] -= lr * delta * inp['i1'] * weights['w5']
  
  weights['w2'] -= lr * delta * inp['i2'] * weights['w5']
  
  weights['w3'] -= lr * delta * inp['i1'] * weights['w6']
  
  weights['w4'] -= lr * delta * inp['i2'] * weights['w6']
  
  weights['w5'] -= lr * delta * (inp['i1']*weights['w1'] + inp['i2']*weights['w2'])
  
  weights['w6'] -= lr * delta * (inp['i1']*weights['w3'] + inp['i2']*weights['w4'])
  
  print(f'delta = {delta}\nlearning rate = {lr}')
  print('The updated weights are: ')
  
  [print(f'{i} = {weights[i]}') for i in weights]
  
  return weights
  
  

In [43]:
weights = back_prop(inp, weights, 0.05, val, 1.0) 
val = forward_prop(inp, weights)
print('The forward prop value now is: {val}')

delta = 0.08229943021349473
;earning rate = 0.05
The updated weights are: 
w1 = 0.2512877810088377
w2 = 0.2869316715132566
w3 = 0.4536677316035932
w4 = 0.2105015974053898
w5 = 0.34669573081668975
w6 = 0.3754733714854912
The forward prop value now is: {val}


0.8836562856724063