## Simple MLP Neural Network using numpy, 1 neuron, 1 hidden layer

---

### Activation function - sigmoid 
$a = f(z)={\frac {1}{1+e^{-z}}}={\frac {e^{z}}{e^{z}+1}}$

### Backpropagation algorithm - weights update

$w_{i}= w_{i} - LR * {\frac {\partial E}{\partial w_{i}}}$\
$b_{i}= b_{i} - LR * {\frac {\partial E}{\partial b_{i}}}$

### Loss funtion (Mean squared error, MSE) 

$\frac{1}{2m}\sum_{i=1}^{m}(T_{i} - a_{2,i})^2$

### Examples of derivatives calculations

---
#### Updating $w_2$

$w_{2}= w_{2} - LR * {\frac {\partial E}{\partial w_{2}}}$

##### Chain rules

$E = \frac{1}{2}(T - a_2)^2  ---> {\frac {\partial E}{\partial a_{2}}}$

$a_2 = f(z_2)={\frac {1}{1+e^{-z_2}}}  ---> {\frac {\partial a_2}{\partial z_{2}}}$ 

$z_2 = a_1 * w_2 + b_2  ---> {\frac {\partial z_2}{\partial w_{2}}}$

---
${\frac {\partial E}{\partial w_{2}}} = {\frac {\partial E}{\partial a_{2}}} * {\frac {\partial a_2}{\partial z_{2}}} * {\frac {\partial z_2}{\partial w_{2}}}$

$ = (-T - a_2)) * (a_2(1 - a_2)) * (a_1)$

$ \boldsymbol w_{2}= w_{2} - LR * [(-T - a_2)) * (a_2(1-a_2)) * (a_1)]$


---
#### Updating $b_2$

$b_{2}= b_{2} - LR * {\frac {\partial E}{\partial b_{2}}}$

##### Chain rules
---
${\frac {\partial E}{\partial b_{2}}} = {\frac {\partial E}{\partial a_{2}}} * {\frac {\partial a_2}{\partial z_{2}}} * {\frac {\partial z_2}{\partial b_{2}}}$

$ = (-T - a_2)) * (a_2(1 - a_2)) * 1$

$ b_{2}= b_{2} - LR * [(-T - a_2)) * (a_2(1 - a_2)) * 1]$

---
#### Updating $w_1$

$w_{1}= w_{1} - LR * {\frac {\partial E}{\partial w_{1}}}$

##### Chain rules

$E = \frac{1}{2}(T - a_2)^2  ---> {\frac {\partial E}{\partial a_{2}}}$

$a_2 = f(z_2)={\frac {1}{1+e^{-z_2}}}  ---> {\frac {\partial a_2}{\partial z_{2}}}$ 

$z_2 = a_1 * w_2 + b_2  ---> {\frac {\partial z_2}{\partial a_{1}}}$

$a_1 = f(z_1)={\frac {1}{1+e^{-z_1}}}  ---> {\frac {\partial a_2}{\partial z_{1}}}$ 

$z_1 = x_1 * w_1 + b_1  ---> {\frac {\partial z_2}{\partial w_{1}}}$

---
${\frac {\partial E}{\partial w_{2}}} = {\frac {\partial E}{\partial a_{2}}} * {\frac {\partial a_2}{\partial z_{2}}} * {\frac {\partial z_2}{\partial a_{1}}} * {\frac {\partial a_1}{\partial z_{1}}} * {\frac {\partial z_1}{\partial w_{1}}}$

$ = (-T - a_2)) * (a_2(1 - a_2)) * (w_2) * (a_1(1 - a_1)) * x_1$

$ w_{1} = w_{1} - LR * [(-T - a_2)) * (a_2(1 - a_2)) * (w_2) * (a_1(1 - a_1)) * x_1]$

---
#### Updating $b_1$

$b_{1}= b_{1} - LR * {\frac {\partial E}{\partial b_{1}}}$

##### Chain rules

---
${\frac {\partial E}{\partial w_{2}}} = {\frac {\partial E}{\partial a_{2}}} * {\frac {\partial a_2}{\partial z_{2}}} * {\frac {\partial z_2}{\partial a_{1}}} * {\frac {\partial a_1}{\partial z_{1}}} * {\frac {\partial z_1}{\partial b_{1}}}$

$ = (-T - a_2)) * (a_2(1 - a_2)) * (w_2) * (a_1(1 - a_1)) * 1$

$ b_{1} = b_{1} - LR * [(-T - a_2)) * (a_2(1 - a_2)) * (w_2) * (a_1(1 - a_1)) * 1] $

#### Calculating weights

$ w_{2} = w_{2} - LR * [(-T - a_2)) * (a_2(1-a_2)) * (a_1)]$

$ w_{2} = 0.45 - 0.4 * 0.05706 = 0.427 $

---

$ b_{2} = b_{2} - LR * [(-T - a_2)) * (a_2(1 - a_2)) * 1]$

$ b_{2} = 0.65 - 0.4 * 0.0948 = 0.612 $

---
$ w_{1} = w_{1} - LR * [(-T - a_2)) * (a_2(1 - a_2)) * (w_2) * (a_1(1 - a_1)) * x_1]$

$ w_{1} = 0.15 - 0.4 * 0.001021 = 0.1496 $

---
$ b_{1} = b_{1} - LR * [(-T - a_2)) * (a_2(1 - a_2)) * (w_2) * (a_1(1 - a_1)) * 1] $

$ b_{1} = 0.40 - 0.4 * 0.01021 = 0.3959 $



### The algorithm

In [None]:
import numpy as np

# dataset

X = np.array([
[0.300, 0.250],
[1.000, 0.750],
[1.000, 0.500],
[0.350, 0.150],
[0.300, 0.350],
[0.050, 0.250],
[1.200, 0.700],
[0.800, 0.600]])

Y = np.array([
[0.000],
[1.000],
[1.000],
[0.000],
[0.000],
[0.000],
[1.000],
[1.000]])


In [None]:
# set learning rate
lr = 0.8

# set parameters for input layer
w12 = np.array([[0.5],[0.5]])
b1  = np.ones(shape=Y.shape)
w3  = np.array([[0.5]])
b2  = np.ones(shape=Y.shape)

In [None]:
'''Note on brodcasting in numpy'''
# https://numpy.org/doc/stable/user/basics.broadcasting.html
# The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations

---
**Note on the code below:**\
This command $X[:,0]$ returns a row vector, but we need to hstack with some other array with dimension 1xN while this command $X[:,[0]]$ gives us a column vector, so that weu can do hstack operation.

---

In [None]:
#specify number of iterations
epochs = 50000
# a paramter used to print first/last X epochs
epochs_print = 5

# main loop of the neural network:
    # - input layer, dimension=2, 
    # - 1 hidden layer, activation function = sigmoid
    # - output layer, activation function = sigmoid
for epoch in range(epochs):
    
    '''1st layer: neuron and activation function'''
    z1 = np.dot(X, w12) + b1
    a1 = 1/(1 + np.exp(-z1))
    
    '''2nd layer: neuron and activation function'''
    z2 = np.dot(a1, w3) + b2
    a2 = 1/(1 + np.exp(-z2))
    
    ''' calculate partial derivatives'''
    dE_dw3 = (-(Y - a2)) * (a2*(1 - a2)) * a1
    dE_db2 = (-(Y - a2)) * (a2*(1 - a2)) * 1
    dE_dw1 = (-(Y - a2)) * (a2*(1 - a2)) * w3 * a1*(1 - a1) * 𝑋[:,[0]]
    dE_dw2 = (-(Y - a2)) * (a2*(1 - a2)) * w3 * a1*(1 - a1) * 𝑋[:,[1]]
    dE_db1 = (-(Y - a2)) * (a2*(1 - a2)) * w3 * a1*(1 - a1) * 1   
    
    '''update weights'''
    w12[0] = w12[0] - lr * sum(dE_dw1)
    w12[1] = w12[1] - lr * sum(dE_dw2)
    b1     = b1 - lr * sum(dE_db1)
    w3     = w3 - lr * sum(dE_dw3)
    b2     = b2 - lr * sum(dE_db2)       
    
    '''calculate error (cost function)'''
    Err = 0.5 * np.power(Y - a2, 2)
    #print(f'Errors of the {epoch+1}:\n{Err}')    
    if epoch < epochs_print or epoch > (epochs - epochs_print) :
        print(f'Epoch = {epoch+1} -> Sum of Error:{sum(Err)}\n')

In [None]:
'''Test model'''
X_test = np.array([
[0.300, 0.250],
[1.000, 0.750]])

# forward pass using current weights
z1 = np.dot(X_test, w12) + b1[X_test.shape[0]]
a1 = 1/(1 + np.exp(-z1))
z2 = np.dot(a1, w3) + b2[X_test.shape[0]]
a2 = 1/(1 + np.exp(-z2))

print(f'Predicted value of the observation 1: {a2[0]}')
print(f'Predicted value of the observation 2: {a2[1]}')