<h1 style="position:relative; text-align:center">Backpropagation algorithm</h1> 

## Review NN

<img src="https://th.bing.com/th/id/OIP.GOmyJGzqoxcPYnEpXoU_kAHaD2?rs=1&pid=ImgDetMain" style="display: block;
margin-left: auto;
margin-right: auto;
width: 50%;">

## MLP algorithm

**abstraction**

The MLP, or Multilayer Perceptron, is a type of artificial neural network known for its capability to classify data that is not linearly separable1. It’s a supervised learning algorithm that learns a function $$f(⋅):R_m→R_o$$

where $m$ is the number of dimensions for input layer and $o$ is the number of dimensions for output layer.

The key characteristics of an MLP include:

- **Fully Connected Layers:** Every neuron in one layer connects with a certain weight to every neuron in the following layer.

- **Activation Function:** Nonlinear functions that help the network learn complex patterns.

- **Backpropagation:** The method used for training the network, involving forward propagation of inputs and backward propagation of errors.

## What is Backpropagation ?

Backpropagation is a method used to calculate the gradient of the loss function with respect to the weights of the network. This gradient is used to adjust or update the network's weights depending on its influence on the network's total prediction error. If we can continuously reduce the prediction error of each weight, we will eventually obtain a set of weights that can make sufficiently good predictions.

### Chain rule

$$ f(x) = A(B(C(x))) ⟶ f'(x) = f'(A)A'(B)B'(C)C'(x)$$

Ex:

- $A = 2z+1 ⟶ A'(z) = 2$
- $z = 3x+3 ⟶ z'(x) = 3$

then $ A'(x) = A'(z).z'(x) = 2.3 = 6$


<img src="https://github.com/PhuongNam2k4/Study/assets/156770604/9ec4617f-161a-4972-8f5e-c430cf71dbfb" style="display: block;
margin-left: auto;
margin-right: auto;
width: 50%;">

## Implement


In [10]:
import numpy as np


In [11]:
# sigmoid
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# gradient
def sigmoid_derivative(x):
    return x * (1 - x)

# initialze neural network
def initialize_network(input_size, hidden_size, output_size):
    np.random.seed(1)
    hidden_weights = np.random.uniform(size=(input_size, hidden_size))
    hidden_bias = np.random.uniform(size=(1, hidden_size))
    output_weights = np.random.uniform(size=(hidden_size, output_size))
    output_bias = np.random.uniform(size=(1, output_size))
    return hidden_weights, hidden_bias, output_weights, output_bias


In [12]:
def forward_propagate(inputs, hidden_weights, hidden_bias, output_weights, output_bias):
    hidden_layer_activation = np.dot(inputs, hidden_weights) + hidden_bias
    hidden_layer_output = sigmoid(hidden_layer_activation)
    
    output_layer_activation = np.dot(hidden_layer_output, output_weights) + output_bias
    predicted_output = sigmoid(output_layer_activation)
    
    return hidden_layer_output, predicted_output


In [13]:
def backpropagate(inputs, hidden_layer_output, predicted_output, actual_output, hidden_weights, hidden_bias, output_weights, output_bias, learning_rate):
    error = actual_output - predicted_output
    d_predicted_output = error * sigmoid_derivative(predicted_output)
    
    error_hidden_layer = d_predicted_output.dot(output_weights.T)
    d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)
    
    # update weight and bias
    output_weights += hidden_layer_output.T.dot(d_predicted_output) * learning_rate
    output_bias += np.sum(d_predicted_output, axis=0, keepdims=True) * learning_rate
    hidden_weights += inputs.T.dot(d_hidden_layer) * learning_rate
    hidden_bias += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate
    
    return hidden_weights, hidden_bias, output_weights, output_bias


In [14]:
def train(inputs, actual_output, input_size, hidden_size, output_size, epochs, learning_rate):
    hidden_weights, hidden_bias, output_weights, output_bias = initialize_network(input_size, hidden_size, output_size)
    
    for epoch in range(epochs):
        hidden_layer_output, predicted_output = forward_propagate(inputs, hidden_weights, hidden_bias, output_weights, output_bias)
        hidden_weights, hidden_bias, output_weights, output_bias = backpropagate(
            inputs, hidden_layer_output, predicted_output, actual_output, hidden_weights, hidden_bias, output_weights, output_bias, learning_rate)
        
        if epoch % 1000 == 0:
            loss = np.mean(np.square(actual_output - predicted_output))
            print(f'Epoch {epoch} loss: {loss}')
    
    return hidden_weights, hidden_bias, output_weights, output_bias


In [15]:
# training data
inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
actual_output = np.array([[0], [1], [1], [0]])

# initialize neural
input_size = 2
hidden_size = 2
output_size = 1
epochs = 10000
learning_rate = 0.1

# train
hidden_weights, hidden_bias, output_weights, output_bias = train(inputs, actual_output, input_size, hidden_size, output_size, epochs, learning_rate)

# evaluate
hidden_layer_output, predicted_output = forward_propagate(inputs, hidden_weights, hidden_bias, output_weights, output_bias)
print("Predicted Output: \n", predicted_output)


Epoch 0 loss: 0.28014363590911784
Epoch 1000 loss: 0.24971731456941582
Epoch 2000 loss: 0.24834256246704567
Epoch 3000 loss: 0.23517320796329574
Epoch 4000 loss: 0.18761745745277092
Epoch 5000 loss: 0.11376191085651667
Epoch 6000 loss: 0.028482725248000107
Epoch 7000 loss: 0.012042153641357224
Epoch 8000 loss: 0.007162910776260677
Epoch 9000 loss: 0.0049834071198557265
Predicted Output: 
 [[0.06367371]
 [0.94086271]
 [0.94109457]
 [0.06401166]]
