
# unit 1.3 - Back-propagation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/culurciello/deep-learning-course-source/blob/main/source/lectures/13-backpropagation.ipynb)

Backpropagation is an optimization algorithm commonly used in training artificial neural networks. It's a supervised learning algorithm that adjusts the weights of the network to minimize the error between predicted and actual outputs.

## An example

Let's consider a simple two-layer neural network with two inputs (x1 and x2), two neurons in the first layer (hidden layer), and two neurons in the second layer (output layer). We'll use a mean squared error (MSE) loss function.

Here's the architecture:
- Inputs: x1, x2
- Hidden layer neurons: h1, h2
- Output layer neurons: o1, o2

![](images/backpropagation.png)

The forward pass can be expressed as follows:

$h_1 = w_{11} \cdot x_1 + w_{21} \cdot x_2 + b_1$

$h_2 = w_{12} \cdot x_1 + w_{22} \cdot x_2 + b_2$

$o_1 = w_{31} \cdot h_1 + w_{41} \cdot h_2 + b_3$

$o_2 = w_{32} \cdot h_1 + w_{42} \cdot h_2 + b_4$

Here, $w$ represents weights, $b$ represents biases, and the subscripts denote the connection between neurons.

Let's assume the target outputs are $y_1$ and $y_2$. The MSE loss is given by:
$L = \frac{1}{2} \sum_{i=1}^{2} (y_i - o_i)^2$

Now, we want to minimize this loss using backpropagation.

### 1. Compute the error terms at the output layer:

$ \delta_{o1} = (o_1 - y_1) \cdot \sigma'(o_1)$

$ \delta_{o2} = (o_2 - y_2) \cdot \sigma'(o_2)$

Here, $\sigma'$ is the derivative of the activation function used in the output layer (for simplicity, assume a linear activation, so $\sigma'(x) = 1$).

### 2. Compute the error terms at the hidden layer:

$ \delta_{h1} = (\delta_{o1} \cdot w_{31} + \delta_{o2} \cdot w_{32}) \cdot \sigma'(h1)$

$ \delta_{h2} = (\delta_{o1} \cdot w_{41} + \delta_{o2} \cdot w_{42}) \cdot \sigma'(h2)$


### 3. Update the weights and biases using the error terms and learning rate ($\alpha$):

$ w_{31}^{new} = w_{31} - \alpha \cdot \delta_{o1} \cdot h_1$

$ w_{41}^{new} = w_{41} - \alpha \cdot \delta_{o1} \cdot h_2$

$ w_{32}^{new} = w_{32} - \alpha \cdot \delta_{o2} \cdot h_1$

$ w_{42}^{new} = w_{42} - \alpha \cdot \delta_{o2} \cdot h_2$

$ w_{11}^{new} = w_{11} - \alpha \cdot \delta_{h1} \cdot x_1$

$ w_{21}^{new} = w_{21} - \alpha \cdot \delta_{h1} \cdot x_2$

$ w_{12}^{new} = w_{12} - \alpha \cdot \delta_{h2} \cdot x_1$

$ w_{22}^{new} = w_{22} - \alpha \cdot \delta_{h2} \cdot x_2$


### 4. Update the biases similarly:

$ b_3^{new} = b_3 - \alpha \cdot \delta_{o1}$

$ b_4^{new} = b_4 - \alpha \cdot \delta_{o2}$

$ b_1^{new} = b_1 - \alpha \cdot \delta_{h1}$

$ b_2^{new} = b_2 - \alpha \cdot \delta_{h2}$

This process is repeated iteratively until the network converges to a solution that minimizes the error. The learning rate ($\alpha$) is a hyperparameter that determines the step size during weight and bias updates. Adjustments may be needed based on the specific problem and dataset.
