# 🔁 Backpropagation: How Neural Networks Learn

---

## 🎯 What You’ll Learn

After completing this lesson, you will be able to:

- Understand how **neural networks train and optimize** weights and biases
- Describe how to use **backpropagation** in a training algorithm
- Apply the **chain rule** to compute gradients of weights and biases

---

## 🧠 What is Backpropagation?

- **Backpropagation** is the algorithm used by neural networks to **adjust weights and biases** based on the **error** between predicted output and the ground truth.
- It works by **propagating the error backward** through the network and applying **gradient descent** to minimize the loss function.

---

## 🧪 The Training Process (Supervised Learning)

1. **Initialize weights and biases** randomly  
2. **Forward Propagation**: Compute prediction \( a_2 \)  
3. **Compute Error**:  
   Use the **Mean Squared Error (MSE)**:  
   $$
   E = \frac{1}{2}(T - a_2)^2
   $$
4. **Backpropagation**:  
   Use the **chain rule** to compute partial derivatives and update:  
   - \( w_2, b_2 \)  
   - \( w_1, b_1 \)

---

## 🧩 Example: 2-Neuron Network (1 Hidden, 1 Output)

- Input: \( x_1 = 0.1 \)

- Initial weights & biases:  
  - \( w_1 = 0.15, \quad b_1 = 0.4 \)  
  - \( w_2 = 0.45, \quad b_2 = 0.6 \)

🔸 **Forward Pass Results**:
- \( z_1 = 0.415, \quad a_1 = 0.6023 \)  
- \( z_2 = 0.921, \quad a_2 = 0.7153 \)  
- Ground truth: \( T = 0.25 \)

---

## 🔁 Backpropagation (Using Chain Rule)

**Derivatives:**

$$
\frac{\partial E}{\partial a_2} = -(T - a_2)
$$

$$
\frac{\partial a_2}{\partial z_2} = a_2 \cdot (1 - a_2)
$$

$$
\frac{\partial z_2}{\partial w_2} = a_1
$$



---

### 💡 Update Rule:

$$
w_2 = w_2 - \alpha \cdot \frac{\partial E}{\partial w_2}
$$

Where:

$$
\frac{\partial E}{\partial w_2} =
\frac{\partial E}{\partial a_2} \cdot
\frac{\partial a_2}{\partial z_2} \cdot
\frac{\partial z_2}{\partial w_2}
$$

---

## 📊 Example Weight Updates (Using \( \alpha = 0.4 \)):

| Parameter | Gradient       | New Value     |
|-----------|----------------|---------------|
| \( w_2 \) | 0.05706        | 0.427         |
| \( b_2 \) | 0.0948         | 0.612         |
| \( w_1 \) | 0.001021       | 0.1496        |
| \( b_1 \) | 0.01021        | 0.3959        |


---

## 🔁 Repeat the Process

We repeat:
1. Forward Propagation
2. Compute Error
3. Backpropagation (update weights & biases)

Until:
- A fixed number of **epochs** is reached, or
- The **error falls below a threshold** (e.g., 0.001)

---

## ✅ Summary of Training Algorithm

1. **Initialize** weights and biases randomly
2. **Forward Propagation**: Compute prediction
3. **Calculate Error** using cost function
4. **Backpropagation**: Update weights using chain rule + gradient descent
5. **Repeat** until convergence (threshold or max epochs)

