## **1. Backpropagation**

Backpropagation (short for *backward propagation of errors*) is the **learning algorithm** used to train neural networks.

It answers one big question:

> “Given my network’s prediction and the true label, how do I adjust **each weight and bias** so that the next prediction is a little bit better?”

It works by:

1. Running a **forward pass** to compute the output and loss.
2. Running a **backward pass** to compute how much each weight contributed to the error.
3. Updating weights using **gradient descent**.

---

## **2. Why We Need It**

In a neural network, weights are deeply connected across layers:

* The output layer depends on the hidden layer.
* The hidden layer depends on the previous layer.
* Changing one weight affects many downstream activations.

To train such a network, we need to know:

* **How sensitive** the loss is to each weight.
* That’s where **gradients** come in — backprop efficiently computes them using the **chain rule of calculus**.

---

## **3. The Core Idea**

Let’s say we have a simple 2-layer network:

**Forward pass:**

1. Hidden layer pre-activation:

   $$
   z^{[1]} = W^{[1]}x + b^{[1]}
   $$
2. Hidden layer activation:

   $$
   a^{[1]} = f(z^{[1]})
   $$
3. Output pre-activation:

   $$
   z^{[2]} = W^{[2]}a^{[1]} + b^{[2]}
   $$
4. Output activation (prediction):

   $$
   a^{[2]} = g(z^{[2]})
   $$
5. Loss:

   $$
   L = \text{loss}(y, a^{[2]})
   $$

---

**Backward pass:**
We want $\frac{\partial L}{\partial W^{[1]}}, \frac{\partial L}{\partial b^{[1]}}, \frac{\partial L}{\partial W^{[2]}}, \frac{\partial L}{\partial b^{[2]}}$.

We apply the **chain rule**:

1. **Output layer**:

   $$
   \delta^{[2]} = \frac{\partial L}{\partial z^{[2]}} = a^{[2]} - y
   $$

   $$
   \frac{\partial L}{\partial W^{[2]}} = a^{[1]} \cdot (\delta^{[2]})^\top
   $$

   $$
   \frac{\partial L}{\partial b^{[2]}} = \delta^{[2]}
   $$

2. **Hidden layer**:
   Backpropagate error to hidden layer:

   $$
   \delta^{[1]} = (W^{[2]} \delta^{[2]}) \odot f'(z^{[1]})
   $$

   $$
   \frac{\partial L}{\partial W^{[1]}} = x \cdot (\delta^{[1]})^\top
   $$

   $$
   \frac{\partial L}{\partial b^{[1]}} = \delta^{[1]}
   $$

Here:

* $\odot$ means element-wise multiplication.
* $f'(z)$ is the derivative of the activation function.

---

## **4. Why the Name "Backpropagation"**

* We **propagate forward** to get predictions.
* We **propagate backward** the error signal layer by layer, computing gradients.
* The process is “backward” because we start from the output layer and go in reverse to the input.

---

## **5. A Tiny Intuition Example**

Imagine predicting house prices with 2 features:

* If the prediction is too high, we want to **reduce** the weights that led to that output.
* If the prediction is too low, we want to **increase** them.
* Backprop tells us **exactly how much** to tweak each weight.

---

## **6. Connection to Gradient Descent**

Once we have all partial derivatives:

$$
W := W - \eta \frac{\partial L}{\partial W}
$$

$$
b := b - \eta \frac{\partial L}{\partial b}
$$

where:

* $\eta$ = learning rate
* $\frac{\partial L}{\partial W}$ = gradient (slope of loss surface)

In [21]:
import numpy as np

# Sigmoid and derivative
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(z):
    s = sigmoid(z)
    return s * (1 - s)

# Data
x = 0.5
y_true = 1

# Parameters
W1, b1 = 0.4, 0.3
W2, b2 = 0.7, 0.2
lr = 0.1

# Forward pass
z1 = W1 * x + b1
a1 = sigmoid(z1)

z2 = W2 * a1 + b2
a2 = sigmoid(z2)

loss = 0.5 * (y_true - a2) ** 2

print(f"Forward pass -> a2: {a2:.6f}, Loss: {loss:.6f}")

# Backward pass
delta2 = (a2 - y_true) * sigmoid_derivative(z2)
dW2 = a1 * delta2
db2 = delta2

delta1 = (W2 * delta2) * sigmoid_derivative(z1)
dW1 = x * delta1
db1 = delta1

# Update parameters
W1 -= lr * dW1
b1 -= lr * db1
W2 -= lr * dW2
b2 -= lr * db2

print(f"Updated parameters -> W1: {W1}, b1: {b1}, W2: {W2}, b2: {b2}")

Forward pass -> a2: 0.653786, Loss: 0.059932
Updated parameters -> W1: 0.4006445672678901, b1: 0.3012891345357802, W2: 0.7048779400938787, b2: 0.20783656031705777
