# 🎯 Backpropagation: Complete Visualization

## Understanding the Heart of Deep Learning

Complete mathematical derivation and visual walkthrough of backpropagation.

---


In [None]:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
print('✅ Ready!')


## Chain Rule Foundation

$$\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}$$

**Example**: Linear layer followed by ReLU

$$z = Wx + b$$
$$a = \text{ReLU}(z) = \max(0, z)$$
$$L = \frac{1}{2}(a - y)^2$$

**Backward pass**:
1. $\frac{\partial L}{\partial a} = a - y$
2. $\frac{\partial a}{\partial z} = \mathbb{1}(z > 0)$
3. $\frac{\partial z}{\partial W} = x^T$

**Result**: $\frac{\partial L}{\partial W} = (a-y) \cdot \mathbb{1}(z>0) \cdot x^T$


In [None]:
# Simple backprop example
class SimpleNetwork:
    def forward(self, x, W, b):
        self.x = x
        self.z = np.dot(W, x) + b
        self.a = np.maximum(0, self.z)  # ReLU
        return self.a
    
    def backward(self, grad_output, W):
        # Gradient through ReLU
        grad_relu = grad_output * (self.z > 0)
        # Gradient w.r.t weights
        grad_W = np.outer(grad_relu, self.x)
        grad_b = grad_relu
        # Gradient w.r.t input (for chain)
        grad_x = np.dot(W.T, grad_relu)
        return grad_W, grad_b, grad_x

print('✅ Backprop illustrated!')


## Key Insights

1. **Gradient flows backward** through computational graph
2. **Local gradients multiply** (chain rule)
3. **ReLU kills gradients** where z ≤ 0
4. **Vanishing gradients**: Deep networks struggle (solved by ResNet)
