# Day 5 — "How Learning Flows Backward": Chain Rule & Backpropagation Intuition

Backpropagation is the nervous system of a neural network: it sends responsibility backwards so each weight knows how to adjust.

## 1. Core Intuition

- Think of layers as gears. Loss tweaks the final gear, and the chain rule tells us how earlier gears moved.
- Backpropagation applies the chain rule efficiently through the entire stack.
- Gradients quantify how much each neuron/weight contributed to the final error.

## 2. Mathematical Story — The Chain of Influence

| Concept | Formula | Meaning |
| --- | --- | --- |
| Chain rule | `∂L/∂x = (∂L/∂y)(∂y/∂h₂)(∂h₂/∂h₁)(∂h₁/∂x)` | Influence ripples through each layer. |
| Backprop recurrence | `δ_{l} = δ_{l+1} * ∂h_{l+1}/∂h_l` | Re-use gradients moving backward. |
        | Responsibility | Each derivative indicates how strongly one stage affects the next. |

## 3. Python Implementation — Chain Rule Demo

`days/day05/code/backprop_demo.py` implements a tiny 2-layer network forward/backward pass.

In [1]:
from __future__ import annotations

import sys
from pathlib import Path


def find_repo_root(marker: str = "days") -> Path:
    path = Path.cwd()
    while path != path.parent:
        if (path / marker).exists():
            return path
        path = path.parent
    raise RuntimeError("Run this notebook from inside the repository tree.")

REPO_ROOT = find_repo_root()
if str(REPO_ROOT) not in sys.path:
    sys.path.append(str(REPO_ROOT))

from days.day05.code.backprop_demo import BackpropExample, forward, backward

example = BackpropExample()
h, y = forward(example.x, example.w1, example.w2)
grads = backward(example.x, h, y, example.target, example.w1, example.w2)
print("Hidden activation h=", h)
print("Output y=", y)
print("Gradients (dL/dw1, dL/dw2, dL/dx)=", grads)


Hidden activation h= 0.8336546070121552
Output y= -0.6669236856097243
Gradients (dL/dw1, dL/dw2, dL/dx)= (np.float64(0.2847480465272111), np.float64(-0.9728113065401505), np.float64(0.3416976558326533))


Observe how the single scalar error `(y - target)` travels backward through the chain, multiplying by local derivatives to produce gradients for each weight.

## 4. Visualization — Watching Gradients Flow

`days/day05/code/visualizations.py` animates forward values vs. gradient magnitudes across a toy chain (tanh → square → identity).

In [2]:
from days.day05.code.visualizations import anim_backprop_chain

RUN_ANIMATIONS = False

if RUN_ANIMATIONS:
    path = anim_backprop_chain()
    print(f"Saved backprop animation → {path}")
else:
    print('Set RUN_ANIMATIONS = True to regenerate GIFs in days/day05/outputs/.')


Set RUN_ANIMATIONS = True to regenerate GIFs in days/day05/outputs/.


## 5. Deep Learning & CV Connections

| Concept | What It Means in Practice |
| --- | --- |
| Chain rule | Every layer passes backward partial responsibility. |
| Backprop | Efficient, vectorized chain rule for huge parameter counts. |
| Gradient flow | Determines whether learning is stable (vanishing/exploding issues). |
| Jacobians | Local linear maps of each layer; products of Jacobians describe gradient flow. |
| Activations | ReLU keeps gradients alive; sigmoid/tanh may shrink them. |

## 6. Mini Exercises

1. Swap tanh for ReLU or sine in the demo and watch gradients change.
2. Add more functions (e.g., `f4 = sin`) to the chain; plot gradient magnitudes.
3. Modify the loss to `L = |output - target|` and derive derivatives manually.
4. Extend the code to a 2-layer MLP (matrix weights) and verify gradients numerically.

## 7. Key Takeaways

| Concept | Meaning |
| --- | --- |
| Chain rule | Describes how changes ripple through composite functions. |
| Backpropagation | Mechanized chain rule powering training. |
| Gradient signal | Shows how much each neuron contributed to the loss. |
| Activations | Influence whether gradients vanish or explode. |
| Gradient flow | Healthy gradients = healthy learning. |

> Backprop is how networks learn who to blame — and how to improve.