
# ðŸ“˜ Part 4: Chain Rule for Machine Learning

This notebook explains the **Chain Rule**, the mathematical idea that makes
**backpropagation and neural network training possible**.

Focus:
- Intuition
- Flow of gradients
- ML relevance

No deep neural network math, only foundations.
---



## 1. Why the Chain Rule Matters in ML

In Machine Learning:
- Models are built from **layers of functions**
- Loss depends on parameters **indirectly**

Chain Rule answers:
> *How does a small change in an early parameter affect the final loss?*


In [None]:

import numpy as np
import matplotlib.pyplot as plt



## 2. Simple Function Composition

Consider:
y = f(x)  
z = g(y)

Then:
z = g(f(x))

The output depends on x **through y**.



## 3. Chain Rule (Core Idea)

If:
z = g(y) and y = f(x)

Then:
dz/dx = (dz/dy) Â· (dy/dx)

This is the **Chain Rule**.



## 4. Simple Numerical Example

Let:
y = xÂ²  
z = 3y + 1


In [None]:

def f(x):
    return x**2

def g(y):
    return 3*y + 1

x = 2

dy_dx = 2*x
dz_dy = 3

print("dz/dx via chain rule:", dz_dy * dy_dx)



## 5. Why We Need Chain Rule in ML

Loss functions depend on:
- Predictions
- Predictions depend on parameters

So:
Loss â†’ Prediction â†’ Parameter

Chain Rule allows gradients to **flow backward**.



## 6. ML-style Example (Single Parameter)

Prediction:
yÌ‚ = w Â· x

Loss:
L = (yÌ‚ âˆ’ y)Â²

Loss depends on w through yÌ‚.


In [None]:

x = 2
y = 5
w = 1.0

y_hat = w * x
loss = (y_hat - y)**2

dL_dyhat = 2*(y_hat - y)
dyhat_dw = x

print("Gradient using chain rule:", dL_dyhat * dyhat_dw)



## 7. Backpropagation Intuition

Backpropagation is:
- Repeated application of the chain rule
- From loss back to parameters
- Layer by layer

Neural networks are just **many chain rules stacked together**.



## 8. Computational Graph View

Think of models as graphs:
- Nodes = operations
- Edges = flow of values

Gradients flow **backward** through the graph.



## 9. ML Interpretation (Very Important)

Chain Rule:
- Enables gradient computation
- Makes deep learning possible
- Works efficiently via backpropagation

Without chain rule:
- No neural networks
- No deep learning



## ðŸ§ª Practice (Thinking-Based)

1. Why canâ€™t we differentiate loss directly w.r.t parameters?
2. How does chain rule simplify complex models?
3. Why is backpropagation efficient?
4. How does this scale to deep networks?



## ðŸ“Œ Calculus for ML â€“ Completion

You have now completed the **core calculus required for Machine Learning**:
- Derivatives
- Partial derivatives
- Gradients & gradient descent
- Chain rule (backpropagation intuition)

Next, we will **apply this math directly to ML algorithms**.
