 # Chain Rule of Derivatives in Backward Propagation
 The chain rule is the core math behind backward propagation in neural networks. It allows us to compute how the loss function changes with respect to each weight by breaking down the complex derivatives into smaller parts.

## 🔍 Why We Need Chain Rule in Backprop:
In a multilayer network, output depends on the composition of multiple functions:

Loss → Output → Activation → Weighted sum → Inputs

To find how loss changes with respect to each weight (∂L/∂w), we use the chain rule.

## 🔣 Chain Rule (Basic Form):
If

y = f(g(x)),  
then dy/dx = f'(g(x)) × g'(x)


## 📘 In Backpropagation (Simple Example):
Let’s say:

z = w·x + b (linear transformation)

a = activation(z) (like sigmoid or ReLU)

L = loss(a, y) (loss function)

To update w, we need:

∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

Each term is a small derivative:

   1. ∂L/∂a: How loss changes with output

   2. ∂a/∂z: How output changes with activation

   3. ∂z/∂w: How z changes with weight (this is x)

This is exactly the chain rule in action.

## 🔗 Backpropagation Uses Chain Rule Layer-by-Layer:
In deeper networks:

∂L/∂w₁ = ∂L/∂a₃ × ∂a₃/∂z₃ × ∂z₃/∂a₂ × ∂a₂/∂z₂ × ∂z₂/∂a₁ × ∂a₁/∂z₁ × ∂z₁/∂w₁

It continues backwards, applying the chain rule from output to input — hence the name backpropagation.

## ✅ Summary:
The chain rule helps break down a complicated derivative into small, manageable pieces.

It’s essential in updating weights during training.

Without it, deep learning wouldn't work!

