### 1. One function inside another (basic chain idea)
Think of this as a two-step pipeline.

In [1]:
#Step 1: inner function
def g(x):
    return x*3
# Step 2: outer function
def f(u):
    return u+5
x = 2
u = g(x) # forward pass through g
y = f(u) #forward pass through f

print("Output:",y)


Output: 11


### Intuition
* x affects u
* u affects y
* So x affects y through u

### 2. “How much did x influence y?” (manual gradient flow)

We track how sensitive each step is.

In [2]:
x = 2
# forward 
u = x*3
y = u + 5
# backward (responsibility passing)
dy_du = 1 # y changes 1-to-1 with u
du_dx = 3 # u chnages 3-to-1 with x

dy_dx = dy_du*du_dx

print("Gradient of y wrt x:",dy_dx)


Gradient of y wrt x: 3


#### Key idea
* Each layer says: “If I change, how much do I affect the next thing?”
* Multiply responsibilities as you go backward

### 3. Tiny neural network (1 neuron, 2 layers)

Now it starts to look like backprop.

In [3]:
# input
x =1.5

# parameters
w1,w2 = 2.0, 3.0
# forward pass
h = x*w1 # layer 1
y = h*w2 # layer 2
print("Output:", y)


Output: 9.0


#### Backward pass (intuition only)

In [4]:
# final output sensitivity 
dy = 1
# layer 2 responsibiility 
dy_dh = w2
dh = dy*dy_dh

# layer 1 responsibility 
dh_dx = w1
dx =dh*dh_dx
print("How much x influenced y:",dx)

How much x influenced y: 6.0


#### What happened
* Output says: “I depend on layer 2”
* Layer 2 says: “I depend on layer 1”
* Layer 1 says: “I depend on input”

This is backpropagation.

### 4. Add a non-linearity (real neural network behavior)

In [5]:
import math 
def relu(x):
    return max(0,x)
def relu_grad(x):
    return 1 if x>0 else 0
x = 2.0
w = 1.5
# forward
z = x*w
a = relu(z)

print("Output:",a)

Output: 3.0


#### Backward (gradient flows only if active)

In [6]:
da = 1 
da_dz = relu_grad(z)
dz_dx = w

dx = da*da_dz*dz_dx
print("Gradient wrt x:",dx)

Gradient wrt x: 1.5


### Intuition
* ReLU can block gradient flow
* If a neuron is “off”, it passes no responsibility backward

### 5. Multiple layers = chain rule in action

In [7]:
x = 1.0
w1,w2,w3 = 2.0,3.0,4.0
# forward
a1 = x*w1
a2 = a1*w2
y = a2*w3

print("Output:",y)

Output: 24.0


In [8]:
# backward
dy = 1
dy_da2 =w3
da2_da1 =w2
da1_dx =w1

dx = dy*dy_da2*da2_da1*da1_dx

print("Gradient wrt x:",dx)

Gradient wrt x: 24.0


#### Mental model
* Forward pass: values flow
* Backward pass: blame flows
* Each layer scales the blame

### 6. Why this matters for learning

In [9]:
learning_rate =0.1
w =2.0
x =3.0
# forward
y = w*x
loss = (y-10)**2
# backward
dloss_dy = 2*(y-10)
dy_dw = x

dw = dloss_dy * dy_dw
w =w - learning_rate*dw

print("Updated weight:",w)

Updated weight: 4.4


#### Interpretation
* Loss tells output it did something wrong
* Output tells weight how responsible it was
* Weight adjusts itself

#### Practice next
* Add one more layer and backprop manually
* Replace ReLU with sigmoid and see gradient shrinking
* Print gradients at each layer
* Intentionally break gradient flow and observe learning stop