# Backpropagation Walkthrough

**Step-by-step manual computation** of forward and backward passes
through a tiny 2-layer network using our from-scratch framework.

We will see every matrix multiply, every activation derivative,
and every gradient ‚Äî fully transparent.

In [None]:
import sys
sys.path.insert(0, '..')

import numpy as np
np.set_printoptions(precision=6, suppress=True)

from src.core.activations import ReLU, Sigmoid, Softmax
from src.core.layer import DenseLayer
from src.core.losses import CrossEntropyLoss, MSELoss
from src.core.initializers import he_init
from src.validation.gradient_check import gradient_check_layer

## 1. Setup: Tiny Network

Architecture: `Input(2) ‚Üí Dense(2, 3, ReLU) ‚Üí Dense(3, 2, Softmax)`

We use a single sample for clarity.

In [None]:
# Single input sample (1, 2)
X = np.array([[1.0, 0.5]])
Y = np.array([[1.0, 0.0]])  # one-hot: class 0

# Layer 1: Dense(2‚Üí3) + ReLU
layer1 = DenseLayer(2, 3, activation=ReLU(), seed=42)
# Layer 2: Dense(3‚Üí2) + Softmax
layer2 = DenseLayer(3, 2, activation=Softmax(), seed=43)

print('W1 shape:', layer1.W.shape)
print('W1:\n', layer1.W)
print('\nW2 shape:', layer2.W.shape)
print('W2:\n', layer2.W)

## 2. Forward Pass ‚Äî Step by Step

### Layer 1: Z‚ÇÅ = X @ W‚ÇÅ + b‚ÇÅ, then A‚ÇÅ = ReLU(Z‚ÇÅ)

In [None]:
# Manual forward for layer 1
Z1 = X @ layer1.W + layer1.b
print('Z1 (pre-activation):', Z1)

A1 = np.maximum(0, Z1)  # ReLU
print('A1 (post-ReLU):    ', A1)

# Verify against our framework
A1_fw = layer1.forward(X)
print('Framework A1:      ', A1_fw)
np.testing.assert_allclose(A1, A1_fw)

### Layer 2: Z‚ÇÇ = A‚ÇÅ @ W‚ÇÇ + b‚ÇÇ, then ≈∂ = Softmax(Z‚ÇÇ)

In [None]:
Z2 = A1 @ layer2.W + layer2.b
print('Z2 (pre-softmax):', Z2)

# Softmax
exp_Z2 = np.exp(Z2 - Z2.max(axis=1, keepdims=True))
Y_hat = exp_Z2 / exp_Z2.sum(axis=1, keepdims=True)
print('≈∂  (softmax):   ', Y_hat)
print('Sum:', Y_hat.sum())

Y_hat_fw = layer2.forward(A1)
print('Framework ≈∂:    ', Y_hat_fw)
np.testing.assert_allclose(Y_hat, Y_hat_fw, atol=1e-10)

## 3. Loss Computation

$$L = -\frac{1}{m}\sum Y \cdot \log(\hat{Y})$$

In [None]:
loss_fn = CrossEntropyLoss()
loss = loss_fn.forward(Y_hat, Y)
print(f'Cross-Entropy Loss: {loss:.6f}')

# Manual
manual_loss = -np.sum(Y * np.log(Y_hat + 1e-15)) / Y.shape[0]
print(f'Manual Loss:        {manual_loss:.6f}')

## 4. Backward Pass ‚Äî Step by Step

### Combined softmax + cross-entropy gradient: dZ‚ÇÇ = ≈∂ ‚àí Y

In [None]:
# dZ2 = Y_hat - Y (combined gradient)
dZ2 = loss_fn.backward()
print('dZ2 (≈∂ - Y):', dZ2)
print('Manual:      ', Y_hat - Y)

### Layer 2 backward:

- $dW_2 = \frac{1}{m} A_1^T \cdot dZ_2$
- $db_2 = \frac{1}{m} \sum dZ_2$
- $dA_1 = dZ_2 \cdot W_2^T$

In [None]:
dA1 = layer2.backward(dZ2)
print('dW2:', layer2.dW)
print('db2:', layer2.db)
print('dA1 (to pass back):', dA1)

# Manual verification
m = X.shape[0]
dW2_manual = A1.T @ dZ2 / m
print('\nManual dW2:', dW2_manual)
np.testing.assert_allclose(layer2.dW, dW2_manual, atol=1e-10)

### Layer 1 backward (with ReLU derivative):

- $dZ_1 = dA_1 \odot \mathbb{1}(Z_1 > 0)$
- $dW_1 = \frac{1}{m} X^T \cdot dZ_1$
- $db_1 = \frac{1}{m} \sum dZ_1$

In [None]:
dX = layer1.backward(dA1)
print('dW1:', layer1.dW)
print('db1:', layer1.db)
print('dX (input grad):', dX)

## 5. Gradient Checking

Verify our analytical gradients vs. numerical approximation.

In [None]:
print('Layer 1 gradient check:')
errors1 = gradient_check_layer(layer1, X, dA1, verbose=True)

print('\nLayer 2 gradient check:')
errors2 = gradient_check_layer(layer2, A1_fw, dZ2, verbose=True)

## Summary

We traced every single computation:

1. **Forward**: `X ‚Üí Z‚ÇÅ = XW‚ÇÅ + b‚ÇÅ ‚Üí A‚ÇÅ = ReLU(Z‚ÇÅ) ‚Üí Z‚ÇÇ = A‚ÇÅW‚ÇÇ + b‚ÇÇ ‚Üí ≈∂ = Softmax(Z‚ÇÇ)`
2. **Loss**: `L = -Œ£ Y¬∑log(≈∂) / m`
3. **Backward**: `dZ‚ÇÇ = ≈∂ - Y ‚Üí dW‚ÇÇ, db‚ÇÇ, dA‚ÇÅ ‚Üí dZ‚ÇÅ = dA‚ÇÅ ‚äô ùüô(Z‚ÇÅ>0) ‚Üí dW‚ÇÅ, db‚ÇÅ`

Every gradient was verified against numerical differentiation. ‚úÖ