<a target="_blank" href="https://colab.research.google.com/github/FranQuant/the_ai_engineer_capstones/blob/main/capstones/Week02_backprop/02_pytorch_no_autograd.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>

# 02 — PyTorch Implementation (No Autograd)

## 1. Imports & Deterministic Seeds

In [1]:
# ============================================
# 1. Imports & Deterministic Seeds (match NB01)
# ============================================
import torch
import numpy as np

torch.manual_seed(42)
rng = np.random.default_rng(42)

## 2. Synthetic Dataset (same as Notebook 01)

In [2]:
# ============================================
# 2. Synthetic Dataset (match NB01 exactly)
# ============================================
N = 500  # same as Notebook 01

X_np = rng.uniform(-1, 1, size=(N, 2))
y_np = (X_np[:, 0] * X_np[:, 1] < 0).astype(np.float32)

# Convert to PyTorch (float32 everywhere)
X = torch.tensor(X_np, dtype=torch.float32)
y = torch.tensor(y_np, dtype=torch.float32)

## 3. NumPy Reference Forward Pass (from Notebook 01)
To ensure numerical parity between the NumPy and PyTorch implementations,
we replicate the minimal forward-pass functions from Notebook 01. These
are used for direct comparison in Section 8.

In [3]:
# ============================================
# 3. NumPy Reference Forward Pass (from NB01)
# ============================================

def relu_np(x):
    return np.maximum(0, x)

def forward_single(x, W1, b1, W2, b2):
    a1 = W1 @ x + b1        # (h,)
    h  = relu_np(a1)        # (h,)
    f  = W2 @ h + b2        # (1,)
    return a1, h, float(f[0])  # explicit scalar extract

## 4 .NumPy Model Parameters (for comparison)

Notebook 02 needs standalone NumPy parameters to reproduce the exact
forward pass used in Notebook 01. These are synchronized with the
PyTorch parameters in Section 5 so both implementations produce
identical outputs.

In [4]:
# ============================================
# 4. NumPy Model Parameters (match NB01 exactly)
# ============================================

d, h = 2, 4

# Weight initialization: small Gaussian (std = 0.1), cast to float32
W1 = rng.normal(0.0, 0.1, size=(h, d)).astype(np.float32)
W2 = rng.normal(0.0, 0.1, size=(1, h)).astype(np.float32)

# Biases as float32
b1 = np.zeros((h,), dtype=np.float32)
b2 = np.zeros((1,), dtype=np.float32)

## 5. Model Parameters in PyTorch (No Autograd)

In [5]:
# ============================================
# 5. PyTorch Parameters (synced with NumPy)
# ============================================

# Create tensors with the same shapes and dtypes as NumPy params
W1_t = torch.empty((h, d), dtype=torch.float32)
b1_t = torch.empty((h,),   dtype=torch.float32)
W2_t = torch.empty((1, h), dtype=torch.float32)
b2_t = torch.empty((1,),   dtype=torch.float32)

# Disable autograd for this notebook
for t in [W1_t, b1_t, W2_t, b2_t]:
    t.requires_grad_(False)

# Sync PyTorch parameters with NumPy (safe in-place copy)
W1_t.copy_(torch.tensor(W1, dtype=torch.float32))
b1_t.copy_(torch.tensor(b1, dtype=torch.float32))
W2_t.copy_(torch.tensor(W2, dtype=torch.float32))
b2_t.copy_(torch.tensor(b2, dtype=torch.float32))

tensor([0.])

## 6. Activation Function
#### Match NumPy ReLU:

In [6]:
def relu_t(x):
    return torch.clamp(x, min=0.0)

## 7. Forward Pass (PyTorch)

In [7]:
# ============================================
# 7. Forward Pass (PyTorch) — single-sample only
# ============================================

def forward_torch(x, W1, b1, W2, b2):
    """
    Single-sample forward pass matching NumPy forward_single.
    x:  (d,)
    W1: (h, d)
    b1: (h,)
    W2: (1, h)
    b2: (1,)
    Returns:
        a1: pre-activation (h,)
        h:  hidden layer (h,)
        f:  scalar output (float)
    """
    a1 = x @ W1.T + b1        # (h,)
    h  = relu_t(a1)
    f  = W2 @ h + b2
    return a1, h, f.item()

## 8. Loss Function
#### Match NumPy MSE:

In [8]:
# ============================================
# 8. Loss Function (match NumPy exactly)
# ============================================

def mse_loss_t(f, y):
    """
    Mean-squared error loss for a single sample.
    Matches the NumPy definition L = 0.5 * (f - y)**2.
    f: Python float
    y: torch scalar tensor (float32)
    Returns: torch scalar tensor
    """
    return 0.5 * (f - y)**2

## 9. Numerical Consistency Test (NumPy vs Torch)
We compare the NumPy output from Notebook 01 with the PyTorch output here.

Pick a single sample:

In [9]:
# ============================================
# 9. Numerical Consistency Test (NumPy vs Torch)
# ============================================

i = 0  # pick sample
x_i_np = X_np[i]              # NumPy input
y_i   = y[i]                  # Torch scalar tensor target

# NumPy forward pass
a1_np, h_np, f_np = forward_single(x_i_np, W1, b1, W2, b2)   # scalar f_np (Python float)

# PyTorch forward pass
x_i_t = X[i]                  # torch.float32
a1_t, h_t, f_t = forward_torch(x_i_t, W1_t, b1_t, W2_t, b2_t)  # f_t is Python float

# Print comparison
print("NumPy output f_np =", f_np)
print("Torch output f_t =", f_t)
print("Absolute difference =", abs(f_np - f_t))

# Assertion — ensures consistency
assert abs(f_np - f_t) < 1e-6, "NumPy and PyTorch outputs diverge!"

NumPy output f_np = -0.0017306688285198745
Torch output f_t = -0.0017306690569967031
Absolute difference = 2.2847682860233087e-10


## 9. Conclusion
In this notebook we re-implemented the 2-layer neural network forward pass in PyTorch without autograd and aligned it exactly with the NumPy model from Notebook 01. To ensure numerical parity, we matched:

- the same dataset (same RNG, same sampling),

- the same parameter initialization (small-σ Gaussian weights, zero biases),

- the same forward equations (linear → ReLU → linear),

- the same loss definition (0.5 * (f - y)**2),

- the same dtype (float32 end-to-end).

After synchronization, the PyTorch and NumPy outputs matched to 1e-10 precision, confirming full consistency.

This establishes a deterministic baseline.
Next, in Notebook 03, we introduce PyTorch autograd and compare auto-computed gradients to the manual derivatives from Notebook 01.