# Level 3: Build from scratch

In this final notebook, we strip away the framework (`tinygrad`) and implement the core CNN layers using only `numpy`. This proves we understand the mathematical operations happening under the hood.

In [1]:
import numpy as np

# Set random seed for reproducibility
np.random.seed(94)

## 1. Convolution (Conv2d)

We will implement a naive 2D convolution using nested loops. Real frameworks use optimized matrix multiplications (im2col), but loops are easier to understand.

In [2]:
def conv2d_forward(input, weight, bias=None):
    """
    Naive implementation of Conv2d.
    input: (B, Cin, H, W)
    weight: (Cout, Cin, K, K)
    bias: (Cout)
    """
    B, Cin, H, W = input.shape
    Cout, _, K, _ = weight.shape

    # Assuming stride=1, no padding for this demo
    H_out = H - K + 1
    W_out = W - K + 1

    out = np.zeros((B, Cout, H_out, W_out))

    print(f"Conv2d: Input {input.shape} -> Output {out.shape}")

    for b in range(B):
        for c_out in range(Cout):
            for h in range(H_out):
                for w in range(W_out):
                    # Extract patch from input
                    patch = input[b, :, h:h+K, w:w+K]
                    # Dot product with weight filter
                    # Sum over Cin, K, K
                    val = np.sum(patch * weight[c_out])
                    if bias is not None:
                        val += bias[c_out]
                    out[b, c_out, h, w] = val
    return out

# Verification
x_dummy = np.ones((1, 1, 3, 3))
w_dummy = np.ones((1, 1, 2, 2))
out_dummy = conv2d_forward(x_dummy, w_dummy)
print(f"Dummy Output:\n{out_dummy}")
# Expected: 4.0 in 2x2 grid (since 1*1 sum over 2x2 area is 4)

Conv2d: Input (1, 1, 3, 3) -> Output (1, 1, 2, 2)
Dummy Output:
[[[[4. 4.]
   [4. 4.]]]]


### Illustration: Edge Detection with Sobel Kernels

The Sobel operator detects edges by computing image gradients in the X and Y directions. Our `conv2d_forward` function applies these kernels to find vertical and horizontal edges.

![Convolution: Edge Detection with Sobel Kernels](images/conv2d_edge_detection.png)


## 2. ReLU Activation

Rectified Linear Unit: returns `x` if `x > 0`, else `0`.

In [3]:
def relu_forward(x):
    return np.maximum(0, x)

# Verification
print(f"ReLU([-1, 0, 1]) = {relu_forward(np.array([-1, 0, 1]))}")

ReLU([-1, 0, 1]) = [0 0 1]


### Illustration: ReLU in Action

Negative values (shown in blue) become zero, while positive values (red) pass through unchanged. The mask shows which values were zeroed out.

![ReLU: Zeroing Negative Activations](images/relu_heatmap.png)


## 3. Max Pooling

Reduces the spatial dimensions by taking the maximum value in a window.

In [4]:
def max_pool_forward(input, kernel_size=2, stride=2):
    B, C, H, W = input.shape

    H_out = (H - kernel_size) // stride + 1
    W_out = (W - kernel_size) // stride + 1

    out = np.zeros((B, C, H_out, W_out))

    print(f"MaxPool: Input {input.shape} -> Output {out.shape}")

    for b in range(B):
        for c in range(C):
            for h in range(H_out):
                for w in range(W_out):
                    h_start = h * stride
                    w_start = w * stride
                    patch = input[b, c, h_start:h_start+kernel_size, w_start:w_start+kernel_size]
                    out[b, c, h, w] = np.max(patch)
    return out

# Verification
x_pool = np.array([[1, 2], [3, 4]]).reshape(1, 1, 2, 2)
print(f"MaxPool(2x2):\n{max_pool_forward(x_pool, 2, 2)}")

MaxPool: Input (1, 1, 2, 2) -> Output (1, 1, 1, 1)
MaxPool(2x2):
[[[[4.]]]]


### Illustration: MaxPool in Action

Each 2×2 block is reduced to its maximum value, halving the spatial dimensions. The green cells in the rightmost panel show which values were selected as the maximum in each block.

![MaxPool: Spatial Downsampling via Local Maxima](images/maxpool_heatmap.png)


## 4. Integration

Let's pass a random input through our manual layers to observe volume compression.

In [5]:
# Random Input Image (1 batch, 1 channel, 28x28)
x = np.random.randn(1, 1, 28, 28)

# Weights for Conv Layer (32 filters, 1 input channel, 3x3 kernel)
w = np.random.randn(32, 1, 3, 3)

print("--- Forward Pass ---")
# 1. Convolution
out_conv = conv2d_forward(x, w)

# 2. ReLU
out_relu = relu_forward(out_conv)

# 3. MaxPool
out_pool = max_pool_forward(out_relu, 2, 2)

print("\n--- Shapes ---")
print(f"Input:    {x.shape}")
print(f"Conv:     {out_conv.shape} (28 -> 26)")
print(f"ReLU:     {out_relu.shape}")
print(f"Pool:     {out_pool.shape} (26 -> 13)")

--- Forward Pass ---
Conv2d: Input (1, 1, 28, 28) -> Output (1, 32, 26, 26)
MaxPool: Input (1, 32, 26, 26) -> Output (1, 32, 13, 13)

--- Shapes ---
Input:    (1, 1, 28, 28)
Conv:     (1, 32, 26, 26) (28 -> 26)
ReLU:     (1, 32, 26, 26)
Pool:     (1, 32, 13, 13) (26 -> 13)


### Illustration: Full CNN Pipeline

A digit passes through the complete pipeline: **Conv2d** (with 8 different kernels) → **ReLU** (zeroing negatives) → **MaxPool** (spatial downsampling). Each column shows a different filter's response at each stage.

![Full CNN Pipeline: Input → Conv → ReLU → MaxPool](images/full_pipeline.png)
