# Minimal Diffusion Lab (Student Edition)
Welcome! You'll build a diffusion model step-by-step. Each section mirrors the instructor solution but intentionally leaves key pieces out so you can reason about them. Use the provided hints, experiment often, and take notes about what surprised you.

> **Expectation:** You may consult AI tools, but you must understand and explain every line you accept. Write short reflections in the provided cells to solidify your learning.

## 0. Lab Survival Kit
- Work top-to-bottom, running cells one at a time.
- When you see **TODO**, stop and think before coding.
- Use the **Reflection** prompts; they're graded.
- If you get stuck, describe what you tried and why it failed before asking for help (human or AI).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Reproducibility helpers
RNG_SEED = 42
rng = np.random.default_rng(RNG_SEED)
print("✅ Imports ready. Seed:", RNG_SEED)

## 1. Build a toy spiral dataset
We want a 2D dataset that makes it easy to **see** whether denoising works. A noisy spiral does the trick.

**TODO: (estimate 30 min)** Complete `make_spiral`. Keep it simple: sample an angle `t`, map to `(x, y)`, add gentle Gaussian jitter, and normalize.

> **Hint:** The instructor version scales the radius linearly with the angle and then standardizes the data. You can mimic that or invent your own variation—just document it.

In [None]:
def make_spiral(n_samples=1000, noise=0.02):
    """Return a normalized (n_samples, 2) array shaped like a spiral."""
    # TODO: implement using rng for randomness
    raise NotImplementedError("Fill in make_spiral to proceed.")

X = make_spiral()
plt.figure(figsize=(4,4))
plt.scatter(X[:,0], X[:,1], s=5)
plt.title("Spiral dataset (normalized)")
plt.axis("equal")
plt.show()

## 2. Noise schedule $\beta_t$
A **noise schedule** tells us how much Gaussian noise to inject at each diffusion step. In DDPM we typically choose a simple schedule such as a linear ramp so that early steps barely disturb $x_0$ while later steps push the sample toward pure noise. The closed-form forward equation
$$ x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1-\bar{\alpha}_t}\, \epsilon $$
depends on the cumulative product $\bar{\alpha}_t = \prod_{s=1}^t (1-\beta_s)$, so carefully shaping $\beta_t$ directly controls how fast signal fades.

**TODOs:(estimate 20 min)**
1. Implement `make_linear_beta_schedule` so it returns `betas`, `alphas`, and the cumulative product `alpha_bars`.
2. Plot the schedule to see if it matches your intuition.

> **Hint:** Remember `alpha_t = 1 - beta_t`. Use `np.cumprod` for the running product.

In [None]:
def make_linear_beta_schedule(T=100, beta_start=1e-4, beta_end=5e-3):
    """Return betas, alphas, and cumulative alpha_bars for T steps."""
    # TODO: fill in this function using np.linspace and np.cumprod
    raise NotImplementedError

T = 100
betas, alphas, alpha_bars = make_linear_beta_schedule(T=T)

plt.figure(figsize=(5,4))
plt.plot(betas)
plt.title("Linear beta schedule")
plt.xlabel("t")
plt.ylabel("beta_t")
plt.show()

plt.figure(figsize=(5,4))
plt.plot(alpha_bars)
plt.title("Cumulative product: alpha_bar_t")
plt.xlabel("t")
plt.ylabel("alpha_bar_t")
plt.show()

## 3. Forward diffusion helper $q(x_t \mid x_0)$
The closed-form equation lets us jump directly to any timestep.

**TODO:(estimate 10 min)** Finish `q_sample` so it returns `(x_t, eps)` for either a scalar `t` or a vector of times. For the equation, go back to 
$ x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1-\bar{\alpha}_t}\, \epsilon $. 

> **Hint:** When `t` is a vector, you'll want to broadcast `alpha_bars[t-1]` to match the shape of `x0`.

In [None]:
def q_sample(x0, t, alpha_bars):
    """Sample x_t and record the actual noise epsilon."""
    # TODO: support scalar or vector t values
    raise NotImplementedError
    
# Visualize noising at different t values
ts_to_show = [1, T//4, T//2, T]
fig, axes = plt.subplots(1, len(ts_to_show), figsize=(12,3))
for i, tt in enumerate(ts_to_show):
    xt, _ = q_sample(X, tt, alpha_bars)
    axes[i].scatter(xt[:,0], xt[:,1], s=5)
    axes[i].set_title(f"t={tt}")
    axes[i].set_aspect('equal', 'box')
plt.suptitle("Forward diffusion: adding noise over time")
plt.show()

## 4. Multi layer neural network to predict noise
We'll train a small fully connected network that takes $[x_t, y_t, t/T]$ and predicts the noise $\epsilon$ that was added.

### TODOs inside the network module
1. Replace the `raise NotImplementedError` statements in `forward_network` and `backward_network` with actual linear + ReLU math.
2. Momentum is provided for you; just update the velocity buffers.

> **Hint:** This is a plain 3-layer MLP. Start from matrix multiplication + bias addition + ReLU, nothing fancy.

In [None]:
def relu(x):
    return np.maximum(0.0, x)

def relu_deriv(x):
    return (x > 0).astype(np.float32)

def make_network(in_dim=3, hidden=128, out_dim=2, rng=rng, scale=0.02, momentum=0.999):
    net = {
        "W1": rng.normal(0.0, scale, size=(in_dim, hidden)).astype(np.float32),
        "b1": np.zeros(hidden, dtype=np.float32),
        "W2": rng.normal(0.0, scale, size=(hidden, hidden)).astype(np.float32),
        "b2": np.zeros(hidden, dtype=np.float32),
        "W3": rng.normal(0.0, scale, size=(hidden, out_dim)).astype(np.float32),
        "b3": np.zeros(out_dim, dtype=np.float32),
        "momentum": momentum,
    }
    net["velocities"] = {
        "W1": np.zeros_like(net["W1"]),
        "b1": np.zeros_like(net["b1"]),
        "W2": np.zeros_like(net["W2"]),
        "b2": np.zeros_like(net["b2"]),
        "W3": np.zeros_like(net["W3"]),
        "b3": np.zeros_like(net["b3"]),
    }
    return net

def forward_network(net, x):
    # TODO: compute the forward pass and return (out, cache)
    raise NotImplementedError

def backward_network(net, cache, grad_out, lr=1e-3):
    # TODO: backprop through the network and apply SGD + momentum
    raise NotImplementedError

def mse_loss(pred, target):
    diff = pred - target
    return np.mean(diff**2), (2.0 / pred.shape[0]) * diff

network = make_network()
print("Network layers:", [k for k in network.keys() if not k.startswith('velocity')])

## 5. Train to predict noise
We'll use full-dataset steps (a.k.a. true steepest descent) for clarity. Every update:
1. Uses all samples in `X`.
2. Draws a random timestep for each sample.
3. Generates `x_t` with `q_sample`.
4. Feeds `[x_t, t/T]` into the network.
5. Minimizes MSE between predicted and true noise.

**TODOs inside `train`:**
- Replace the `pass` statements with actual training logic.
- Track an exponential moving average of the loss for smoother plotting.

> **Hint:** Most of the pieces are already written in previous sections—reuse them!

In [None]:
def train(network, X, T, betas, alphas, alpha_bars, steps=8000, lr=5e-2, smoothing=0.999):
    losses = []
    running = None
    for step in range(1, steps+1):
        # TODO: implement the training loop described above
        pass
    return np.array(losses)

losses = train(network, X, T, betas, alphas, alpha_bars)
plt.figure(figsize=(5,4))
plt.plot(losses)
plt.title("Training loss (EMA)")
plt.xlabel("step")
plt.ylabel("MSE")
plt.show()

### Reflection 2
What happens if you reduce the smoothing factor dramatically (e.g., from 0.999 to 0.5)? Why?

In [None]:
reflection_2 = "TODO: discuss smoothing behavior."
print(reflection_2)

## 6. Sampling (reverse diffusion)
With a trained network we can denoise from pure noise back to data.

**TODOs:**
- Implement `p_sample` based on the DDPM update equation in the markdown above.
- Use `p_sample_loop` to track snapshots at the timesteps in `ts_to_show` so you can visualize the denoising trajectory.

> **Hint:** The forward helper already computed $\sqrt{ar{lpha}_t}$ and $\sqrt{1-ar{lpha}_t}$. Reuse those patterns.

In [None]:
def p_sample(network, x_t, t, betas, alphas, alpha_bars):
    # TODO: implement reverse diffusion step
    raise NotImplementedError

def p_sample_loop(network, n_samples, T, betas, alphas, alpha_bars, ts_to_save=None):
    x_t = rng.normal(0.0, 1.0, size=(n_samples, 2)).astype(np.float32)
    snapshots = {}
    save_set = set(ts_to_save or [])
    for t in range(T, 0, -1):
        if t in save_set:
            snapshots[t] = x_t.copy()
        x_t = p_sample(network, x_t, t, betas, alphas, alpha_bars)
    snapshots[0] = x_t
    return x_t, snapshots

ts_to_show = [1, T//4, T//2, T]
gen, reverse_snaps = p_sample_loop(network, n_samples=1000, T=T, betas=betas, alphas=alphas, alpha_bars=alpha_bars, ts_to_save=ts_to_show)

plt.figure(figsize=(4,4))
plt.scatter(X[:,0], X[:,1], s=5, label="real")
plt.scatter(gen[:,0], gen[:,1], s=5, label="generated")
plt.legend()
plt.title("Real vs Generated (simple DDPM)")
plt.axis("equal")
plt.show()

fig, axes = plt.subplots(1, len(ts_to_show), figsize=(12,3))
for i, tt in enumerate(ts_to_show):
    pts = reverse_snaps.get(tt)
    axes[i].scatter(pts[:,0], pts[:,1], s=5)
    axes[i].set_title(f"t={tt}")
    axes[i].set_aspect('equal', 'box')
plt.suptitle("Reverse diffusion snapshots")
plt.show()

### Reflection 3
Compare the visual storytelling between the forward and reverse plots. What does each snapshot teach you about the diffusion process?

In [None]:
reflection_3 = "TODO: describe insights from the reverse snapshots."
print(reflection_3)

## 7. Experiments & reporting
Use this space to describe at least two experiments you ran (schedule tweaks, network size changes, etc.) and what you learned from each. Screenshots or plots are encouraged.

- Experiment A:
- Experiment B:

Wrap up with a concrete lesson learned about diffusion models.

In [None]:
lab_report = "TODO: summarize experiments and lessons learned."
print(lab_report)