# Homework: Diffusion Models — Exploring the Cosine Schedule

**Based on:** DDPM Tutorial
**References:** Ho et al. (2020); Nichol & Dhariwal (2021)  
**Estimated time:** 20–30 minutes  

---

## Background

In the tutorial you trained a DDPM on the **S-Curve** dataset using a **linear** noise schedule.

Nichol & Dhariwal (2021) showed that a **cosine schedule** preserves signal more evenly across timesteps, which can improve sample quality — especially for low-resolution data.

## Your Task

**Train the same DDPM architecture on the `make_moons` dataset using a cosine noise schedule**, then compare results to the linear schedule.

Specifically:

1. **Generate the Moons dataset** using `sklearn.datasets.make_moons` (2D, ~10,000 points, normalized)
2. **Implement the cosine schedule** for $\bar{\alpha}_t$ as defined in Nichol & Dhariwal (2021)
3. **Train the same `SimpleNoisePredictor`** model on this data with the cosine schedule  
4. **Sample and visualize** generated points vs real data
5. **Compare:** Also train with the linear schedule and show both results side by side

## Hint Before You Start

**Most of the code you need is already in the tutorial!** This homework is designed as a small tweak — not a rewrite. Specifically:

- The model class (`SinusoidalEmbedding`, `SimpleNoisePredictor`), `q_sample`, and `sample_ddpm` can all be **copied directly** from the tutorial with no changes.
- The training loop is also the same — just swap the dataset and schedule.
- The only truly new code you need to write is the **cosine schedule** (Task 2) and deriving `betas` from it.

So: open the tutorial side by side, reuse what you need, and focus your energy on understanding the cosine schedule and comparing results.

---
### Task 1: Load and plot the Moons dataset (2 pts)

Generate 10,000 points from `make_moons` with `noise=0.05`. Normalize to zero mean, unit std. Plot it.

In [None]:
# YOUR CODE HERE


---
### Task 2: Implement the cosine noise schedule (3 pts)

Implement the cosine schedule from Nichol & Dhariwal (2021):

$$\bar{\alpha}_t = \frac{f(t)}{f(0)}, \quad f(t) = \cos\left(\frac{t/T + s}{1+s} \cdot \frac{\pi}{2}\right)^2, \quad s = 0.008$$

Then derive `betas` from `alpha_bar` using $\beta_t = 1 - \bar{\alpha}_t / \bar{\alpha}_{t-1}$, clipped to max 0.999.

Plot both the linear and cosine $\bar{\alpha}_t$ on the same axes.

In [None]:
# YOUR CODE HERE


---
### Task 3: Train the model with the cosine schedule (3 pts)

Use `T=300`, `batch_size=256`, `n_epochs=100`, same model architecture as the tutorial. Plot the training loss.

In [None]:
# YOUR CODE HERE


---
### Task 4: Sample and compare (2 pts)

Generate 5,000 samples from your trained model. Create a side-by-side plot: real moons (left) vs generated samples (right).

**Bonus:** Also train a model with the linear schedule and show a 3-panel comparison: Real | Linear | Cosine.

In [None]:
# YOUR CODE HERE


---
### Task 5: Short answer (2 pts)

In 2–3 sentences: 
- Do you notice any difference in sample quality between linear vs cosine schedules on this dataset?  
- Why might the cosine schedule help (or not help much) for this simple 2D data?

*Your answer here:*

