The smallest repo that takes you from "what is a noise schedule" to a working classifier-free-guided sampler · without skipping the math.
Diffusion is the most consequential generative paradigm of the decade and most engineers know it as a black box behind a pipe(prompt) call. This repo derives it from scratch · forward process, score / ε-prediction parameterization, classifier-free guidance, DDIM sampler · and trains a small UNet on CIFAR-10 and CelebA, with sample grids checked in.
Status: scaffolding the noise schedule and forward process. First milestone: produce non-garbage CIFAR-10 samples.
Forward process · gradually destroy the signal:
Reverse process · learn ε with a UNet, denoise step by step:
Classifier-free guidance · at sample time, blend conditional and unconditional ε:
That's the whole game. The repo implements each line above with a pointer back to this section.
flowchart LR
X0[x_0 image] -->|forward q| XT[x_t noisy]
T[timestep t] --> N
XT --> N[ε-pred UNet]
C[class label c<br/>10% drop] --> N
N --> EP[predicted ε]
EP --> L[MSE loss]
subgraph Sampling
direction LR
XS[x_T ~ N(0, I)] --> NS[ε-pred UNet]
NS --> CFG[CFG blend]
CFG --> RS[DDIM step]
RS -.-> NS
RS --> X0H[x_0]
end
| Component | Choice | Why |
|---|---|---|
| Noise schedule | linear + cosine | reproduce DDPM and Nichol & Dhariwal |
| Backbone | UNet (GroupNorm + SiLU + sinusoidal time embed) | canonical |
| Conditioning | class-embedding + drop-prob 0.1 (CFG) | enables guidance scale at sample time |
| Sampler | ancestral DDPM + deterministic DDIM | quality + speed dial |
| Datasets | CIFAR-10, CelebA-HQ 64x64 | small enough to iterate |
| Eval | FID via clean-fid | comparable to literature |
# train DDPM on CIFAR-10 (single GPU, ~few hours)
uv run train.py --config configs/cifar10.yaml
# sample a 8x8 grid with CFG scale 3.0
uv run sample.py --ckpt out/cifar10/ema.pt --grid 8 --w 3.0
# DDIM 50-step sampling
uv run sample.py --ckpt out/cifar10/ema.pt --sampler ddim --steps 50
# eval FID against 50k real samples
uv run eval_fid.py --ckpt out/cifar10/ema.pt| Dataset | Model | Steps | Sampler | FID ↓ | Grid |
|---|---|---|---|---|---|
| CIFAR-10 | small UNet | 1000 | DDPM | TBD | TBD |
| CIFAR-10 | small UNet | 50 | DDIM | TBD | TBD |
| CelebA 64 | small UNet | 1000 | DDPM | TBD | TBD |
- Forward process + cosine/linear schedule + sanity-check forward animation
- UNet backbone (GroupNorm, sinusoidal time embed, attention at low res)
- DDPM training loop + EMA weights
- DDPM ancestral sampler
- DDIM sampler (deterministic, 50-step)
- Classifier-free guidance + ablation over guidance scale w
- FID eval harness
- CIFAR-10 sample grid + checkpoint on HF
- CelebA-HQ 64x64 sample grid + checkpoint on HF
- Companion blog post deriving DDPM from the forward process
- Lilian Weng · What are diffusion models? · single best derivation
- lucidrains/denoising-diffusion-pytorch · the reference implementation
- Ho et al. · DDPM (2020)
- Song et al. · DDIM (2021)
- Ho & Salimans · Classifier-free guidance (2022)
- Karras et al. · Elucidating the design space of diffusion models (EDM)