Skip to content

Umarfarook1/Tiny-diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Typing SVG

The smallest repo that takes you from "what is a noise schedule" to a working classifier-free-guided sampler · without skipping the math.


Why this repo exists

Diffusion is the most consequential generative paradigm of the decade and most engineers know it as a black box behind a pipe(prompt) call. This repo derives it from scratch · forward process, score / ε-prediction parameterization, classifier-free guidance, DDIM sampler · and trains a small UNet on CIFAR-10 and CelebA, with sample grids checked in.

Status: scaffolding the noise schedule and forward process. First milestone: produce non-garbage CIFAR-10 samples.

The math (in one screen)

Forward process · gradually destroy the signal:

$$ q(x_t \mid x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}, x_{t-1},\ \beta_t I) \qquad\Rightarrow\qquad x_t = \sqrt{\bar\alpha_t}, x_0 + \sqrt{1-\bar\alpha_t}, \varepsilon $$

Reverse process · learn ε with a UNet, denoise step by step:

$$ \mathcal{L}(\theta) = \mathbb{E}_{x_0, \varepsilon, t} \left[, \lVert \varepsilon - \varepsilon_\theta(x_t, t) \rVert^2 ,\right] $$

Classifier-free guidance · at sample time, blend conditional and unconditional ε:

$$ \tilde\varepsilon_\theta(x_t, c) = (1+w),\varepsilon_\theta(x_t, c) - w,\varepsilon_\theta(x_t, \emptyset) $$

That's the whole game. The repo implements each line above with a pointer back to this section.

Pipeline

flowchart LR
    X0[x_0 image] -->|forward q| XT[x_t noisy]
    T[timestep t] --> N
    XT --> N[ε-pred UNet]
    C[class label c<br/>10% drop] --> N
    N --> EP[predicted ε]
    EP --> L[MSE loss]

    subgraph Sampling
        direction LR
        XS[x_T ~ N(0, I)] --> NS[ε-pred UNet]
        NS --> CFG[CFG blend]
        CFG --> RS[DDIM step]
        RS -.-> NS
        RS --> X0H[x_0]
    end
Loading

Stack

Component Choice Why
Noise schedule linear + cosine reproduce DDPM and Nichol & Dhariwal
Backbone UNet (GroupNorm + SiLU + sinusoidal time embed) canonical
Conditioning class-embedding + drop-prob 0.1 (CFG) enables guidance scale at sample time
Sampler ancestral DDPM + deterministic DDIM quality + speed dial
Datasets CIFAR-10, CelebA-HQ 64x64 small enough to iterate
Eval FID via clean-fid comparable to literature

Quickstart (coming soon)

# train DDPM on CIFAR-10 (single GPU, ~few hours)
uv run train.py --config configs/cifar10.yaml

# sample a 8x8 grid with CFG scale 3.0
uv run sample.py --ckpt out/cifar10/ema.pt --grid 8 --w 3.0

# DDIM 50-step sampling
uv run sample.py --ckpt out/cifar10/ema.pt --sampler ddim --steps 50

# eval FID against 50k real samples
uv run eval_fid.py --ckpt out/cifar10/ema.pt

Sample gallery (populated as runs complete)

Dataset Model Steps Sampler FID ↓ Grid
CIFAR-10 small UNet 1000 DDPM TBD TBD
CIFAR-10 small UNet 50 DDIM TBD TBD
CelebA 64 small UNet 1000 DDPM TBD TBD

Roadmap

  • Forward process + cosine/linear schedule + sanity-check forward animation
  • UNet backbone (GroupNorm, sinusoidal time embed, attention at low res)
  • DDPM training loop + EMA weights
  • DDPM ancestral sampler
  • DDIM sampler (deterministic, 50-step)
  • Classifier-free guidance + ablation over guidance scale w
  • FID eval harness
  • CIFAR-10 sample grid + checkpoint on HF
  • CelebA-HQ 64x64 sample grid + checkpoint on HF
  • Companion blog post deriving DDPM from the forward process

Inspiration & required reading


About

DDPM + classifier-free guidance + DDIM · built from scratch on CIFAR-10 and CelebA, with the math derived.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors