# Diffusion Models



These three techniques are the main reasons diffusion models have become practical enough for production use.

## 🔹 1. DDIM (Denoising Diffusion Implicit Models)

* **Problem it solves:** Standard diffusion requires hundreds or thousands of denoising steps → very slow generation.
* **Key idea:** Instead of sampling with the full stochastic process, DDIM reformulates the reverse diffusion as a **deterministic mapping**, skipping steps while preserving quality.
* **How it works:**

  * In regular diffusion, each step adds some randomness.
  * DDIM uses a non-Markovian formulation that allows you to **“jump” across steps deterministically**.
  * This means you can sample with **20–50 steps instead of 1000**.
* **Impact:** Much faster sampling, often with comparable image quality. Used in most real-world deployments of Stable Diffusion.

---

## 🔹 2. Distillation

* **Problem it solves:** Even with DDIM, inference is still slower than GANs (multiple steps needed).
* **Key idea:** Train a **smaller, faster “student” model** to mimic the outputs of the large, multi-step diffusion model.
* **Types of distillation in diffusion:**

  * **Progressive distillation:** Train a model that can do in *N/2* steps what the original did in *N*. Repeat until very few steps remain.
  * **One-step distillation:** Train a model to approximate the final output of a full diffusion run in a single forward pass.
* **Impact:** Sampling can become **as fast as GANs** while keeping diffusion’s stability and fidelity. Still an active research area.

---

## 🔹 3. Classifier-Free Guidance (CFG)

* **Problem it solves:** Diffusion models conditioned on prompts sometimes produce **weakly aligned outputs** (the image doesn’t match the text well).
* **Key idea:** Train the model with and without conditioning (e.g., with text prompt and with null prompt). At inference:

  * Generate two predictions (with prompt and without).
  * Combine them with a guidance weight:

    $$
    \hat{\epsilon} = \hat{\epsilon}_\text{uncond} + s \cdot (\hat{\epsilon}_\text{cond} - \hat{\epsilon}_\text{uncond})
    $$

    where $s$ is the **guidance scale** (usually 5–10).
* **Impact:** Strongly improves alignment between prompt and output (e.g., “a cat in a spacesuit” actually looks like one).
* **Trade-off:** Higher scale = better alignment, but less diversity (images start looking too similar).

---

## ✅ Why These Matter

* **DDIM → Faster inference** (practical for interactive apps).
* **Distillation → Near real-time generation** (closing gap with GANs).
* **Classifier-Free Guidance → Better controllability** (makes outputs useful for actual creative tasks).

Together, they turn diffusion models from a **research novelty** into **deployable systems** that power tools like Stable Diffusion, Midjourney, and Photoshop’s AI features.

