# Diffusion Models History

## 1. What Are Diffusion Models?
- A family of generative models used mainly for image generation.
- They learn by **adding noise** to images step by step, then **learning to reverse** this noise to recreate clean images.
- Modern text-to-image systems (Stable Diffusion, DALL-E 2, Midjourney, Imagen) use this technique.
- They outperform VAEs, GANs, and PixelCNN due to higher stability and better image quality.

---

## 2. Intuition Behind Diffusion Models
- Think of a drop of ink diffusing in water: structure slowly disappears into noise.
- Diffusion models imitate this:
  - Clean image → corrupted with noise → becomes pure Gaussian noise.
  - The model then learns the **reverse process**: noise → structure → final image.

---

## 3. Main Applications
- Image generation.
- Inpainting (restoring missing regions).
- Super-resolution.
- Extensions to audio, molecules, and 3D generation.

---

# History and Theory (Simplified)

## 4. Two Independent Origins

### A. Physics-Inspired Diffusion (Sohl-Dickstein et al., 2015)
- Modeled data corruption as physical diffusion.
- Transform data into noise, then reverse the process.
- Required modeling probability densities with normalization constants, which is computationally difficult.

### B. Score-Based Models (Song & Ermon, 2019)
- Modeled the **score**:  
  $$s_\theta(x) = \nabla_x \log p_\theta(x)$$
- Avoided computing normalization constants.
- Used Gaussian perturbations and Langevin dynamics.
- These ideas later unified with diffusion theory.

---

## 5. DDPMs (2020–2021): The Breakthrough
- Ho et al. (2020) introduced **Denoising Diffusion Probabilistic Models (DDPMs)**.
- Showed diffusion can match or outperform GANs.
- The model predicts noise instead of pixels.
- Dhariwal & Nichol (2021): “Diffusion Models Beat GANs”.

---

## 6. Latent Diffusion (Stable Diffusion, 2022)
- Instead of diffusing in pixel space, diffusion is performed in latent space.
- Reduces computation, enabling fast and efficient image generation.
- Allowed wide adoption in creative applications.

---

# How Diffusion Models Work (Simplified)

## 7. Three Main Stages
1. **Forward Process (Noise Addition)**  
   Adds Gaussian noise over \(T\) steps until the image becomes pure noise.

2. **Reverse Process (Learning to Denoise)**  
   A neural network learns how to remove noise step by step.

3. **Image Generation**  
   Start from random noise and iteratively denoise.

---

## 8. Forward Diffusion
- A Markov chain:  
  $$q(x_t \mid x_{t-1}) = \mathcal{N}(\sqrt{1-\beta_t}\, x_{t-1}, \beta_t I)$$
- Noise level controlled by a schedule \(\beta_t\).
- Using closed-form reparameterization:  
  $$x_t = \sqrt{\bar{\alpha}_t}\, x_0 + \sqrt{1-\bar{\alpha}_t}\, \epsilon$$  
  which allows jumping directly to any step \(t\).

---

## 9. Reverse Diffusion (Denoising)
- Model predicts noise \(\epsilon_\theta(x_t, t)\) instead of the clean image.
- Training objective:  
  $$L = \mathbb{E}\|\epsilon - \epsilon_\theta(x_t, t)\|^2$$
- Reconstruction step:  
  $$x_{t-1} = \frac{1}{\sqrt{1-\beta_t}}\left(x_t - \beta_t \frac{\epsilon_\theta(x_t, t)}{\sqrt{1-\bar{\alpha}_t}}\right) + \text{noise}$$

---

# Image Generation
- Start with random Gaussian noise:  
  $$x_T \sim \mathcal{N}(0, I)$$
- Apply reverse denoising for \(T\) steps to obtain \(x_0\).
- Fewer steps → faster but less detailed.  
- More steps → higher quality but slower.

---

# Guided Diffusion

## 10. Why Guidance?
- Users need control over the generated output.

### Types of Guidance
**Classifier Guidance:**  
Uses an external classifier; limited to predefined categories.

**Classifier-Free Guidance (Stable Diffusion):**  
- No external classifier.  
- Works with arbitrary text prompts.  
- Uses conditional and unconditional predictions:  
  $$\epsilon_{\text{guided}} = \epsilon_{\text{uncond}} + w(\epsilon_{\text{cond}} - \epsilon_{\text{uncond}})$$  
- \(w\) controls how strongly the text influences the generation.

---

# Latent Diffusion Models (LDMs)

## 11. Why Latent Diffusion?
- Pixel space is extremely large and costly.
- Latent space is compressed and efficient.

### Process
1. Image → Encoder → latent \(z\)
2. Diffusion happens in latent space: \(z_t\)
3. Final latent decoded into an image.

This is the core mechanism behind **Stable Diffusion**.

---

# Final Simple Summary
- Diffusion models generate data by **learning to reverse noise**.
- Inspired by physics and probabilistic modeling.
- DDPMs unified diffusion and score-based ideas into a powerful generative framework.
- Stable Diffusion introduced latent diffusion, enabling fast, accessible image generation.
- Guidance methods allow precise text-to-image control.
- Diffusion remains the leading technology in generative AI today.
