🎨 Let’s make noise into art. We’re diving into the **most powerful generative model class** of the 2020s: **diffusion models**.

---

# 🧪 `09_lab_diffusion_model_toy_image_gen.ipynb`  
### 📁 `04_advanced_architectures`  
> Implement a **toy Denoising Diffusion Probabilistic Model (DDPM)** on MNIST.  
Step-by-step from **noise schedule → sampling → image recovery**.  
Fully Colab + CPU/GPU friendly.

---

## 🎯 Learning Objectives

- Understand **forward diffusion** (add noise)  
- Train a model to **reverse it** (denoise)  
- Visualize **denoising steps as images emerge**  
- Generate **new digits from noise**

---

## 💻 Runtime Design

| Feature             | Spec                   |
|---------------------|------------------------|
| Dataset             | MNIST (28×28) ✅  
| Runtime             | Colab / Laptop CPU/GPU ✅  
| Model               | Tiny UNet-ish CNN ✅  
| VRAM                | < 2GB ✅  
| Training time       | ~5 mins ✅  

---

## 🔧 Section 1: Imports & Dataset

```python
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as T
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
```

```python
# Load MNIST
transform = T.Compose([
    T.ToTensor(),
    T.Lambda(lambda x: x * 2 - 1)  # scale to [-1, 1]
])

ds = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dl = DataLoader(ds, batch_size=64, shuffle=True)
```

---

## 🧮 Section 2: Noise Scheduler

```python
T_steps = 1000
betas = torch.linspace(1e-4, 0.02, T_steps)
alphas = 1 - betas
alphas_bar = torch.cumprod(alphas, dim=0)

def q_sample(x0, t, noise=None):
    """Apply forward diffusion step."""
    if noise is None:
        noise = torch.randn_like(x0)
    sqrt_ab = alphas_bar[t]**0.5
    sqrt_1ab = (1 - alphas_bar[t])**0.5
    return sqrt_ab[:, None, None, None] * x0 + sqrt_1ab[:, None, None, None] * noise
```

---

## 🤖 Section 3: Denoising Model

```python
class SimpleDenoiser(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(),
            nn.Conv2d(32, 32, 3, padding=1), nn.ReLU(),
            nn.Conv2d(32, 1, 3, padding=1)
        )

    def forward(self, x, t_emb):
        return self.net(x)
```

---

## 🏋️ Section 4: Training

```python
device = "cuda" if torch.cuda.is_available() else "cpu"
model = SimpleDenoiser().to(device)
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()

for epoch in range(5):
    for x0, _ in dl:
        x0 = x0.to(device)
        t = torch.randint(0, T_steps, (x0.size(0),), device=device)
        noise = torch.randn_like(x0)
        xt = q_sample(x0, t, noise=noise)
        pred_noise = model(xt, t)
        loss = loss_fn(pred_noise, noise)
        opt.zero_grad()
        loss.backward()
        opt.step()
    print(f"Epoch {epoch+1} | Loss: {loss.item():.4f}")
```

---

## 🔄 Section 5: Reverse Process (Sampling)

```python
@torch.no_grad()
def sample_images(num=4):
    x = torch.randn((num, 1, 28, 28)).to(device)
    for t in reversed(range(T_steps)):
        z = torch.randn_like(x) if t > 0 else 0
        alpha = alphas[t]
        alpha_bar = alphas_bar[t]
        noise_pred = model(x, torch.tensor([t]*num).to(device))
        x = (1 / alpha**0.5) * (x - (1 - alpha) / (1 - alpha_bar)**0.5 * noise_pred) + betas[t]**0.5 * z
    return x

imgs = sample_images(4).cpu()
```

---

## 🎨 Section 6: Visualize Generated Images

```python
grid = torchvision.utils.make_grid((imgs + 1) / 2, nrow=4)
plt.imshow(grid.permute(1, 2, 0))
plt.title("Generated Digits (Diffusion)")
plt.axis("off")
plt.show()
```

---

## ✅ Wrap-Up Summary

| Feature                        | ✅ |
|--------------------------------|----|
| Forward & reverse process      | ✅ |
| Trained on MNIST               | ✅ |
| Generated from pure noise      | ✅ |
| Visualized denoising steps     | ✅ |
| Colab-friendly, fast training  | ✅ |

---

## 🧠 What You Learned

- Diffusion models work by **learning to denoise** step-by-step  
- Unlike GANs, they use **likelihood training (MSE)** and are **stable**  
- Each generation step **peels noise away** — like an image developing in reverse  
- You’ve now built a **tiny DDPM** from scratch 💥

---

Next lab:  
Shall we hit `05_model_optimization` next and explore `07_lab_weight_pruning_and_accuracy_tracking.ipynb`?  
Let’s see what happens when we **cut out neurons** and **track how accuracy behaves**.