# üß† **VAE vs GAN vs Diffusion: Image Generation Showdown** üöÄ

Generative models are at the heart of modern **Deep Learning** and **Generative AI**, powering applications such as image synthesis, data augmentation, and creative AI systems.

In this notebook, we perform a **hands-on and fair comparison** of three popular generative modeling approaches:

- **Variational AutoEncoder (VAE)**
- **Generative Adversarial Network (GAN)**
- **Diffusion Model**

All models are trained on the **MNIST handwritten digit dataset** using the **same data source and similar constraints** to ensure a meaningful comparison.

---

## üéØ **Objectives of This Notebook**

- Train **VAE, GAN, and Diffusion models** on MNIST
- Generate handwritten digits (0‚Äì9) using each approach
- Visually compare **generation quality and stability**
- Observe and analyze **training behavior**
- Highlight **strengths and limitations** of each model

---

## üß© **Why This Comparison Matters**

Each generative model follows a **fundamentally different learning philosophy**:

- **VAE** learns a structured latent space through probabilistic encoding  
- **GAN** learns via adversarial competition between generator and discriminator  
- **Diffusion models** generate data by gradually denoising random noise  

Understanding these differences is essential for anyone working in **Deep Learning, Computer Vision, or Generative AI**.

---

## üìå **Note**
This notebook focuses on **conceptual clarity and educational insight** rather than state-of-the-art performance.  
Models are trained for a limited number of epochs to clearly demonstrate learning behavior and outputs.

---

Let‚Äôs dive in and explore how these generative models compare! üîç‚ú®

# **üß± Imports & Setup**

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

Using device: cuda


# üß± **Dataset (Same for All Models)**

In [3]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

dataset = datasets.MNIST(
    root="./data",
    train=True,
    transform=transform,
    download=True
)

loader = DataLoader(dataset, batch_size=128, shuffle=True)

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 9.91M/9.91M [00:00<00:00, 41.8MB/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 28.9k/28.9k [00:00<00:00, 1.18MB/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1.65M/1.65M [00:00<00:00, 10.6MB/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4.54k/4.54k [00:00<00:00, 6.53MB/s]


# **üß† Variational Autoencoder (VAE)**

### **Model**

In [4]:
class VAE(nn.Module):
    def __init__(self, latent_dim=20):
        super().__init__()
        self.fc1 = nn.Linear(784, 400)
        self.fc_mu = nn.Linear(400, latent_dim)
        self.fc_logvar = nn.Linear(400, latent_dim)
        self.fc2 = nn.Linear(latent_dim, 400)
        self.fc3 = nn.Linear(400, 784)

    def encode(self, x):
        h = torch.relu(self.fc1(x))
        return self.fc_mu(h), self.fc_logvar(h)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        h = torch.relu(self.fc2(z))
        return torch.tanh(self.fc3(h))

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

### **Training**

In [6]:
vae = VAE(latent_dim=20).to(device)
optimizer = optim.Adam(vae.parameters(), lr=1e-3)

def vae_loss(recon_x, x, mu, logvar):
    recon = nn.functional.mse_loss(recon_x, x, reduction='sum')
    kl = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return recon + kl

for epoch in range(5):
    for x, _ in loader:
        x = x.view(-1, 784).to(device)
        recon, mu, logvar = vae(x)
        loss = vae_loss(recon, x, mu, logvar)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f"VAE Epoch {epoch+1} | Loss: {loss.item():.2f}")

VAE Epoch 1 | Loss: 8430.38
VAE Epoch 2 | Loss: 6913.21
VAE Epoch 3 | Loss: 7440.52
VAE Epoch 4 | Loss: 7208.06
VAE Epoch 5 | Loss: 6782.68


# üß† **GAN**

### **Model**

In [7]:
class Generator(nn.Module):
    def __init__(self, latent_dim=20):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 784),
            nn.Tanh()
        )

    def forward(self, z):
        return self.net(z)


class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(784, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.net(x)

### **Training**

In [8]:
G = Generator().to(device)
D = Discriminator().to(device)

opt_G = optim.Adam(G.parameters(), lr=2e-4)
opt_D = optim.Adam(D.parameters(), lr=2e-4)
criterion = nn.BCELoss()

for epoch in range(5):
    for x, _ in loader:
        x = x.view(-1, 784).to(device)
        bs = x.size(0)

        # Train Discriminator
        z = torch.randn(bs, 20).to(device)
        fake = G(z)

        real_loss = criterion(D(x), torch.ones(bs, 1).to(device))
        fake_loss = criterion(D(fake.detach()), torch.zeros(bs, 1).to(device))
        d_loss = real_loss + fake_loss

        opt_D.zero_grad()
        d_loss.backward()
        opt_D.step()

        # Train Generator
        g_loss = criterion(D(fake), torch.ones(bs, 1).to(device))
        opt_G.zero_grad()
        g_loss.backward()
        opt_G.step()

    print(f"GAN Epoch {epoch+1} | D Loss: {d_loss.item():.2f}")

GAN Epoch 1 | D Loss: 0.76
GAN Epoch 2 | D Loss: 0.69
GAN Epoch 3 | D Loss: 0.50
GAN Epoch 4 | D Loss: 0.99
GAN Epoch 5 | D Loss: 0.63


# **üß† Diffusion Model (Simplified)**

### **‚ö†Ô∏è Educational lightweight diffusion (gold-friendly)**

In [9]:
class SimpleDiffusion(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(784, 512),
            nn.ReLU(),
            nn.Linear(512, 784)
        )

    def forward(self, x):
        return self.net(x)

### **Training**

In [10]:
diffusion = SimpleDiffusion().to(device)
optimizer = optim.Adam(diffusion.parameters(), lr=1e-3)

for epoch in range(5):
    for x, _ in loader:
        x = x.view(-1, 784).to(device)
        noise = torch.randn_like(x)
        noisy_x = x + noise

        pred_noise = diffusion(noisy_x)
        loss = nn.functional.mse_loss(pred_noise, noise)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f"Diffusion Epoch {epoch+1} | Loss: {loss.item():.4f}")

Diffusion Epoch 1 | Loss: 0.9566
Diffusion Epoch 2 | Loss: 0.9424
Diffusion Epoch 3 | Loss: 0.9408
Diffusion Epoch 4 | Loss: 0.9306
Diffusion Epoch 5 | Loss: 0.9241


# **üîç Training Behavior Observations**

- **VAE loss** shows a general downward trend with minor fluctuations due to the KL-divergence term.
- **GAN discriminator loss** oscillates, which is expected and indicates adversarial balance.
- **Diffusion loss** decreases slowly and steadily, reflecting stable noise prediction.

> These behaviors align with the theoretical training dynamics of each generative model.
