# Fashion Generative Studio
## A Complete Hands-On Project on Generative Models

---

## Project Title
**Fashion Generative Studio: Building and Comparing VAE, GANs Models**

---

## Project Overview

In this project, participants will build a complete **Generative AI Studio** capable of producing new fashion images inspired by the Fashion-MNIST dataset.

Students will implement and train three major families of generative models:

1. **Variational AutoEncoder (VAE)**
2. **Generative Adversarial Network (GAN / DCGAN)**

All models will be trained on the same dataset and evaluated using the same metrics, enabling a fair and educational comparison.

The goal is not only to generate images, but to deeply understand how each generative paradigm learns, how stable it is during training, and how its outputs differ in quality and diversity.

---

## Learning Objectives

By completing this project, students will:

- Understand the concept of **latent representations**
- Learn how **probabilistic encoders and decoders** work
- Understand **adversarial training dynamics**
- Gain experience with **training stability challenges**
- Learn how to **evaluate generative models quantitatively**
- Build a complete reproducible generative AI pipeline

---

## Dataset

**Fashion-MNIST**

- 70,000 grayscale images
- Image size: 28 × 28
- 10 clothing categories
- Well-suited for rapid experimentation
- Visually interpretable for comparison

---

## Project Structure

The project is divided into five main phases:

1. Environment and Data Preparation
2. Variational AutoEncoder (VAE)
3. Generative Adversarial Network (GAN)
4. Evaluation and Final Comparison
5. Generative Playground (Final Demo)

---


### Fashion-MNIST Label Map
| Label | Clothing Item |
|--------|--------------|
| 0 | T-shirt / Top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |




# Phase 1 — Environment and Data Preparation

In [None]:
# Phase 1: Environment and Data Preparation
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, utils
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
import os

# Create output directory
os.makedirs('studio_outputs', exist_ok=True)

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Data preparation
batch_size = 128

# VAE uses [0, 1] normalization
transform_vae = transforms.Compose([
    transforms.ToTensor(),
])

# GAN uses [-1, 1] normalization
transform_gan = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load Fashion-MNIST
train_dataset_vae = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform_vae)
train_dataset_gan = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform_gan)

train_loader_vae = DataLoader(train_dataset_vae, batch_size=batch_size, shuffle=True)
train_loader_gan = DataLoader(train_dataset_gan, batch_size=batch_size, shuffle=True)

# Visualize real samples
real_batch, _ = next(iter(train_loader_vae))
grid = utils.make_grid(real_batch[:64], nrow=8, normalize=True)
plt.figure(figsize=(8,8))
plt.imshow(np.transpose(grid), cmap='gray')
plt.title("Real Samples")
plt.axis('off')
plt.show()



### Goal
Establish a unified environment and data pipeline for all models.

### Steps

1. Install required libraries (Python, PyTorch, Torchvision)
2. Download the Fashion-MNIST dataset
3. Convert images to tensors
4. Normalize data:
   - VAE: values in [0, 1]
   - GAN & Diffusion: values in [-1, 1]
5. Create DataLoaders with fixed batch size
6. Save a reference grid of real images for later comparison

### Deliverables

- Functional DataLoader
- Saved grid of real Fashion-MNIST samples

---




# Phase 2 — Variational AutoEncoder (VAE)

In [None]:
# Phase 2: Variational AutoEncoder (VAE) Implementation
class VAE(nn.Module):
    def __init__(self, latent_dim=20):
        super(VAE, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(784, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
        )
        self.fc_mu = nn.Linear(256, latent_dim)
        self.fc_logvar = nn.Linear(256, latent_dim)
        
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 784),
            nn.Sigmoid()
        )

    def encode(self, x):
        h = self.encoder(x.view(-1, 784))
        return self.fc_mu(h), self.fc_logvar(h)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std

    def decode(self, z):
        return self.decoder(z).view(-1, 1, 28, 28)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

def vae_loss(recon_x, x, mu, logvar):
    BCE = nn.functional.binary_cross_entropy(recon_x.view(-1, 784), x.view(-1, 784), reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD

vae_latent_dim = 20
vae_model = VAE(vae_latent_dim).to(device)
vae_optimizer = optim.Adam(vae_model.parameters(), lr=1e-3)

# Training loop
epochs = 10
for epoch in range(epochs):
    vae_model.train()
    total_loss = 0
    for data, _ in train_loader_vae:
        data = data.to(device)
        vae_optimizer.zero_grad()
        recon, mu, logvar = vae_model(data)
        loss = vae_loss(recon, data, mu, logvar)
        loss.backward()
        vae_optimizer.step()
        total_loss += loss.item()
    print(f"VAE Epoch {epoch+1}, Avg Loss: {total_loss/len(train_loader_vae.dataset):.4f}")

# Visualization
vae_model.eval()
with torch.no_grad():
    z = torch.randn(64, vae_latent_dim).to(device)
    samples = vae_model.decode(z).cpu()
    grid = utils.make_grid(samples, nrow=8, normalize=True)
    plt.figure(figsize=(8,8))
    plt.imshow(np.transpose(grid), cmap='gray')
    plt.title("VAE Generated Samples")
    plt.axis('off')
    plt.show()



## Goal
Learn structured latent representations and image reconstruction.

## Concept Summary

A VAE consists of:

- **Encoder**: maps input image to mean and variance vectors
- **Reparameterization Trick**: samples latent vector from distribution
- **Decoder**: reconstructs image from latent vector

## Steps

1. Build an Encoder network that outputs mean and variance
2. Apply reparameterization to sample latent vector
3. Build a Decoder network that reconstructs images
4. Define the VAE loss function:
   - Reconstruction Loss
   - KL Divergence
5. Train the VAE model on Fashion-MNIST
6. Visualize reconstructed images vs original
7. Generate new samples from random latent vectors
8. Explore latent space by varying single latent dimensions

## Deliverables

- Reconstruction comparison grid
- Generated image samples
- Latent traversal visualization

---



# Phase 3 — Generative Adversarial Network (GAN)

In [None]:
# Phase 3: Generative Adversarial Network (GAN) Implementation
nz = 100
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(nz, 256, 4, 1, 0, bias=False),
            nn.BatchNorm2d(256),
            nn.ReLU(True),
            nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(True),
            nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.ConvTranspose2d(64, 1, 4, 2, 3, bias=False),
            nn.Tanh()
        )
    def forward(self, input):
        return self.main(input)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(1, 64, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(128, 256, 7, 1, 0, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(256, 1, 1, 1, 0, bias=False),
            nn.Sigmoid()
        )
    def forward(self, input):
        return self.main(input).view(-1, 1)

netG = Generator().to(device)
netD = Discriminator().to(device)
criterion = nn.BCELoss()
optimizerD = optim.Adam(netD.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=0.0002, betas=(0.5, 0.999))

# Training loop
epochs = 15
for epoch in range(epochs):
    for i, (data, _) in enumerate(train_loader_gan):
        netD.zero_grad()
        real = data.to(device)
        b_size = real.size(0)
        label = torch.full((b_size,), 1.0, device=device)
        output = netD(real).view(-1)
        errD_real = criterion(output, label)
        errD_real.backward()
        
        noise = torch.randn(b_size, nz, 1, 1, device=device)
        fake = netG(noise)
        label.fill_(0.0)
        output = netD(fake.detach()).view(-1)
        errD_fake = criterion(output, label)
        errD_fake.backward()
        optimizerD.step()
        
        netG.zero_grad()
        label.fill_(1.0)
        output = netD(fake).view(-1)
        errG = criterion(output, label)
        errG.backward()
        optimizerG.step()
    print(f"GAN Epoch {epoch+1}, Loss_D: {errD_real+errD_fake:.4f}, Loss_G: {errG:.4f}")

# Visualization
netG.eval()
with torch.no_grad():
    fixed_noise = torch.randn(64, nz, 1, 1, device=device)
    fake = netG(fixed_noise).cpu()
    grid = utils.make_grid(fake, nrow=8, normalize=True)
    plt.figure(figsize=(8,8))
    plt.imshow(np.transpose(grid), cmap='gray')
    plt.title("GAN Generated Samples")
    plt.axis('off')
    plt.show()



## Goal
Learn image generation through adversarial training.

## Concept Summary

A GAN consists of:

- **Generator**: produces fake images from random noise
- **Discriminator**: distinguishes real images from fake ones
- Both networks play a minimax adversarial game

## Steps

1. Build the Generator network
2. Build the Discriminator network
3. Define adversarial loss function
4. Train the Discriminator on real and fake images
5. Train the Generator to fool the Discriminator
6. Monitor training stability and losses
7. Save generated samples at regular intervals
8. Detect and discuss mode collapse if it occurs

## Deliverables

- Progressive generation GIF
- Generator and Discriminator loss curves
- Final generated image grid

---



# Phase 4 — Evaluation and Comparison

In [None]:
# Phase 4: Evaluation and Comparison
class Evaluator(nn.Module):
    def __init__(self):
        super(Evaluator, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 32, 3), nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3), nn.ReLU(),
            nn.Flatten(),
            nn.Linear(64*11*11, 10)
        )
    def forward(self, x):
        return self.conv(x)

eval_net = Evaluator().to(device)
eval_opt = optim.Adam(eval_net.parameters(), lr=0.001)
eval_crit = nn.CrossEntropyLoss()

# Train evaluator on real data
for epoch in range(5):
    for data, target in train_loader_vae:
        data, target = data.to(device), target.to(device)
        eval_opt.zero_grad()
        loss = eval_crit(eval_net(data), target)
        loss.backward()
        eval_opt.step()
print("Evaluator trained.")

def get_stats(model, model_type='vae'):
    eval_net.eval()
    with torch.no_grad():
        if model_type == 'vae':
            z = torch.randn(1000, vae_latent_dim).to(device)
            samples = model.decode(z)
        else:
            z = torch.randn(1000, nz, 1, 1, device=device)
            samples = (model(z) + 1) / 2
        
        logits = eval_net(samples)
        probs = torch.softmax(logits, dim=1)
        max_probs, _ = torch.max(probs, dim=1)
        return max_probs.mean().item()

vae_conf = get_stats(vae_model, 'vae')
gan_conf = get_stats(netG, 'gan')

print(f"VAE Avg Confidence: {vae_conf:.4f}")
print(f"GAN Avg Confidence: {gan_conf:.4f}")



## Goal
Quantitatively and visually compare all models.

## Steps

1. Train a simple classifier on real Fashion-MNIST
2. Generate large image sets from each model (VAE,GANs)
3. Compute evaluation metrics:
   - Fréchet Inception Distance (FID)
   - Sample diversity per class
4. Measure training and sampling time
5. Create a comparison table

## Deliverables

- Evaluation report
- Quantitative comparison table
- Visual comparison gallery

---



# Phase 5 — Generative Playground (Final Demo)

In [None]:
# Phase 5: Generative Playground Demo
import ipywidgets as widgets
from IPython.display import display

def preview(model_type, seed):
    torch.manual_seed(seed)
    with torch.no_grad():
        if model_type == 'VAE':
            z = torch.randn(1, vae_latent_dim).to(device)
            img = vae_model.decode(z).view(28, 28).cpu().numpy()
        else:
            z = torch.randn(1, nz, 1, 1, device=device)
            img = netG(z).view(28, 28).cpu().numpy()
            img = (img + 1) / 2
    
    plt.imshow(img, cmap='gray')
    plt.title(f"{model_type} (Seed {seed})")
    plt.axis('off')
    plt.show()

model_drop = widgets.Dropdown(options=['VAE', 'GAN'], description='Model:')
seed_input = widgets.IntSlider(min=0, max=999, description='Seed:')
ui = widgets.HBox([model_drop, seed_input])
out = widgets.interactive_output(preview, {'model_type': model_drop, 'seed': seed_input})
display(ui, out)



## Goal
Build an interactive interface that allows real-time generation and comparison of images from VAE, GAN models.
This phase transforms the trained models into a live demonstration tool, enabling users to explore generative behavior through simple controls.

---
## Concept Summary

The Generative Playground is a lightweight application where users can:

- Select a generative model (VAE, GAN)
- Adjust latent or noise parameters
- Change random seed values
- Instantly visualize newly generated samples

This creates a practical showcase of how different generative paradigms respond to input changes.
---

## Interface Requirements

The interface should provide:
- Model selection dropdown (VAE / GAN)
- Seed input field
- Latent slider (for VAE exploration)
- Generate button
- Image grid output area
---



## Implementation Options

### Option 1 — Streamlit App
- Simple web-based interface
- Runs locally or on cloud
- Interactive sliders and buttons
- Automatic refresh of generated samples

### Option 2 — Interactive Notebook

- Jupyter widgets for controls
- Inline image display
- Suitable for classroom demonstrations

---

## Functional Workflow

1. Load trained model checkpoints
2. Wait for user selection
3. Generate latent/noise vector using chosen seed
4. Run sampling pipeline
5. Display generated images immediately

---

## User Interactions

- Changing seed → produces different random samples
- Changing latent slider → explores VAE latent space
---
# Final Presentation

Each team presents:

- Explanation of how each model learns
- Challenges faced during training
- Comparison of outputs
- Key lessons learned

---

# Expected Outcomes

After completing this project, students will be able to:

- Understand modern generative AI systems
- Implement and train generative models
- Diagnose training instability
- Evaluate generative models objectively
- Transition confidently to large-scale diffusion models such as Stable Diffusion

---

# Project Value

This project bridges theory and practice in Generative AI, giving students strong foundations to understand and build real-world generative systems.

---