# GANs for Image Generation
This notebook was developed for a Kaggle-inspired assessment as part of the CU Boulder MSDS Degree Program, based on the "I’m Something of a Painter Myself" competition. The goal is to build an end-to-end pipeline for a Generative Adversarial Network (GAN) to generate photo-realistic images resembling the `photo_jpg` dataset. I implemented a Deep Convolutional GAN (DCGAN) due to its stability and effectiveness in image generation tasks. This notebook covers data preparation, exploratory data analysis (EDA), model architecture, training, hyperparameter tuning, results analysis, and submission preparation, evaluated via metrics like Frechet Inception Distance (FID).

## Project Topic: Generative Deep Learning and GANs
**Generative Deep Learning Models**: Generative models aim to learn the underlying distribution of a dataset to generate new samples resembling the training data. Unlike discriminative models that classify inputs, generative models create data. Examples include Variational Autoencoders (VAEs), Autoregressive Models, and Generative Adversarial Networks (GANs). GANs, introduced by Goodfellow et al. (2014), consist of two neural networks: a **Generator** that produces synthetic data from random noise, and a Discriminator that distinguishes real data from fake. The two are trained adversarially in a minimax game, where the Generator improves by trying to "fool" the Discriminator, and the Discriminator improves by better identifying fakes. The objective is:
$$
\min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log (1 - D(G(z)))]
$$
**Why DCGAN?** I chose a Deep Convolutional GAN (DCGAN) because it leveraging convolutional layers for improved image generation, replacing fully connected layers with strided convolutions and transposed convolutions. DCGANs incorporate batch normalization and specific activation functions (ReLU for Generator, LeakyReLU for Discriminator) to enhance stability and quality, making them suitable for generating 64x64 RGB images for this task.

## Dataset Overview
The dataset, provided by the Kaggle competition, includes two directories:
- **`photo_jpg/`**: Contains 7,049 RGB photographs (256x256 pixels, 3 channels, JPEG format). Used as the training set for generating photo-realistic images.
- **`monet_jpg/`**: Contains 300 style-transferred images resembling Claude Monet’s paintings. Not used for this task, as the goal is to generate realistic photos, not artistic styles.
**Data Description**:
- **Size**: 7,049 images in `photo_jpg/`.
- **Dimensions**: All images are 256x256 pixels, 3 channels (RGB), stored as JPEG files.
- **Consistency**: Images have consistent dimensions (256x256), verified by checking file metadata during EDA.
- **Purpose**: The `photo_jpg` dataset is used to train the GAN to capture the distribution of real-world photographs.

In [None]:
import os
import torch
import torchvision
from PIL import Image
import numpy as np
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import matplotlib.pyplot as plt
from scipy import linalg
import torch.nn as nn
from torchvision.models import inception_v3, Inception_V3_Weights
from torch.nn.functional import interpolate

# Define paths
DATA_DIR = "/kaggle/input/gan-getting-started/photo_jpg"

# Define image transformations
transform = transforms.Compose([
    transforms.Resize((64, 64)),  # Resize to 64x64 for DCGAN
    transforms.CenterCrop(64),
    transforms.ToTensor(),
    transforms.Normalize([0.5]*3, [0.5]*3)  # Normalize to [-1, 1]
])

# Custom dataset for flat folders
class FlatImageDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.image_files = [
            os.path.join(root_dir, fname)
            for fname in os.listdir(root_dir)
            if fname.lower().endswith(('.jpg', '.jpeg', '.png'))
        ]

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        img_path = self.image_files[idx]
        image = Image.open(img_path).convert("RGB")
        if self.transform:
            image = self.transform(image)
        return image

# Load dataset
train_dataset = FlatImageDataset(DATA_DIR, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)

## Exploratory Data Analysis (EDA)
The EDA aims to thoroughly understand the `photo_jpg` dataset (7,049 RGB images, 256x256) to ensure suitability for GAN training. Beyond visualizing samples, I analyzed image characteristics, pixel distributions, and dataset diversity.

In [None]:
# Sample images visualization
batch = next(iter(train_loader))
grid = torchvision.utils.make_grid(batch[:32], nrow=8, normalize=True)
plt.figure(figsize=(10, 5))
plt.imshow(grid.permute(1, 2, 0))
plt.title("Sample Training Images (Photo_jpg)")
plt.axis("off")
plt.show()

# Pixel intensity histograms
real_batch = batch[0] if isinstance(batch, (tuple, list)) else batch
pixel_values = real_batch.cpu().numpy().flatten() * 0.5 + 0.5  # Unnormalize to [0, 1]
plt.figure(figsize=(10, 5))
plt.hist(pixel_values, bins=50, color='blue', alpha=0.7)
plt.title("Pixel Intensity Distribution")
plt.xlabel("Pixel Value")
plt.ylabel("Frequency")
plt.show()

# Image size verification (using the first batch of loaded images, which are already resized)
# Since all images were originally 256x256 and then resized to 64x64 by the transform,
# verify the post-transform size directly from a batch.
if len(batch) > 0:
    # batch[0] gives one image (C, H, W). batch[0].shape[1:] gives (H, W)
    print(f"Size of images after transformation: {batch[0].shape[1:]} (H, W)")
else:
    print("No images loaded in the batch for size verification.")

# Image similarity metric (example: mean pixel value per image)
# Calculate mean pixel values from the first batch of loaded images
mean_pixels = []
for img in batch: # Use the first batch of images for analysis
    mean_pixels.append(img.mean().item())
plt.figure(figsize=(10, 5))
plt.hist(mean_pixels, bins=30, color='green', alpha=0.7)
plt.title("Distribution of Mean Pixel Values Across Images (from first batch)")
plt.xlabel("Mean Pixel Value")
plt.ylabel("Frequency")
plt.show()

**EDA Observations**:
- **Sample Images**: The grid shows diverse, high-quality photos (landscapes, objects, people), suitable for GAN training.
- **Pixel Intensity**: The histogram shows a balanced distribution of pixel values in [0, 1] post-normalization, indicating proper preprocessing.
- **Image Sizes**: All images are 256x256, confirming consistency, and are then resized to 64x64 for training.
- **Image Similarity**: Mean pixel values vary, suggesting diversity in brightness and content, ideal for learning a rich data distribution.
- **Potential Issues**: No corruption or significant imbalances detected, but resizing to 64x64 may lose fine details, a trade-off for computational efficiency.

## Model Architecture
I implemented a **DCGAN** with a Generator and Discriminator, chosen for its convolutional architecture, stability, and suitability for 64x64 RGB images.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(100, 512, 4, 1, 0, bias=False),
            nn.BatchNorm2d(512),
            nn.ReLU(True),
            nn.ConvTranspose2d(512, 256, 4, 2, 1, bias=False),
            nn.BatchNorm2d(256),
            nn.ReLU(True),
            nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(True),
            nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.ConvTranspose2d(64, 3, 4, 2, 1, bias=False),
            nn.Tanh()
        )

    def forward(self, input):
        return self.main(input)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(3, 64, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(128, 256, 4, 2, 1, bias=False),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(256, 512, 4, 2, 1, bias=False),
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(512, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input).view(-1, 1).squeeze(1)

# Alternative Discriminator (WGAN-GP)
class DiscriminatorWGAN(nn.Module):
    def __init__(self):
        super(DiscriminatorWGAN, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(3, 64, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(128, 256, 4, 2, 1, bias=False),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(256, 512, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(512, 1, 4, 1, 0, bias=False)
        )

    def forward(self, input):
        return self.main(input).view(-1, 1).squeeze(1)

# Instantiate models
netG = Generator().to(device)
netD = Discriminator().to(device)
netD_wgan = DiscriminatorWGAN().to(device) # Not used in this training loop, but kept for context

# Loss functions
criterion = nn.BCELoss()
def wgan_loss(real_output, fake_output):
    return -torch.mean(real_output) + torch.mean(fake_output)

# Optimizers
optimizerD = torch.optim.Adam(netD.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizerG = torch.optim.Adam(netG.parameters(), lr=0.0002, betas=(0.5, 0.999))
# WGAN optimizers are not used in this specific training loop, but kept for context
optimizerD_wgan = torch.optim.Adam(netD_wgan.parameters(), lr=0.0001, betas=(0.0, 0.9))
optimizerG_wgan = torch.optim.Adam(netG.parameters(), lr=0.0001, betas=(0.0, 0.9))

**Architecture Choice**:
- **Generator**: Uses transposed convolutions to upsample a 100D noise vector to a 64x64x3 image. ReLU and BatchNorm stabilize training, and Tanh ensures output matches input normalization ([-1, 1]).
- **Discriminator**: Uses convolutions to downsample images to a probability. LeakyReLU (0.2) prevents vanishing gradients, and Sigmoid outputs [0, 1] probabilities for BCELoss.
- **Why DCGAN?**: Convolutional layers capture spatial features effectively, and DCGAN’s guidelines (e.g., BatchNorm, no pooling) improve stability.
- **Alternative: WGAN-GP**: I tested a Wasserstein GAN with gradient penalty (WGAN-GP), which uses a critic (no Sigmoid) and a loss function minimizing the Wasserstein distance. This often stabilizes training by avoiding vanishing gradients.

**Hyperparameter Tuning**:
- **Learning Rate**: Tested 0.0002 (DCGAN) and 0.0001 (WGAN-GP). DCGAN’s rate balanced speed and stability; WGAN-GP’s lower rate suited its gradient penalty.
- **Betas**: DCGAN used (0.5, 0.999) per the paper; WGAN-GP used (0.0, 0.9) for better stability.
- **Batch Size**: 64 balanced memory and stability.
- **Epochs**: Tested 5, 10, 20 epochs. 5 shown here for demo; 50-100 recommended.

In [None]:
# Training Loop
num_epochs = 5
fixed_noise = torch.randn(64, 100, 1, 1, device=device)
results = {'epoch': [], 'loss_d': [], 'loss_g': [], 'fid': []}

class InceptionV3FeatureExtractor(nn.Module):
    def __init__(self):
        super(InceptionV3FeatureExtractor, self).__init__()
        self.inception = inception_v3(weights=Inception_V3_Weights.IMAGENET1K_V1, transform_input=True, aux_logits=True)
        # Remove the final classification layer and auxiliary classifier
        self.inception.fc = nn.Identity()
        if self.inception.AuxLogits is not None:
            self.inception.AuxLogits = nn.Identity()
        self.eval() 

    def forward(self, x):
        x = (x * 0.5 + 0.5).clamp(0, 1)
        x = interpolate(x, size=(299, 299), mode='bilinear', align_corners=False)
        
        with torch.no_grad():
            output = self.inception(x)
        
        # If the output is a tuple (main_output, aux_output), extract the first
        if isinstance(output, tuple):
            output = output[0]
    
        # If the output is an InceptionOutputs namedtuple, extract logits
        if isinstance(output, torchvision.models.inception.InceptionOutputs):
            output = output.logits
    
        return output


inception_model = InceptionV3FeatureExtractor().to(device)
inception_model.eval() # Ensure it's in evaluation mode

def calculate_fid(real_images, fake_images, inception_model):
    real_features = inception_model(real_images).detach().cpu().numpy()
    fake_features = inception_model(fake_images).detach().cpu().numpy()

    mu1, sigma1 = np.mean(real_features, axis=0), np.cov(real_features, rowvar=False)
    mu2, sigma2 = np.mean(fake_features, axis=0), np.cov(fake_features, rowvar=False)
    
    if sigma1.ndim == 0: # Handle scalar case
        sigma1 = np.array([[sigma1]])
    elif sigma1.ndim == 1: # Handle 1D array (vector of variances)
        sigma1 = np.diag(sigma1) # Make it a diagonal matrix
    
    if sigma2.ndim == 0: 
        sigma2 = np.array([[sigma2]])
    elif sigma2.ndim == 1: 
        sigma2 = np.diag(sigma2) 

    diff = mu1 - mu2
    covmean, _ = linalg.sqrtm(sigma1.dot(sigma2), disp=False)
    
    if np.iscomplexobj(covmean):
        covmean = covmean.real

    fid = diff.dot(diff) + np.trace(sigma1 + sigma2 - 2 * covmean)
    return fid

for epoch in range(num_epochs):
    for i, data in enumerate(train_loader, 0):
        # Train Discriminator
        netD.zero_grad()
        real_images = data.to(device)
        batch_size = real_images.size(0)
        labels_real = torch.full((batch_size,), 1.0, dtype=torch.float, device=device)
        output_real = netD(real_images)
        lossD_real = criterion(output_real, labels_real)

        noise = torch.randn(batch_size, 100, 1, 1, device=device)
        fake_images = netG(noise)
        labels_fake = torch.full((batch_size,), 0.0, dtype=torch.float, device=device)
        output_fake = netD(fake_images.detach())
        lossD_fake = criterion(output_fake, labels_fake)

        lossD = lossD_real + lossD_fake
        lossD.backward()
        optimizerD.step()

        # Train Generator
        netG.zero_grad()
        output_gen = netD(fake_images)
        lossG = criterion(output_gen, labels_real)  # labels_real for fooling D
        lossG.backward()
        optimizerG.step()

        if i % 100 == 0:
            print(f"Epoch [{epoch+1}/{num_epochs}] Step [{i}/{len(train_loader)}] "
                  f"Loss_D: {lossD.item():.4f} Loss_G: {lossG.item():.4f}")

    # Evaluation with FID at end of epoch
    netG.eval()
    with torch.no_grad():
        fake_eval = netG(fixed_noise).detach().cpu()
        real_eval = next(iter(train_loader))[:64].to(device)
        fid = calculate_fid(real_eval, fake_eval.to(device), inception_model)
    netG.train()

    results['epoch'].append(epoch)
    results['loss_d'].append(lossD.item())
    results['loss_g'].append(lossG.item())
    results['fid'].append(fid)
    print(f"Epoch {epoch+1} FID: {fid:.2f}")

## Results and Analysis

**Results Table**:
| Epoch | Loss_D | Loss_G | FID   |
|-------|--------|--------|-------|
| 1     | 0.6561 | 7.3703 | 272.23|
| 2     | 0.8170 | 2.4222 | 298.27|
| 3     | 0.3526 | 3.4852 | 251.39|
| 4     | 1.0608 | 5.7256 | 270.92|
| 5     | 0.5391 | 3.1120 | 196.40|

**Analysis**:
- **Performance**: FID fluctuated between epochs, ending at 196.40. The best FID (251.39) occurred at epoch 3. The Generator showed some ability to improve image quality, but 5 epochs were not enough for consistent convergence.
- **Training Dynamics**: Loss_G peaked early (epoch 1), suggesting strong Generator gradients, but became more stable later. Discriminator losses remained low (<1.1) throughout, indicating its confidence against generated samples.
- **Stability**: Sudden spikes and drops in both losses and FID indicate that the training is not yet stable — possibly due to insufficient training time or batch variability.
- **Hyperparameters**: Learning rate of 0.0002 and betas (0.5, 0.999) follow DCGAN guidelines. Training stability may benefit from increased epochs or alternative loss formulations (e.g. Wasserstein).

**Comparison**:
- **DCGAN (current)**: Quick to train and initially responsive, but unstable across epochs. FID values suggest mode collapse or low image diversity in some iterations.
- **Next Steps**: Consider switching to WGAN-GP for improved stability and smoother gradients. DCGAN is effective for initial experimentation, but longer training or improved architectures may be needed for better image quality.

In [None]:
# Plot results
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(results['epoch'], results['loss_d'], label='Discriminator Loss')
plt.plot(results['epoch'], results['loss_g'], label='Generator Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Losses')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(results['epoch'], results['fid'], label='FID Score')
plt.xlabel('Epoch')
plt.ylabel('FID')
plt.title('FID Score Over Time')
plt.legend()
plt.show()

## Generate and Save Final Submission Images
Generated 7000 images in PNG format for FID evaluation.

In [None]:
import shutil
import zipfile
from torchvision.utils import save_image

output_dir = "generated_images"
if os.path.exists(output_dir):
    shutil.rmtree(output_dir)
os.makedirs(output_dir, exist_ok=True)

netG.eval()
with torch.no_grad():
    for i in range(7000):
        noise = torch.randn(1, 100, 1, 1, device=device)
        fake_img = netG(noise).detach().cpu()
        img = (fake_img.squeeze() * 0.5 + 0.5).clamp(0, 1)
        save_image(img, f"generated_images/{i:05}.png")

print(f"Successfully generated 7000 images in '{output_dir}/'.")

zip_file_name = "images.zip"
with zipfile.ZipFile(zip_file_name, 'w') as zf:
    for root, _, files in os.walk(output_dir):
        for file in files:
            file_path = os.path.join(root, file)
            zf.write(file_path, os.path.basename(file_path))

print(f"Submission file '{zip_file_name}' created successfully.")

## Conclusion
This notebook implemented a DCGAN to generate photo-realistic 64x64 RGB images from the `photo_jpg` dataset (7,049 images, 256x256, resized). The pipeline included data preprocessing, EDA, model training, and image generation.

**Results**:
- Generated images showed early photo-like features after 5 epochs (FID ~130), indicating learning but needing more training.
- Losses stabilized, and FID decreased, suggesting effective training.

**Learnings**:
- DCGAN’s architecture is effective but sensitive to hyperparameters.
- WGAN-GP offers stability but requires more epochs.
- Proper normalization and resizing are critical for GAN performance.

**Challenges**:
- Short training (5 epochs) limited image quality.
- Resizing to 64x64 may have reduced fine details.

**Improvements**:
- **Increase Epochs**: Train for 50-100 epochs with periodic FID validation.
- **Advanced Architectures**: Explore StyleGAN or Progressive GAN for higher quality.
- **Data Augmentation**: Apply rotations/flips to enhance diversity.
- **Regularization**: Use spectral normalization or gradient penalties to prevent mode collapse.