### Regis University

**MSDS688_X70: Artificial Intelligence**  
Master of Science in Data Science Program

#### Week 6: Generative Adversarial Networks (GANs) for Image Generation  
*GPU Required*

## Lecture: Week 6 - Introduction to Deep Convolutional GANs (DCGAN)

### Overview

This week, we are diving into **Deep Convolutional Generative Adversarial Networks (DCGANs)**, a specific type of **Generative Adversarial Network (GAN)** that uses convolutional layers for image generation. This lecture will introduce you to the fundamental concepts of GANs, followed by a deep, detailed exploration of the architecture of DCGANs, including the generator and discriminator networks, how they function, and how they work together.

---

### 1. **What is a GAN?**

A **Generative Adversarial Network (GAN)** consists of two neural networks — a **generator** and a **discriminator** — that compete against each other in a game-theoretic scenario. The goal of the generator is to create data (e.g., images) that can fool the discriminator into thinking the data is real, while the discriminator aims to correctly classify whether the data it receives is real (from the dataset) or fake (from the generator).

#### Key Elements:
- **Generator (G)**: Takes a random noise vector and transforms it into a data point, like an image.
- **Discriminator (D)**: A classifier that tries to distinguish between real data and generated data.
- **Latent Space**: The generator starts with a random input vector sampled from a probability distribution (usually Gaussian), referred to as **latent space**.

In essence, GANs work by having the two networks train each other in an adversarial process, pushing the generator to create better and more realistic images over time.

---

### 2. **Why DCGAN?**

While traditional GANs can generate data using fully connected layers, they struggle to capture the intricate structures in image data. **DCGANs** address this by using **convolutional layers**, which are better suited for processing image data because of their ability to recognize spatial hierarchies, like edges, textures, and higher-level patterns.

DCGAN introduces several architectural enhancements specifically designed for generating images:
- **Convolutional Layers**: Both the generator and discriminator are based on convolutional layers, allowing them to better capture and process image features.
- **Transposed Convolutions**: The generator uses transposed convolutions (also known as deconvolutions) to upsample latent vectors into images.
- **Strided Convolutions**: The discriminator uses strided convolutions to downsample images and extract relevant features for classification.

---

### 3. **DCGAN Architecture Breakdown**

The architecture of a DCGAN is composed of two networks:
1. **Generator Network**: Responsible for creating images.
2. **Discriminator Network**: Responsible for evaluating the quality of the generated images.

#### 3.1 Generator Architecture

The **generator** transforms a randomly sampled vector (latent vector) into an image. The core idea behind the generator is to progressively **upsample** the input latent vector through **transposed convolutional layers** until it becomes a full-sized image.

##### Key Components of the Generator:
1. **Latent Vector (Input)**: The generator starts with a small latent vector, typically sampled from a standard normal distribution (e.g., a 100-dimensional vector sampled from \( N(0, 1) \)).
   - **Why random noise?**: The latent vector is the seed for generating the image. Starting from a random vector allows the generator to explore many possible outputs.

2. **Transposed Convolutions**: These layers reverse the downsampling process of normal convolutions. In traditional convolutions, you reduce the image size, but in **transposed convolutions**, the goal is to **increase** the spatial dimensions, i.e., upsample the feature map back into an image. These layers learn how to construct higher resolution features from the noise vector.
   - **Filter size**: Controls how much detail is added at each layer.
   - **Stride**: Controls the step size for upsampling, typically a stride of 2 to double the spatial resolution.
   
3. **Batch Normalization**: Each transposed convolutional layer is followed by a **batch normalization** layer. Batch normalization stabilizes the training by normalizing the activations and helps prevent internal covariate shift (large swings in the distribution of activations during training). This speeds up the learning process and allows the network to train with higher learning rates.
   - **Why Batch Norm?**: It ensures that the inputs to each layer maintain a stable distribution, which is important for maintaining training stability in GANs.

4. **Activation Functions**:
   - **ReLU Activation**: In most layers of the generator, the **ReLU** activation function is used. ReLU (Rectified Linear Unit) is commonly used in deep learning because it introduces non-linearity without suffering from vanishing gradients.
   - **tanh Activation**: The final layer of the generator uses the **tanh** activation function to output pixel values. This ensures that the pixel values are scaled between -1 and 1, which is typical for image normalization.

##### Generator Architecture Summary:
- **Input**: Latent vector (e.g., 100-dimensional).
- **Transposed Conv Layer 1**: Upsampling the latent vector (noise) and applying ReLU activation.
- **Batch Normalization**: To stabilize training.
- **Transposed Conv Layer 2**: Further upsampling with ReLU and Batch Norm.
- **Final Transposed Conv Layer**: Producing the image output with **tanh** activation.

By the end of this process, the generator has transformed a random noise vector into a high-dimensional image (e.g., 64x64x3 for a color image).

#### 3.2 Discriminator Architecture

The **discriminator** is a **binary classifier** that takes an image as input (either a real image from the dataset or a fake image from the generator) and outputs a single value indicating whether the input is real or fake.

##### Key Components of the Discriminator:
1. **Input**: An image, either from the dataset or generated by the generator.
   - The input is a 2D image with three color channels (RGB) or a grayscale image.
   
2. **Convolutional Layers**: The discriminator uses standard **convolutional layers** to progressively **downsample** the input image. As the input passes through these layers, the spatial resolution is reduced while the depth (number of feature maps) increases, allowing the network to learn increasingly complex features of the image, like textures and shapes.
   - **Strided Convolutions**: The discriminator uses **strided convolutions** to downsample the image, similar to max-pooling but with better learnable parameters.
   
3. **Leaky ReLU Activation**: Instead of the standard ReLU used in the generator, the discriminator uses **Leaky ReLU** activation. This variant of ReLU allows small negative gradients to pass through, preventing dead neurons and helping the model learn more robust features.
   - **Why Leaky ReLU?**: In the early stages of training, the discriminator needs to learn from both real and generated images, and using Leaky ReLU helps propagate gradients even when the activation is negative, improving training dynamics.

4. **Batch Normalization**: Similar to the generator, **batch normalization** is applied after each convolutional layer to stabilize the training and maintain a stable gradient flow.
   
5. **Final Layer**: The final convolutional layer is followed by a **sigmoid activation function**, which outputs a single scalar value between 0 and 1. This value represents the discriminator’s confidence that the input image is real (1) or fake (0).

##### Discriminator Architecture Summary:
- **Input**: Image (e.g., 64x64x3).
- **Conv Layer 1**: Downsampling with strided convolution and Leaky ReLU activation.
- **Batch Normalization**: To stabilize training.
- **Conv Layer 2**: Further downsampling with Leaky ReLU and Batch Norm.
- **Final Conv Layer**: Outputs a scalar value with sigmoid activation to classify real or fake.

---

### 4. **DCGAN Training Process**

The training process for a DCGAN involves the generator and discriminator competing against each other in a **minimax game**. The objective is to optimize both networks to achieve their respective goals:
- The generator wants to minimize the loss by generating images that can fool the discriminator.
- The discriminator wants to maximize the loss by correctly classifying real vs. fake images.

#### Key Steps in Training:
1. **Training the Discriminator**: In each iteration, the discriminator is trained to maximize the probability of assigning the correct label (real/fake) to real and generated images. The discriminator gets two types of input:
   - Real images from the dataset.
   - Fake images from the generator.
   The discriminator is updated to become better at distinguishing real from fake images.

2. **Training the Generator**: After the discriminator is updated, the generator is trained. The generator aims to minimize the probability that the discriminator will correctly classify its output as fake. In other words, the generator tries to make the discriminator "believe" its fake images are real.

3. **Adversarial Process**: Both networks train simultaneously. The generator constantly improves to fool the discriminator, while the discriminator improves to better detect the generator’s fakes. This adversarial dynamic leads to increasingly realistic images as training progresses.

---

### 5. **Challenges in Training DCGANs**

Training DCGANs can be challenging due to the instability inherent in adversarial training. Common issues include:
- **Mode Collapse**: The generator starts producing very similar images, limiting diversity. This happens when the generator converges to a narrow solution where it produces only a subset of possible outputs.
- **Vanishing Gradients**: When the discriminator becomes too confident and the generator produces poor results, the gradients passed back to the generator can become very small, making it difficult for the generator to learn.
- **Balancing Training**: It’s crucial to maintain a balance between the discriminator and the generator. If one network becomes significantly better than the other, training can stagnate.

---

### Conclusion

This week’s assignment introduces you to the architecture and training process of **Deep Convolutional GANs (DCGANs)**. You will build and train a DCGAN, exploring how the generator and discriminator work together in an adversarial setting to produce realistic images. Pay close attention to the structure of both networks and experiment with hyperparameters like learning rates and batch normalization to see their impact on performance.

---


## Assignment Part 1: Follow Me – DCGAN for Image Generation using MNIST

In this section, we will build and train a Deep Convolutional Generative Adversarial Network (DCGAN) to generate images from the MNIST dataset. We will visualize the generated images and explore the architecture of the model, focusing on how GANs work in generating realistic images.


In [None]:
!pip install torch matplotlib

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
import os

In [None]:
# Check for GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
# Create a directory for saving generated images
os.makedirs("generated_images", exist_ok=True)

In [None]:
# ---------------- DCGAN for MNIST ----------------
class QuickMNISTGenerator(nn.Module):
    def __init__(self):
        super(QuickMNISTGenerator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(100, 128, 7, 1, 0, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(True),
            nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.ConvTranspose2d(64, 1, 4, 2, 1, bias=False),  # 1 channel for grayscale
            nn.Tanh()
        )

    def forward(self, x):
        return self.main(x)

In [None]:
# Discriminator for MNIST
class QuickMNISTDiscriminator(nn.Module):
    def __init__(self):
        super(QuickMNISTDiscriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(1, 64, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(128, 1, 7, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.main(x).view(-1, 1)

In [None]:
# Initialize the models
latent_dim = 100
mnist_gen = QuickMNISTGenerator().to(device)
mnist_disc = QuickMNISTDiscriminator().to(device)

In [None]:
# Loss function and optimizers
criterion = nn.BCELoss()
optimizer_g = optim.Adam(mnist_gen.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_d = optim.Adam(mnist_disc.parameters(), lr=0.0002, betas=(0.5, 0.999))

In [None]:
# Load the MNIST dataset for training
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize([0.5], [0.5])])
mnist_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dataloader = DataLoader(mnist_dataset, batch_size=64, shuffle=True)

In [None]:
# Training function for the DCGAN
def train_dcgan(dataloader, generator, discriminator, optimizer_g, optimizer_d, criterion, num_epochs=5):
    for epoch in range(num_epochs):
        for i, (imgs, _) in enumerate(dataloader):
            # Update Discriminator
            optimizer_d.zero_grad()
            real_imgs = imgs.to(device)
            real_labels = torch.ones(imgs.size(0), 1).to(device)
            fake_labels = torch.zeros(imgs.size(0), 1).to(device)

            outputs = discriminator(real_imgs)
            outputs = outputs.view(real_imgs.size(0), -1).mean(1).unsqueeze(1)  # Flatten output to match label size
            loss_real = criterion(outputs, real_labels)

            z = torch.randn(imgs.size(0), latent_dim, 1, 1).to(device)
            fake_imgs = generator(z)
            outputs = discriminator(fake_imgs.detach())
            outputs = outputs.view(fake_imgs.size(0), -1).mean(1).unsqueeze(1)  # Flatten output to match label size
            loss_fake = criterion(outputs, fake_labels)

            loss_d = loss_real + loss_fake
            loss_d.backward()
            optimizer_d.step()

            # Update Generator
            optimizer_g.zero_grad()
            outputs = discriminator(fake_imgs)
            outputs = outputs.view(fake_imgs.size(0), -1).mean(1).unsqueeze(1)  # Flatten output to match label size
            loss_g = criterion(outputs, real_labels)
            loss_g.backward()
            optimizer_g.step()

        print(f"Epoch [{epoch + 1}/{num_epochs}] Loss D: {loss_d.item()}, Loss G: {loss_g.item()}")

In [None]:
# Train the GAN with only 5 epochs for faster results
train_dcgan(dataloader, mnist_gen, mnist_disc, optimizer_g, optimizer_d, criterion, num_epochs=5)

In [None]:
# Generate and save images to folder
def generate_images(generator, num_images=5):
    generator.eval()
    z = torch.randn(num_images, latent_dim, 1, 1).to(device)
    with torch.no_grad():
        generated_imgs = generator(z).cpu().numpy()
    generator.train()

    for i in range(num_images):
        img = np.transpose(generated_imgs[i], (1, 2, 0))
        img = (img + 1) / 2.0  # Rescale to [0, 1]
        plt.imsave(f"generated_images/generated_image_{i}.png", img.squeeze(), cmap="gray")
        plt.imshow(img.squeeze(), cmap="gray")
        plt.show()

In [None]:
generate_images(mnist_gen)

## Assignment Part 2: Your Turn – Build Your Own DCGAN with Custom Dataset

In this section, you'll use your own dataset and design your own DCGAN model to generate images. Follow the tasks below, which include hints for customizing the architecture and dataset. **A framework has been provided, and your job is to complete the TODOs.**

You can use any image dataset of your choice (e.g., CIFAR-10, CelebA, or a custom dataset of images).

In [None]:
transform_custom = transforms.Compose([transforms.ToTensor(), transforms.Normalize([0.5], [0.5])])

In [None]:
#### TODO: Use the CIFAR-10 dataset or replace it with your own dataset. Either option is fine. ####

# Example: CIFAR-10 dataset (you can use this or replace it with your own dataset)
custom_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_custom)
custom_dataloader = DataLoader(custom_dataset, batch_size=64, shuffle=True)

Task 2: Modify the Generator and Discriminator.

Below is the same Generator and Discriminator as used for MNIST.

If you're using a dataset with 3-channel RGB images (like CIFAR-10), make sure to adjust the number of input/output channels.

In [None]:
class CustomGenerator(nn.Module):
    def __init__(self):
        super(CustomGenerator, self).__init__()
        self.main = nn.Sequential(
            #### TODO: Create Model ####
        )

    def forward(self, x):
        return self.main(x)

In [None]:
class CustomDiscriminator(nn.Module):
    def __init__(self):
        super(CustomDiscriminator, self).__init__()
        self.main = nn.Sequential(
            #### TODO: Create Model ####
        )

    def forward(self, x):
        return self.main(x).view(-1, 1)

In [None]:
custom_gen = CustomGenerator().to(device)
custom_disc = CustomDiscriminator().to(device)

In [None]:
optimizer_g_custom = optim.Adam(custom_gen.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_d_custom = optim.Adam(custom_disc.parameters(), lr=0.0002, betas=(0.5, 0.999))

In [None]:
# Train your DCGAN with your dataset
train_dcgan(custom_dataloader, custom_gen, custom_disc, optimizer_g_custom, optimizer_d_custom, criterion, num_epochs=5)

In [None]:
generate_images(custom_gen)

### TODO: DCGAN Custom Dataset Analysis

Now that you've trained a Deep Convolutional Generative Adversarial Network (DCGAN) using your custom dataset, summarize your experience and insights by addressing the following questions:

- **Dataset Insights:**  
  How effectively did DCGAN generate realistic images from your chosen dataset? Discuss specific visual characteristics and the quality of generated images.

- **Training Observations:**  
  Did your dataset present any specific challenges during training (e.g., instability, mode collapse, convergence issues)? If so, how did you address these challenges?

- **Hyperparameter Impact:**  
  How did adjusting hyperparameters (e.g., learning rate, epochs, batch size) influence your model’s performance and generated image quality?

- **Practical Applications:**  
  Based on your results, what potential real-world applications can you envision for using DCGAN with datasets similar to yours?

**Action:**  
Write a concise summary (1-2 paragraphs) capturing your observations and insights clearly in a markdown cell below.
