# Assignment 4 - Geenerative Adversarial Networks

# Theory Questions on GANs

---

### **Q1: Explain the minimax loss function in GANs and how it ensures competitive training between the generator and discriminator.**

In **Generative Adversarial Networks (GANs)**, the minimax loss function establishes a competitive dynamic between the **generator (G)** and the **discriminator (D)**. This creates a two-player game:

- **Discriminator's Goal (D):** Maximize its ability to correctly distinguish real data from fake (generated) data.
- **Generator's Goal (G):** Minimize the discriminator's ability to detect that its outputs are fake.

The **minimax loss function** is formulated as:

\[
\min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))]
\]

- **First term:** \( \log D(x) \) represents how confidently the discriminator identifies real data.
- **Second term:** \( \log(1 - D(G(z))) \) measures the discriminator's confidence in detecting fake data.

Through iterative updates:
- **D** becomes better at distinguishing real from fake.
- **G** improves in generating realistic data that fools **D**.

This adversarial process pushes both models to improve, aiming for a point where **D** cannot differentiate real from generated data.

---

### **Q2: What is mode collapse, why can mode collapse occur during GAN training, and how can it be mitigated?**

**Mode collapse** happens when the **generator** produces limited, repetitive outputs, failing to capture the full diversity of the data distribution.

#### **Why Can Mode Collapse Occur?**
- The generator finds a small subset of outputs that consistently fool the discriminator and keeps producing them.
- The discriminator might not penalize the generator enough for the lack of variety, allowing this behavior to persist.

#### **How Can It Be Mitigated?**
1. **Improved Loss Functions:**  
   - Using **Wasserstein loss** (in WGANs) helps provide more stable training and informative gradients, reducing the risk of mode collapse.

2. **Feature Matching:**  
   - The generator tries to match intermediate features from the discriminator instead of just focusing on fooling it, promoting diversity.

3. **Mini-batch Discrimination:**  
   - The discriminator evaluates batches of samples, encouraging the generator to produce varied outputs.

4. **Unrolled GANs:**  
   - The generator anticipates future discriminator behavior, discouraging repetitive outputs.

5. **Regularization Techniques:**  
   - Techniques like adding noise to the discriminator’s inputs can also help mitigate mode collapse.

---

### **Q3: Explain the role of the discriminator in adversarial training.**

The **discriminator (D)** in a GAN functions as a **binary classifier** that distinguishes between real data (from the dataset) and fake data (generated by the generator). Its main roles include:

1. **Providing Feedback to the Generator:**  
   - The discriminator evaluates how realistic the generator's outputs are, providing gradients that guide the generator to improve.

2. **Driving the Adversarial Process:**  
   - The discriminator forces the generator to continually improve by penalizing unrealistic outputs.

3. **Measuring Progress:**  
   - When the discriminator’s accuracy drops to around 50%, it suggests that the generator's outputs are nearly indistinguishable from real data, indicating progress.

4. **Preventing Overfitting:**  
   - By challenging the generator to produce diverse and realistic data, the discriminator helps prevent the generator from overfitting to specific patterns.

---

### **Q4: How do metrics like IS and FID evaluate GAN performance?**

GANs are evaluated using specialized metrics, with **Inception Score (IS)** and **Fréchet Inception Distance (FID)** being the most common.

#### **Inception Score (IS):**

- **Purpose:** Measures both the quality and diversity of generated images.
  
- **How It Works:**  
  - Uses a pre-trained **Inception network** to classify generated images.
  - A good image should:
    - Be classified with **high confidence** (indicating quality).
    - Cover **diverse classes** (indicating variety).

- **Formula:**  
  \[
  IS = \exp(\mathbb{E}_{x} [KL(p(y|x) \parallel p(y))])
  \]
  - \( p(y|x) \): Predicted label distribution for a generated image.
  - \( p(y) \): Marginal distribution of labels across all generated images.

- **Limitations:**  
  - It does not directly measure how close generated images are to real data.
  - It can be biased if the generated images align with the Inception model's pre-trained classes.

---

#### **Fréchet Inception Distance (FID):**

- **Purpose:** Measures how close the distribution of generated images is to the real data distribution.

- **How It Works:**  
  - Compares the **mean** and **covariance** of feature representations (from the Inception network) of real and generated images.

- **Formula:**  
  \[
  FID = ||\mu_r - \mu_g||^2 + \text{Tr}(\Sigma_r + \Sigma_g - 2(\Sigma_r \Sigma_g)^{1/2})
  \]
  - \( \mu_r, \Sigma_r \): Mean and covariance of real data features.
  - \( \mu_g, \Sigma_g \): Mean and covariance of generated data features.

- **Interpretation:**  
  - **Lower FID scores** indicate that the generated data distribution is closer to the real data distribution, suggesting better performance.

- **Advantages:**  
  - More robust than IS, as it directly compares generated data to real data.

---


In [7]:
import tensorflow as tf
from tensorflow.keras import layers
import matplotlib.pyplot as plt
import numpy as np
import os

In [8]:
# Load and preprocess CIFAR-10 data
(train_images, _), (_, _) = tf.keras.datasets.cifar10.load_data()
train_images = train_images.astype('float32')
train_images = (train_images - 127.5) / 127.5  # Normalize to [-1, 1]

BUFFER_SIZE = 50000
BATCH_SIZE = 256
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

In [9]:
# Generator model with additional convolutional layers
def make_generator_model():
    model = tf.keras.Sequential([
        layers.Dense(8*8*256, use_bias=False, input_shape=(100,)),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.Reshape((8, 8, 256)),

        layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),

        layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),

        layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),

        layers.Conv2DTranspose(3, (5, 5), strides=(1, 1), padding='same', use_bias=False, activation='tanh')
    ])
    return model

In [10]:
# Discriminator model updated for CIFAR-10 dimensions
def make_discriminator_model():
    model = tf.keras.Sequential([
        layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=[32, 32, 3]),
        layers.LeakyReLU(),
        layers.Dropout(0.3),

        layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'),
        layers.LeakyReLU(),
        layers.Dropout(0.3),

        layers.Conv2D(256, (5, 5), strides=(2, 2), padding='same'),
        layers.LeakyReLU(),
        layers.Dropout(0.3),

        layers.Flatten(),
        layers.Dense(1)
    ])
    return model

In [11]:
# Loss functions
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    return real_loss + fake_loss

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)


In [12]:
# Optimizers
generator = make_generator_model()
discriminator = make_discriminator_model()
generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

In [13]:
# Function to generate and save images
def generate_and_save_images(model, epoch, test_input):
    predictions = model(test_input, training=False)
    fig = plt.figure(figsize=(4,4))

    for i in range(predictions.shape[0]):
        plt.subplot(4, 4, i+1)
        plt.imshow((predictions[i] * 127.5 + 127.5).numpy().astype(np.uint8))
        plt.axis('off')

    if not os.path.exists('generated_images'):
        os.makedirs('generated_images')
    plt.savefig('generated_images/image_at_epoch_{:04d}.png'.format(epoch))
    plt.close()

In [14]:
# Training function
@tf.function
def train_step(images):
    noise = tf.random.normal([BATCH_SIZE, 100])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = generator(noise, training=True)

        real_output = discriminator(images, training=True)
        fake_output = discriminator(generated_images, training=True)

        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))


In [15]:
# Training loop
def train(dataset, epochs):
    seed = tf.random.normal([16, 100])

    for epoch in range(epochs):
        for image_batch in dataset:
            train_step(image_batch)

        if (epoch + 1) % 10 == 0:
            generate_and_save_images(generator, epoch + 1, seed)

In [16]:
# Run the training
train(train_dataset, epochs=250)