### Task 1

Note: Carefully revisit the reading material, and choose additional sources if needed.

- a. Briefly explain why the Generator’s loss function only depends on the Discriminator's output for generated images. (Note: different representations.)

- b. What is “mode collapse” in GANs? Provide an intuitive example of how it might appear when generating images, and explain one strategy (theoretical or architectural) that can mitigate it. Provide a reference (from the provided reading material or beyond).

- c. What is the difference between Inception Score (IS) and Fréchet Inception Distance (FID)?

- d. Choose one application of GANs, and discuss potential ethical issue(s) that could arise.



potential answers:



Task 1: Theory - GAN background
a. Generator's Loss Function Dependency
The generator's loss function only depends on the discriminator's output for generated images because:

The generator's objective is to fool the discriminator into classifying fake images as real

During generator training, we freeze the discriminator and only update generator parameters

The loss is calculated as: L_G = -log(D(G(z))) where we want D(G(z)) → 1 (real)

We don't use real images in generator loss because the generator doesn't need to learn from real data directly - it learns indirectly through the discriminator's feedback

b. Mode Collapse
Definition: Mode collapse occurs when the generator produces limited varieties of samples, capturing only a few modes of the data distribution while ignoring others.

Intuitive Example: When generating human faces, a GAN suffering from mode collapse might only generate faces of middle-aged Caucasian males, ignoring other ethnicities, ages, and genders.

Mitigation Strategy: Mini-batch Discrimination (proposed in Improved GANs by Salimans et al.)

The discriminator looks at multiple samples in a batch to detect if the generator is producing similar outputs

This allows the discriminator to penalize lack of diversity

Reference: Salimans, T., et al. "Improved techniques for training gans." NeurIPS 2016.

c. IS vs FID Comparison
Inception Score (IS):

Measures quality and diversity using pre-trained Inception network

High score when: high confidence in predictions (quality) and diverse predictions across samples (diversity)

Only uses generated images

Fréchet Inception Distance (FID):

Compares statistics of real and generated images in feature space

Lower FID = better quality and diversity

Uses both real and generated images

Generally considered more reliable than IS

d. Ethical Issues in GAN Applications
Application: Deepfake generation for entertainment/media

Ethical Issues:

Identity theft and impersonation: Creating convincing fake videos of public figures

Non-consensual pornography: Generating explicit content using people's likeness without consent

Misinformation: Creating fake news or events that never happened

Legal evidence tampering: Potential to undermine trust in video evidence



another potential answer


🧠 Task 1 — Theory
a. Why the Generator’s loss depends only on the Discriminator’s output for generated images

The generator never sees real data; it only receives feedback through the discriminator.
Formally,

min
⁡
𝐺
𝐸
𝑧
∼
𝑝
𝑧
[
log
⁡
(
1
−
𝐷
(
𝐺
(
𝑧
)
)
)
]
G
min
	​

E
z∼p
z
	​

	​

[log(1−D(G(z)))]

The generator adjusts its parameters θᵍ so that D(G(z)) → 1 (i.e., fake → real).
Hence the loss depends only on D(G(z)) — the discriminator’s evaluation of generated samples.
All gradient information about realism flows through that scalar output; the generator’s job is simply to make D believe its samples are real.

Alternative “non-saturating” form

max
⁡
𝐺
𝐸
𝑧
∼
𝑝
𝑧
[
log
⁡
𝐷
(
𝐺
(
𝑧
)
)
]
G
max
	​

E
z∼p
z
	​

	​

[logD(G(z))]

This avoids vanishing gradients early in training but still depends solely on D(G(z)).

b. Mode Collapse

Definition:
Mode collapse happens when the generator produces limited diversity — e.g., it discovers a few patterns that reliably fool the discriminator and repeats them.

Example:
In CIFAR-10 training, instead of generating ten diverse object classes, G outputs only one convincing “car-like” image with small variations.

Mitigation strategies

Architectural: Use Minibatch Discrimination (Salimans et al., 2016) → D looks at feature diversity across a batch.

Objective modification: Wasserstein GAN with Gradient Penalty (WGAN-GP; Gulrajani et al., 2017) stabilizes training and discourages collapse by providing smooth, meaningful gradients.

Training trick: Label smoothing, noise injection, or unrolling the discriminator.

Reference:
Gulrajani et al., “Improved Training of Wasserstein GANs,” NIPS 2017.

c. Inception Score (IS) vs Fréchet Inception Distance (FID)
Metric	What it measures	Computation	Limitations
IS	Image quality + diversity based on Inception-v3 classifier outputs	( \exp(\mathbb{E}_x [ KL(p(y	x)
FID	How close generated images are to real ones in feature space	Computes Fréchet distance between multivariate Gaussians fitted to Inception features of real vs fake images	More robust, penalizes mode collapse and bad diversity
✅ FID < better ⇒ closer to real distribution.
d. Application + Ethical Issues

Example Application: Face generation / deepfakes

Ethical concerns:

Misinformation → fake videos or portraits used for fraud or defamation.

Consent → use of people’s faces without permission.

Bias → training on imbalanced datasets amplifies societal biases.

Mitigations: Watermarking, detectable synthetic data tags, ethics review for dataset use.



### Task 2

Training a GAN can be challenging. In this exercise we invite you to try it for yourself, and see if you can succeed. We provide a sample notebook, which you can use as a starting point. You probably need to download some data first.

Please note: In the provided sample notebook script_gan.ipynb, we use CIFAR10. You are free to use another dataset, MNIST or celebA. You might find it helpful to read more about the pytorch dataloader.

Note: Remember to change the data path in the notebook.

- a. Complete the notebook by adding the implementation for the generator and discriminator.
- b. Which parts of the notebook have to be changed if you want to use another dataset? How do you need to change them?
- c. Please document your findings when executing the notebook. Did your training converge? How did the intermediate generated images look like?
- d. Name several things which can be changed to receive a different result after training.
- e. Choose something to change, e.g. setting, preprocessing or parameters, and run the notebook (at least) 2 more times with different configurations. Save your results. Did the training converge? How do the generated images look like? How do the three different models perform in comparison?
Non-mandatory: Comment on your suggestions to change the code.

In [2]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Subset
import torchvision.utils as vutils
import matplotlib.pyplot as plt
import pandas as pd



In [3]:
# Define a simple generator and discriminator for CIFAR-10
class Generator(nn.Module):
    def __init__(self, latent_dim):
        super(Generator, self).__init__()
        self.latent_dim = latent_dim

        self.model = nn.Sequential(
            nn.Linear(latent_dim, 128 * 8 * 8),
            nn.ReLU(),
            nn.Unflatten(1, (128, 8, 8)),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128, momentum=0.78),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(128, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64, momentum=0.78),
            nn.ReLU(),
            nn.Conv2d(64, 3, kernel_size=3, padding=1),
            nn.Tanh()
        )

    def forward(self, x):
        return self.model(x)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        self.model = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.25),
            nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
            nn.ZeroPad2d((0, 1, 0, 1)),
            nn.BatchNorm2d(64, momentum=0.82),
            nn.LeakyReLU(0.25),
            nn.Dropout(0.25),
            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(128, momentum=0.82),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.25),
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256, momentum=0.8),
            nn.LeakyReLU(0.25),
            nn.Dropout(0.25),
            nn.Flatten(),
            nn.Linear(256 * 5 * 5, 1),  # Fixed: calculated correct dimensions
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.model(x)

In [4]:
# Add this verification code right after your model definitions
def verify_model_dimensions():
    print("=== VERIFYING MODEL DIMENSIONS ===")

    # Test Generator
    latent_dim = 100
    generator = Generator(latent_dim)

    # Test with batch of 4 noise vectors
    test_noise = torch.randn(4, latent_dim)
    print(f"Generator input shape: {test_noise.shape}")

    try:
        generated_images = generator(test_noise)
        print(f"Generator output shape: {generated_images.shape}")

        # Check if output matches 3x32x32
        expected_shape = (4, 3, 32, 32)
        if generated_images.shape == expected_shape:
            print("✅ Generator CORRECT: Output matches 3x32x32")
        else:
            print(f"❌ Generator INCORRECT: Expected {expected_shape}, got {generated_images.shape}")
    except Exception as e:
        print(f"❌ Generator ERROR: {e}")

    # Test Discriminator
    discriminator = Discriminator()

    # Create a proper 32x32 test image
    test_images = torch.randn(4, 3, 32, 32)
    print(f"\nDiscriminator input shape: {test_images.shape}")

    try:
        discriminator_output = discriminator(test_images)
        print(f"Discriminator output shape: {discriminator_output.shape}")

        # Check if input is 3x32x32 and output is single number per image
        if test_images.shape == (4, 3, 32, 32) and discriminator_output.shape == (4, 1):
            print("✅ Discriminator CORRECT: Input=3x32x32, Output=single number")
        else:
            print("❌ Discriminator INCORRECT")
    except Exception as e:
        print(f"❌ Discriminator ERROR: {e}")

# Run verification
verify_model_dimensions()

=== VERIFYING MODEL DIMENSIONS ===
Generator input shape: torch.Size([4, 100])
Generator output shape: torch.Size([4, 3, 32, 32])
✅ Generator CORRECT: Output matches 3x32x32

Discriminator input shape: torch.Size([4, 3, 32, 32])
Discriminator output shape: torch.Size([4, 1])
✅ Discriminator CORRECT: Input=3x32x32, Output=single number


In [7]:
# Data loading and preprocessing (using CIFAR-10 dataset)

# train on GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
dataset = datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)


print(f'Train: {len(dataset)} samples')

dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

100%|██████████| 170M/170M [00:14<00:00, 11.8MB/s] 


Train: 50000 samples


In [1]:
# Initialize the models

latent_dim = 100
lr = 0.0002
beta1 = 0.5
beta2 = 0.999
num_epochs = 10

generator = Generator(latent_dim)
discriminator = Discriminator()


print(f"Generator parameters: {sum(p.numel() for p in generator.parameters()):,}")
print(f"Discriminator parameters: {sum(p.numel() for p in discriminator.parameters()):,}")


# Define loss function and optimizers
criterion = nn.BCELoss()
optimizer_G = optim.Adam(generator.parameters(), lr=lr, betas=(beta1, beta2))
optimizer_D = optim.Adam(discriminator.parameters(), lr=lr, betas=(beta1, beta2))

# Lists to store losses for plotting
d_losses = []
g_losses = []



# Training loop
# def train_gan(generator, discriminator, dataloader, num_epochs):
#     for epoch in range(num_epochs):
#         for i, data in enumerate(dataloader):
#             real_images, _ = data
#             batch_size = real_images.size(0)
#             real_images = real_images.view(batch_size, -1)
#             real_labels = torch.ones(batch_size, 1)
#             fake_labels = torch.zeros(batch_size, 1)
#
#             # Train the discriminator
#             optimizer_D.zero_grad()
#             outputs = discriminator(real_images)
#             d_loss_real = criterion(outputs, real_labels)
#             d_loss_real.backward()
#
#             z = torch.randn(batch_size, 100)
#             fake_images = generator(z)
#             outputs = discriminator(fake_images.detach())
#             d_loss_fake = criterion(outputs, fake_labels)
#             d_loss_fake.backward()
#             d_loss = d_loss_real + d_loss_fake
#             optimizer_D.step()
#
#             # Train the generator
#             optimizer_G.zero_grad()
#             outputs = discriminator(fake_images)
#             g_loss = criterion(outputs, real_labels)
#             g_loss.backward()
#             optimizer_G.step()
#
#             d_losses.append(d_loss.item())
#             g_losses.append(g_loss.item())
#
#             if (i + 1) % 100 == 0:
#                 print(f'Epoch [{epoch+1}/{num_epochs}], Batch [{i+1}/{len(dataloader)}], D Loss: {d_loss.item():.4f}, G Loss: {g_loss.item():.4f}')
#
#         # Generate and save a sample of fake images
#         if (epoch + 1) % 10 == 0:
#             with torch.no_grad():
#                 z = torch.randn(32, 100)
#                 fake_samples = generator(z)
#                 vutils.save_image(fake_samples, f'fake_cifar_samples_epoch_{epoch+1}.png', normalize=True)
#
#         # Plot the loss curves
#         plt.figure(figsize=(10, 5))
#         plt.title("Generator and Discriminator Loss")
#         plt.plot(g_losses, label="G Loss")
#         plt.plot(d_losses, label="D Loss")
#         plt.xlabel("Iterations")
#         plt.ylabel("Loss")
#         plt.legend()
#         plt.savefig(f'loss_plot_epoch_{epoch+1}.png')
#         plt.show()


# Training loop
def train_gan(generator, discriminator, dataloader, num_epochs):
    for epoch in range(num_epochs):
        for i, data in enumerate(dataloader):
            real_images, _ = data
            batch_size = real_images.size(0)

            # Move to same device as models
            device = next(generator.parameters()).device
            real_images = real_images.to(device)

            real_labels = torch.ones(batch_size, 1, device=device)
            fake_labels = torch.zeros(batch_size, 1, device=device)

            # Train the discriminator
            optimizer_D.zero_grad()

            # Real images
            outputs_real = discriminator(real_images)
            d_loss_real = criterion(outputs_real, real_labels)

            # Fake images
            z = torch.randn(batch_size, latent_dim, device=device)
            fake_images = generator(z)
            outputs_fake = discriminator(fake_images.detach())
            d_loss_fake = criterion(outputs_fake, fake_labels)

            d_loss = (d_loss_real + d_loss_fake) / 2
            d_loss.backward()
            optimizer_D.step()

            # Train the generator
            optimizer_G.zero_grad()
            outputs = discriminator(fake_images)
            g_loss = criterion(outputs, real_labels)
            g_loss.backward()
            optimizer_G.step()

            d_losses.append(d_loss.item())
            g_losses.append(g_loss.item())

            if (i + 1) % 10 == 0:
                print(f'Epoch [{epoch+1}/{num_epochs}], Batch [{i+1}/{len(dataloader)}], '
                      f'D Loss: {d_loss.item():.4f}, G Loss: {g_loss.item():.4f}')

        fixed_z = torch.randn(32, latent_dim, device=device)

        with torch.no_grad():
            fake_fixed = generator(fixed_z)
            vutils.save_image(fake_fixed, f'fixed_progress_epoch_{epoch+1:02d}.png', normalize=True, nrow=8)
            z = torch.randn(32, latent_dim, device=device)
            fake_samples = generator(z)
            vutils.save_image(fake_samples, f'fake_samples_epoch_{epoch+1:02d}.png',
                            normalize=True, nrow=8)
            print(f'Saved samples for epoch {epoch+1}')

        if (epoch + 1) % 10 == 0:
            torch.save({
                'epoch': epoch + 1,
                'generator_state_dict': generator.state_dict(),
                'discriminator_state_dict': discriminator.state_dict(),
                'optimizer_G_state_dict': optimizer_G.state_dict(),
                'optimizer_D_state_dict': optimizer_D.state_dict(),
                'g_losses': g_losses,
                'd_losses': d_losses
            }, f'checkpoint_epoch_{epoch+1:02d}.pt')
            print(f"Checkpoint saved for epoch {epoch+1}")

        # Save loss arrays to CSV for later plotting
        if (epoch + 1) % 2 == 0:
            pd.DataFrame({'G_loss': g_losses, 'D_loss': d_losses}) \
                .to_csv(f'losses_epoch_{epoch+1:02d}.csv', index=False)
        # Plot loss curves every 2 epochs
        if (epoch + 1) % 2 == 0:
            plt.figure(figsize=(10, 5))
            plt.title(f"Generator and Discriminator Loss - Epoch {epoch+1}")
            plt.plot(g_losses, label="G Loss")
            plt.plot(d_losses, label="D Loss")
            plt.xlabel("Iterations")
            plt.ylabel("Loss")
            plt.legend()
            plt.savefig(f'loss_plot_epoch_{epoch+1:02d}.png')
            plt.close()


NameError: name 'Generator' is not defined

In [None]:
# Main training loop
print("Starting training...")

train_gan(generator, discriminator, dataloader, num_epochs=num_epochs)
print("Training completed!")

Starting training...
Epoch [1/10], Batch [20/3125], D Loss: 0.6065, G Loss: 0.9074
Epoch [1/10], Batch [40/3125], D Loss: 0.4467, G Loss: 1.5799
Epoch [1/10], Batch [60/3125], D Loss: 0.6644, G Loss: 1.3866
Epoch [1/10], Batch [80/3125], D Loss: 0.5484, G Loss: 1.6885
Epoch [1/10], Batch [100/3125], D Loss: 0.4316, G Loss: 1.6188
Epoch [1/10], Batch [120/3125], D Loss: 0.4619, G Loss: 1.4403
Epoch [1/10], Batch [140/3125], D Loss: 0.4939, G Loss: 1.5904
Epoch [1/10], Batch [160/3125], D Loss: 0.6754, G Loss: 1.3510
Epoch [1/10], Batch [180/3125], D Loss: 0.6325, G Loss: 1.1295
Epoch [1/10], Batch [200/3125], D Loss: 0.6710, G Loss: 0.9767
Epoch [1/10], Batch [220/3125], D Loss: 0.6282, G Loss: 0.9186
Epoch [1/10], Batch [240/3125], D Loss: 0.7062, G Loss: 0.8775
Epoch [1/10], Batch [260/3125], D Loss: 0.5164, G Loss: 1.0150
Epoch [1/10], Batch [280/3125], D Loss: 0.6876, G Loss: 1.0355
Epoch [1/10], Batch [300/3125], D Loss: 0.4625, G Loss: 1.2416
Epoch [1/10], Batch [320/3125], D Loss