## Part 0: Repurposing the DC-GAN Discriminator to a Classifier

The below code trains a binary classifier to detect whether a person in a photo has bangs or not. It is inspired by the convolutional architecture used in DCGAN discriminators, which is effective for image classification tasks because it's designed to analyze and categorize visual features.

In actual DC-GANs, these discriminators "fight" with the generators via adverserial training for the purpose of detecting real vs. fake images. In similar fashion, I wanted to create a classfier that could distirnguish between a person with a specific feature - in this case bangs - and a person without that specific feature.

The goal is to utilize this architecture as an accuracy metric for our images produced later on by Stable Diffusion and actual DC-GANs with different combinations of hyperparameters.

In [5]:
!pip install torchvision matplotlib



In [6]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torchvision.utils import make_grid
import matplotlib.pyplot as plt
import numpy as np
import os

In [15]:
# # Load CelebA with the attribute labels & Image transformations
transform = transforms.Compose([transforms.Resize(64), transforms.CenterCrop(64), transforms.ToTensor(), transforms.Normalize([0.5]*3, [0.5]*3)])
celeba_dataset = datasets.CelebA(root='./data', split='train', target_type='attr', transform=transform, download=True)

# Extract Bangs attribute (chosen arbitrarily) & convert attributes to 0 or 1
attr_idx = celeba_dataset.attr_names.index("Bangs")
celeba_dataset.attr = (celeba_dataset.attr[:, attr_idx : attr_idx + 1] == 1).long()

# DataLoader
batch_size = 128
train_loader = DataLoader(celeba_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

FileURLRetrievalError: Failed to retrieve file url:

	Too many users have viewed or downloaded this file recently. Please
	try accessing the file again later. If the file you are trying to
	access is particularly large or is shared with many people, it may
	take up to 24 hours to be able to view or download the file. If you
	still can't access a file after 24 hours, contact your domain
	administrator.

You may still be able to access the file from the browser:

	https://drive.google.com/uc?id=0B7EVK8r0v71pZjFTYXZWM3FlRnM

but Gdown can't. Please check connections and permissions.

In [8]:
# Using discriminator architecture to develop a DCGAN binary classifier (bangs or not)
class DCGANClassifier(nn.Module):
    def __init__(self):
        super(DCGANClassifier, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 64, 4, 2, 1),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(64, 128, 4, 2, 1),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(128, 256, 4, 2, 1),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(256, 512, 4, 2, 1),
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Flatten(),
            nn.Linear(8192, 1)
        )

    def forward(self, x):
        return self.model(x)

In [9]:
# Training loop preparation variables
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = DCGANClassifier().to(device)

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=2e-4, betas=(0.5, 0.999))

epochs = 10

In [None]:
# Forward pass: loss calculation, backpropagation, and accuracy tracking
for epoch in range(epochs):
    total_loss = 0
    correct = 0
    total = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device).float()

        logits = model(images)
        loss = criterion(logits, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        preds = torch.sigmoid(logits) > 0.5
        correct += (preds == labels).sum().item()
        total += labels.size(0)

    acc = 100 * correct / total
    print(f"Epoch {epoch+1}/{epochs} | Loss: {total_loss:.4f} | Accuracy: {acc:.2f}%")

NameError: name 'train_loader' is not defined

In [None]:
torch.save(model.state_dict(), "dcgan_bangs_classifier.pt")

## Part 1: Generate images with Stable Diffusion

This code is creating Dataset #1, which is our Stable Diffusion baseline. It generates 120 synthetic images (30 per category -> male/female with/without bangs) that will later be evaluatedto see how accurately Stable Diffusion can produce images with the specified gender and bang attributes. This will be compared to various sets of images that will be generated by DC-GAN based models.

**View the generated images [here.](https://drive.google.com/drive/folders/1lOI5gYbLT8DSywnhIT-NkQ047QE4AI-2?usp=drive_link)**

In [None]:
!pip install diffusers transformers accelerate torchvision



In [32]:
# Mounting Drive to permanently store all images -> https://drive.google.com/drive/u/0/folders/1OefCAcaPU6-Xk801LSuJufcC-z35mKEN
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from diffusers import StableDiffusionPipeline
import torch
from PIL import Image
import os
from torchvision import transforms
from tqdm import tqdm

In [None]:
# Loading the Stable Diffusion v1.5 model from Hugging Face, attention slicing for reducing memory usage
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16 if device == "cuda" else torch.float32)
pipe = pipe.to(device)
pipe.enable_attention_slicing()

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

In [None]:
# Prompt Engineering
prompts = {
    "woman_bangs": "a portrait photo of a woman with bangs",
    "woman_no_bangs": "a portrait photo of a woman without bangs",
    "man_bangs": "a portrait photo of a man with bangs",
    "man_no_bangs": "a portrait photo of a man without bangs" }

# Creating respective folders in Drive per category
drive_path = "/content/drive/MyDrive/COGS185-FinalProject/CelebA-StableDiffusion"
output_dir = "stable_diffusion_images"
os.makedirs(output_dir, exist_ok=True)
for label in prompts:
    os.makedirs(os.path.join(drive_path, label), exist_ok=True)

In [None]:
# Image generation with Stable Diffusion
num_images_per_prompt = 30
for label, prompt in prompts.items():
    print(f"Generating images for prompt: '{prompt}'")
    for i in tqdm(range(num_images_per_prompt)):
        image = pipe(prompt).images[0]
        image.save(os.path.join(drive_path, label, f"{label}_{i}.png"))

Generating images for prompt: 'a portrait photo of a woman with bangs'


  0%|          | 0/30 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  3%|▎         | 1/30 [00:10<05:18, 10.97s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  7%|▋         | 2/30 [00:21<05:04, 10.89s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 10%|█         | 3/30 [00:32<04:50, 10.78s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 13%|█▎        | 4/30 [00:42<04:37, 10.66s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 17%|█▋        | 5/30 [00:53<04:24, 10.56s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|██        | 6/30 [01:03<04:12, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 23%|██▎       | 7/30 [01:14<04:00, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 27%|██▋       | 8/30 [01:24<03:50, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 30%|███       | 9/30 [01:35<03:40, 10.49s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 33%|███▎      | 10/30 [01:45<03:30, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 37%|███▋      | 11/30 [01:56<03:19, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|████      | 12/30 [02:06<03:09, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 43%|████▎     | 13/30 [02:17<02:58, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 47%|████▋     | 14/30 [02:27<02:48, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 50%|█████     | 15/30 [02:38<02:37, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 53%|█████▎    | 16/30 [02:48<02:27, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 57%|█████▋    | 17/30 [02:59<02:16, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 18/30 [03:09<02:06, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 63%|██████▎   | 19/30 [03:20<01:55, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 67%|██████▋   | 20/30 [03:30<01:44, 10.49s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 70%|███████   | 21/30 [03:41<01:34, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 73%|███████▎  | 22/30 [03:51<01:23, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 77%|███████▋  | 23/30 [04:02<01:13, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 80%|████████  | 24/30 [04:12<01:03, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 83%|████████▎ | 25/30 [04:23<00:52, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 87%|████████▋ | 26/30 [04:33<00:42, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 90%|█████████ | 27/30 [04:44<00:31, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 93%|█████████▎| 28/30 [04:54<00:21, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 97%|█████████▋| 29/30 [05:05<00:10, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

100%|██████████| 30/30 [05:15<00:00, 10.53s/it]


Generating images for prompt: 'a portrait photo of a woman without bangs'


  0%|          | 0/30 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  3%|▎         | 1/30 [00:10<05:06, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  7%|▋         | 2/30 [00:21<04:55, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 10%|█         | 3/30 [00:31<04:44, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 13%|█▎        | 4/30 [00:42<04:34, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 17%|█▋        | 5/30 [00:52<04:22, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|██        | 6/30 [01:03<04:12, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 23%|██▎       | 7/30 [01:13<04:01, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 27%|██▋       | 8/30 [01:24<03:51, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 30%|███       | 9/30 [01:34<03:40, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 33%|███▎      | 10/30 [01:45<03:30, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 37%|███▋      | 11/30 [01:55<03:19, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|████      | 12/30 [02:06<03:08, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 43%|████▎     | 13/30 [02:16<02:58, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 47%|████▋     | 14/30 [02:27<02:47, 10.49s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 50%|█████     | 15/30 [02:37<02:37, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 53%|█████▎    | 16/30 [02:48<02:27, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 57%|█████▋    | 17/30 [02:58<02:17, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 18/30 [03:09<02:06, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 63%|██████▎   | 19/30 [03:19<01:55, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 67%|██████▋   | 20/30 [03:30<01:45, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 70%|███████   | 21/30 [03:40<01:34, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 73%|███████▎  | 22/30 [03:51<01:24, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 77%|███████▋  | 23/30 [04:01<01:13, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 80%|████████  | 24/30 [04:12<01:03, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 83%|████████▎ | 25/30 [04:22<00:52, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 87%|████████▋ | 26/30 [04:33<00:42, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 90%|█████████ | 27/30 [04:44<00:31, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 93%|█████████▎| 28/30 [04:54<00:21, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 97%|█████████▋| 29/30 [05:05<00:10, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

100%|██████████| 30/30 [05:15<00:00, 10.52s/it]


Generating images for prompt: 'a portrait photo of a man with bangs'


  0%|          | 0/30 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  3%|▎         | 1/30 [00:10<05:07, 10.60s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  7%|▋         | 2/30 [00:21<04:54, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 10%|█         | 3/30 [00:31<04:43, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 13%|█▎        | 4/30 [00:42<04:32, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 17%|█▋        | 5/30 [00:52<04:22, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|██        | 6/30 [01:03<04:12, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 23%|██▎       | 7/30 [01:13<04:01, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 27%|██▋       | 8/30 [01:24<03:51, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 30%|███       | 9/30 [01:34<03:40, 10.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 33%|███▎      | 10/30 [01:45<03:30, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 37%|███▋      | 11/30 [01:55<03:20, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|████      | 12/30 [02:06<03:09, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 43%|████▎     | 13/30 [02:16<02:58, 10.52s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 47%|████▋     | 14/30 [02:27<02:48, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 50%|█████     | 15/30 [02:37<02:37, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 53%|█████▎    | 16/30 [02:48<02:26, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 57%|█████▋    | 17/30 [02:58<02:16, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 18/30 [03:09<02:05, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 63%|██████▎   | 19/30 [03:19<01:55, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 67%|██████▋   | 20/30 [03:30<01:44, 10.49s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 70%|███████   | 21/30 [03:40<01:34, 10.49s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 73%|███████▎  | 22/30 [03:51<01:23, 10.49s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 77%|███████▋  | 23/30 [04:01<01:13, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 80%|████████  | 24/30 [04:12<01:02, 10.49s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 83%|████████▎ | 25/30 [04:22<00:52, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 87%|████████▋ | 26/30 [04:32<00:41, 10.47s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 90%|█████████ | 27/30 [04:43<00:31, 10.45s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 93%|█████████▎| 28/30 [04:53<00:20, 10.46s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 97%|█████████▋| 29/30 [05:04<00:10, 10.47s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

100%|██████████| 30/30 [05:14<00:00, 10.49s/it]


Generating images for prompt: 'a portrait photo of a man without bangs'


  0%|          | 0/30 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  3%|▎         | 1/30 [00:10<05:03, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  7%|▋         | 2/30 [00:20<04:53, 10.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 10%|█         | 3/30 [00:31<04:43, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 13%|█▎        | 4/30 [00:41<04:33, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 17%|█▋        | 5/30 [00:52<04:22, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|██        | 6/30 [01:02<04:12, 10.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 23%|██▎       | 7/30 [01:13<04:02, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 27%|██▋       | 8/30 [01:24<03:51, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 30%|███       | 9/30 [01:34<03:41, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 33%|███▎      | 10/30 [01:45<03:31, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 37%|███▋      | 11/30 [01:55<03:20, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|████      | 12/30 [02:06<03:09, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 43%|████▎     | 13/30 [02:16<02:58, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 47%|████▋     | 14/30 [02:27<02:48, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 50%|█████     | 15/30 [02:38<02:38, 10.58s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 53%|█████▎    | 16/30 [02:48<02:28, 10.57s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 57%|█████▋    | 17/30 [02:59<02:17, 10.57s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 18/30 [03:09<02:06, 10.56s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 63%|██████▎   | 19/30 [03:20<01:56, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 67%|██████▋   | 20/30 [03:30<01:45, 10.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 70%|███████   | 21/30 [03:41<01:34, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 73%|███████▎  | 22/30 [03:51<01:24, 10.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 77%|███████▋  | 23/30 [04:02<01:13, 10.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 80%|████████  | 24/30 [04:13<01:04, 10.82s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 83%|████████▎ | 25/30 [04:24<00:54, 10.87s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 87%|████████▋ | 26/30 [04:35<00:43, 10.77s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 90%|█████████ | 27/30 [04:45<00:32, 10.69s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 93%|█████████▎| 28/30 [04:56<00:21, 10.63s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 97%|█████████▋| 29/30 [05:06<00:10, 10.59s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

100%|██████████| 30/30 [05:17<00:00, 10.58s/it]


## Part 2.0: Setting up Conditional DC-GAN

The below code sets up a conditional DC-GAN, to allow us to generate images based on a combiantion of gender and the presence of bangs. The actual generation will happen in Parts 2.1-4, but this section of code allows us to instantiate all required code for DC-GANs now, so that we can train multiple iterations with different hyperparameters later.

In [None]:
import torch
import torch.nn.functional as F
import torch.nn as nn
from torchvision import datasets
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import os
from tqdm import tqdm

In [None]:
# Converting class labels into 1-hot vectors
def one_hot(labels, num_classes=4):
    return F.one_hot(labels, num_classes).float()

In [None]:
# Takes both noise and class labels as input to specify generations
class ConditionalGenerator(nn.Module):
    def __init__(self, z_dim=100, num_classes=4, img_channels=3, feature_maps=64):
        super().__init__()
        self.z_dim = z_dim
        self.num_classes = num_classes
        self.input_dim = z_dim + num_classes # random noise + classes

        self.gen = nn.Sequential(
            nn.ConvTranspose2d(self.input_dim, feature_maps * 8, 4, 1, 0, bias=False),  # 4x4
            nn.BatchNorm2d(feature_maps * 8),
            nn.ReLU(True),

            nn.ConvTranspose2d(feature_maps * 8, feature_maps * 4, 4, 2, 1, bias=False),  # 8x8
            nn.BatchNorm2d(feature_maps * 4),
            nn.ReLU(True),

            nn.ConvTranspose2d(feature_maps * 4, feature_maps * 2, 4, 2, 1, bias=False),  # 16x16
            nn.BatchNorm2d(feature_maps * 2),
            nn.ReLU(True),

            nn.ConvTranspose2d(feature_maps * 2, feature_maps, 4, 2, 1, bias=False),  # 32x32
            nn.BatchNorm2d(feature_maps),
            nn.ReLU(True),

            nn.ConvTranspose2d(feature_maps, img_channels, 4, 2, 1, bias=False),  # 64x64
            nn.Tanh()
        )

    def forward(self, z, labels):
        one_hot_labels = one_hot(labels, self.num_classes).to(z.device)  # [B, 4]
        x = torch.cat([z, one_hot_labels], dim=1)  # [B, z_dim + 4]
        x = x.view(x.size(0), -1, 1, 1)  # [B, z_dim+4, 1, 1]
        return self.gen(x)

In [None]:
# Pre-processing data for labeling
celeba = datasets.CelebA(root='./data', split='train', target_type='attr', download=True, transform=transform)
bangs_idx = celeba.attr_names.index("Bangs")
gender_idx = celeba.attr_names.index("Male")

# Female = 0, Male = 1; Bangs = 1, No Bangs = 0
celeba_classes = []
for attr in celeba.attr:
    gender = attr[gender_idx].item()
    bangs = attr[bangs_idx].item()
    if gender == 0 and bangs == 1: # woman with bangs
        label = 0
    elif gender == 0 and bangs == 0: # woman without bangs
        label = 1
    elif gender == 1 and bangs == 1: # man with bangs
        label = 2
    else:                           # man without bangs
        label = 3
    celeba_classes.append(label)
celeba.targets = celeba_classes

NameError: name 'transform' is not defined

In [None]:
# Takes both images and class labels to make authenticity judgments.
class ConditionalDiscriminator(nn.Module):
    def __init__(self, img_channels=3, num_classes=4, feature_maps=64):
        super().__init__()
        self.label_embed = nn.Embedding(num_classes, num_classes)

        self.disc = nn.Sequential(
            nn.Conv2d(img_channels + num_classes, feature_maps, 4, 2, 1),
            nn.LeakyReLU(0.2),

            nn.Conv2d(feature_maps, feature_maps * 2, 4, 2, 1),
            nn.BatchNorm2d(feature_maps * 2),
            nn.LeakyReLU(0.2),

            nn.Conv2d(feature_maps * 2, feature_maps * 4, 4, 2, 1),
            nn.BatchNorm2d(feature_maps * 4),
            nn.LeakyReLU(0.2),

            nn.Conv2d(feature_maps * 4, feature_maps * 8, 4, 2, 1),
            nn.BatchNorm2d(feature_maps * 8),
            nn.LeakyReLU(0.2),

            nn.Conv2d(feature_maps * 8, 1, 4, 1, 0),
            nn.Sigmoid()
        )

    def forward(self, x, labels):
        label_map = self.label_embed(labels)
        label_map = label_map.unsqueeze(2).unsqueeze(3)
        label_map = label_map.expand(-1, -1, 64, 64)

        x = torch.cat([x, label_map], dim=1)
        return self.disc(x)

In [None]:
# Adapts the original CelebA dataset to work with 4 class labels
class CelebAWithLabels(Dataset):
    def __init__(self, celeba_dataset):
        self.celeba = celeba_dataset
        self.targets = celeba_dataset.targets

    def __len__(self):
        return len(self.celeba)

    def __getitem__(self, idx):
        img, _ = self.celeba[idx]
        label = self.targets[idx]
        return img, torch.tensor(label)

train_loader = DataLoader(CelebAWithLabels(celeba), batch_size=128, shuffle=True, num_workers=2)

In [None]:
# After training, this function generates the evaluation image datasets.
def save_generated_images(generator, z_dim, label_map, save_dir, num_per_class=30):
    generator.eval()
    os.makedirs(save_dir, exist_ok=True)

    for folder in label_map.values():
        os.makedirs(os.path.join(save_dir, folder), exist_ok=True)

    with torch.no_grad():
        for class_label in range(len(label_map)):
            label = torch.tensor([class_label] * num_per_class).to(device)
            z = torch.randn(num_per_class, z_dim).to(device)
            imgs = generator(z, label)
            subfolder = label_map[class_label]

            for i, img in enumerate(imgs):
                img = (img * 0.5 + 0.5).clamp(0, 1)  # Denormalize
                path = os.path.join(save_dir, subfolder, f"{subfolder}_{i}.png")
                transforms.ToPILImage()(img.cpu()).save(path)

## Part 2.1: Generate images with DC-GAN (Adam Optimizer, BCEWithLogitsLoss)

This is DCGAN Version 1 with the following hyperparameters:
- Loss Function: Binary Cross Entropy (BCE) with Logits Loss
- Optimizer: Adam
- Learning Rate: 2e-4
- Beta parameters: (0.5, 0.999)

For the purposes of this project, we will try different loss functions and optimizers, but leave the learning rate and beta parameters constant.

In this iteration, we use the Adam Optimizer + BCEWithLogitsLoss. The code below implements the training loop and image generation for the first DC-GAN version.

**View the generated images [here.](https://drive.google.com/drive/folders/1psrANSAa3upNZ4CTMR2kiGynUbGKtzuC?usp=drive_link)**

In [None]:
z_dim = 100
epochs = 10

# Instantiating Generator and Discriminator
G = ConditionalGenerator(z_dim=z_dim).to(device)
D = ConditionalDiscriminator().to(device)

criterion = nn.BCELoss()
lr = 2e-4

opt_G = torch.optim.Adam(G.parameters(), lr=lr, betas=(0.5, 0.999))
opt_D = torch.optim.Adam(D.parameters(), lr=lr, betas=(0.5, 0.999))

G.train()
D.train()

In [None]:
# Adverserial Training Loop
for epoch in range(epochs):
    loop = tqdm(train_loader, desc=f"Adam+BCE Epoch {epoch+1}/{epochs}")
    for real_images, labels in loop:
        batch_size = real_images.size(0)
        real_images = real_images.to(device)
        labels = labels.to(device)

        # Train Discriminator -> Generate fake images using current generator & calculate real/fake/total loss
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = G(z, labels)

        real_labels = torch.ones(batch_size, 1, 1, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1, 1, 1).to(device)

        # Real loss
        D_real = D(real_images, labels)
        loss_real = criterion(D_real, real_labels)

        # Fake loss
        D_fake = D(fake_images.detach(), labels) # prevents gradients from flowing back to generator
        loss_fake = criterion(D_fake, fake_labels)

        loss_D = loss_real + loss_fake

        D.zero_grad()
        loss_D.backward()
        opt_D.step()

        # Train Genrator -> Generate new fake images to make discriminator output "1" for fake images
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = G(z, labels)

        D_pred = D(fake_images, labels)
        loss_G = criterion(D_pred, real_labels)  # wants D to believe it's real

        G.zero_grad()
        loss_G.backward()
        opt_G.step()

        loop.set_postfix({
            "D_loss": loss_D.item(),
            "G_loss": loss_G.item()
        })

Epoch 1/10: 100%|██████████| 1272/1272 [05:10<00:00,  4.09it/s, D_loss=0.821, G_loss=2.14]
Epoch 2/10: 100%|██████████| 1272/1272 [04:57<00:00,  4.28it/s, D_loss=0.742, G_loss=2.79]
Epoch 3/10: 100%|██████████| 1272/1272 [04:51<00:00,  4.36it/s, D_loss=0.68, G_loss=1.7]
Epoch 4/10: 100%|██████████| 1272/1272 [04:50<00:00,  4.37it/s, D_loss=1.21, G_loss=4.12]
Epoch 5/10: 100%|██████████| 1272/1272 [04:44<00:00,  4.47it/s, D_loss=0.524, G_loss=1.37]
Epoch 6/10: 100%|██████████| 1272/1272 [04:43<00:00,  4.48it/s, D_loss=0.651, G_loss=3.75]
Epoch 7/10: 100%|██████████| 1272/1272 [04:39<00:00,  4.55it/s, D_loss=0.353, G_loss=3.86]
Epoch 8/10: 100%|██████████| 1272/1272 [04:38<00:00,  4.56it/s, D_loss=1.27, G_loss=1.78]
Epoch 9/10: 100%|██████████| 1272/1272 [04:40<00:00,  4.54it/s, D_loss=0.354, G_loss=6.23]
Epoch 10/10:  31%|███       | 392/1272 [01:24<03:07,  4.69it/s, D_loss=0.111, G_loss=4.92]

In [None]:
# Save model
torch.save(G.state_dict(), "cDCGAN_generator.pt")
torch.save(D.state_dict(), "cDCGAN_discriminator.pt")

In [None]:
# Generate images and store them to Drive
label_map = {
    0: "woman_bangs",
    1: "woman_no_bangs",
    2: "man_bangs",
    3: "man_no_bangs"
}

save_generated_images(G, z_dim=100, label_map=label_map, save_dir="/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-Adam-BCEWithLogitsLoss")

## Part 2.2: Generate images with DC-GAN (Adam Optimizer, HingeLoss)

This is DCGAN Version 2 with the following hyperparameters:
- Loss Function: Hinge Loss
- Optimizer: Adam
- Learning Rate: 2e-4
- Beta parameters: (0.5, 0.999)

**View the generated images [here.](https://drive.google.com/drive/folders/1GaFuSu5kgxTPsspi7XJM9xvRKcuqUeB4?usp=drive_link)**

In [None]:
# Same Adam optimizers as before
opt_G = torch.optim.Adam(G.parameters(), lr=2e-4, betas=(0.5, 0.999))
opt_D = torch.optim.Adam(D.parameters(), lr=2e-4, betas=(0.5, 0.999))

epochs = 10
z_dim = 100

G.train()
D.train()

ConditionalDiscriminator(
  (label_embed): Embedding(4, 4)
  (disc): Sequential(
    (0): Conv2d(7, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.2)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1))
    (12): Sigmoid()
  )
)

In [None]:
for epoch in range(epochs):
    loop = tqdm(train_loader, desc=f"Adam+Hinge Epoch {epoch+1}/{epochs}")

    for real_images, labels in loop:
        batch_size = real_images.size(0)
        real_images = real_images.to(device)
        labels = labels.to(device)

        # Generate fake images
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = G(z, labels)

        # Train Discriminator
        D_real = D(real_images, labels)
        D_fake = D(fake_images.detach(), labels)

        # Hinge loss for Discriminator
        loss_D_real = F.relu(1.0 - D_real).mean()
        loss_D_fake = F.relu(1.0 + D_fake).mean()
        loss_D = loss_D_real + loss_D_fake

        D.zero_grad()
        loss_D.backward()
        opt_D.step()

        # Train Generator
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = G(z, labels)

        D_fake_for_G = D(fake_images, labels)

        # Hinge loss for Generator
        loss_G = -D_fake_for_G.mean()

        G.zero_grad()
        loss_G.backward()
        opt_G.step()

        loop.set_postfix({
            "D_loss": loss_D.item(),
            "G_loss": loss_G.item()
        })

Adam+Hinge Epoch 1/10: 100%|██████████| 1272/1272 [04:44<00:00,  4.47it/s, D_loss=1, G_loss=-6.83e-6]
Adam+Hinge Epoch 2/10: 100%|██████████| 1272/1272 [04:37<00:00,  4.59it/s, D_loss=1, G_loss=-2.34e-6]
Adam+Hinge Epoch 3/10: 100%|██████████| 1272/1272 [04:40<00:00,  4.54it/s, D_loss=2, G_loss=-2.69e-10]
Adam+Hinge Epoch 4/10: 100%|██████████| 1272/1272 [04:48<00:00,  4.41it/s, D_loss=2, G_loss=-5.3e-10]
Adam+Hinge Epoch 5/10: 100%|██████████| 1272/1272 [04:44<00:00,  4.47it/s, D_loss=2, G_loss=-3.38e-8]
Adam+Hinge Epoch 6/10: 100%|██████████| 1272/1272 [04:29<00:00,  4.72it/s, D_loss=2, G_loss=-0.451]
Adam+Hinge Epoch 7/10: 100%|██████████| 1272/1272 [04:30<00:00,  4.71it/s, D_loss=2, G_loss=-0.524]
Adam+Hinge Epoch 8/10: 100%|██████████| 1272/1272 [04:28<00:00,  4.74it/s, D_loss=2, G_loss=-0.5]
Adam+Hinge Epoch 9/10: 100%|██████████| 1272/1272 [04:27<00:00,  4.75it/s, D_loss=2, G_loss=-0.463]
Adam+Hinge Epoch 10/10: 100%|██████████| 1272/1272 [04:28<00:00,  4.74it/s, D_loss=2, G_los

In [None]:
label_map = {
    0: "woman_bangs",
    1: "woman_no_bangs",
    2: "man_bangs",
    3: "man_no_bangs"
}

save_generated_images(G, z_dim=100, label_map=label_map, save_dir="/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-Adam-HingeLoss")

## Part 2.3: Generate images with DC-GAN (RMSprop, BCEWithLogitsLoss)

This is DCGAN Version 3 with the following hyperparameters:
- Loss Function: Binary Cross Entropy (BCE) with Logits Loss
- Optimizer: RMSprop with weight decay + gradient clipping
- Learning Rate: 2e-4
- Beta parameters: (0.5, 0.999)

**View the generated images [here.](https://drive.google.com/drive/folders/1FxhzIk3ybwIdL9N5ohq01Pf-CG-eJOX3?usp=drive_link)**

In [None]:
# RMSprop optimizer instead of Adam
opt_G = torch.optim.RMSprop(G.parameters(), lr=2e-4, weight_decay=1e-5)
opt_D = torch.optim.RMSprop(D.parameters(), lr=2e-4, weight_decay=1e-5)

# BCEWithLogitsLoss remains the same
criterion = nn.BCEWithLogitsLoss()

G.train()
D.train()

ConditionalDiscriminator(
  (label_embed): Embedding(4, 4)
  (disc): Sequential(
    (0): Conv2d(7, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.2)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1))
    (12): Sigmoid()
  )
)

In [None]:
for epoch in range(epochs):
    loop = tqdm(train_loader, desc=f"RMSprop+BCE Epoch {epoch+1}/{epochs}")

    for real_images, labels in loop:
        batch_size = real_images.size(0)
        real_images = real_images.to(device)
        labels = labels.to(device)

        # Train Discriminator
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = G(z, labels)

        real_labels = torch.ones(batch_size, 1, 1, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1, 1, 1).to(device)

        D_real = D(real_images, labels)
        D_fake = D(fake_images.detach(), labels)

        loss_real = criterion(D_real, real_labels)
        loss_fake = criterion(D_fake, fake_labels)
        loss_D = loss_real + loss_fake

        D.zero_grad()
        loss_D.backward()

        # Gradient clipping for D
        torch.nn.utils.clip_grad_norm_(D.parameters(), max_norm=1.0)

        opt_D.step()

        # Train Generator
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = G(z, labels)
        D_fake_pred = D(fake_images, labels)

        loss_G = criterion(D_fake_pred, real_labels)  # G wants D to believe fakes are real

        G.zero_grad()
        loss_G.backward()

        # Gradient clipping for G
        torch.nn.utils.clip_grad_norm_(G.parameters(), max_norm=1.0)

        opt_G.step()

        loop.set_postfix({
            "D_loss": loss_D.item(),
            "G_loss": loss_G.item()
        })

RMSprop+BCE Epoch 1/10: 100%|██████████| 1272/1272 [04:46<00:00,  4.43it/s, D_loss=1.39, G_loss=0.693]
RMSprop+BCE Epoch 2/10: 100%|██████████| 1272/1272 [04:44<00:00,  4.46it/s, D_loss=1.39, G_loss=0.693]
RMSprop+BCE Epoch 3/10: 100%|██████████| 1272/1272 [04:42<00:00,  4.51it/s, D_loss=1.39, G_loss=0.693]
RMSprop+BCE Epoch 4/10: 100%|██████████| 1272/1272 [04:36<00:00,  4.61it/s, D_loss=1.39, G_loss=0.693]
RMSprop+BCE Epoch 5/10: 100%|██████████| 1272/1272 [04:27<00:00,  4.76it/s, D_loss=1.39, G_loss=0.693]
RMSprop+BCE Epoch 6/10: 100%|██████████| 1272/1272 [04:27<00:00,  4.76it/s, D_loss=1.39, G_loss=0.693]
RMSprop+BCE Epoch 7/10: 100%|██████████| 1272/1272 [04:41<00:00,  4.52it/s, D_loss=1.39, G_loss=0.693]
RMSprop+BCE Epoch 8/10: 100%|██████████| 1272/1272 [04:39<00:00,  4.55it/s, D_loss=1.39, G_loss=0.691]
RMSprop+BCE Epoch 9/10: 100%|██████████| 1272/1272 [04:28<00:00,  4.73it/s, D_loss=1.38, G_loss=0.679]
RMSprop+BCE Epoch 10/10: 100%|██████████| 1272/1272 [04:35<00:00,  4.62it

In [None]:
label_map = {
    0: "woman_bangs",
    1: "woman_no_bangs",
    2: "man_bangs",
    3: "man_no_bangs"
}

save_generated_images(G, z_dim=100, label_map=label_map, save_dir="/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-RMSProp-BCEWithLogitsLoss")

## Part 2.4: Generate images with DC-GAN (RMSprop, HingeLoss)

This is DCGAN Version 4 with the following hyperparameters:
- Loss Function: Hinge Loss
- Optimizer: RMSprop with weight decay + gradient clipping
- Learning Rate: 2e-4
- Beta parameters: (0.5, 0.999)

**View the generated images [here.](https://drive.google.com/drive/folders/1dLsHrPqnk2krRTkXqreYNa3vriFB3R-R?usp=drive_link)**

In [None]:
opt_G = torch.optim.RMSprop(G.parameters(), lr=2e-4, weight_decay=1e-5)
opt_D = torch.optim.RMSprop(D.parameters(), lr=2e-4, weight_decay=1e-5)

epochs = 10
z_dim = 100

G.train()
D.train()

ConditionalDiscriminator(
  (label_embed): Embedding(4, 4)
  (disc): Sequential(
    (0): Conv2d(7, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.2)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1))
    (12): Sigmoid()
  )
)

In [None]:
for epoch in range(epochs):
    loop = tqdm(train_loader, desc=f"[RMSprop+Hinge] Epoch {epoch+1}/{epochs}")

    for real_images, labels in loop:
        batch_size = real_images.size(0)
        real_images = real_images.to(device)
        labels = labels.to(device)

        # Train Discriminator
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = G(z, labels)

        D_real = D(real_images, labels)
        D_fake = D(fake_images.detach(), labels)

        # Hinge loss
        loss_D_real = F.relu(1.0 - D_real).mean()
        loss_D_fake = F.relu(1.0 + D_fake).mean()
        loss_D = loss_D_real + loss_D_fake

        D.zero_grad()
        loss_D.backward()
        torch.nn.utils.clip_grad_norm_(D.parameters(), max_norm=1.0)
        opt_D.step()

        # Train Generator
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = G(z, labels)
        D_fake_pred = D(fake_images, labels)

        loss_G = -D_fake_pred.mean()

        G.zero_grad()
        loss_G.backward()
        torch.nn.utils.clip_grad_norm_(G.parameters(), max_norm=1.0)
        opt_G.step()

        loop.set_postfix({
            "D_loss": loss_D.item(),
            "G_loss": loss_G.item()
        })

[RMSprop+Hinge] Epoch 1/10: 100%|██████████| 1272/1272 [05:09<00:00,  4.11it/s, D_loss=1.56, G_loss=-0.066]
[RMSprop+Hinge] Epoch 2/10: 100%|██████████| 1272/1272 [04:50<00:00,  4.38it/s, D_loss=1.34, G_loss=-0.00588]
[RMSprop+Hinge] Epoch 3/10: 100%|██████████| 1272/1272 [04:30<00:00,  4.70it/s, D_loss=1.43, G_loss=-0.0135]
[RMSprop+Hinge] Epoch 4/10: 100%|██████████| 1272/1272 [04:44<00:00,  4.47it/s, D_loss=1.58, G_loss=-0.0176]
[RMSprop+Hinge] Epoch 5/10: 100%|██████████| 1272/1272 [04:31<00:00,  4.68it/s, D_loss=1.19, G_loss=-0.00662]
[RMSprop+Hinge] Epoch 6/10: 100%|██████████| 1272/1272 [04:44<00:00,  4.47it/s, D_loss=1.15, G_loss=-0.281]
[RMSprop+Hinge] Epoch 7/10: 100%|██████████| 1272/1272 [04:26<00:00,  4.77it/s, D_loss=1.26, G_loss=-0.305]
[RMSprop+Hinge] Epoch 8/10: 100%|██████████| 1272/1272 [04:24<00:00,  4.80it/s, D_loss=1.27, G_loss=-0.00401]
[RMSprop+Hinge] Epoch 9/10: 100%|██████████| 1272/1272 [04:33<00:00,  4.64it/s, D_loss=1.37, G_loss=-0.00509]
[RMSprop+Hinge] Ep

In [None]:
label_map = {
    0: "woman_bangs",
    1: "woman_no_bangs",
    2: "man_bangs",
    3: "man_no_bangs"
}

save_generated_images(G, z_dim=100, label_map=label_map, save_dir="/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-RMSProp-HingeLoss")

## Part 3.0: Fréchet Inception Distance (FID) Score

Goal: Measure distributional similarity between generated and real images

TLDR: Lower score is better.

In [36]:
!pip install pytorch-fid

Collecting pytorch-fid
  Downloading pytorch_fid-0.3.0-py3-none-any.whl.metadata (5.3 kB)
Downloading pytorch_fid-0.3.0-py3-none-any.whl (15 kB)
Installing collected packages: pytorch-fid
Successfully installed pytorch-fid-0.3.0


In [None]:
from torchvision import datasets
from torchvision.utils import save_image
import os

In [None]:
# Creating local folders for FID comparison - one for real images, one for each generative model.
!mkdir -p /content/fid_real
!mkdir -p /content/fid_sd
!mkdir -p /content/fid_dcgan_adam_bcewithlogitsloss/
!mkdir -p /content/fid_dcgan_adam_hingeloss/
!mkdir -p /content/fid_dcgan_rmsprop_bcewithlogitsloss/
!mkdir -p /content/fid_dcgan_rmsprop_hingeloss/

In [None]:
''' --> FileURLRetrievalError: Too many users have viewed or downloaded this file recently...
# Pre-processing the real-test images set since Inception-v3 expects 299×299 inputs
celeba_test = datasets.CelebA(root='./data', split='test', target_type='attr',
                              transform=transforms.Compose([
                                  transforms.Resize(299),
                                  transforms.CenterCrop(299),
                                  transforms.ToTensor()
                              ]), download=True)

# Save 1000 sample real images
os.makedirs("/content/fid_real", exist_ok=True)
for i in range(1000):
    img, _ = celeba_test[i]
    save_image(img, f"/content/fid_real/{i}.png")
'''

FileURLRetrievalError: Failed to retrieve file url:

	Too many users have viewed or downloaded this file recently. Please
	try accessing the file again later. If the file you are trying to
	access is particularly large or is shared with many people, it may
	take up to 24 hours to be able to view or download the file. If you
	still can't access a file after 24 hours, contact your domain
	administrator.

You may still be able to access the file from the browser:

	https://drive.google.com/uc?id=0B7EVK8r0v71pZjFTYXZWM3FlRnM

but Gdown can't. Please check connections and permissions.

In [33]:
image_dir = "/content/drive/MyDrive/COGS185-FinalProject/CelebA"

output_dir = "/content/fid_real"
os.makedirs(output_dir, exist_ok=True)

# Inception-prep transform
transform = transforms.Compose([
    transforms.Resize(299),
    transforms.CenterCrop(299),
    transforms.ToTensor()
])

# Loop through and save transformed images
for i, filename in enumerate(sorted(os.listdir(image_dir))[:1000]):
    if filename.endswith((".jpg", ".png")):
        img = Image.open(os.path.join(image_dir, filename)).convert("RGB")
        img = transform(img)
        save_image(img, os.path.join(output_dir, f"{i}.png"))

In [34]:
# Flattening folder structure into single folders for FID calculation.
!cp /content/drive/MyDrive/COGS185-FinalProject/CelebA-StableDiffusion/*/*.png /content/fid_sd/
!cp /content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-Adam-HingeLoss/*/*.png /content/fid_dcgan_adam_hingeloss/
!cp /content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-Adam-BCEWithLogitsLoss/*/*.png /content/fid_dcgan_adam_bcewithlogitsloss/
!cp /content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-RMSProp-HingeLoss/*/*.png /content/fid_dcgan_rmsprop_hingeloss/
!cp /content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-RMSProp-BCEWithLogitsLoss/*/*.png /content/fid_dcgan_rmsprop_bcewithlogitsloss/

In [37]:
print("Stable Diffusion FID: ")
!python -m pytorch_fid /content/fid_real /content/fid_sd

print("DC-GAN with Adam + BCELogitsLoss FID: ")
!python -m pytorch_fid /content/fid_real /content/fid_dcgan_adam_bcewithlogitsloss/

print("DC-GAN with Adam + HingeLoss FID: ")
!python -m pytorch_fid /content/fid_real /content/fid_dcgan_adam_hingeloss/

print("DC-GAN with RMSProp + BCELogitsLoss FID: ")
!python -m pytorch_fid /content/fid_real /content/fid_dcgan_rmsprop_bcewithlogitsloss/

print("DC-GAN with RMSProp + HingeLoss FID: ")
!python -m pytorch_fid /content/fid_real /content/fid_dcgan_rmsprop_hingeloss/

Stable Diffusion FID: 
Downloading: "https://github.com/mseitzer/pytorch-fid/releases/download/fid_weights/pt_inception-2015-12-05-6726825d.pth" to /root/.cache/torch/hub/checkpoints/pt_inception-2015-12-05-6726825d.pth
100% 91.2M/91.2M [00:00<00:00, 127MB/s]
100% 20/20 [00:08<00:00,  2.42it/s]
100% 3/3 [00:03<00:00,  1.29s/it]
FID:  164.5020520887915
DC-GAN with Adam + BCELogitsLoss FID: 
100% 20/20 [00:05<00:00,  3.63it/s]
100% 3/3 [00:00<00:00,  5.00it/s]
FID:  153.09701859305036
DC-GAN with Adam + HingeLoss FID: 
100% 20/20 [00:05<00:00,  3.63it/s]
100% 3/3 [00:00<00:00,  4.96it/s]
FID:  342.9682573608184
DC-GAN with RMSProp + BCELogitsLoss FID: 
100% 20/20 [00:05<00:00,  3.46it/s]
100% 3/3 [00:00<00:00,  4.88it/s]
FID:  169.4088613197128
DC-GAN with RMSProp + HingeLoss FID: 
100% 20/20 [00:06<00:00,  3.05it/s]
100% 3/3 [00:00<00:00,  4.28it/s]
FID:  149.9808134021584


### Results:
- RMSProp + Hinge performed best on FID, its generated images statistically most resembled real CelebA faces.
- Adam + Hinge produced the worst FID by far, suggesting severe mode collapse or poor alignment with the real distribution.
- Stable Diffusion surprisingly didn't outperform all GANs, possibly due to domain mismatch (CelebA ≠ its training distribution).

## Part 3.1: Inception Score

Goal: Quanitfy the balance beteween both image quality and diversity.

TLDR: Higher Score is better

In [10]:
import torch
import torch.nn.functional as F
from torchvision.models import inception_v3
from torchvision import transforms
from torch.utils.data import DataLoader
from PIL import Image
import numpy as np
from scipy.stats import entropy
import os
from tqdm import tqdm

In [11]:
class ImageFolder(torch.utils.data.Dataset):
    def __init__(self, folder):
        self.paths = [os.path.join(folder, fname) for fname in os.listdir(folder) if fname.endswith('.png')]
        self.transform = transforms.Compose([
            transforms.Resize(299),
            transforms.CenterCrop(299),
            transforms.ToTensor(),
            transforms.Normalize([0.5]*3, [0.5]*3),
        ])

    def __len__(self):
        return len(self.paths)

    def __getitem__(self, i):
        img = Image.open(self.paths[i]).convert("RGB")
        return self.transform(img)

def inception_score(imgs, batch_size=32, splits=5):
    N = len(imgs)
    dataloader = torch.utils.data.DataLoader(imgs, batch_size=batch_size)

    inception = inception_v3(pretrained=True, transform_input=False).to(device)
    inception.eval()

    preds = []
    with torch.no_grad():
        for batch in tqdm(dataloader, desc="Computing IS"):
            batch = batch.to(device)
            pred = inception(batch)
            pred = F.softmax(pred, dim=1).cpu().numpy()
            preds.append(pred)

    preds = np.concatenate(preds, axis=0)
    split_scores = []

    for k in range(splits):
        part = preds[k * (N // splits): (k+1) * (N // splits), :]
        py = np.mean(part, axis=0)
        scores = [entropy(pyx, py) for pyx in part]
        split_scores.append(np.exp(np.mean(scores)))

    return np.mean(split_scores), np.std(split_scores)

In [13]:
# For Stable Diffusion
sd_dataset = ImageFolder("/content/fid_sd")
sd_mean, sd_std = inception_score(sd_dataset)
print(f"Stable Diffusion IS: {sd_mean:.2f} ± {sd_std:.2f}")

# For DC-GAN with Adam + BCEWithLogitsLoss
dcgan_dataset = ImageFolder("/content/fid_dcgan_adam_bcewithlogitsloss")
dcgan_mean, dcgan_std = inception_score(dcgan_dataset)
print(f"DC-GAN with Adam + BCEWithLogitsLoss IS: {dcgan_mean:.2f} ± {dcgan_std:.2f}")

# For DC-GAN with Adam + HingeLoss
dcgan_dataset = ImageFolder("/content/fid_dcgan_adam_hingeloss")
dcgan_mean, dcgan_std = inception_score(dcgan_dataset)
print(f"DC-GAN with Adam + HingeLoss IS: {dcgan_mean:.2f} ± {dcgan_std:.2f}")

# For DC-GAN with RMSProp + BCEWithLogitsLoss
dcgan_dataset = ImageFolder("/content/fid_dcgan_rmsprop_bcewithlogitsloss")
dcgan_mean, dcgan_std = inception_score(dcgan_dataset)
print(f"DC-GAN with RMSProp + BCEWithLogitsLoss IS: {dcgan_mean:.2f} ± {dcgan_std:.2f}")

# For DC-GAN with RMSProp + HingeLoss
dcgan_dataset = ImageFolder("/content/fid_dcgan_rmsprop_hingeloss")
dcgan_mean, dcgan_std = inception_score(dcgan_dataset)
print(f"DC-GAN with RMSProp + HingeLoss IS: {dcgan_mean:.2f} ± {dcgan_std:.2f}")

Computing IS: 100%|██████████| 4/4 [00:02<00:00,  1.46it/s]


Stable Diffusion IS: 3.25 ± 0.15


Computing IS: 100%|██████████| 4/4 [00:00<00:00,  5.18it/s]


DC-GAN with Adam + BCEWithLogitsLoss IS: 2.20 ± 0.20


Computing IS: 100%|██████████| 4/4 [00:00<00:00,  4.88it/s]


DC-GAN with Adam + HingeLoss IS: 1.92 ± 0.08


Computing IS: 100%|██████████| 4/4 [00:00<00:00,  4.91it/s]


DC-GAN with RMSProp + BCEWithLogitsLoss IS: 2.05 ± 0.28


Computing IS: 100%|██████████| 4/4 [00:00<00:00,  4.87it/s]


DC-GAN with RMSProp + HingeLoss IS: 2.09 ± 0.07


### Results:
- Stable Diffusion dominates IS, showing higher visual variety and confidence in predicted classes.
- Among DCGANs, Adam + BCE yields the best balance of visual fidelity and variety.
- Again, Adam + Hinge performs worst—confirming it may be unstable.

Stable Diffusion produces higher-quality and more diverse images (per IS), though its outputs are statistically more distinct from CelebA (higher FID), likely due to differing training distribution.

DCGAN, while producing less sharp or varied images, better mimics the distributional footprint of CelebA (lower FID), likely because it learns directly from that dataset.

In [None]:
label_map = {
    "woman_bangs": 1,
    "man_bangs": 1,
    "woman_no_bangs": 0,
    "man_no_bangs": 0
}

In [None]:
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
import os

class BangsDataset(Dataset):
    def __init__(self, root_dir):
        self.paths = []
        self.labels = []
        self.transform = transforms.Compose([
            transforms.Resize(64),
            transforms.CenterCrop(64),
            transforms.ToTensor(),
            transforms.Normalize([0.5]*3, [0.5]*3)
        ])

        for subfolder in os.listdir(root_dir):
            full_path = os.path.join(root_dir, subfolder)
            if os.path.isdir(full_path):
                label = label_map[subfolder]
                for fname in os.listdir(full_path):
                    if fname.endswith(".png"):
                        self.paths.append(os.path.join(full_path, fname))
                        self.labels.append(label)

    def __len__(self):
        return len(self.paths)

    def __getitem__(self, idx):
        image = Image.open(self.paths[idx]).convert("RGB")
        return self.transform(image), torch.tensor(self.labels[idx]).float()

In [None]:
import torch.nn as nn

class BangsCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(3, 64, 4, 2, 1),   # 64x32x32
            nn.BatchNorm2d(64),
            nn.LeakyReLU(0.2),

            nn.Conv2d(64, 128, 4, 2, 1), # 128x16x16
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2),

            nn.Conv2d(128, 256, 4, 2, 1), # 256x8x8
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2),

            nn.Flatten(),
            nn.Linear(256 * 8 * 8, 1)
        )

    def forward(self, x):
        return self.conv(x)

In [None]:
train_dataset = CelebAWithLabels(celeba)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

model = BangsCNN().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=2e-4)
criterion = nn.BCEWithLogitsLoss()

epochs = 5
for epoch in range(epochs):
    model.train()
    total, correct = 0, 0
    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.unsqueeze(1).float().to(device)

        logits = model(imgs)
        loss = criterion(logits, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        preds = (torch.sigmoid(logits) > 0.5).float()
        correct += (preds == labels).sum().item()
        total += labels.size(0)

    acc = 100 * correct / total
    print(f"Epoch {epoch+1}/{epochs} | Train Accuracy: {acc:.2f}%")

Epoch 1/5 | Train Accuracy: 46.23%
Epoch 2/5 | Train Accuracy: 46.20%
Epoch 3/5 | Train Accuracy: 46.27%
Epoch 4/5 | Train Accuracy: 46.28%
Epoch 5/5 | Train Accuracy: 46.32%


In [None]:
from collections import Counter
print(Counter(celeba.targets))

Counter({1: 75612, 3: 62473, 0: 18897, 2: 5788})


In [None]:
def evaluate_on_generated(model, root_dir):
    model.eval()

    all_imgs = []
    all_labels = []

    transform = transforms.Compose([
        transforms.Resize(64),
        transforms.CenterCrop(64),
        transforms.ToTensor(),
        transforms.Normalize([0.5]*3, [0.5]*3)
    ])

    for subfolder in os.listdir(root_dir):
        full_path = os.path.join(root_dir, subfolder)
        if os.path.isdir(full_path) and subfolder in label_map:
            label = label_map[subfolder]
            for fname in os.listdir(full_path):
                if fname.endswith(".png"):
                    img_path = os.path.join(full_path, fname)
                    img = Image.open(img_path).convert("RGB")
                    img_tensor = transform(img)
                    all_imgs.append(img_tensor)
                    all_labels.append(label)

    imgs_tensor = torch.stack(all_imgs)
    labels_tensor = torch.tensor(all_labels).unsqueeze(1).float()

    dataset = torch.utils.data.TensorDataset(imgs_tensor, labels_tensor)
    loader = DataLoader(dataset, batch_size=32)

    correct, total = 0, 0
    with torch.no_grad():
        for imgs, labels in loader:
            imgs, labels = imgs.to(device), labels.to(device)
            logits = model(imgs)
            preds = (torch.sigmoid(logits) > 0.5).float()
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    acc = 100 * correct / total
    return acc

In [None]:
acc_sd = evaluate_on_generated(model, "/content/drive/MyDrive/COGS185-FinalProject/CelebA-StableDiffusion")
acc_dcgan = evaluate_on_generated(model, "/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN")

print(f"Stable Diffusion Bangs Accuracy: {acc_sd:.2f}%")
print(f"DCGAN Bangs Accuracy: {acc_dcgan:.2f}%")


Stable Diffusion Bangs Accuracy: 50.00%
DCGAN Bangs Accuracy: 50.00%


## Part 4: Evalutating with DC-GAN inspired Classifier

In [14]:
model = DCGANClassifier().to(device)
model.load_state_dict(torch.load("dcgan_bangs_classifier.pt"))
model.eval()

FileNotFoundError: [Errno 2] No such file or directory: 'dcgan_bangs_classifier.pt'

In [None]:
label_map = {
    "woman_bangs": 1,
    "man_bangs": 1,
    "woman_no_bangs": 0,
    "man_no_bangs": 0
}

In [None]:
def evaluate_generated_images(image_folder, label_map, classifier, device):
    transform = transforms.Compose([
        transforms.Resize(64),
        transforms.CenterCrop(64),
        transforms.ToTensor(),
        transforms.Normalize([0.5]*3, [0.5]*3)  # → [-1, 1]
    ])

    total = 0
    correct = 0

    for subfolder in os.listdir(image_folder):
        true_label = label_map[subfolder]
        subfolder_path = os.path.join(image_folder, subfolder)
        for filename in os.listdir(subfolder_path):
            if not filename.endswith(".png"): continue
            img_path = os.path.join(subfolder_path, filename)
            img = transform(Image.open(img_path).convert("RGB")).unsqueeze(0).to(device)

            with torch.no_grad():
                output = model(img)
                pred = (torch.sigmoid(output) > 0.5).item()

            correct += (pred == true_label)
            total += 1

    return correct / total

In [None]:
folders = {
    "Adam+BCE": "/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-Adam-BCE",
    "Adam+Hinge": "/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-Adam-Hinge",
    "RMSprop+BCE": "/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-RMSprop-BCE",
    "RMSprop+Hinge": "/content/drive/MyDrive/COGS185-FinalProject/CelebA-DCGAN-RMSprop-Hinge",
    "StableDiffusion": "/content/drive/MyDrive/COGS185-FinalProject/CelebA-StableDiffusion"
}

for name, path in folders.items():
    acc = evaluate_generated_images(path, label_map, model, device)
    print(f"{name} classifier accuracy: {acc * 100:.2f}%")