<a href="https://colab.research.google.com/github/RDGopal/IB9AU-2026/blob/main/SD3_GAN_Illustration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a class of AI algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework.

### How GANs Work:

GANs consist of two main components:
1.  **Generator (G)**: This network learns to create new data instances that resemble the training data. It takes random noise as input and outputs synthetic data (e.g., images).
2.  **Discriminator (D)**: This network learns to distinguish between real data (from the training set) and fake data (generated by the Generator). It acts as a binary classifier.

The training process is adversarial:
*   The Generator tries to fool the Discriminator by producing increasingly realistic fake data.
*   The Discriminator tries to get better at identifying fake data.

This continuous competition drives both networks to improve, resulting in a Generator that can produce highly realistic data.

### GANs vs. VAEs (Variational Autoencoders)

Both GANs and VAEs are generative models capable of creating new data, but they achieve this through different mechanisms and have distinct characteristics:

| Feature           | GANs                                      | VAEs                                          |
| :---------------- | :---------------------------------------- | :-------------------------------------------- |
| **Architecture**  | Two competing networks: Generator & Discriminator | Encoder-Decoder architecture                  |
| **Training**      | Adversarial (minimax game)                | Maximizes evidence lower bound (ELBO)         |
| **Output Quality**| Can produce very sharp, realistic samples | Often produces blurrier samples               |
| **Latent Space**  | Implicit, often unstructured              | Explicit, structured (Gaussian distribution)  |
| **Sampling**      | Straightforward sampling from noise       | Sampling from the learned latent distribution |
| **Control**       | Less direct control over generated features | More amenable to controlling features via latent space manipulation |
| **Mode Collapse** | Prone to mode collapse (generating limited variety) | Less prone to mode collapse                 |

In essence, GANs prioritize generating highly realistic samples by pitting two networks against each other, while VAEs focus on learning a structured and continuous latent representation of the data, which can be useful for tasks like interpolation and reconstruction, even if their generated samples are sometimes less sharp than those from GANs.

In [None]:
%%capture
!pip install torch torchvision matplotlib

### Hyperparameters and Data Loading

This section initializes key parameters for the GAN and loads the MNIST dataset. The `latent_dim` is the size of the noise vector fed into the generator. `lr` is the learning rate for the optimizers, `batch_size` determines how many samples are processed at once, and `epochs` specifies the number of training cycles over the entire dataset. The MNIST dataset, consisting of handwritten digits, is loaded and transformed into tensors suitable for PyTorch models.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

# Hyperparameters
latent_dim = 100
lr = 0.0002
batch_size = 64
epochs = 5

# MNIST Data Loader
transform = transforms.ToTensor()
dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

### Generator and Discriminator Network Architectures

Here, we define the two core neural networks of the GAN:

*   **Generator (`Generator`)**: This network takes a `latent_dim` (100 in this case) random noise vector as input and transforms it through several linear layers with `LeakyReLU` activations. The final layer uses `Tanh` to output pixel values in the range [-1, 1], which is common for image generation to match the possible range of normalized pixel values. The output is then reshaped to `(batch_size, 1, 28, 28)` to represent 28x28 grayscale images.

*   **Discriminator (`Discriminator`)**: This network takes a flattened 28x28 image (784 pixels) as input. It uses linear layers with `LeakyReLU` activations and `Dropout` layers to prevent overfitting. The final layer uses `Sigmoid` activation to output a single probability score between 0 and 1, indicating whether the input image is real (closer to 1) or fake (closer to 0).

After defining the networks, they are initialized and moved to the appropriate device (GPU if available, otherwise CPU). `BCELoss` (Binary Cross-Entropy Loss) is chosen as the loss function, and `Adam` optimizers are set up for both the Generator and Discriminator.

In [None]:
# Generator Network
class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 784),
            nn.Tanh()  # Output in [-1,1], match MNIST [0,1] after denormalization
        )
    def forward(self, z):
        return self.model(z).view(-1, 1, 28, 28)

# Discriminator Network
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 512),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.3),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.3),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )
    def forward(self, img):
        img_flat = img.view(-1, 784)
        return self.model(img_flat)

# Initialize Networks
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
G = Generator().to(device)
D = Discriminator().to(device)

# Loss and Optimizers
criterion = nn.BCELoss()
optimizer_G = optim.Adam(G.parameters(), lr=lr)
optimizer_D = optim.Adam(D.parameters(), lr=lr)

### Training Loop

This is the core training logic of the GAN, executed over a specified number of epochs. Within each epoch, the model iterates through the `dataloader` to get batches of real images. The training proceeds in two main phases for each batch:

1.  **Train Discriminator**:
    *   First, the Discriminator is updated to correctly classify real images as real and fake images (generated by the current Generator) as fake.
    *   Random noise is fed to the Generator to create `fake_imgs`. Crucially, `.detach()` is used to prevent gradients from flowing back to the Generator during this phase, ensuring only the Discriminator learns.
    *   The Discriminator's loss is calculated using `BCELoss` for both real and fake images, and the average is backpropagated to update its weights.

2.  **Train Generator**:
    *   Next, the Generator is updated to produce images that can fool the Discriminator.
    *   New random noise is used to generate `fake_imgs`.
    *   The Discriminator's output for these fake images is then compared against `real_labels` (ones). This "tricks" the Discriminator into classifying fake images as real, and the loss is backpropagated to update only the Generator's weights, making it generate more convincing fakes.

The print statement after each epoch shows the current losses for both the Discriminator (D) and Generator (G), indicating their training progress.

In [None]:
# Training Loop
for epoch in range(epochs):
    for i, (real_imgs, _) in enumerate(dataloader):
        real_imgs = real_imgs.to(device)
        batch_size = real_imgs.size(0)

        # Train Discriminator
        z = torch.randn(batch_size, latent_dim).to(device)
        fake_imgs = G(z).detach()  # Detach to avoid training G here
        real_labels = torch.ones(batch_size, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1).to(device)

        outputs_real = D(real_imgs)
        loss_real = criterion(outputs_real, real_labels)

        outputs_fake = D(fake_imgs)
        loss_fake = criterion(outputs_fake, fake_labels)

        loss_D = (loss_real + loss_fake) / 2
        optimizer_D.zero_grad()
        loss_D.backward()
        optimizer_D.step()

        # Train Generator
        z = torch.randn(batch_size, latent_dim).to(device)
        fake_imgs = G(z)
        outputs = D(fake_imgs)
        loss_G = criterion(outputs, real_labels)  # Trick D into thinking fake is real

        optimizer_G.zero_grad()
        loss_G.backward()
        optimizer_G.step()

    print(f"Epoch [{epoch+1}/{epochs}] | Loss D: {loss_D.item():.4f} | Loss G: {loss_G.item():.4f}")

### Generate and Visualize Samples

After training, this section demonstrates the Generator's ability to create new images.

*   A batch of 16 random noise vectors (`z`) is created.
*   These vectors are fed into the trained Generator (`G`), which outputs 16 synthetic images.
*   `.detach().cpu()` moves the generated images from the GPU (if used) to the CPU and detaches them from the computation graph, making them regular tensors suitable for visualization.
*   Finally, `matplotlib` is used to display these 16 generated images in a 4x4 grid. Each image is desqueezed (removing the channel dimension) and shown in grayscale. This allows us to visually inspect the quality and diversity of the images the GAN has learned to produce.

In [None]:
# Generate and Visualize Samples
z = torch.randn(16, latent_dim).to(device)
generated = G(z).detach().cpu()
fig, axes = plt.subplots(4, 4, figsize=(8,8))
for i, ax in enumerate(axes.flat):
    ax.imshow(generated[i].squeeze(), cmap='gray')
    ax.axis('off')
plt.show()

#Required Task 12
Download the zip file `floorplans_v2-20251223T170650Z-3-001.zip` which contains a large sample of floorplan images. Your task is to train a GAN model based on these images. Train the model on the floorplan images and create code to generate new synthetic floorplans.

##Optional Task
The current Generative Adversarial Network (GAN) you've implemented is an unconditional GAN. This means it learns to generate images that resemble the training data (MNIST digits) but doesn't have a mechanism to control which digit it generates.

To generate images for a chosen digit, like '7', you would need to implement a Conditional Generative Adversarial Network (CGAN). In a CGAN, the class label (e.g., the digit '7') is provided as an input to both the Generator and the Discriminator during training. This conditions the generation process on the desired output.

Give it a shot - try to develop a CGAN!