<a href="https://colab.research.google.com/github/babupallam/PyTorch-Learning-Repository/blob/main/08_Generative_Models_(GANs%2C_VAEs).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


#### **8.1. Introduction to Generative Models**
- Generative models learn the joint probability distribution of the input data, allowing them to generate new samples from the learned distribution.
- They can be broadly classified into two categories:
  - **Implicit Models**: Learn to generate data directly from the data distribution (e.g., GANs).
  - **Explicit Models**: Learn to model the data distribution directly (e.g., VAEs).

---



#### **8.2. Generative Adversarial Networks (GANs)**
- **GANs** are a class of generative models introduced by Ian Goodfellow and his colleagues in 2014. They consist of two neural networks: the **generator** and the **discriminator**.
  
**Key Components of GANs**:
1. **Generator**: This network generates new data samples from random noise. Its goal is to produce samples that are indistinguishable from real data.
2. **Discriminator**: This network evaluates the generated samples and determines whether they are real (from the dataset) or fake (generated by the generator).
3. **Adversarial Training**: The generator and discriminator are trained simultaneously in a game-theoretic setup where the generator tries to fool the discriminator while the discriminator tries to correctly identify real and fake samples.

---

**8.2.1. GAN Training Process**
- The training process involves:
  1. Sampling random noise and generating a fake sample using the generator.
  2. Sampling real data from the dataset.
  3. Training the discriminator on both real and fake samples to improve its ability to distinguish between them.
  4. Training the generator to improve its ability to create realistic samples that can fool the discriminator.
  
This process continues until the generator produces samples that are indistinguishable from real data.

---



#### **8.3. Building a GAN**
We will now implement a simple GAN to generate images from the MNIST dataset, which consists of handwritten digits.

---

**8.3.1. Example: Building and Training a GAN**

In [1]:
import torch
import torch.nn as nn  # Neural network modules
import torch.optim as optim  # Optimization algorithms like Adam
import torchvision  # PyTorch's computer vision package
import torchvision.transforms as transforms  # Tools for transforming data
import matplotlib.pyplot as plt  # Used for visualizing generated images

# Step 1: Define the generator network
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),  # Input is a noise vector of size 100 (latent space)
            nn.ReLU(),  # Activation function: ReLU
            nn.Linear(256, 512),  # Hidden layer with 512 units
            nn.ReLU(),
            nn.Linear(512, 1024),  # Hidden layer with 1024 units
            nn.ReLU(),
            nn.Linear(1024, 28 * 28),  # Output size is 28x28 (MNIST image flattened)
            nn.Tanh()  # Tanh activation scales output to range [-1, 1], which matches the normalized input data
        )

    def forward(self, z):
        return self.model(z).view(-1, 1, 28, 28)  # Reshape the output to match the image size (batch_size, 1, 28, 28)


In [2]:

# Step 2: Define the discriminator network
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28 * 28, 1024),  # Input is a flattened 28x28 image
            nn.LeakyReLU(0.2),  # Leaky ReLU activation with slope 0.2 for negative values
            nn.Linear(1024, 512),  # Hidden layer with 512 units
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),  # Hidden layer with 256 units
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),  # Output layer produces a single score (real or fake)
            nn.Sigmoid()  # Sigmoid activation to output probability (between 0 and 1)
        )

    def forward(self, x):
        return self.model(x.view(-1, 28 * 28))  # Flatten input image for fully connected layers


In [3]:

# Step 3: Initialize the generator and discriminator networks
generator = Generator()  # Instantiate the generator
discriminator = Discriminator()  # Instantiate the discriminator


In [4]:

# Step 4: Define loss function and optimizers
criterion = nn.BCELoss()  # Binary cross-entropy loss for classification (real/fake)
optimizer_G = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))  # Optimizer for generator
optimizer_D = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))  # Optimizer for discriminator


In [5]:

# Step 5: Load the MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert images to tensors
    transforms.Normalize((0.5,), (0.5,))  # Normalize images to range [-1, 1] to match the output of the generator (Tanh)
])
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)  # DataLoader for batching and shuffling


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 53032137.94it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 1845342.97it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 14635413.34it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 3715726.31it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [7]:

# Step 6: Train the GAN
num_epochs = 5  # Number of epochs for training
for epoch in range(num_epochs):
    for i, (real_images, _) in enumerate(train_loader):  # Loop over batches of real images

        # Create labels for real and fake images
        real_labels = torch.ones(real_images.size(0), 1)  # Real images have label 1
        fake_labels = torch.zeros(real_images.size(0), 1)  # Fake images have label 0

        # Step 6.1: Train the discriminator
        optimizer_D.zero_grad()  # Zero the gradients of the discriminator

        # Forward pass for real images
        outputs = discriminator(real_images)  # Get discriminator's prediction on real images
        d_loss_real = criterion(outputs, real_labels)  # Calculate loss for real images
        d_loss_real.backward()  # Backpropagate the loss for real images

        # Generate fake images
        noise = torch.randn(real_images.size(0), 100)  # Generate noise vectors (latent space) of size 100
        fake_images = generator(noise)  # Use generator to create fake images from noise

        # Forward pass for fake images
        outputs = discriminator(fake_images.detach())  # Get discriminator's prediction on fake images (detach to avoid updating generator during this pass)
        d_loss_fake = criterion(outputs, fake_labels)  # Calculate loss for fake images
        d_loss_fake.backward()  # Backpropagate the loss for fake images

        optimizer_D.step()  # Update the discriminator's weights

        # Step 6.2: Train the generator
        optimizer_G.zero_grad()  # Zero the gradients of the generator

        # Forward pass for fake images (generator wants discriminator to classify them as real)
        outputs = discriminator(fake_images)  # Get discriminator's prediction for fake images
        g_loss = criterion(outputs, real_labels)  # Generator's goal is to fool the discriminator, so we use real labels for fake images
        g_loss.backward()  # Backpropagate the loss for the generator

        optimizer_G.step()  # Update the generator's weights

    # Print the loss after each epoch
    print(f'Epoch [{epoch+1}/{num_epochs}], D Loss: {d_loss_real.item() + d_loss_fake.item():.4f}, G Loss: {g_loss.item():.4f}')

    # Step 7: Generate and save images at intervals
    if (epoch + 1) % 10 == 0:
        with torch.no_grad():  # Disable gradient calculation for generating images
            sample_noise = torch.randn(16, 100)  # Generate a batch of 16 noise vectors
            generated_images = generator(sample_noise)  # Generate images from the noise
            generated_images = generated_images.view(-1, 1, 28, 28)  # Reshape the images to 28x28 size

            # Create a grid of generated images
            grid = torchvision.utils.make_grid(generated_images, nrow=4, normalize=True)
            plt.imshow(grid.permute(1, 2, 0).numpy())  # Permute dimensions to match the format for plotting (HWC)
            plt.title(f'Epoch {epoch + 1}')
            plt.axis('off')  # Remove axis labels
            plt.show()  # Display the generated images


Epoch [1/5], D Loss: 0.3720, G Loss: 2.5657
Epoch [2/5], D Loss: 0.4795, G Loss: 2.1411
Epoch [3/5], D Loss: 0.8254, G Loss: 2.4208
Epoch [4/5], D Loss: 0.6313, G Loss: 1.7668
Epoch [5/5], D Loss: 0.9390, G Loss: 1.6317



**Explanation**:
- **Generator Network**: A simple feedforward neural network that takes a noise vector and generates a 28x28 image.
- **Discriminator Network**: A neural network that takes an image (real or fake) and outputs a probability indicating whether the image is real.
- **Training Loop**:
  - The discriminator is trained to differentiate between real and fake images.
  - The generator is trained to produce images that can fool the discriminator.
  - Losses are calculated using Binary Cross Entropy, and the networks are optimized using the Adam optimizer.
- **Sample Generation**: Every 10 epochs, the model generates and displays sample images produced by the generator.

---



#### **8.4. Variational Autoencoders (VAEs)**
- **Variational Autoencoders (VAEs)** are another class of generative models that learn to encode input data into a latent space and then decode it back to the original space.
- VAEs are designed to generate new data points by sampling from the learned latent space, capturing the distribution of the input data.

**Key Components of VAEs**:
1. **Encoder**: Maps input data to a latent representation. It outputs the parameters of the latent distribution (mean and variance).
2. **Latent Space**: A compressed representation of the input data, which captures the essential features of the input.
3. **Decoder**: Reconstructs data from the latent representation. It generates new samples by sampling from the latent space and passing it through the decoder.

---

**8.4.1. VAE Training Process**
- The training process for a VAE involves two main losses:
  1. **Reconstruction Loss**: Measures how well the decoder reconstructs the input from the latent space. This is often calculated using mean squared error or binary cross-entropy.
  2. **Kullback-Leibler Divergence (KL Divergence)**: Measures how much the learned latent distribution deviates from a standard normal distribution. This encourages the model to learn a more structured latent space.

The overall loss function combines both reconstruction loss and KL divergence.

---



#### **8.6. Applications of GANs and VAEs**
- **Image Generation**: GANs are widely used to generate realistic images, such as faces or artworks, and can create high-quality images that are indistinguishable from real ones.
- **Data Augmentation**: Both GANs and VAEs can be used to augment datasets by generating additional synthetic data, which is particularly useful in scenarios with limited data.
- **Text and Audio Generation**: VAEs have been applied to generate sequences in NLP and to synthesize audio signals, producing realistic speech or music.
- **Style Transfer**: GANs and VAEs are employed in tasks like style transfer, where the model learns to apply the style of one image to the content of another.

---



#### **8.7. Observations on Generative Models**
- **GANs**:
  - **Training Stability**: Training GANs can be tricky; if the generator becomes too good too quickly, the discriminator may not learn effectively. Techniques like gradient penalty and historical averaging have been proposed to stabilize training.
  - **Mode Collapse**: A common problem where the generator learns to produce a limited variety of outputs. Various techniques, such as mini-batch discrimination, can mitigate this issue.

- **VAEs**:
  - **Latent Space Representation**: VAEs provide a well-structured latent space, which allows for meaningful interpolation between data points.
  - **Continuous Generation**: The continuous nature of the latent space in VAEs enables smooth transitions between generated samples, making them suitable for tasks requiring variability.

---



### Continuity to the Next Section
- In the next section, we will explore **Reinforcement Learning**, a powerful paradigm where agents learn to make decisions by interacting with an environment. We will cover key concepts, algorithms, and applications in various domains.

This section provided an in-depth overview of generative models, including GANs and VAEs. We discussed their architectures, training processes, and applications in image and data generation. These models represent a significant advancement in machine learning, enabling the creation of realistic and complex data distributions.