# Generate synthetic data using GAN

Here's a simple example of how you can implement a generative adversarial network (GAN) using Python and PyTorch to generate synthetic data

In [7]:
!pip install torch torchvision dataloader --quiet

We import the necessary libraries for loading the MNIST dataset. We import the Python modules and libraries necessary. 

- torch is the main PyTorch library for tensor computations and neural network operations. 
- torch.nn provides classes and functions for building neural networks. 
- torchvision.datasets contains popular datasets like MNIST 
- torchvision.transforms provides utilities for data transformations.



In [8]:
import torch
import torch.nn as nn
import torchvision.datasets as datasets
import torchvision.transforms as transforms

This block defines the Generator network as a PyTorch module. 

The __init__ method initializes the network with the specified latent_dim (size of the random noise vector) and output_dim (size of the output data, e.g., 784 for MNIST images). 

The Generator network is a simple feed-forward neural network with two linear layers, a LeakyReLU activation, and a Tanh activation at the output. The forward method takes a random noise vector z as input and passes it through the network to generate synthetic data.

In [9]:
# Define the Generator Network
class Generator(nn.Module):
    def __init__(self, latent_dim, output_dim):
        super(Generator, self).__init__()
        self.latent_dim = latent_dim
        self.output_dim = output_dim

        self.generator = nn.Sequential(
            nn.Linear(latent_dim, 128),
            nn.LeakyReLU(0.2),
            nn.Linear(128, output_dim),
            nn.Tanh()
        )

    def forward(self, z):
        return self.generator(z)

This block defines the Discriminator network as a PyTorch module. 

The __init__ method initializes the network with the input_dim (size of the input data). 

The Discriminator network is also a simple feed-forward neural network with two linear layers, a LeakyReLU activation, and a Sigmoid activation at the output. The forward method takes input data x (either real or synthetic) and passes it through the network to produce a scalar output between 0 and 1, representing the probability that the input data is real.

In [10]:
# Define the Discriminator Network
class Discriminator(nn.Module):
    def __init__(self, input_dim):
        super(Discriminator, self).__init__()
        self.input_dim = input_dim

        self.discriminator = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.LeakyReLU(0.2),
            nn.Linear(128, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.discriminator(x)


Set the hyperparameters for the GAN: latent_dim is the size of the random noise vector, and output_dim is the size of the output data (784 for MNIST images, which are 28x28 pixels). 

In [11]:
# Hyperparameters
latent_dim = 64
output_dim = 784  # For MNIST dataset

In [12]:
# Initialize the networks
generator = Generator(latent_dim, output_dim)
discriminator = Discriminator(output_dim)


The loss function and optimizers for training the GAN:

- nn.BCELoss is the Binary Cross-Entropy loss, which is typically used for binary classification tasks like discriminating between real and fake data. 

- The Adam optimizer is used for both the Generator and Discriminator networks, with a learning rate of 0.0002.

In [13]:
# Loss functions and optimizers
criterion = nn.BCELoss()
gen_optimizer = torch.optim.Adam(generator.parameters(), lr=0.0002)
disc_optimizer = torch.optim.Adam(discriminator.parameters(), lr=0.0002)

First, a transformation is defined to convert the images to PyTorch tensors and normalize them to have a mean of 0.5 and a standard deviation of 0.5. 



In [14]:
# Load the MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

The datasets.MNIST function downloads and loads the MNIST dataset from the specified root directory (here, ./data). 
The train=True argument indicates that we want to load the training set. The download=True argument ensures that the dataset is downloaded if it is not already present in the root directory. The dataloader is created from the train_dataset with a batch size of 128 and shuffling enabled.

In [16]:
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True)


The training loop runs for num_epochs iterations (50 in this case). In each epoch, the dataloader is iterated over to obtain batches of real MNIST images (real_data).

Train the Discriminator:

- Reset the Discriminator's optimizer gradients.
- Create labels for real and fake data.
- Pass real data through the Discriminator to get its output.
- Calculate loss comparing Discriminator's output with real labels.
- Generate random noise for fake data.
- Generate fake data using the Generator.
- Pass fake data through the Discriminator to get its output.
- Calculate loss comparing Discriminator's output with fake labels.
- Calculate total loss as the average of real and fake losses.
- Backpropagate the loss and update Discriminator's weights.

Train the Generator:

- Reset the Generator's optimizer gradients.
- Generate new random noise.
- Generate fake data using the Generator.
- Pass fake data through the Discriminator to get its output.
- Calculate loss comparing Discriminator's output with real labels.
- Backpropagate the loss and update Generator's weights.

In [19]:
# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    for i, (real_data, _) in enumerate(dataloader):
        # Train the discriminator
        disc_optimizer.zero_grad()
        real_label = torch.ones(real_data.size(0), 1)
        fake_label = torch.zeros(real_data.size(0), 1)
        real_output = discriminator(real_data.view(-1, output_dim))
        real_loss = criterion(real_output, real_label)
        z = torch.randn(real_data.size(0), latent_dim)
        fake_data = generator(z)
        fake_output = discriminator(fake_data)
        fake_loss = criterion(fake_output, fake_label)
        disc_loss = (real_loss + fake_loss) / 2
        disc_loss.backward()
        disc_optimizer.step()
        # Train the generator
        gen_optimizer.zero_grad()
        z = torch.randn(real_data.size(0), latent_dim)
        fake_data = generator(z)
        fake_output = discriminator(fake_data)
        gen_loss = criterion(fake_output, real_label)
        gen_loss.backward()
        gen_optimizer.step()

    # Print losses and generate synthetic data
    print(f'Epoch [{epoch+1}/{num_epochs}], Generator Loss: {gen_loss.item():.4f}, Discriminator Loss: {disc_loss.item():.4f}')
    with torch.no_grad():
        z = torch.randn(16, latent_dim)
        synthetic_data = generator(z).view(-1, 1, 28, 28)
        # Save or visualize the generated synthetic data

Epoch [1/50], Generator Loss: 1.4403, Discriminator Loss: 0.4601
Epoch [2/50], Generator Loss: 1.6008, Discriminator Loss: 0.4184
Epoch [3/50], Generator Loss: 1.8119, Discriminator Loss: 0.2253
Epoch [4/50], Generator Loss: 0.9599, Discriminator Loss: 0.5667
Epoch [5/50], Generator Loss: 0.9471, Discriminator Loss: 0.6062
Epoch [6/50], Generator Loss: 0.8735, Discriminator Loss: 0.6687
Epoch [7/50], Generator Loss: 1.2985, Discriminator Loss: 0.5231
Epoch [8/50], Generator Loss: 1.0226, Discriminator Loss: 0.5434
Epoch [9/50], Generator Loss: 1.2984, Discriminator Loss: 0.4454
Epoch [10/50], Generator Loss: 1.6485, Discriminator Loss: 0.3565
Epoch [11/50], Generator Loss: 1.4496, Discriminator Loss: 0.4481
Epoch [12/50], Generator Loss: 1.2997, Discriminator Loss: 0.4426
Epoch [13/50], Generator Loss: 1.4082, Discriminator Loss: 0.4128
Epoch [14/50], Generator Loss: 1.1078, Discriminator Loss: 0.5760
Epoch [15/50], Generator Loss: 1.3003, Discriminator Loss: 0.4328
Epoch [16/50], Gene