# 7b - Generative Adversarial Networks (GANs) in 2D

The goal of a Generative Adversarial Networks (GANs) is to learn a **generative model**, where
1. the low-dimensional latent space distribution $p$ is explicitely fixed;
2. the quality of the generated samples is evaluated by an auxilliary model (classification network) driving the optimization of the generative model (decoder network).

This is in contrast to **Variational Auto-Encoders (VAEs)** seen previously where the latent representation is optimized to during training to match the desired distribution, and the quality of the generated samples is evaluated by the reconstruction loss that is agnostic to the data distribution.

---

Reference: "Generative adversarial Networks", Goodfellow, Ian J and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua. In Advances in neural information processing systems (NeuRIPS) 2014.

julien (dot) rabin @ greyc.ensicaen.fr 2025

## Quick recap on GANs

### Architecture
A GAN is a type of neural network that consists of two main components: a generative network (a decoder called **generator G**) used exclusively during inference, and a discriminative network (a **classifier**) used only during training. The generative network learns to generate new data samples that resemble the training data, while the discriminative network learns to distinguish between real data samples and those generated by the generative network: 

- The generator $G : z \in \mathbb R^{D} \to y = G(z) \in \mathbb R^{d} $ maps points in the latent space to the original data space. 
- The discriminator $D : x \in \mathbb R^{d} \to y = D(x) \in [0,1] $ 
maps the input data to a scalar value between 0 and 1, representing the probability that the input $x$ is a real data sample (from the training distribution $p_{\text{data}}$) rather than a generated sample from the distribution $p_G := G_\# p$.

Like many other generative model, GAN typically by impose that the latent variables follow a Gaussian distribution : $z \sim \mathcal N(0, I_D)$ as it is easy to sample from (as seen earlier with GMM and VAE).

The key idea of GAN is to train the generator and discriminator in a **minimax game**:
- The generator tries to minimize the probability of the discriminator correctly identifying generated samples as fake, i.e., it tries to maximize the discriminator's error.
- The discriminator tries to maximize its ability to correctly classify real and generated samples, i.e., it tries to minimize its error.

### Training

Contrary to VAEs and GMMs, GANs are not based on a probabilistic model optimized by maximizing the likelihood of the data. Instead, they are trained using a minimax game where the generator and discriminator are trained jointly in an adversarial manner.

To the point of view of the discriminator, the problem is similar to binary classification problem
where the criterion is the **binary cross-entropy loss** between the predicted probabilities $p_i = D(x_i) \in [0,1]$ and the true labels $y_i \in \{0,1\}$.
Though, contrary to a conventional classification problem, this problem does not require supervision: labels can be automatically defined (e.g. 1 for real samples: $x \sim p_{\text{data}}$, 0 for generated samples $x \sim p_{\text{G}}$) which makes it an **unsupervised learning** problem:
$$  
   \min \mathcal L (D) := - \mathbb E_{x \sim p_{\text{data}}} \left[ \log D(x) \right] - \mathbb E_{x' \sim p_G} \left[ \log (1 - D(x')) \right]
$$

Now, considering the fact that the distribution of generated samples $p_G = G_\# p$ is parametrized by the generator $G$ (the prior latent distribution $p$ is fixed), we can rewrite the loss function as a function of the generator parameters, which role is to maximize the discriminator's error:
$$  
     \min_{D} \max_{G} \mathcal L (G,D) = - \mathbb E_{x \sim p_{\text{data}}} \left[ \log D(x) \right] - \mathbb E_{z \sim p} \left[ \log (1 - D(G(z))) \right]
$$
This way, the quality of the generated samples is automatically evaluated by the discriminator, which drives the optimization of the generator.

To train the GAN, we alternate between two steps:
1. **Train the discriminator**: Update the discriminator parameters $D$ by minimizing the loss $ \mathcal L (D) $.

*Note that the discriminator should have good performance to be relevant, so it is often trained for several iterations before updating the generator.*

2. **Train the generator**: Update the generator parameters $G$ by maximizing the loss $\mathcal L (G,D)$ which reduces to minimizing the negative log-likelihood of the generated samples being classified as fake by the discriminator:
$\mathbb E_{z \sim p} \left[ \log (1 - D(G(z))) \right] $

*Note that only randomly generated samples are now required to optimize the generator !*


### Implementation
Here we implement a simple GAN generator and discriminator based on MLPs using PyTorch.

First we review the training of a simple MLP classifier to discriminate between the data distribution and an untrained generative model.
Then, we train the generator in adversarial manner.


### useful imports, definitions and data setup

In [None]:
import torch 
from torch import nn, Tensor

import numpy as np

import matplotlib.pyplot as plt

from tqdm import tqdm

Data loader

In [None]:
def sample_data(n: int, model = 'moons') -> Tensor:
    data_dim = 2  # Dimension of the data
    if model == 'moons':
        from sklearn.datasets import make_moons
        return Tensor(make_moons(n_samples=n, noise=0.05)[0])
    elif model == 'circles':
        from sklearn.datasets import make_circles
        return Tensor(make_circles(n_samples=n, noise=0.05, factor=0.5)[0])
    elif model == '2gmm':
        n_samples = n
        n = n//3
        X1 = torch.randn(n, data_dim) @ torch.tensor([[.05, -0.02],[-0.02, .4]]) + torch.tensor([[.0, 1.0]]).view(1, data_dim)
        X2 = torch.randn(n_samples - n, data_dim) @ torch.tensor([[.3, 0.05],[0.05, .05]]) + torch.tensor([[-1.0, 0.]]).view(1, data_dim)
        return torch.cat((X1,X2), dim=0)  # Concatenate the two sets of samples
    elif model == 'radial_gmm':
        K = 8
        n_samples = n
        samples_per_component = n_samples // K
        remainder = n_samples % K
        all_samples = []

        for k in range(K):
            radius = 3.
            # Angle for the mean on a circle
            theta = 2 * np.pi * k / K
            cs = np.cos(theta)
            sn = np.sin(theta)

            # Radial direction unit vector
            radial = torch.tensor([cs, sn], dtype=float)#.view(data_dim, 1)
            tangential = torch.tensor([-sn, cs], dtype=float)#.view(data_dim, 1)

            mean = radius * radial
            
            # Covariance matrix: elongated along radial direction
            cov = 0.3 * torch.outer(radial, radial).to(float) + 0.05 * torch.outer(tangential, tangential).to(float)

            # Generate samples
            n = samples_per_component + (remainder if k == K else 0)
            samples = torch.randn(n, data_dim, dtype=float) @ torch.linalg.cholesky(cov).to(float) + mean.to(float)
            all_samples.append(samples)

        return torch.cat(all_samples, dim=0).to(torch.float32)
    else:
        raise ValueError(f"Unknown model: {model}.")


Define useful plot functions

In [None]:
def plot_data_comparison(x_data : np.ndarray, x_model : np.ndarray, colors_data = None, colors_model = None):
    if colors_data is None:
        colors_data = 'C0' # blue
        colors_model = 'C1' # orange
        
    fig, axes = plt.subplots(1, 2, figsize=(20, 10), sharex=True, sharey=True)

    # Original data
    axes[0].scatter(x_data[:, 0], x_data[:, 1], s=10, c=colors_data)
    axes[0].set_title('Data Samples')
    axes[0].set_xlim(-3.0, 3.0)
    axes[0].set_ylim(-3.0, 3.0)

    # GAN samples
    axes[1].scatter(x_model[:, 0], x_model[:, 1], s=10, c=colors_model)
    axes[1].set_title('GAN Output Samples')
    axes[1].set_xlim(-3.0, 3.0)
    axes[1].set_ylim(-3.0, 3.0)

    # Global title
    fig.suptitle("Original vs Model samples", fontsize=24)
    plt.tight_layout()
    plt.subplots_adjust(top=0.88)

    return fig, axes


In [None]:
def define_colors(x_data : np.ndarray, x_label : np.ndarray) -> np.ndarray:
    cmap = plt.get_cmap("viridis")
    values = x_label  # Use the label for coloring
    values = (values - values.min()) / (values.max() - values.min())
    colors = cmap(values)
    return colors[:, :3]

# Exercice 1 : definition of the Neural Networks Architecture

Complete the following code to define the generator and discriminator architectures as simple MLPs with hidden layers.

Recall that the discriminator outputs 
- either a a scalar probability in [0,1], which can be obtained using a sigmoid activation in the last layer. Such a classifier can be trained using the binary cross-entropy loss : `nn.BCELoss()`
- or a logit (real value) without any activation in the last layer. Such a classifier can be trained using the binary cross-entropy with logits loss : `nn.BCEWithLogitsLoss()`

In this lab, we will use the **second option**.

In [None]:
class GAN(nn.Module):
    def __init__(self, dim: int = 2, h: int = 64, latent_dim: int = 2):
        super().__init__()
        
        self.lat_dim = latent_dim
        
        # Classifier / Critique
        self.discriminator = ...
        
        # Decoder
        self.generator = ...
    
    def discriminate(self, x: Tensor) -> Tensor:
        # Note : input x is a batch of fake or real points in the data space of shape (batch_size, dim)
        return self.discriminator(x)
    
    def sample(self, n: int) -> Tensor:
        # Sample from the latent space distribution N(0, I)
        z = ...
        y = self.generator(z)
        return y
    

Test the (untrained) GAN : generate random samples

In [None]:
# Random GAN model
latent_dim = 1
model = GAN(dim=2, h=16, latent_dim=latent_dim)

In [None]:
# Number of trainable parameters
for name, net in [("Generator",model.generator), ('Discriminator', model.discriminator)]:
    print(f"Network: {name}")
    n_params = sum(p.numel() for p in net.parameters() if p.requires_grad)
    print(f"Number of trainable parameters in {net.__class__.__name__}: {n_params}")

In [None]:
torch.manual_seed(42)
n = 1_000  # Number of samples
dataset_name = 'moons' # moons, circles, 2gmm, radial_gmm

# Generate synthetic data
x = ...

# Generate samples from the untrained GAN
y = ...

# use the discriminator logits to label the samples
x_label = nn.Sigmoid()(model.discriminate(x)).detach().numpy() # sigmoid is used to get probabilities in [0,1]
y_label = nn.Sigmoid()(model.discriminate(y)).detach().numpy()

x_colors = define_colors(x.numpy(), x_label)  
y_colors = define_colors(y.numpy(), y_label)  
fig, axes = plot_data_comparison(x,y, x_colors, y_colors)

fig.suptitle("*UNTRAINED* GAN generation of random samples", fontsize=24)
fig.colorbar(axes[0].collections[0], ax=axes[0], label='Data Probability', orientation='vertical')

As we are dealing with a 2D classifier (discriminator), it is interesting to visualize its decision boundary (before, after but also during training !).

In [None]:
def plot_GAN_critic (gan : GAN) : # plot training data + GAN samples with discriminator values in background
    
    torch.manual_seed(42)
    n = 100  # Number of samples
    
    # Generate synthetic data
    x = ...
    # Generate samples from the untrained GAN
    y = ...

    n = 100 # n**2 points for the grid
    t = np.linspace(-3, 3, n)
    X,Y = np.meshgrid(t, t)
    Z = gan.discriminate(Tensor(np.c_[X.ravel(), Y.ravel()])) # in R
    Z = nn.Sigmoid()(Z).detach().numpy() # convert to probability in [0,1]
    Z = Z.reshape(X.shape)
    fig, axes = plt.subplots(1, 1, figsize=(5, 5), sharex=True, sharey=True)
    axes = [axes]
    # Decision boundary 
    axes[0].contourf(X, Y, Z, levels=50, cmap='coolwarm', alpha=0.5)
    fig.colorbar(axes[0].collections[0], ax=axes[0], label='Discriminator Output', orientation='horizontal')
   
    # Original data points
    axes[0].scatter(x[:, 0], x[:, 1], s=10, alpha=0.5, c='C1', label='Data Samples')
    axes[0].set_xlim(-3.0, 3.0)
    axes[0].set_ylim(-3.0, 3.0)
    
    # GAN samples points
    axes[0].scatter(y[:, 0], y[:, 1], s=10, alpha=0.5, c='C2', label='GAN Samples')
    axes[0].set_title('Data and GAN Samples')
    axes[0].legend()

    plt.show()
    
    return fig, axes

In [None]:
fig, axes = plot_GAN_critic(model)
axes[0].set_title("*UNTRAINED* GAN Critic Decision Boundary")

Is this result what do you expect from an untrained discriminator ?

# Warm-Up : training the discriminator only



Note: the classifier neural network gives logits as output (batch of real values) rather than probabilities, so we need to apply the sigmoid function to get probabilities.

We use here directly the `torch.nn.BCEWithLogitsLoss` loss function which combines a sigmoid layer and the binary cross-entropy loss in one single class. This is more numerically stable than using a plain Sigmoid followed by a BCELoss when the input is very large (or very small).


In [None]:
n_iter = ...
learning_rate = ...
optimizer_dis = ... # setup the optimizer for the discriminator
criterion = ...
dis_losses = []

batch_size = 64

for _ in tqdm(range(n_iter), desc="Training Discriminator"):
    # Sample/predict/label real data
    x = ...
    x_pred = ... # use the discriminator to predict on real data
    x_label = torch.ones((batch_size, 1))  # Real labels are 1
    
    # Sample/predict/label fake data (from untrained generator)
    y = ... # create fake data samples
    y_pred = ...
    y_label = torch.zeros((batch_size, 1))  # Fake labels are 0
    
    # Concatenate real and fake data
    xy_pred = torch.vstack((x_pred, y_pred))
    labels = torch.vstack((x_label, y_label))
    
    # Compute BCE loss
    loss = criterion(xy_pred, labels)
    
    # Backward pass and optimization
    optimizer_dis.zero_grad()
    loss.backward()
    optimizer_dis.step()
    
    dis_losses.append(loss.item())

fig, axes = plt.subplots(1, 1, figsize=(10, 5))
axes.plot(dis_losses, label='Discriminator Loss', color='C0')
axes.set_title('Discriminator Loss Over Iterations')


Plot the decision boundary and the data/samples colored by the *trained* discriminator output probabilities. 

Is it what you expect from a trained discriminator ? 

In [None]:
torch.manual_seed(42)
n = ...
x = ...

y = ...

x_label = nn.Sigmoid()(model.discriminate(x)).detach().numpy()
y_label = nn.Sigmoid()(model.discriminate(y)).detach().numpy()

x_colors = define_colors(x.numpy(), x_label)  
y_colors = define_colors(y.numpy(), y_label)  

fig, axes = plot_data_comparison(x,y, x_colors, y_colors)

fig.suptitle("Untrained generator with trained discriminator", fontsize=24)
fig.colorbar(axes[0].collections[0], ax=axes[0], label='Data Probability', orientation='vertical')

In [None]:
fig, axes = plot_GAN_critic(model)
axes[0].set_title("Untrained Generator with Trained Discriminator")

# Exercice 2 : Training the full GAN

The optimization problem is a min max problem w.r.t. to the generator $G$ and the discriminator $D$:
$$\min_G \max_D \mathbb E_{x \sim p_{\text{data}}} [\log D(x)] + \mathbb E_{z \sim p_z} [\log (1 - D(G(z)))]$$

where $p_{\text{data}}$ is the distribution of the training data and $p_z = \mathcal N (0_d,I_d) $ is the d-dimensional latent distribution (input to the generator).

For a fixed generator $G$, the discriminator $D$ is trained to maximize the probability of assigning the correct label to both real (1) and generated samples (0), which is @equivalent to minimizing the Binary Cross Entropy (BCE) loss of the mixture of real (x, label 1) and generated (y, label 0) samples:
$$
	\min_D - \mathbb E_{x \sim p_{\text{data}}} [\log D(x)] - \mathbb E_{y \sim p_G} [\log (1 - D(y))]
$$

Using the (fixed) discriminator $D$ as a criterion, the generator $G$ is trained to minimize the probability of the discriminator assigning the wrong label to generated samples :
$$
	\min_G \mathbb E_{y \sim p_G} [\log (1 -  D(y))]
$$
To avoid vanishing gradients (a good discriminator will provide $D(y)\approx 0$), the generator is equivalently trained to maximize the probability of the discriminator assigning the desired label to generated samples (i.e. 1):
$$
	\min_G - \mathbb E_{y \sim p_G} [\log D(y)]
$$
This is equivalent to minimizing the BCE loss of the generated samples $y$ but using the desired label 1 (instead of 0)

In [None]:
torch.manual_seed(0) # For reproducibility

model = ... # create a new GAN model
criterion = ...

learning_rate = ...
optimizer_gen = ... # setup the optimizer for the generator
optimizer_dis = ... # setup the optimizer for the discriminator

In [None]:
n_iter = ...
n_dis = ...  # Number of discriminator updates per generator update (more than 1 !)
n_show = n_iter//10
batch_size = ...

gen_losses = []
dis_losses = []

pbar = tqdm(range(n_iter), desc="Training GAN")

x_labels = torch.ones((batch_size, 1))  # Real data labels (1 for real)
y_labels = torch.zeros((batch_size, 1))  # Fake data labels (0 for fake)
        
for it in pbar :
    for _ in range(n_dis):
        x = ... # Sample real data
        y = ...  # Sample from the generator
        
        # Train discriminator for a fixed generator (with y.detach())
        optimizer_dis.zero_grad()
        dis_loss_real = criterion(model.discriminate(x), x_labels)  # Loss for real data 
        dis_y = model.discriminate(y.detach())  # Detach to avoid backpropagation through generator
        dis_loss_fake = criterion(dis_y, y_labels)  # Loss for fake data
        dis_loss = (dis_loss_real + dis_loss_fake)/2  # Total discriminator loss
        dis_loss.backward()
        optimizer_dis.step()
        
    dis_losses.append(dis_loss.item())
    
    # Train generator to fool the discriminator on a new batch
    x = ... # Sample real data
    y = ...  # Sample from the generator
    
    optimizer_gen.zero_grad()
    dis_y = model.discriminate(y)
    gen_loss = criterion(dis_y, x_labels)  # Generator loss using BCE with real labels
    gen_loss.backward()
    optimizer_gen.step()
    
    gen_losses.append(gen_loss.item())
    
    pbar.set_postfix({
        'gen loss'  : f"{gen_losses[-1]:.4f}",
        'dis loss'  : f"{dis_losses[-1]:.4f}"
    })
    
    if it % n_show == 0:
        fig, axes = plot_GAN_critic(model)
        axes[0].set_title(f"GAN @ Iteration {it}")

In [None]:
fig, ax = plt.subplots(1, figsize=(10, 5))
ax.plot(gen_losses, label='GEN Loss', color='blue')
ax.plot(dis_losses, label='DIS Loss', color='orange')
ax.plot(np.log(2)/2*np.ones_like(dis_losses), '--', label='BCE for D=0.5', color='orange')
ax.set_title('Losses during GAN Training')
ax.set_xlabel('Iteration')
ax.set_ylabel('Loss Value')
ax.legend()
plt.show()

# Exercice 3 : Sampling the model and comparing to the data

In [None]:
torch.manual_seed(42)
n = 1_000
x = ...

# Generate samples from the trained GAN
y = ...

# Define colors based on discriminator output

x_label = model.discriminate(x).detach().numpy()
y_label = model.discriminate(y).detach().numpy()

x_colors = define_colors(x.numpy(), x_label= x_label)  
y_colors = define_colors(y.numpy(), x_label= y_label)  
fig, axes = plot_data_comparison(x,y, x_colors, y_colors)

fig.suptitle("GAN generation of random samples", fontsize=24)
fig.colorbar(axes[0].collections[0], ax=axes[0], label='Data Probability', orientation='vertical')


In [None]:
fig, axes = plot_GAN_critic(model)
axes[0].set_title("*UNTRAINED* GAN Critic Decision Boundary")

## Conclusion and Discussion

In this notebook, we have implemented a simple Generative Adversarial Network (GAN) using PyTorch. 
The generative model is composed of a decoder that maps a small latent space with a simple prior distribution to the (high-dimensional) data space.
To train such a model, the GAN requires a discriminator that classifies the training and generatated data and which is trained jointly with the generator. 

Advantages and Drawbacks of GANs compared to other models:
+ ✅ ...
- ❌ ...


In the next notebooks, we will explore more advanced generative models based on likelihood maximization that address some of these limitations, such as Normalizing Flow and Diffusion Models.