# Generative Adversarial Network

- **Generative modeling** tries to model the representation of probability distribution(i.e. Density Estimation), which explains( | represents) collection of input training examples. 
- **Generative Adversarial Network(GAN)** is composed of : a generator(G) & a discriminator (D). It is used to generate new samples from learned latent space.



![](gan.png)

[@imgsource](https://www.kdnuggets.com/wp-content/uploads/generative-adversarial-network.png)

## GAN-MNIST dataset ( PyTorch )

In [1]:
import os

import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
from torchvision.utils import save_image
from torch.autograd import Variable
import torchvision.datasets as datasets
 
import matplotlib.pyplot as plt
import numpy as np
import pylab

### Hyperparameters

In [2]:
latent_size = 64
hidden_size = 256
image_size = 784
num_epochs = 30
batch_size = 32
sample_dir = 'samples'
save_dir = 'save'

# Create a directory if not exists
if not os.path.exists(sample_dir):
    os.makedirs(sample_dir)

if not os.path.exists(save_dir):
    os.makedirs(save_dir)

### Preprocessing

In [3]:
# Image processing
transform = transforms.Compose([
transforms.ToTensor(), transforms.Normalize([0.5], [0.5])])

### Dataset

In [4]:
mnist = torchvision.datasets.MNIST(root='./data/*'
                                   , train=True
                                   , download=True
                                   , transform = transforms.Compose([
                                       transforms.ToTensor()
                                       , transforms.Normalize([0.5], [0.5])]))

data_loader = torch.utils.data.DataLoader(dataset=mnist
                                          , batch_size=batch_size 
                                          , shuffle=True)

## Generator

- It tries to generate fake data from randomly generated noise G(z), which are harder to discriminate each iteration, from real ones.
- $\because x$ is the actual image, $D(x) = 1$ ( Probability ),  generator tries to increase the value of $D(G(x))$ (i.e. Probability of being real data )
- Training G : Maximizing the probability of $D$ making mistakes by generating data as realistic as possible.

In [5]:
class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.main = nn.Sequential(
            
            nn.Linear(latent_size, hidden_size)
            , nn.ReLU()
            , nn.Linear(hidden_size, hidden_size)
            , nn.ReLU()
            , nn.Linear(hidden_size, image_size)
            , nn.Tanh()
        )

## Discriminator

- It is a binary classifier, that discriminates whether the output from generator is real or fake.
- $\because x$ is the actual image, $D(x) = 1$, discriminator tries to decrease the value of $D(G(x))$ (i.e. fake data )

In [6]:
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.main = nn.Sequential(
            
            nn.Linear(image_size, hidden_size)
            , nn.LeakyReLU(0.2)
            , nn.Linear(hidden_size, hidden_size)
            , nn.LeakyReLU(0.2)
            , nn.Linear(hidden_size, 1)
            , nn.Sigmoid()
        )

In [8]:
# Discriminator
D = nn.Sequential(
    nn.Linear(image_size, hidden_size),
    nn.LeakyReLU(0.2),
    nn.Linear(hidden_size, hidden_size),
    nn.LeakyReLU(0.2),
    nn.Linear(hidden_size, 1),
    nn.Sigmoid())

# Generator 
G = nn.Sequential(
    nn.Linear(latent_size, hidden_size),
    nn.ReLU(),
    nn.Linear(hidden_size, hidden_size),
    nn.ReLU(),
    nn.Linear(hidden_size, image_size),
    nn.Tanh())

## MinMax Game | *Adversarial Learning*
- After training, D & G will reach saturation of improvement.
- Gererator(G) wins (i.e. learns to create realistic data ) when Discriminator(D) can't differentiate generated data from the real one.
- This loss function maximizes the function $D(x)$, and also minimizes $D(G(x))$. where 'x': real data, 'G(x)': generrated data. 

$$
    \boxed{\min_G \max_D V(D, G)= \mathbb{E}_{x\sim p_{data}(x)}[\log D(x)] + \mathbb{E}_{z\sim p_z(z)}[\log(1 - D(G(z)))]} 
$$


### Binary Cross Entropy Loss( `BCE-Loss` ):

- v : inputs, w: weights, y : targets, N : batch size

$$
    \boxed{L = {\{l_1, ... , l_N\}}^T, l_i = -w_i\left[ y_i \cdot \log(v_i) + (1-y)\cdot \log(1-v_i)\right]}
$$


In [9]:
# Binary cross entropy loss and optimizer
criterion = nn.BCELoss()
d_optimizer = torch.optim.Adam(D.parameters(), lr=0.0002)
g_optimizer = torch.optim.Adam(G.parameters(), lr=0.0002)

def denorm(x):
    out = (x + 1) / 2
    return out.clamp(0, 1)

def reset_grad():
    d_optimizer.zero_grad()
    g_optimizer.zero_grad()

### Discriminator Loss:

$$
    \boxed{D_{o}=\frac{1}{m} \sum_{i=1}^{m}\left[ \log D(x^{(i)}) + \log (1-D(G(z^{(i)}))) \right]} 
$$

1. If $v_i = D(x_i)$ and $y_i=1 \forall i$ in the `BCE-Loss above ⬆️` : Loss related to real images.
2. If $v_i = D(G(x))$ and $y_i=0 \forall i$ : Loss related to fake images.
3. Sum of `1` and `2` : **`minibatch-loss`** for the Discriminator.

### Generator Loss:

$$
    \boxed{G_{o}=\frac{1}{m} \sum_{i=1}^{m}\log \left({1 - D\left(G\left(z^{(i)}\right)\right)}\right)} 
$$

- If $v_i = D(G(z_i))$ and $y_i = 1 \forall i$ : Loss needed to be minimized.
- Train the generator to maximize $\log \left(D(G(z)))\right)$ ( provides stronger gradients early in training [Ref.@Section3](https://arxiv.org/pdf/1406.2661.pdf)) rather than minimizing $\log \left( 1- D(G(z))\right)$ 



## Training

In [10]:
# Statistics to be saved
d_losses = np.zeros(num_epochs)
g_losses = np.zeros(num_epochs)
real_scores = np.zeros(num_epochs)
fake_scores = np.zeros(num_epochs)

total_step = len(data_loader)
for epoch in range(num_epochs):
    for i, (images, _) in enumerate(data_loader):
        images = images.view(batch_size, -1)
        images = Variable(images)
        # Create the labels which are later used as input for the BCE loss
        real_labels = torch.ones(batch_size, 1)
        real_labels = Variable(real_labels)
        fake_labels = torch.zeros(batch_size, 1)
        fake_labels = Variable(fake_labels)

        # ================================================================== #
        #                      Train the discriminator                       #
        # ================================================================== #

        # Compute BCE_Loss using real images where BCE_Loss(x, y): - y * log(D(x)) - (1-y) * log(1 - D(x))
        # Second term of the loss is always zero since real_labels == 1
        outputs = D(images)
        d_loss_real = criterion(outputs, real_labels)
        real_score = outputs
        
        # Compute BCELoss using fake images
        # First term of the loss is always zero since fake_labels == 0
        z = torch.randn(batch_size, latent_size)
        z = Variable(z)
        fake_images = G(z)
        outputs = D(fake_images)
        d_loss_fake = criterion(outputs, fake_labels)
        fake_score = outputs
        
        # Backprop and optimize
        # If D is trained so well, then don't update
        d_loss = d_loss_real + d_loss_fake
        reset_grad()
        d_loss.backward()
        d_optimizer.step()
        # ================================================================== #
        #                        Train the generator                         #
        # ================================================================== #

        # Compute loss with fake images
        z = torch.randn(batch_size, latent_size)
        z = Variable(z)
        fake_images = G(z)
        outputs = D(fake_images)
        
        # We train G to maximize log(D(G(z)) instead of minimizing log(1-D(G(z)))
        # For the reason, see the last paragraph of section 3. https://arxiv.org/pdf/1406.2661.pdf
        g_loss = criterion(outputs, real_labels)
        
        # Backprop and optimize
        # if G is trained so well, then don't update
        reset_grad()
        g_loss.backward()
        g_optimizer.step()
        # =================================================================== #
        #                          Update Statistics                          #
        # =================================================================== #
        d_losses[epoch] = d_losses[epoch]*(i/(i+1.)) + d_loss.data*(1./(i+1.))
        g_losses[epoch] = g_losses[epoch]*(i/(i+1.)) + g_loss.data*(1./(i+1.))
        real_scores[epoch] = real_scores[epoch]*(i/(i+1.)) + real_score.mean().data*(1./(i+1.))
        fake_scores[epoch] = fake_scores[epoch]*(i/(i+1.)) + fake_score.mean().data*(1./(i+1.))
        
        if (i+1) % 200 == 0:
            print('Epoch [{}/{}], Step [{}/{}], d_loss: {:.4f}, g_loss: {:.4f}, D(x): {:.2f}, D(G(z)): {:.2f}' 
                  .format(epoch, num_epochs, i+1, total_step, d_loss.data, g_loss.data, 
                          real_score.mean().data, fake_score.mean().data))
    
    # Save real images
    if (epoch+1) == 1:
        images = images.view(images.size(0), 1, 28, 28)
        save_image(denorm(images.data), os.path.join(sample_dir, 'real_images.png'))
    
    # Save sampled images
    fake_images = fake_images.view(fake_images.size(0), 1, 28, 28)
    save_image(denorm(fake_images.data), os.path.join(sample_dir, 'fake_images-{}.png'.format(epoch+1)))
    
    # Save and plot Statistics
    np.save(os.path.join(save_dir, 'd_losses.npy'), d_losses)
    np.save(os.path.join(save_dir, 'g_losses.npy'), g_losses)
    np.save(os.path.join(save_dir, 'fake_scores.npy'), fake_scores)
    np.save(os.path.join(save_dir, 'real_scores.npy'), real_scores)
    
    plt.figure()
    pylab.xlim(0, num_epochs + 1)
    plt.plot(range(1, num_epochs + 1), d_losses, label='d loss')
    plt.plot(range(1, num_epochs + 1), g_losses, label='g loss')    
    plt.legend()
    plt.savefig(os.path.join(save_dir, 'loss.pdf'))
    plt.close()

    plt.figure()
    pylab.xlim(0, num_epochs + 1)
    pylab.ylim(0, 1)
    plt.plot(range(1, num_epochs + 1), fake_scores, label='fake score')
    plt.plot(range(1, num_epochs + 1), real_scores, label='real score')    
    plt.legend()
    plt.savefig(os.path.join(save_dir, 'accuracy.pdf'))
    plt.close()

    # Save model at checkpoints
    if (epoch+1) % 50 == 0:
        torch.save(G.state_dict(), os.path.join(save_dir, 'G--{}.ckpt'.format(epoch+1)))
        torch.save(D.state_dict(), os.path.join(save_dir, 'D--{}.ckpt'.format(epoch+1)))

# Save the model checkpoints 
torch.save(G.state_dict(), 'G.ckpt')
torch.save(D.state_dict(), 'D.ckpt')

Epoch [0/30], Step [200/1875], d_loss: 0.0474, g_loss: 4.2258, D(x): 0.99, D(G(z)): 0.04
Epoch [0/30], Step [400/1875], d_loss: 0.9275, g_loss: 3.6391, D(x): 0.84, D(G(z)): 0.38
Epoch [0/30], Step [600/1875], d_loss: 0.0271, g_loss: 4.8383, D(x): 0.99, D(G(z)): 0.02


KeyboardInterrupt: 

### Generated Samples

#### Epoch 1. (Random noise)

![](./samples/fake_images-1.png)

. </br>
[.](https://aihubprojects.com/gan-implementation-on-mnist-dataset-pytorch/) </br>
. </br>

#### Epoch 30. 

![](./samples/fake_images-30.png)

*** 