# Generative Adveserial Networks

Generative Adveserial Networks (GAN) are type of neural network architecture that is originally designed to generate samples which are close to orignal sample space. As the name suggests it is Generative network, which means that there is no supervised learning required. The general structure of GANs is provided in figure below. The figure shows two different networks present in the framework. They are commonly named Generator Network and Discriminator Network.

The two networks could be categorized as adversaries of each other. The basic purpose of the Generator Network is to learn the data distribution which is almost equivalent to original data distribution. The Generator Network starts with the noisy distribution and starts to learn from it. The network learns from the feedback provided by the Discriminator Network. The Discriminator could generally be any function that is differentiable. The network is trained on the sample latent data that have dimensions less than or equal to that of original data. Using this strategy allows the Generator to learn so generic features and represent the sample space in less dimesions. This ease the way to create further samples which are not in the original space. Inputs to the generator are randomly sampled from the model’s prior over the latent variables.

The Discriminator Network is provided with two types of data; the original data and the noisy data (i.e: data from Generator Network). Discriminator Network could be thought of as a binary classifier. The task of this binary classifier is to distinguish original data from generated data. It outputs 1 when the classifier thinks that the data is from original distribution and it outputs 0 if the classifier thinks that the data is from Generator Network.

Overall framework of GAN could be seen as a two player game where Generator Network is trying to get past the Discriminator Network by making it believe that the generated sample is from the original distibution. On the other hand Discriminator Network tries it best to not be fooled by Generative Network. In Ideal case If both models have sufficient capacity, then the Nash equilibrium of this game corresponds to the G(z) being drawn from the same distribution as the training data, and D(x) = 1/2 for all x.

**The GANs have mostly similar to VAEs, the most salient difference is that, if relying on standard backprop, VAEs cannot have discrete variables at the input to the generator, while GANs cannot have discrete variables at the output of the generator. The GAN approximation is subject to the failures of supervised learning: overfitting and underfitting.**

## Training and Cost Functions

There could be two types of training, for the first part of the training the Discriminator is fed with the original data. The data has half fake and half original data samples. The goal of Descriminator is to identify as much original data samples and output 1. 

In second part, generator is fed with latent variables from model's prior. The Generator tries to create fake data items and those data items are passed to Discriminator Network. In this part, the goal of Generator is to keep the output near to 1 whereas Discriminator tries to keep output near 0.

There are two cost functions for this framework. One for Generative Network and one for Discriminator Network. The commonly used cost function for Discriminator Network is modified cross entrophy, where it is expected that there are two types on inputs.

$$C_{D} = -\frac{1}{2}\mathbb{E}_{x\sim p(data)}log(D(x)) -\frac{1}{2}\mathbb{E}_{z}log(1- D(G(z)))$$

As could be seen from equation above, it maximize the data samples with are from original data and minimize the samples which are generated from Generator Network.

By training the discriminator we get the estimated ration of the two models

$$\frac{p_{data}(x)}{p_{model}(x)}$$

This is key approximation technqiue that sets GANs apart form variational autoencoders and Boltzmann machines.


As for Generator Network a simple cost function would be negative of the Discriminator cost function 

$$C_{G} = - C_{D}$$

The advantage of this approch is that we would only need to calculate the value function for adversary game. We can than find minmax function for one cost function and that would optimize the whole framework.

$$\theta_{G} = \text{argmin}_{\theta^{(G)}}  \text{max}_{\theta^{(D)}}C_{G}$$

However, there is also a big disadvantage of this cost function. Imagine a scenerio where discriminator successfully rejects generator samples with high confidence, in this case the generators's gradients vanishes. This could be intuitively seen in the above equation of cost function for discriminator. When D(G(z)) is 0 or near zero than the cost is almost zero, and taking derivative with zero initial value will vanish the gradients before reaching the input layer of neural network.

To overcome this problem we use a different cost function for Generator network.

$$C_{G} = -\frac{1}{2}\mathbb{E}_{z}log(D(G(z)))$$

Intuiviely this means that the generator maximizes the logprobability
of the discriminator being mistaken.

### DCGAN

Deep, Convolutional GAN (DCGAN), is the common architecture used in modern applications. DCGANs were initially used to generate high resolution images. They perform this task in one iteration where as prior to DCGANs, LAPGANs were used which required generation of image multiples in one iteration.

The best practicies for training DCGAN can be found on the following side https://github.com/soumith/ganhacks



#### Get packges

In [141]:
from __future__ import print_function, division

from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D, Conv2DTranspose
from keras.models import Sequential, Model
from keras.optimizers import Adam
import keras.backend as k
import keras.callbacks as callbacks;
import tensorflow as tf;

import matplotlib.pyplot as plt

import sys

import numpy as np

#### Define initial variables

In [116]:
# Input shape
img_rows = 28
img_cols = 28
channels = 1
img_shape = (img_rows, img_cols, channels)
latent_dim = 64

optimizer = Adam(0.0002, 0.5)

The paper of Radford et al. gives some hints about what is a good DCGAN architecture :

* Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). In other words, when the generator needs to increase the spatial dimension of the representation, it uses transposed convolution with a stride greater than 1

* Use Batch Normalization in both the generator (except at the output layer) and the discriminator (except at the input layer)
* Remove fully connected hidden layers for deeper architectures
* Use ReLU activation in generator for all layers except for the output
* Use LeakyReLU activation in the discriminator for all layers

For creation of generator, we have used Transposed conv2D layer with batch normalization applied

<img src="./files/dc-gan-flow.png" 
alt="IMAGE ALT TEXT HERE" border="10" />

In [134]:
def buildGenerator():

    model = Sequential()
    d1 = 7
    d2 = 1

    model.add(Dense(d1 * d1 * d2, activation="relu", input_dim=latent_dim))
    model.add(Dropout(0.25))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Reshape((d1, d1, d2)))
    model.add(Conv2DTranspose(64, kernel_size=5, strides=2, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    model.add(Dropout(0.25))
    model.add(Conv2DTranspose(64, kernel_size=5, strides=2, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    model.add(Dropout(0.25))
    model.add(Conv2DTranspose(64, kernel_size=5,strides=1, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    model.add(Dropout(0.25))
    model.add(Conv2DTranspose(channels, kernel_size=5, strides=1, padding="same"))
    model.add(Activation("tanh"))

    model.summary()

    noise = Input(shape=(latent_dim,))
    img = model(noise)

    return Model(noise, img)

The discriminator is
generated using the Conv2D layer, with activation function set to LeakyReLU and batch Normalization is applied

In [102]:
def buildDiscriminator():

    model = Sequential()

    model.add(Conv2D(64, kernel_size=5, strides=2, input_shape=img_shape, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Conv2D(64, kernel_size=5, strides=1, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Conv2D(64, kernel_size=5, strides=1, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dense(1, activation='sigmoid'))

    model.summary()

    img = Input(shape=img_shape)
    validity = model(img)

    return Model(img, validity)

In [135]:
# Build and compile the discriminator
discriminator = buildDiscriminator()
discriminator.compile(loss='binary_crossentropy',
    optimizer=optimizer,
    metrics=['accuracy'])

# Build the generator
generator = buildGenerator()

# The generator takes noise as input and generates imgs
z = Input(shape=(64,))
img = generator(z)

# For the combined model we will only train the generator
discriminator.trainable = False

# The discriminator takes generated images as input and determines validity
valid = discriminator(img)

# The combined model  (stacked generator and discriminator)
# Trains the generator to fool the discriminator
combined = Model(z, valid)
combined.compile(loss='binary_crossentropy', optimizer=optimizer)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_97 (Conv2D)           (None, 14, 14, 64)        1664      
_________________________________________________________________
batch_normalization_162 (Bat (None, 14, 14, 64)        256       
_________________________________________________________________
leaky_re_lu_110 (LeakyReLU)  (None, 14, 14, 64)        0         
_________________________________________________________________
dropout_174 (Dropout)        (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_98 (Conv2D)           (None, 14, 14, 64)        102464    
_________________________________________________________________
batch_normalization_163 (Bat (None, 14, 14, 64)        256       
_________________________________________________________________
leaky_re_lu_111 (LeakyReLU)  (None, 14, 14, 64)        0         
__________

In [104]:
def save_imgs(epoch):
        r, c = 5, 5
        noise = np.random.normal(0, 1, (r * c, latent_dim))
        gen_imgs = generator.predict(noise)

        # Rescale images 0 - 1
        gen_imgs = 0.5 * gen_imgs + 0.5

        fig, axs = plt.subplots(r, c)
        cnt = 0
        for i in range(r):
            for j in range(c):
                axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray')
                axs[i,j].axis('off')
                cnt += 1
        fig.savefig("images/mnist_%d.png" % epoch)
        plt.close()

In [143]:
def train(epochs, batch_size=128, save_interval=50):
    res =[];
    # Load the dataset
    (X_train, _), (_, _) = mnist.load_data()

    # Rescale -1 to 1
    X_train = X_train / 127.5 - 1.
    X_train = np.expand_dims(X_train, axis=3)

    # Adversarial ground truths
    valid = np.ones((batch_size, 1))
    fake = np.zeros((batch_size, 1))

    for epoch in range(epochs):

        # ---------------------
        #  Train Discriminator
        # ---------------------

        # Select a random half of images
        idx = np.random.randint(0, X_train.shape[0], batch_size)
        imgs = X_train[idx]

        # Sample noise and generate a batch of new images
        noise = np.random.normal(0, 1, (batch_size, latent_dim))
        gen_imgs = generator.predict(noise)   
        # Train the discriminator (real classified as ones and generated as zeros)
        d_loss_real = discriminator.train_on_batch(imgs, valid)
        d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
        d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

        # ---------------------
        #  Train Generator
        # ---------------------
    
        # Train the generator (wants discriminator to mistake images as real)
        g_loss = combined.train_on_batch(noise, valid)

        stats = {}
        stats['d_loss'] = d_loss[0]
        stats['d_acc'] = d_loss[1]
        stats['g_loss'] = g_loss
        
        res.append(stats)

        # If at save interval => save generated image samples
        if epoch % save_interval == 0:
            # Plot the progress
            print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))
            save_imgs(epoch)
    return res;

In [144]:
res = train(epochs=60001, batch_size=64, save_interval=500)

  'Discrepancy between trainable weights and collected trainable'


0 [D loss: 0.134999, acc.: 97.66%] [G loss: 2.306590]
500 [D loss: 0.165022, acc.: 93.75%] [G loss: 5.646105]
1000 [D loss: 0.141802, acc.: 92.97%] [G loss: 7.505753]
1500 [D loss: 0.369489, acc.: 85.94%] [G loss: 7.366264]
2000 [D loss: 0.098820, acc.: 96.09%] [G loss: 7.605535]
2500 [D loss: 0.112871, acc.: 96.09%] [G loss: 8.644616]
3000 [D loss: 0.052448, acc.: 97.66%] [G loss: 8.247178]
3500 [D loss: 0.066384, acc.: 96.88%] [G loss: 7.525047]
4000 [D loss: 0.039341, acc.: 99.22%] [G loss: 8.743761]
4500 [D loss: 0.044297, acc.: 99.22%] [G loss: 8.098145]
5000 [D loss: 0.149638, acc.: 95.31%] [G loss: 10.335606]
5500 [D loss: 0.038596, acc.: 99.22%] [G loss: 8.984094]
6000 [D loss: 0.104113, acc.: 96.88%] [G loss: 9.162931]
6500 [D loss: 0.224159, acc.: 92.97%] [G loss: 8.550970]


KeyboardInterrupt: 

In [74]:
res
np.save('28-14-7-stride-gen-28-14-7-stride-dis', res)