# Understanding GAN (with Keras)

This tutorial aims to get you started with Generative Adversial Networks (GAN). We will first see what each term in GAN means. After that, we will start with implementation of one varient of GAN called [Deep Convolutional GAN](https://arxiv.org/abs/1511.06434) (DCGAN) along with explanation of each step in implementation. Throughout this tutorial, we will use [CelebA dataset](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) to run our GAN models. 


## Outline of the Tutorial:


* Understand GAN through basic concepts of Generative Models and Adversial Networks
* DCGAN Architecture
* Preparing CelebA dataset
* Build Generator model for DCGAN
* Build Discriminator model for DCGAN
* Understand loss functions for both generator and discriminator.
* Build Complete Training Model for DCGAN
* Understand training limitations of naive DCGAN (for ex. mode collapse)
* Learn improved techniques for training GAN with implementation
* Conclusion & Future Scope


## Dependencies:


* tqdm==4.17.0
* opencv_python==3.3.0.10
* numpy==1.13.3
* matplotlib==2.0.2
* Keras==2.0.8
* Tensorflow==1.3.0
* h5py==2.7.0
* parmap==1.5.1



## What is GAN?


To understand [GAN](https://arxiv.org/abs/1406.2661) in detail, Let's first try to understand what Generative Adversial Networks (GAN) term means. 
#### Generative models 
[Generative model](https://arxiv.org/pdf/1406.2661.pdf) refers to any model that takes a training set, consisting of samples drawn 
from a distribution **$p_{data}$** , and learns to represent an estimate of that distribution
somehow. The result is a probability distribution **$p_{model}$**.  In case of GAN, it generates samples from estimated probability distribution **$p_{model}$**.

#### Adversial Networks

Adversial Networks are implemented by a system of two neural networks contesting with each other in a **zero-sum game** framework. Adversial training will allow two opponents to learn from each other's mistakes. In case of GAN, this zero-sum game will be in between generator and discriminator. 

Let us take a **real life analogy** to explain the concept:

If you want to get better at a game, say chess; what would you do? You would play chess with an opponent better than you. Then you would analyze what you did wrong based on your moves (generator's training), what opponent did right (discriminator's training), and think on what could you do to beat opponent in the next game (generator's learning).

You would repeat this step until you defeat the opponent. This concept can be used to build better learning models. So simply, for building a powerful hero (generator), we need a more powerful opponent (discriminator)!

![GAN](https://deeplearning4j.org/img/gan_schema.png)


Now, let's understand role of two players called generator and discriminator in terms of GAN. Let's say we trying to mimic celebrity faces based on CelebA dataset. Below mentioned are the steps that GAN takes :

* Generator takes in random numbers as input and returns an image.
* Generated image by generator is fed into the discriminator along with a batch of images taken from the real dataset which GAN is trying to mimic.
* Discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.

So we have a double feedback loop in GAN:

* The discriminator is in a feedback loop with real dataset.
* The generator is in a feedback loop with the discriminator using it's output to improve generator model.


Since we have a basic understanding of how GAN works, Let's move on to one particular varient of GAN called Deep Convolutional GAN (DCGAN).


## DCGAN Architecure

![DCGAN](images/DCGANArch.png)

[DCGAN](https://arxiv.org/abs/1511.06434) stands for "Deep Convolution GAN". Below mentioned are the key insights from DCGAN architecture.
* Network Structure is based on idea of all-convolutional network. It has no pooling layers. When generator needs to increase the spatial
dimensionality of the representation, it can use transposed convolution with a
stride greater than 1.
* Architecture uses batch normalization in all layers of discriminator and generator except first layer of discriminator and last layer of generator.
* Architecture uses Adam optimizer rather than SGD with momentum.

Now, Let's start implementing DCGAN architecture before dwelling more into GAN.





In [0]:
import numpy as np
from scipy.io import loadmat
from scipy.misc import imresize
from glob import glob
!pip install tqdm
from tqdm import tqdm
import tensorflow as tf
import cv2
import keras
import h5py
import keras.backend as K
!pip install parmap
import parmap
from keras.initializers import RandomNormal
from keras.layers import Input, Concatenate
from keras.models import Model
from keras.layers.core import Flatten, Dense, Dropout, Activation, Lambda, Reshape
from keras.layers.convolutional import Conv2D, Deconv2D, ZeroPadding2D, UpSampling2D, Conv2DTranspose
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.normalization import BatchNormalization
from keras.optimizers import Adam,SGD
import matplotlib.pyplot as plt
import time
import os
%matplotlib inline

## Loading & Preprocessing CelebA dataset

**CelebFaces Attributes Dataset (CelebA)** is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. In our case, we will not require attribute annotations. We will load celebrity images and fed it into discriminator model after pre-processing.

### Steps to download CelebA Dataset

- Go to http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
- In the Downloads section, select Align&Cropped images.
- In the dropbox page that follows, download the Anno, Eval and Img folders.
- Extract the zip files.

You should have the following folder structure:

    ├── Anno
        ├── list_attr_celeba.txt  
        ├── list_bbox_celeba.txt  
        ├── list_landmarks_align_celeba.txt  
        ├── list_landmarks_celeba.txt
    ├── Eval
        ├──list_eval_partition.txt
    ├── img_align_celeba
        ├──lots of images



Once you have dataset in place on local system, load all the filenames.Once you execute code snippet below, it will ask to upload images. Based on number of images you upload, adjust **chunk_size** variable,  which helps us to divide data into fixed size chunks to pre-process each chunk in parallel.

In [0]:
from google.colab import files

uploaded = files.upload()

# for fn in uploaded.keys():
#   print('User uploaded file "{name}" with length {length} bytes'.format(
#       name=fn, length=len(uploaded[fn]))
filenames = np.array(glob('*.jpg'))
print(filenames)
num_files = len(filenames)
chunk_size = 5
num_chunks = num_files / chunk_size
print(num_chunks)
arr_chunks = np.array_split(np.arange(num_files), num_chunks)
print(arr_chunks)


Now, let's define function which will take image and output size as input parameters. This function will mainly perform two pre-processing steps:
* Slice image to center around face
* Reduce image size to the size provided as input parameter of function (64 in this case)

In [0]:
def preprocess_image(img_path, size=64):
    img_color = cv2.imread(img_path)
    img_color = img_color[:, :, ::-1]
    # Slice image to center around face
    img_color = img_color[30:-30, 20:-20, :]
    img_color = cv2.resize(img_color, (size, size), interpolation=cv2.INTER_AREA)
    img_color = img_color.reshape((1, size, size, 3)).transpose(0, 3, 1, 2)
    print(img_color.shape)
    return img_color

Let's create training dataset of face images using above defined preprocess_image function. 

We will use HDF5 file to create single dataset using [h5py](https://www.h5py.org/) library. An HDF5 file is a container for two kinds of objects: datasets, which are array-like collections of data, and groups, which are folder-like containers that hold datasets and other groups.

We will use permap library to preprocesses images parallelly in each chunk.

In the end, we will append preprocessed images to dataset.

In [0]:
with h5py.File("CelebA_sample100.h5", "w") as hfw:
  X_train = hfw.create_dataset("data",
                                  (0, 3, 64, 64),
                                  maxshape=(None, 3, 64, 64),
                                  dtype=np.uint8)

  for chunk_idx in tqdm(arr_chunks):
      print(chunk_idx)
      list_img_path = filenames[chunk_idx].tolist()
      output = parmap.map(preprocess_image, list_img_path, 64, pm_parallel=True)
      arr_img_color = np.concatenate(output, axis=0)
      X_train.resize(X_train.shape[0] + arr_img_color.shape[0], axis=0)
      X_train[-arr_img_color.shape[0]:] = arr_img_color.astype(np.uint8)

As we will see later on, the generator is using $tanh$ activation, for which we need to normalize the image data into the range between -1 and 1. We will add inverse normalization function as well.


In [0]:
def normalization(X):
    return X / 127.5 - 1
  
def inverse_normalization(X):
    return (X + 1.) / 2.

In [0]:
with h5py.File("CelebA_sample100.h5", "r") as hf:
    X_real_train = hf["data"][:].astype(np.float32).transpose(0, 2, 3, 1)
    X_real_train = normalization(X_real_train)

## Build Generator model for DCGAN

![Generator](https://cdn-images-1.medium.com/max/1600/1*Tv7wjpBTB0Pg6rWfLm4YSA.png) 

The generator takes a latent sample (100 randomly generated numbers) and produces a color face image that should look like one from the CelebA dataset. 

Generator model shown in image above is self-explanatory if you understand Convolution NN. Since this tutorial aims to explore GAN, we will not cover basics of convolution. If you want more details about Convolution based NNs, refer [this](http://cs231n.github.io/convolutional-networks/) tutorial.

#### Few highlights from Generator Model :

 * We will use $tanh$ activation at last layer of model, as suggested [here](https://github.com/soumith/ganhacks).
 * We will use transposed convolution instead of normal convolution to achieve transformation in opposite direction of normal convolution while maintaining connectivity patterns compatible with convolution. For more details, refer [this](https://towardsdatascience.com/up-sampling-with-transposed-convolution-9ae4f2df52d0) tutorial. 
 
 

In [0]:
def generator_deconv(noise_dim, img_dim, batch_size):

    s = img_dim[1]
    f = 1024
    start_dim = int(s / 16)
    nb_upconv = 4

    reshape_shape = (start_dim, start_dim, f)
    output_channels = img_dim[-1]

    gen_input = Input(shape=noise_dim, name="generator_input")

    x = Dense(f * start_dim * start_dim, input_dim=noise_dim)(gen_input)
    x = Reshape(reshape_shape)(x)
    x = BatchNormalization(axis=-1)(x)
    x = Activation("relu")(x)

    # Transposed conv blocks
    for i in range(nb_upconv - 1):
        nb_filters = int(f / (2 ** (i + 1)))
        s = start_dim * (2 ** (i + 1))
        o_shape = (batch_size, s, s, nb_filters)
        x = Conv2DTranspose(nb_filters, (3, 3), strides=(2, 2), padding="same")(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation("relu")(x)

    # Last block
    s = start_dim * (2 ** (nb_upconv))
    o_shape = (batch_size, s, s, output_channels)
    x = Conv2DTranspose(output_channels, (3, 3), strides=(2, 2), padding="same")(x)
    x = Activation("tanh")(x)

    generator_model = Model(inputs=[gen_input], outputs=[x])

    return generator_model


## Build Discriminator model for DCGAN

Discriminator model for DCGAN consist of 4 blocks of layers which in turn contains convolution, BatchNorm and LeakyReLU layers. You can directly feed output of last block to Dense block which will map it to final probability distribution. Discriminator model used here is just traditional convolutional NN.

But then, You might wonder about a role of function named **minb_disc** which is a part of discriminator model. Well, This function is trying to reduce/solve a problem called **Mode collapse**. Mode collapse occurs while training GAN itself. 

### Mode Collapse 
During the training process of GAN, generator may collapse to parameter settings/values where it mostly/always produces same set of generated images. When collapse to single mode is about to occur, gradient of the discriminator may point in similar directions because discriminator is not able to distinguish between samples since it processes each generator sample independently.

In such a scenario, the generator will exhibit really poor diversity among generated samples, which limits the usefulness of the learnt GAN. 

One solution to reduce possibility of mode collapse is to look and process multiple images at a time instead of just one image. To achieve this, we will use a technique called mini-batch discrimination which is explained below:

### Mini-bach Descrimination
[Mini-batch descrimination](https://arxiv.org/abs/1606.03498) allows discriminator model to look at multiple examples in combination, rather than in isolation which could potentially help avoid collapse of the generator. With minibatch discrimination, the discriminator is able to digest the relationship between training data points in one batch, instead of processing each point independently.

In one minibatch layer, we approximate the closeness between every pair of samples, $c(x_i,x_j)$, and get the overall summary of one data point by summing up how close it is to other samples in the same batch, $o(x_i)=∑_jc(x_i,x_j)$. Then $o(x_i)$ is concatenated to output of previous layer to mini-batch layer.




In [0]:
def DCGAN_discriminator(noise_dim, img_dim):

    disc_input = Input(shape=img_dim, name="discriminator_input")

    list_f = [64, 128, 256]

    # First conv
    x = Conv2D(32, (3, 3), strides=(2, 2), name="disc_Conv2D_1", padding="same")(disc_input)
    x = BatchNormalization(axis=-1)(x)
    x = LeakyReLU(0.2)(x)

    # Next convs
    for i, f in enumerate(list_f):
        name = "disc_Conv2D_%s" % (i + 2)
        x = Conv2D(f, (3, 3), strides=(2, 2), name=name, padding="same")(x)
        x = BatchNormalization(axis=-1)(x)
        x = LeakyReLU(0.2)(x)

    x = Flatten()(x)

    def minb_disc(x):
        diffs = K.expand_dims(x, 3) - K.expand_dims(K.permute_dimensions(x, [1, 2, 0]), 0)
        abs_diffs = K.sum(K.abs(diffs), 2)
        x = K.sum(K.exp(-abs_diffs), 2)
        return x

    def lambda_output(input_shape):
        return input_shape[:2]

    num_kernels = 100
    dim_per_kernel = 5

    M = Dense(num_kernels * dim_per_kernel, use_bias=False, activation=None)
    MBD = Lambda(minb_disc, output_shape=lambda_output)

    x_mbd = M(x)
    x_mbd = Reshape((num_kernels, dim_per_kernel))(x_mbd)
    x_mbd = MBD(x_mbd)
    x = Concatenate(axis=-1)([x, x_mbd])

    x = Dense(2, activation='softmax', name="disc_dense_2")(x)

    discriminator_model = Model(inputs=[disc_input], outputs=[x])

    return discriminator_model


def DCGAN(generator, discriminator_model, noise_dim, img_dim):

    noise_input = Input(shape=noise_dim, name="noise_input")

    generated_image = generator(noise_input)
    DCGAN_output = discriminator_model(generated_image)

    DCGAN = Model(inputs=[noise_input],
                  outputs=[DCGAN_output],
                  name="DCGAN")

    return DCGAN

We have defined generator and discriminator models till now. Before we start with training module, we will define some utility functions that will be useful in training process.

Most of the utility functions such as **gen_batch, sample_noise** etc. are self-explanatory. One utility function that requires our attention is **get_disc_batch**. 

**get_disc_batch** function basically helps us in generation batches for discriminator model training.  Here, we will use only generated or real images per batch alternatively as suggested [here](https://github.com/soumith/ganhacks).  As you can see in function, there are two cases based on value of batch_counter :
1) In case of fake images, we will generate random vector and get fake image from generator model. We will set label as fake (0).
2) In case of real images, we will use actual images and set label as real (1).

You might notice few things like label_smoothing & label_flipping etc. Let's understand them one by one.

#### One-Sided Label Smoothing
label smoothing basically suggests to replace 0 & 1 targets for the classifier (discriminator in our case) with smoothed values like $0.1$ & $0.9$ respectively.  Label smoothing shown to reduce vulnerability of neural networks in case of adversial training as suggested [here](https://arxiv.org/abs/1606.03498).  Note that, we will only smooth positive labels for discriminator.

#### Label Flipping
Label flipping basically suggests to occasionally flip the labels for discriminator. It helps in stabilizing training for GAN as suggested [here](https://github.com/soumith/ganhacks). Also, we will flip labels based on binomial distribution as shown in function.



In [0]:
def gen_batch(X, batch_size):

    while True:
        idx = np.random.choice(X.shape[0], batch_size, replace=False)
        yield X[idx]


def sample_noise(noise_scale, batch_size, noise_dim):

    return np.random.normal(scale=noise_scale, size=(batch_size, noise_dim[0]))


def get_disc_batch(X_real_batch, generator_model, batch_counter, batch_size, noise_dim,
                   noise_scale=0.5, label_smoothing=False, label_flipping=0):

    # Create X_disc: alternatively only generated or real images
    if batch_counter % 2 == 0:
        # Pass noise to the generator
        noise_input = sample_noise(noise_scale, batch_size, noise_dim)
        # Produce an output
        X_disc = generator_model.predict(noise_input)
        y_disc = np.zeros((X_disc.shape[0], 2), dtype=np.uint8)
        y_disc[:, 0] = 1

        if label_flipping > 0:
            p = np.random.binomial(1, label_flipping)
            if p > 0:
                y_disc[:, [0, 1]] = y_disc[:, [1, 0]]

    else:
        X_disc = X_real_batch
        y_disc = np.zeros((X_disc.shape[0], 2), dtype=np.uint8)
        if label_smoothing:
            y_disc[:, 1] = np.random.uniform(low=0.9, high=1, size=y_disc.shape[0])
        else:
            y_disc[:, 1] = 1

        if label_flipping > 0:
            p = np.random.binomial(1, label_flipping)
            if p > 0:
                y_disc[:, [0, 1]] = y_disc[:, [1, 0]]

    return X_disc, y_disc


def get_gen_batch(batch_size, noise_dim, noise_scale=0.5):

    X_gen = sample_noise(noise_scale, batch_size, noise_dim)
    y_gen = np.zeros((X_gen.shape[0], 2), dtype=np.uint8)
    y_gen[:, 1] = 1

    return X_gen, y_gen


## Training GAN

Since we have models and utility functions in place, we can start building training model for GAN. GAN training is most important & challenging part of GAN training. We will first go through loss functions for GAN. Later, we will cover training flow and important highlights in GAN training.

#### Loss Functions
First, we will start with discriminator loss function. It is just standard cross-entropy cost that is minimized.

$\qquad$$\qquad$$J_{D}(D,G) = - \sum_{x∼p_{data}} [log D(x)] -   \sum_{z∼p_{z}} [log(1 − D(G(z)))]  $

where $D$ is Discriminator function and $G$ is Generator function. Also, $p_{data}$ refers to real data distribution and $p_{z}$ refers to generated data distribution.


The only difference, from standard cross-entropy, is that classifier is training on two mini-batches of dataset instead one; dataset coming from generator where labels are 0 and dataset coming from real images where labels are 1 for all examples. Most of GAN variants use above loss functions for discriminator till now. They vary in terms of generator loss function which we will see now.


The theoretical analysis in [original GAN paper](https://arxiv.org/abs/1406.2661) is based on a **zero-sum game** in which, generator attempts to generate sample that have low probability of being fake, by minimizing the below mentioned objective function.

$\qquad$$\qquad$$J_{G}(G) =  \sum_{z∼p_{z}} [log(1 − D(G(z)))]  $


However, loss function mentioned above may not provide sufficient gradient for generator to learn well. When G is poor early in training, D rejects generated samples with high confidence resulting in saturation of above loss function. 

In practice, [original GAN paper](https://arxiv.org/abs/1406.2661) recommends alternative loss function which ensures that generated sample have high probability of being real, by minimizing below mentioned alternative objective function:

$\qquad$$\qquad$$J_{G}(G) = - \sum_{z∼p_{z}} [logD(G(z))]  $



We will use non-saturating (due to non-saturating behaviour of gradient) objective shown above in our implementation. For more details about loss functions of GAN, refer [this](https://arxiv.org/abs/1701.00160) tutorial.

#### Optimality for GAN training

When generated probability distribution is equals to real probability distribution, GAN is trained to generate images as close as real images. Essentially, when discriminator is optimal, loss function of GAN quantifies the similarity between the generative data distribution pg and the real sample distribution pr by **Jensen–Shannon divergence**. For more details about mathematical proof, refer [this](https://arxiv.org/abs/1406.2661) paper.


#### Training Highlights and Process

* We will use SGD optimizer for training discriminator and ADAM optimization for training DCGAN model as suggested [here](https://github.com/soumith/ganhacks).
* We will first train discriminator alone on batch of generated and real images. After that, We will train freeze discriminator and train DCGAN model to train generator based on feedback from discriminator.




In [0]:
def train(**kwargs):
    """
    Train model
    Load the whole train data in memory for faster operations
    args: **kwargs (dict) keyword arguments that specify the model hyperparameters
    """

    # Roll out the parameters
    batch_size = kwargs["batch_size"]
    n_batch_per_epoch = kwargs["n_batch_per_epoch"]
    nb_epoch = kwargs["nb_epoch"]
    model_name = kwargs["model_name"]
    image_data_format = kwargs["image_data_format"]
    img_dim = kwargs["img_dim"]
    label_smoothing = kwargs["label_smoothing"]
    label_flipping = kwargs["label_flipping"]
    noise_scale = kwargs["noise_scale"]
    use_mbd = kwargs["use_mbd"]
    epoch_size = n_batch_per_epoch * batch_size

    # Setup environment (logging directory etc)
    # general_utils.setup_logging(model_name)

    img_dim = X_real_train.shape[-3:]
    noise_dim = (100,)

    try:

        # Create optimizers
        opt_dcgan = Adam(lr=1E-3, beta_1=0.5, beta_2=0.999, epsilon=1e-08)
        opt_discriminator = SGD(lr=1E-3, momentum=0.9, nesterov=True)

        # Load generator model
        generator_model = generator_deconv(noise_dim,
                                      img_dim,
                                      batch_size)
        # Load discriminator model
        discriminator_model = DCGAN_discriminator(noise_dim,
                                          img_dim)

        generator_model.compile(loss='mse', optimizer=opt_discriminator)
        discriminator_model.trainable = False

        DCGAN_model = DCGAN(generator_model,
                                   discriminator_model,
                                   noise_dim,
                                   img_dim)

        loss = ['binary_crossentropy']
        loss_weights = [1]
        DCGAN_model.compile(loss=loss, loss_weights=loss_weights, optimizer=opt_dcgan)

        discriminator_model.trainable = True
        discriminator_model.compile(loss='binary_crossentropy', optimizer=opt_discriminator)

        gen_loss = 100
        disc_loss = 100

        # Start training
        print("Start training")
        for e in range(nb_epoch):
            # Initialize progbar and batch counter
            #progbar = generic_utils.Progbar(epoch_size)
            batch_counter = 1
            start = time.time()

            for X_real_batch in gen_batch(X_real_train, batch_size):

                # Create a batch to feed the discriminator model
                X_disc, y_disc = get_disc_batch(X_real_batch,
                                                           generator_model,
                                                           batch_counter,
                                                           batch_size,
                                                           noise_dim,
                                                           noise_scale=noise_scale,
                                                           label_smoothing=label_smoothing,
                                                           label_flipping=label_flipping)

                # Update the discriminator
                disc_loss = discriminator_model.train_on_batch(X_disc, y_disc)

                # Create a batch to feed the generator model
                X_gen, y_gen = get_gen_batch(batch_size, noise_dim, noise_scale=noise_scale)

                # Freeze the discriminator
                discriminator_model.trainable = False
                gen_loss = DCGAN_model.train_on_batch(X_gen, y_gen)
                # Unfreeze the discriminator
                discriminator_model.trainable = True

                batch_counter += 1
                #progbar.add(batch_size, values=[("D logloss", disc_loss),
                #                               ("G logloss", gen_loss)])

                # Save images for visualization
#                 if batch_counter % 100 == 0:
#                     data_utils.plot_generated_batch(X_real_batch, generator_model,
#                                                     batch_size, noise_dim, image_data_format)

                if batch_counter >= n_batch_per_epoch:
                    break

            print("")
            print('Epoch %s/%s, Time: %s' % (e + 1, nb_epoch, time.time() - start))

            if e % 5 == 0:
                gen_weights_path = os.path.join('%s_gen_weights_epoch%s.h5' % (model_name, e))
                generator_model.save_weights(gen_weights_path, overwrite=True)

                disc_weights_path = os.path.join('%s_disc_weights_epoch%s.h5' % (model_name, e))
                discriminator_model.save_weights(disc_weights_path, overwrite=True)

                DCGAN_weights_path = os.path.join('%s_DCGAN_weights_epoch%s.h5' % (model_name, e))
                DCGAN_model.save_weights(DCGAN_weights_path, overwrite=True)

    except KeyboardInterrupt:
        pass


In [0]:
# Change parameters based on size of training dataset

d_params = {"mode": "train_GAN",
              "batch_size": 2,
              "n_batch_per_epoch": 5,
              "nb_epoch": 50,
              "model_name": "CNN",
              "do_plot": False,
              "image_data_format": "channels_last",
              "bn_mode": 2,
              "img_dim": 64,
              "label_smoothing": True,
              "label_flipping": True,
              "noise_scale": 0.5,
              "use_mbd": True,
            }

# Launch training
train(**d_params)


## Results & Future Scope 

![GANResults](images/GANResults.png)

First two rows of results represents GAN generated images and last two rows of results represents real images from CelebA dataset. 

## Future Scope 

#### Use Better Metric of Distribution Similarity
Loss function of the vanilla GAN measures the JS divergence between the distributions of real data distribution and generated distribution. This metric fails to provide a meaningful value when two distributions are disjoint, which is frequent case for GAN.

To overcome this, [Wasserstein metric](https://arxiv.org/abs/1701.07875) is proposed, which has much smoother value even when distributions are disjoint. If you are further interested in Wasserstein based GAN, you can explore [WGAN with gradient penalties](https://arxiv.org/abs/1704.00028) & [DRAGAN](https://arxiv.org/abs/1705.07215)

#### Energy-based GANs

Main idea behind energy-based GAN is to consider discriminator as energy function which assigns low energies to regions near data manifolds and higher energies in other regions. One such GAN called [EBGAN](https://arxiv.org/abs/1609.03126) uses auto-encoder as discriminator with energy being reconstruction error. 

[BEGAN](https://arxiv.org/abs/1703.10717), which was state-of-the art for generating realistic faces, aims to compare auto-encoders' reconstruction losses of real and generated images instead of comparing direct data distribution.

#### GAN on Higher-resolution Images

In this tutorial, We worked on face images of low resolution ($64*64$). [Progressive GAN](https://arxiv.org/abs/1710.10196) is able to generate HD face images of size $1024*1024$. The key idea is to grow generator and discriminator progressively from low resolution images.

Lastly, If you are interested about equilibrium of GAN training, refer [this](https://arxiv.org/abs/1710.08446) awesome paper.


