# GAN and Its Variations

GAN (Generative Adversarial Network) has been hailed as the most interesting idea in the last 10 years of Machine Learning and there have been countless variations of GAN since it came out in 2014. Here is a list of some of its variations and what their specialization.

_Explanation about mentioned dataset can be found at the end of this notebook_

## Vanilla GAN - 10 June 2014

By Ian Goodfellow et al., without which the other variations would not exist. [arXiv](https://arxiv.org/abs/1406.2661)

The first proposed adversarial networks framework, composed of a generator and a discriminator. The generator is trained to generate images from noise input to fool the discriminator. Meanwhile, the discriminator is trained to discriminate real samples from fake samples.

Vanilla GAN is highly unstable and difficult to train. GAN is, however, easier to train than other generative models such as Boltzmann Machines since it does not require Monte Carlo approximations.

GAN is used on MNIST, TFD, and CIFAR-10 in the paper.

## CGAN (Conditional GAN) - 6 November 2014

By Mehdi Mirza and Simon Osindero. [arXiv](https://arxiv.org/abs/1411.1784)

Basically enables GAN to generate data based on certain condition by passing some extra information. The extra information could be class labels to generate data with that specific label. For example, in generating MNIST data, CGAN could produce dataset that represents the number 7 exclusively.

CGAN feeds the extra information to both discriminator and generator.

CGAN used the MNIST dataset in the research paper.

## DCGAN (Deep Convolutional GAN) - 19 November 2015

By Alireza Makhzani et al. [arXiv](https://arxiv.org/abs/1511.06434)

DCGAN uses multiple convolutional layers on the generator to generate image instead of the normal perceptrons.

DCGAN is exclusively used on image datasets (LSUN, Imagenet-1K, Faces (custom dataset)).

## EBGAN (Energy Based GAN) - 11 September 2016

By Junbo Zhao et al. [arXiv](https://arxiv.org/abs/1609.03126v1)

EBGAN's discriminator outputs energy value of the data instead of the probability of the data being a real one. Low energy is associated with region near the data manifold.

The generator is trained to generate data with low energies, while the discriminator will assign high energies to generated data.

EBGAN shows better convergence and scalability, able to generate high-resolution images.

EBGAN is used on MNIST, LSUN, and CelebA datasets.

## iGAN - 12 September 2016

By Jun-Yan Zhu et al. [arXiv](https://arxiv.org/abs/1609.03552)

iGAN finds a way to manipulate and even generate images in a realistic manner using GAN. iGAN has its own UI, a canvas where user could transform, recolor, and even add details to images using rough sketches. iGAN will then take over and generate a realistic image based on the changes the user made on the canvas.

See iGAN demonstration in [YouTube](https://www.youtube.com/watch?v=9c4z6YsBGQ0)

## LSGAN (Least Squares Generative Adversarial Networks) - 13 November 2016

By Xudong Mao et al. [arXiv](https://arxiv.org/abs/1611.04076)

LSGAN uses least squares loss function for the discriminator to overcome the vanishing gradient problem during training. Experimental results show that it could generate higher quality image and is more stable than Vanilla GAN.

LSGAN is tested on LSUN and HWDB1.0 datasets.

## WGAN (Wasserstein GAN) - 26 January 2017

By Martin Arjovsky et al. [arXiv](https://arxiv.org/abs/1701.07875)

WGAN's discriminator is called critic because it does not explicitly try to classify inputs as real or fake, but measure the Wasserstein distance between the generated data.

Wasserstein Distance is a measure of distance between two probability distributions, also known as Earth Mover's (EM) distance.

Using EM distance replaces GAN's original Jensen-Shannon (JS) distance. In JS, as the discriminator gets better, the vanishing gradient problem arises. On the other hand, EM is defferentiable almost everywhere and the critic will give clean gradients, avoiding vanishing gradient.

Read more about WGAN [in this article by alexirpan](https://www.alexirpan.com/2017/02/22/wasserstein-gan.html).

## DRAGAN (Deep Regret Analytic GAN) - 26 Jan 2017

By Naveen Kodali et al. [arXiv](https://arxiv.org/abs/1701.07875)

DRAGAN aims to improve upon WGAN.

DRAGAN uses regret minimization to achieve faster training and improved stability. Similar to WGAN, DRAGAN applies some sort of gradient penalty using the no-regret algorithm to the training process to minimize mode collapse thus increasing stability.

DRAGAN is also tested on image dataset (CIFAR-10, MNIST, CelebA)

## CycleGAN - 30 Mar 2017

By Jun-Yan Zhu et al. [arXiv](https://arxiv.org/abs/1703.10593)

CycleGAN makes unpaired image-to-image translation possible. Unpaired image-to-image translation means two image dataset with different styles, i.e. real photos and Van Gogh paintings, could be used for style transfer even if the image is not paired. The image from dataset A could be filled with scenery photographs, and the image from dataset B could be filled with Van Gogh's style painting of cats; the model could still learn to do style transfer between dataset A and B.

CycleGAN uses the concept of cycle-consistent. Cycle-consistent in language means acquiring the same sentence after translating an English sentence to French and then back again to English. This concept is applied to the image dataset to 

CycleGAN has 2 Discriminators and 2 Generators, one for each dataset.

CycleGAN is used on image for some interesting style transfer, such as horse to zebra, apple to orange, photograph to Monet, and many more.

## Dataset Explanation

### MNIST

A collection of handwritten digits, labelled 0-9. Image size is 28x28 pixels, with 1 channel (grayscale).

60,000 images in training set, 10,000 in test set.

### TFD (Toronto Faces Dataset)

A collection of faces with different emotions, labelled Anger, Disgust, Fear, Happy, Sad, Surprise, Neutral. Image size is 32x32 pixels, with 1 channel (grayscale).

### LSUN (Large-scale Scene Understanding)

A collection of scenery photographs, labelled Bedroom, Bridge, Church Outdoor, Classroom, Conference Room, Dining Room, Kitchen, Living Room, Restaurant, Tower. Images are coloured.

### CelebA (Large-scale CelebFaces Attributes)

A collection of celebrity faces, each image is labelled multiple attributes, such as Glasses, Oval Face, Pointy Nose, Mustache. Images are coloured.

Each image also has 5 landmark locations and 40 binary attributes.

### HWDB1.0

A collection of handwritten chinese characters, numbers, and symbols with a total of 3700+ classes. Images are grayscale.