# Introduction

Generative adversarial networks were first introduced in 2014. These models are comprised of two networks, the generator and the discriminator (or adversary). The generator produces images and the discriminator estimates if each image came from the generator or the training data.

This can also be viewed as a forgery detection task where the generator is making forgeries and the discriminator is learning how to detect forgeries. The objective is to have the discriminator detecting forgeries 50% of the time, or choosing randomly if the image is real or forged. As training goes on, the discriminator learns better and better how to detect fakes, and the generator improves at fooling the discriminator.

The data space that the generator is using to generate images is knows as the latent space, which is periodically updated to form images of higher quality.

# History

## 2014 -  Generative Adversarial Nets.
GANs were first introduced in 2014 by Ian J. Goodfellow and a team of researchers at the University of Montreal. They showed that this two-model system could be used to train models of opposite, but related tasks. In their work, the implemented the generator and the discriminator as simple multi-layer perceptrons.

They based the loss function off of a two-player minmax game function where the discriminator is trying to maximize its score (correct forgery detections) and the generator is trying to minimize its score (forgery detections).

<img src="./images/gan-minmax.png">

Where $G$ is the generator, $D$ is the discriminator, $x$ is the training data, and $z$ is noise. This is quite different from traditional deep learning models where the objective is to minimize some value. With generative adversarial networks, we are instead trying to balance the performance of the two networks without letting one dominate the other. This can make them difficult to train.

In order to avoid overfitting the discriminator, model training alternated between adjusting the discriminator and the generator. Specifically, the discriminator would update for $k$ steps before switching and updating the generator for one step.

(Generative Adversarial Nets., Goodfellow et al.)

## 2015 - Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Deep Convolutional Generative Adversarial Networks (DCGANs) are introduced by a team of researchers at indico Research in Boston. They seek to apply CNNs to unsupervised learning using the two-network generator/descriminator approach put forth by Goodfellow et al. Up until this point, CNNs had primarily been used for supervised learning for classification of images.

Radford et al. had two major goals with this research. First, they wanted to train GANs to learn feature representations of unlabeled image and video data so that the representations could be transfered into a separate model that is trained in a supervised manor. Doing this would hopefully speed up the training time of the image classification network because the filters would already be learnt. Second, the team wanted to visualize what filters the GANs learn and how the generator uses those filters to draw new images.

The team used three image sets to evaluate their architecture: Large-scale Scene Understanding (LSUN), Imagenet-1k, and a data set of faces. When training on scenes, the generator was able to produce the following images of fake bedrooms.

<img src="./images/gan-bedrooms.png">

When training on face images, the team was able to perform vector arithmetic on sets of images to create classes of images that were not represented in the original data. For example, by starting with a man with glasses and subtracting a general man and adding a general woman will produce images of a woman with glasses.

<img src="./images/gan-vector-faces.png">

(Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Radford et al.)

## 2016 - Generative Adversarial Text to Image Synthesis
An interesting unsovled problem is generating a photo-realistic image from a sentence describing that image. In other words, determing a caption for an image and using that string to create the image. Researchers at the University of Michigan, Ann Arbor, created a generative adversarial network that does just that.

The network is trained using pre-captioned images. A convolutional-recurent network creates a representation of the string that is given to both the generator and the discriminator. It is up to the discriminator to decide if the string is an accurate description of the image it is receiving. The generator also receives the string and creates an image that is meant to fool the discriminator.

<img src="./images/gan-text-to-image-model.png">

Some examples are given below. It is important to consider the quality of the captions on the training data when creating these models. If you were to have a large group of undergrads caption your test images for you, it is likely that they will use different terms to describe the image than the researcher who tries to generate a new image from a caption. In the frouth example below, the input string includes "raised orange stamen", but none of the generated images carry this feature. It is possible that it was not in the training set at all.

<img src="./images/gan-text-to-image.png">

(Generative Adversarial Text to Image Synthesis, Reed et al.)

## 2017 - Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
Geoffrey Hinton was able to show that deep convolutional neural networks could be used to compress and reconstruct images using an encoder and decoder pair that operate on a shared latent space (cite). However, small details fail to be captured and reconstructed by this method. A team of researchers at twitter showed that the generators in GANs could be used to generate high-resolution, photo-realistic images.

The researchers generated data by taking high-resolution images and applying filters on them to obtain low-resolution versions of the same image. The generator was then trained to take the low-resolution versions and upsample them into images that fooled the discriminator when compared to the original, high-resolution images.

They used a perceptual loss function that uses adverserial loss and content loss. The content loss is based on the ReLU activation layers of VGG19, similar to the VGG loss used by Gatys et al. to minimize the perceptual loss in neural style transfer. The adversarial loss is defined by traying to minimize the probability of the discriminator deciding that a generated image is fake.

The resulting images were compared using mean opinion score testing to the original images. 26 image raters were presented the original high-resolution images and images generated using eight other encoding methods, including the GAN implementation. Using a scale of 1 - 5, 1 being poor quality, and 5 being excellent quality, the original images were rated at around 4.3 - 4.4, and the GAN generated iamages were rated 3.5 - 3.7.

(Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, Christian Ledig et al.)

## 2018 - Progressive Growing of GANs for Improved Quality, Stability, and Variation
A problem with generating high-resolution images using GANs is that it becomes easier for the descriminator to detect fake images because of the work associated with generating photo-realistic images at a large scale. To address this, team of researchers at NVIDIA introduced a method of generating progressively higher resolution images with GANs by adding new layers to images over many iterations. The training starts with low-resolution images and progressively adds more and more layers that increase the resolution. This allows for a faster training cycle because less time and memory is needed to generate each batch of images.

They applied this method to images of celebrities to generate dozens of images of fake celebrities that are photo-realistic. The generator and descriminator are mirror models of each other and layers are added to them at the same time.

<img src="./images/gan-faces.png">

(Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al.)

## Present
GANs are promising for various other tasks, including speech synthesis.

# Common Problems

## Mode Collapse
Mode collapse can be summarized as a failure of the generator to express a variety of features of the data it is trying to imitate. If the data the generator is trying to mimic has multiple modes (bi-modal, tri-modal, etc.), a common problem when training GANs is that the mode of the data generated will only represent one of the modes of the original data. This can lead to training oscillation where the generator produces data following one mode, so the discriminator learns that all data from that mode is fake, and data from any other mode is real. Then, the generator learns that the discriminator thinks all data from another mode is considered real, so it switches to that mode, and so forth.

There are several methods for countering mode collapse. The first is to encourage diversity in the individual batches so that the discriminator can compare all of the elements of the batch to each other to see if items are real or fake.

Another method is using unrolled GANs so that the generator can see how the discriminator will react to a certain set of inputs. This will significantly increase the training time because the discriminator needs to be updated several times for a single generator update, and then those changes are discarded.

The method of experience replay shows old, forged samples to the discriminator every so often so it remembers what forged samples from a different mode look like.

The last method for countering mode collapse is to accept that the generator will learn a single mode, and use multiple generators to cover each mode in the data. This way, one discriminator will keep seeing multiple forged images from each data mode, balancing its training. Again, this will substantially increase the training time because multiple networks need to be updated in each epoch.

(Mode collapse in GANs, Aiden Nibali)

## Overpowerment
Another common problem in training GANs is overpowerment, where one network learns how to easily exploit the other. This can happen in the generator, where the generator learns to put features in the forged images that frequently result in false negatives in the discriminator. This can also happen in the discriminator where it starts guessing classification values close to 0 or 1 which drive the gradient down and make it slow to update the generator.

# GAN DeepFake Detection

Recent controversial work has been used to transfer the face of one subject to another. One particular implementation made its rounds on the internet as a way of taking a celebrity's face and putting it on content images or video frames.

These models show how machine learning is being used for nefarious applications. It would therefore be useful to have a model that is capable of determining whether an image of a celebrity is genuine or has been generated by a faceswap model.

GANs are an obvious candidate for this task because of their uses for forgery detection. The generator would forge faceswap images and the descriminator would learn how to detect if the picture is a real photo of someone or if it has been faceswapped.

## CNN Detection
To examine the performance of GAN faceswap detection, a simple CNN classifer is used to classify the real and fake images. The input data is a set of 4,000 real images and 10,000 images that have already had the face swapped. The swapped images have been swapped to new people and back to the original person. Additionally, some of the real images have been scaled down a random percentage to simulate someone moving further away from the web cam.

When trained on real and a variety of fake images, the CNN is able to detect fake images with ~64% accuracy.

## GAN Detection
A generator and discriminator network are used to generate faceswapped images, and classifying whether an image has been altered or not. Using this method, we were able to achieve a faceswap detection rate of ~68%.

# References

Generative Adversarial Nets, Ian J. Goodfellow et al.: https://arxiv.org/pdf/1406.2661.pdf

Introduction to generative adversarial networks reference notebook, Francois Chollet: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/8.5-introduction-to-gans.ipynb

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Alec Radford et al.: https://arxiv.org/pdf/1511.06434.pdf

Generative Adversarial Text to Image Synthesis, Scott Reed, et al.: https://arxiv.org/pdf/1605.05396.pdf

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, Christian Ledig et al.: https://arxiv.org/pdf/1609.04802.pdf

Progressive Growing of GANs for Improved Quality, Stability, and Variation, Tero Karras et al.: http://research.nvidia.com/sites/default/files/pubs/2017-10_Progressive-Growing-of/karras2018iclr-paper.pdf

Mode collapse in GANs, Aiden Nibali: http://aiden.nibali.org/blog/2017-01-18-mode-collapse-gans/