# 07 - Deep Learning

## 08 - Generative Adversarial Networks and Adversarial Training

<img src="../media/07-gan/lisa.png">

In [5]:
from IPython.core.display import HTML

# I. What is a GAN ?

## 1. Definition

> **Big Picture** : Generative Adversarial Networks (GANs) are deep learning architectures made of 2 Neural Nets, fighting one against the other. 

- A first network is train to take a random input and try to make it look like our target data.

- A second network is trained to recognize the fake inputs from the true inputs. 

We use the fact that these 2 networks fight against one another to learn the distribution of the input data and generate new data samples. Thus, GANs are said to be a **generative model**.

Therefore, GANs are able to generate :
- faces
- images
- videos
- voices
- data
- and even art...

Concretely, you can apply GANs to :
- Font generation
- Anime character generation
- Interactive Image generation
- Text2Image (text to image)
- 3D Object generation
- Image Editing
- Face Aging
- Human Pose Estimation
- Domain-transfer (e.g. style-transfer, pix2pix, sketch2image)
- Image Inpainting (hole filling)
- Super-resolution
- High-resolution image generation (large-scale image)
- Adversarial Examples (Defense vs Attack)
- Visual Saliency Prediction (attention prediction)
- Object Detection/Recognition
- Robotics
- Video (generation/prediction)
- Synthetic Data Generation
- Upscaling old 2D video games to 4K resolution
- ...


## 2. Illustration

**1)** Some art generated by GANs, from the Art and Artificial Intelligence Laboratory, Rutgers University, and sold for 432'000 $

<img src="images/art.png">

**2)** Faces generated by GANs. None of these people actually exist. These faces were generated on https://www.thispersondoesnotexist.com/

<img src="images/faces.png">

**3)** Anime characters generated by NVIDIA using the algorithm StyleGAN

<img src="images/anime.png">

**4)** GAN-Generated Music by Aiva

In [48]:
%%html
<iframe width="100%" height="166" scrolling="no" frameborder="no" allow="autoplay" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/522141435&color=%23ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&show_teaser=true"></iframe>

**5)** GAN for data generation

https://poloclub.github.io/ganlab/

...

## 3. History

- 1959 : The general idea of learning via competition between players dates back to at least 1959. **Arthur Samuel**, demonstrated that algorithms could learn to play checkers via adversarial self-play.

- 1992 : Works by Jürgen Schmidhuber on Predictibility Minimization, a form of Adversarial Networks https://youtu.be/HGYYEUSm-0Q?t=3779

- 2014 : Ian Goodfellow is recognized by several sources as having **invented GANs** in 2014. His paper included the first working implementation of a generative model based on adversarial networks, as well as game theoretic analysis establishing that the method is sound.
https://arxiv.org/abs/1406.2661

- 2017 : A GAN was used for **image enhancement** focusing on realistic textures rather than pixel-accuracy, producing a higher image quality at high magnification.

- 2017 : The first **faces** were generated. These were exhibited in February 2018 at the Grand Palais.

- 2017 : GAN technology began to make its presence felt in the **fine arts** arena with the appearance of a newly developed implementation which was said to have crossed the threshold of being able to generate unique and appealing abstract paintings, and thus dubbed a "CAN", for "creative adversarial network".

- 2018 : A GAN system was used to create the 2018 **painting** Edmond de Belamy, which sold for 432'500 USD

- 2019 : Researchers at Samsung demonstrated a GAN-based system that produces **videos of a person** speaking given only a single photo of that person.

<img src='images/facebook.png'>

# II. How do GANs work ?

> GANs are used to Generate data that follow a target distribution, from a random input.

<img src='images/GANprinciple.png'>

> GANs are based on the notion of Adversarial Training. Adversarial training is a technique employed in the field of machine learning which attempts to fool models through malicious input.

## 1. Generative Models

There are 2 main ways to generate data :
- by directly comparing the generated data and the input data :  Generative Matching Networks
- by indirectly comparing the generated data and the input data : Generative Adversarial Networks

### a. Generative Matching Networks

Generative Matching Networks is a category of algorithms made of :
- Generative Moments Matching Networks
- Generative Features Matching Networks

> We **compare the true and the generated probability distributions** and backpropagating the difference (the error) through the network.

<img src='images/GMN.png'>

Note, we usually compute the error by Maximum Mean Discrepancy.

### b. Generative Adversarial Networks

> In GANs, we don't compute the error directly. We train a discriminator to recognize the true and the generated images.

GANs are made of 2 Neural Nets :
- A Generator : `G(z)`
- A Discriminator : `D(x)`

We have in a sense, 2 inputs :
- Real data
- Generated data (by the Generator)

We try to :
- teach the discriminator to recognize real data and generated data,
- teach the generator to make generated data look like real data

<img src='images/GANschema.png'>

In adversarial training, both networks try to beat each other and, doing so, they are both becoming better and better.

#### Architecture Overview

<img src='images/GANs.png'>

#### The Discriminator : D(x)

> If you input a `x` data point through `D(x)`, it will output a value between 0 and 1, which is the probability that `x` is from the original dataset.

If we successfully manage to generate data that look like the input data, our discriminator should have issues telling which image is real or fake. In this sense, we expect the probability that it outputs to tend to 0.5.

In [None]:
def discriminator():
    
    img = Input(shape=img_shape)
    
    # Create the sequential model
    model = Sequential()
    
    # Flatten the images taken as inputs
    model.add(Flatten(input_shape=img_shape))
    
    # First layer
    model.add(Dense(512))
    model.add(LeakyReLU(alpha=0.2))

    # Many layers ...
    
    model.add(Dense(1, activation='sigmoid'))

    # Get result
    validity = model(img)
    
    return Model(img, validity)

<img src='images/disc.png'>

#### The Generator : G(x)

> The Generator `G(z)` takes as input a noise vector, called `z`. Then, the generator learns to output a generated data from this noise input. The output of G(z) is a matrix whose dimensions are equal to the true inputs. 

Ideally, we want `G(z)` to output matrices which are indistinguishable from the original data (x) distribution.

In [None]:
def generator():

    # Input Data
    noise_shape = (100,)
    noise = Input(shape=noise_shape)
    
    # Create the sequential model
    model = Sequential()

    # Build the first layer
    model.add(Dense(256, input_shape=noise_shape))
    model.add(LeakyReLU(alpha=0.2))
    model.add(BatchNormalization(momentum=0.8))
    
    # Many layers...
    
    # Flatten and reshape
    model.add(Dense(np.prod(img_shape), activation='tanh'))
    model.add(Reshape(img_shape))

    # Get result
    img = model(noise)
    
    return Model(noise, img)

<img src='images/gen.png'>

#### Practical considerations :

Here's some practical considerations for building your first GAN :
- Usually the discriminator “wins”
- Usually D is bigger and deeper than G
- Sometimes train D more often than G.
- Do not try to limit D to avoid making it “too smart” 
- Use gaussian distribution for input Noise
- Use non-saturating functions (ReLu, MaxPool)
- Use Soft and Noisy Labels

#### A bit of Maths

The objective function in GANs is the following : 

<img src='images/eq2.png'>

What does it mean ?

- We train **D** to **maximize** the probability of assigning the correct label to both training examples and samples from G. 
- We simultaneously train **G** to **minimize** `(1 − D(G(z)))`

In other words, D and G play a two-player **minimax** game with value function `V(G, D)`

<img src='images/eq3.png'>

- Optimizing D is computationally prohibitive, and on finite datasets would result in overfitting. 
- Instead, we alternate between `k` steps of optimizing D and `one` step of optimizing G. 

This results in D being maintained near its optimal solution, while G changes slowly.

<img src='images/process.png'>

# III. How to train a GAN ?

A full list of useful recommendations can be found here : https://github.com/soumith/ganhacks

New papers are constantly coming out, but here's a general list of recommendations to train a GAN :
- Normalize the inputs
- In GAN papers, the loss function to optimize G is : min (log 1-D), but in practice we practically use : max log D
- Sample input data from a Gaussian distribution rather than a uniform one
- Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images
- Avoid Sparse gradients such as Relu and Maxpool.LeakyRelu is good for both the generator and the discriminator.
    - For Downsampling, use: Average Pooling, Conv2d + stride
    - For Upsampling, use: PixelShuffle, ConvTranspose2d + stride
- Soft and Noisy labels : If you have two target labels: Real=1 and Fake=0, then for each incoming sample, if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake sample, replace it with 0.0 and 0.3 (for example).
- Use Stochastic Gradient Descent for the discriminator and ADAM for the generator


# IV. Types and variations of GANs

As you might guess, since GANs are really recent, there's an expending and active litterature on this topic. What we covered so far is only the basis of GANs. In this section, we'll cover some extensions and twists of GANs, what they bring and what they can be used for. There are many, many, many types of GANs, including :   
- Auxiliary Classifier Generative Adversarial Network
- Adversarial Autoencoder
- Bidirectional GAN
- Boundary-Seeking GAN
- Conditional GAN
- Context-Conditional GAN
- CycleGAN
- Deep Convolutional GAN
- Semi-Supervised GAN
- ...


We obviously won't cover all of them, but here are the major improvments to classical GANs.

## 1. StarGAN and InfoGAN

https://arxiv.org/pdf/1711.09020.pdf

> StarGANs give non-random inputs to a GAN (i.e an image), and learn multiple mapping at a time to avoid overfitting, 

We call the fact of feeding an information input instead of noise an **InfoGAN**.

StarGANs are implemented in PyTorch. Pre-trained models exist in the context of face features generation (Blond, Brunette, but also emotions like happiness or sadness).

<img src='images/star.png'>

## 2. Deep Convolutional GANs

https://arxiv.org/pdf/1511.06434.pdf

> DCGANs are an improvement of GANs and use CNNs instead of systematically using Dense networks.

They are more stable and generate higher quality images, since they use CNNs. In DCGAN, **batch normalization** is done in **both networks**, i.e the generator network and the discriminator network. They can be used for style transfer. For example, you can use a dataset of handbags to generate shoes in the same style as the handbags.

<img src='images/bags.png'>

## 3. Conditional GANs (pix2pix)

https://arxiv.org/pdf/1411.1784.pdf

> Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditioned on some extra information y, for example a label.

These GANs use extra label information and result in better quality images and are able to control how generated images will look.

<img src='images/cgans.png'>

## 4. StackGAN

https://arxiv.org/pdf/1612.03242.pdf
    
StackGANs synthetize high-quality images from text descriptions in computer vision. They propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions.

<img src='images/sgans.png'>

## 5. Least Square GAN (LSGAN)

https://arxiv.org/pdf/1611.04076.pdf

Regular GANs may lead to the vanishing gradients problem, which therefore slows the learning process.

LSGAN attempts to overcome this problem by adopting the least squares loss function instead of the sigmoid cross entropy loss for the discriminator.

## 6. Auxiliary Classifier GAN (ACGAN)

https://arxiv.org/pdf/1610.09585.pdf

In ACGAN’s, every generated sample has a corresponding class label C in addition to the noise Z.

G uses both to generate images : $X=G(C,Z)$. The discriminator gives both a probability distribution over sources and a probability distribution over the class labels.

Authors demonstrate that that adding more structure to the GAN latent space along with a specialized cost function results in higher quality samples.