## Generative Adversarial Networks (GANS)

* Generative Adversarial Nets(2014) Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair Aaron Courville, and Yoshua Bengio.   
https://papers.nips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf  
https://www.youtube.com/watch?v=eyxmSmjmNS0

*  A generative model with two parts. The goal is to produce realistic images by having the **Generator** create an image that the **Discriminator** classifies as either real or fake.

* Generative models learn P(x,y), Discrimative models learn P(y|x)
    - We can sample from Generative Models

* A zero-sum game between the Generator and the Discriminator.

![](GAN.png)

#### Generator

* It takes a random noise from some probability distribution as input and tries to generate a realistic output image

* $G(z,\theta_g), z \sim{N(0,1)} \text{ or } z \sim{U(-1,1)}$ Sample from a Normal or Uniform distribution


#### Discriminator

* It takes two alternating inputs: the real images of the training dataset or the generated fake samples from the generator. 
* It classifies the input image as real or fake (i.e. comes from the Generator)

* $D(x,\theta_d)$  
* Input: $z \sim{p_g(z)}$ or $x \sim{p_{data}(x)}$  
* Output: 1 = real, 0 = fake



### Loss Functions

#### Disciminator Loss Function

* For the Disciminator we want to minimize the loss function.

$$J^{(D)} = \mathbb{E}_{x\sim{p_{data}}}log(D(x)) + \mathbb{E}_{z\sim{p_z}}log(1 - D(G(z))$$


* $\mathbb{E}_{x\sim{p_{data}}}log(D(x))$ is the loss when input is sampled from the real data. 

* $\mathbb{E}_{z\sim{p_z}}log(1 - D(G(z))$ is the loss when the input is sampled from the Generator


#### The Generator Loss function

$$J^{(G)} = \mathbb{E}_{z\sim{p_z}}log(1 - D(G(z))$$

#### Combining the Generator and Discriminator Loss Function

$$\underset{G}{\mathrm{min}}\text{ }\underset{D}{\mathrm{max}}V(G,D) = \mathbb{E}_{x\sim{p_{data}}}log(D(x)) + \mathbb{E}_{z\sim{p_z}}log(1 - D(G(z))$$

### Training

* These networks are hard to train

* The generator and the discriminator are trained separately. 

* They are trained sequentially (i.e. one after the other), and alternate between the two over multiple epochs.

#### Training Loop

* 1. Repeat for k steps, where k is a hyperparameter (set k = 1):  
    - Sample a mini-batch of m noise samples $(z^{(1)},z^{(2)},...,z^{(m)})$ and transform with the Generator
    - Sample a mini-batch of m samples from the real data, $(x^{(1)},x^{(2)},...,x^{(m)})$
    - Update the discriminator weights $\theta_d$ by **ascending** the stochastic gradient of its loss:
$$\nabla_{\theta_d}\frac{1}{m}\sum_i^m[log(D(x^{(i)})) + log(1 - D(G(z^{(i)}))]$$
    - The generator weights $\theta_g$ will be locked and only the discriminator weights $\theta_d$ are updated.
    
* 2. Sample a mini-batch of m noise samples $(z^{(1)},z^{(2)},...,z^{(m)})$ and transform with the Generator
* 3.  Update the generator by **descending** the stochastic gradient of its loss:
$$\nabla_{\theta_d}\frac{1}{m}\sum_i^m[ log(1 - D(G(z^{(i)}))]$$
    - The discriminator weights $\theta_d$  are locked and we can only adjust the Generator weights $\theta_g$. 

#### Training Tricks 


* Training GANs is notoriously difficult, below are a few of the tricks (i.e. heuristics) to try

* Use tanh as the last activation in the generator, instead of the sigmoid
* Sample points from the latent space using a Gaussian not a uniform distribution.
* Introduce randomness ways: 
    - Use dropout in the discriminator, 
    - Add some random noise to the labels for the discriminator.
* Use LeakyReLU instead of a ReLU activation to ease sparsity constraints by allowing small negative activation values.


### References

Vasilev,Slatr,Spacagna,Roelants,Zocca (2019) Python Deep Learning, 2nd Edition