# GANs: Generative Adversarial Networks

## Generative models: 

The most popular types are:
- **VAE**: Variational autoencoders [https://arxiv.org/abs/1804.00891]:
    - Work with 2 models, which are typically NNs.
        - **Encoder**: Realistic images are fed, and its job is to represent it in a hyperspherical latent space.
        - **Decoder**: Reconstructs the image that the encoder saw from the vector in the latent space.
    - After training, we actually lop off the encoder and we can pick random points in the latent space.
    - The variational part actually inject some noise into this whole model and training process. Instead of having the encoder encode the image into a single point in that latent space, the encoder actually encodes the image onto a whole distribution and then samples a point on that distribution to feed into the decoder to then produce a realistic image. This adds a little bit of noise since different points can be sampled on this distribution.
    
    
- **GANs**: There are 2 models behind a GAN:
    - **Generator** (*art forger*): similar to encoder: there's no guiding encoder this time that determines what noise vector should look like, that's input into the generator. Instead, there's a discriminator.
        - Given noise and class Y, it's goal is to generate a set of features X that look realistic.
        - Learns to make fakes.
        - Isn't allowed to see real images.
        - Noise (random values) its used to make sure that what's generated is not the same each time.
        - Try to capture *P(X|Y)*.

    - **Discriminator** (*art inspector*): looks at fake and real images and simultaneously trying to figure out which ones are real and which ones are fake.
        - Learns to distinguish real from fake.
        - It can see real images, but doesn't know which is real and which isn't.
        - It is a **classifier**, distinguishing between classes (real and fake). *P(Y|X)*.       
    
    - They "fight" against each other, making 1 of the models so good that it generates realistic images. They learn from each other until they reach a point where we don't need the discriminator anymore, and the generator can take in any random noise and produce a realistic image.
    
<img src="images/model.png">

### Intuition Behind GANs:

1. We train the discriminator using real artwork. After it decides (real or fake), we tell it the truth.
2. When the generator starts creating, it will know in what direction to go on and improve by looking at the scores assigned to its work by the discriminator. 
3. And the discriminator also improves over time because it receives more and more realistic images at each round from the generator, but it will reach a point when generated images are too good to distinguish.
4. When we are happy with the result of this generator, the process ends. 

But how does the generator improve over time?

<img src="images/gan_seq.jpg">

Generated image $\hat{X}$ is fed into the discriminator, which outputs its predictions $\hat{Y}$<sub>d</sub> (how real or fake it thinks generated image is). We can compute a **Cost function** out of this, which looks at how far the examples produced by the generator are being considered real by the discriminator:

- The generator wants $\hat{Y} = 1$, as real as possible.
- The discriminator is trying to get $\hat{Y} = 0$, fake. 

The difference between these two is used to update the parameters of the generator, making it to improve over
time and know which direction to move it's parameters to generate something that looks more real (and that will fool the discriminator). Once we achieve this we can save the parameters $\theta$ of the generator, freezing them so that we can load them back up and sample new images.

The generator will try to approximate the real distribution seen in training. For example, if we are generating cats, most common breeds will have more chances of being generated. 

### Cost function

We will use Binary Cross Entropy (BCE) Cost Function, as it is especially designed for classification tasks:

$$
J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}\log h(x^{(i)},\theta)+(1-y^{(i)})\log(1-h(x^{(i)},\theta)]
$$

Where:
- $m$ is the number of examples in the whole batch. So we sum the loss for each example and average it.
- $h$ denotes the predictions made by the model.
- $y$ is the true label of each example.
- $x$ are the features passed in through the prediction.

Looking at the term of the left in the summatory:

<img src="images/cost_left.png">

And looking at the term in the right:

<img src="images/cost_right.png">

The negative sign ensures that $J>0$, as we want a high value to be bad, in order to minimize it as we learn.

### Training

- Only one model is trained at a time, while the other one is left constant. 
- Both models should improve together and should be kept at similar skill levels from the beginning of training. 

#### Discriminator:

<img src="images/discriminator.png">


#### Generator:

<img src="images/generator.png">