# Homework: Cycle Generative Adversarial Network (CycleGAN)

## Generative Adversarial Networks (GANs)
Introduction of GANs
- Applications
- Introduce generator and discriminator
- Adversarial/GAN loss

Generative models has seen a huge growth in popularity the latest years due to its unique properties. The usage of these models cover applications that directly require generation, creating more training data for machine learning or data augmentation for privacy concerns. A popular choice of generative models are the Generative Adversarial Networks (GANs), which have been used to generate realistic photographs, image style transfer, face ageing in pictures and [many more impressive applications](https://machinelearningmastery.com/impressive-applications-of-generative-adversarial-networks/).

GANs are constructed by training two models, a generator and a discriminator, that are competing against each other. The generator is trying to learn the distribution of a given dataset and is used to generate fake samples following this distribution. The discriminator is trying to determine whether a sample is from the "fake" distribution or the true distribution. Therefore, the generator and discriminator are considered as adversaries, since they are both trying to fool each other. The [original GAN paper](https://arxiv.org/pdf/1406.2661.pdf) provides a comical example of their roles, where the generative model is analogous to a team of counterfeiters, trying to produce fake currency and use it without detection. The discriminator is analogous to the police, which are trying to detect the counterfeit currency. The competition between these two encourages both to improve their methods until the counterfeit currency is indistinguishable from real currency.

We train the discriminator to maximize the probability of assigning the correct labels for both real samples and generated samples, and train the generator to fool the discriminator simultaneously. In other words, generator and discriminator are playing a minmax game with the value function $V(G, D)$:

$$\underset{G}{min}\underset{D}{max} V(G, D) = \mathbb {E}_{x\sim p_{data}(x)}[logD(x)] + \mathbb{E}_{z\sim p_z(z)}[log(1 - D(G(z)))]$$

### Dataset
The generator and discriminator are both neural networks, whose architecture greatly depends on the task at hand. The most popular domain for GANs are with images and convolutional layers are primarily used as a result. Due

- Introduction to Points dataset

In [1]:
# Code cell with imports

In [2]:
# Code cell where generator and discriminator are constructed

### GAN training

In [3]:
# Code cell with training loop. Complete blank parts

Mode collapse
- What is it
- Why does it happen
  - KL divergence

### What is collapse

A GAN is successfully trained when it reaches two goals:
- The generator can reliably generates samples to fool the discriminator
- The generator can generates samples that are as diverse as the distribution in real-world data

Mode collapse means that the model fails the second goal and generate similar or even identical samples.For example, we expect our model to generate number frome 0 to 9 with MNIST dataset. However, if the results only contain 0 and miss the other numbers, we say that the model meets mode collapse.

### Why does it happen

#### What is KL divergence

To measure the differece two distributions over the same variale $x$, we can use a measurement called Kullback-Leibler divergence. Let $p(x)$ and $q(x)$ are two probablity distribution of discrete random varaible $x$. The formula of KL divergence is defined in following equation:

$$D_{KL}(p(x)||q(x)) = \int_{-\infty}^\infty p(x)ln\frac{p(x)}{q(x)}dx $$

#### KL divergence in GAN
For GAN, we can use KL divergence to meansure the similarity of real data distributioin $P_r(x)$ and genereated sample distribution $P_g(x)$. The formula goes like 

$$D_{KL}(p_g(x)||p_r(x)) = \int_{x} p_g(x)ln\frac{p_g(x)}{p_r(x)}dx$$

The generator will have two bad situations:
- Generate unreal samples. For the unreal samples, $p_g(x) > 0$  and $p_r(x)\approx 0$, so the KL divergence will be nearly $\infty$
- Can't generate real samples. For those real samples that can't be generated, $p_r(x) > 0$ and $p_g(x) \approx 0$, so the KL divergence will be nearly 0

GAN tries to make the real data distribution and generated samples distribution similar, therefore, it requires small KL divergence. In terms of the first situation we analysed above, it will lead to huge KL divergence, so the generator avoids generating unreal samples. As for the second situation, it gets low KL divergence, so GAN are likely to avoid generate other real samples, and end up generating samples that are similiar or identical. To conclude, the generator in GAN may thinks that generating similar samples are enought to fool the discriminator with low risk, which causes the mode collapse.



#TODO   
explain that mode collapse happens in our model

## CycleGAN

- Unpaired datasets
- Explain the introduction of cycle consistency loss with images
    - How does it combat mode collapse? 

### Introduction

Image to image translation is class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned pair. In order to get rid of the constrain of expensive paired data collection, CycleGan uses an algorithm with unpaired dataset, and its goal is to transfer an image within collection X into another collection Y(eg. From horse to zebra). Besides the adversirial loss that vanilla GAN uses, it also comes up with cycle consistent loss to overcome mode collapse. We will introduce the key idea of this model in the following part.


### Unpaired dataset
For tasks like image to image translation, the translation system was in supervised settings, which means that it uses datasets with image pairs. However, obtaining paried data are difficult and expensive. As such, there is a desire for techniques for training an image-to-image translation system that does not require paired examples. CycleGAN comes up with an algorithm to overcome such problem, it can learn to translate domains without paired input-output examples, converting a collection of images in domain X(eg. real photo) into a different collection of images in domain Y(eg. Monet style paintings). Here, a domain refers to images with the same general characteristics extracted from each collection , not a single image anymore. 

In [4]:
# Code cell where two generator and discriminator are constructed

In [5]:
# Code cell with training loop. Implement cycle consistency loss

- Limitations of CycleGAN
    - Geometric translations (i.e. cat to dog)