# GAN (Generative Adversarial Networks)

- GAN consits of two neural networks
    - Discriminator and Generator
- __Generator__ is trained to generate the data
- __Discriminator__ is trained to distinguished..
    - input is generated by generator or real image

- Generative model vs Discriminative model
    - There are two types of probabilistic model of ML
- e.g. Multiple class classification
    - Genetative model
    - By sampling latent factor from a generative model, you can generate new data.
        - Latent factor: data representatives in a latent space
        - In multiclass classification problem, generative process is following,
            1. sample a class label y
            2. sample x from the conditional distribution $p(x \mid y)$
    - Generative model is often used for the unlabled unsupervised learning
        - $p(y=c \mid x, \Sigma, \pi, \mu)=\frac{p(x \mid y=c, \mu_{c}, \Sigma_{c})p(y=c, \pi_{c})}{\sum_{c=1}^{c=C}p(x)_c}$
        - $p(x)_c = p(x \mid y=c, \mu_{c}, \Sigma_{c})p(y=c, \pi_{c})$
        - __Softmax__ $p(y=c \mid x, \Sigma, \pi, \mu)=\frac{\exp(W_c*x+W_c0)}{\sum_{l=1}^{l=C}\exp(Wl*x+Wl_0)}$
            - $W_c = \Sigma^{-1}\mu_c$
            - $W_c0 = -\frac{1}{2}\mu^T\Sigma_{-1}\mu_c + \log{p(y=c)}$
    1. Obtain maximum likelihood of $\Sigma, \pi, \mu$
    2. Derive the $W_c, W_c0$ by evaluating the above equation
    - we assume that we are given the label 
        - class prior: p(y=c, $\pi$)
        - class conditional: p(x, y=c, $\mu_{c}$, $\Sigma_{c}$)
            - Gauss distribution
            - In Naive Bayes,
                - we assume that each attribute is independent each other in the condition(y=c)
                - Each dimension can be treated as different distribution
                - Hence the $\Sigma_{c}$ is diagonal and different from each class
        - In Linear discriminative analysis
            - we assume that every class condition has same covariance
            - Hence the $\Sigma$=$\Sigma_{c1}$=$\Sigma_{c2}$
        - Generative model should be normalized by the sum of each class likelihood(=__Softmax__)
    - Discriminative model: logistic regression
        - Let the $W_c, W_c0$ be free variable
        - Learn the $W_c, W_c0$ it self directly
        - W={W_c={1toC}, W_c0={1toC}} //absorbing bias
        - Loss function = Cross Entropy loss
        - Minimization problem of the nagative log likelihood
            - __Softmax__ 
                - $p(y=c \mid x, \Sigma, \pi, \mu)=\frac{\exp(W_c*x+W_c0)}{\sum_{l=1}^{l=C}\exp(Wl*x+Wl_0)}$
            - __Negative Log Likelihood__ 
                - $-\log{p(y=c \mid x, W)}=-\sum_{i=1}^{N}\sum_{c=1}^{C}y_ic\log(\frac{\exp(W_c*x+W_c0)}{\sum_{l=1}^{l=C}\exp(Wl*x+Wl_0)})$
                - $y_ic$ is one hot vector (corresponing to the label column(row) is 1, others are zero)
            - $W_c = \Sigma^{-1}\mu_c$
            - $Wl*x+Wl_0$ is a prediction
            - Each presition are normalized by the aggregated exponential of the prediction among classes

- Architecture of generative adversarial networks
    - The role of the __generator__
        - __Objective__ try to get better at fooling the discriminator
        - Estimate the probability distribution of the real samples in order to provide generated samples resembling real data
        - Construct the probability distribution as much similar as possible to the real world data distribution
    - The role of the __discriminator__
        - __Objective__ try to get better at identifying generated samples
        - Estimate the probability that a given sample came from the real data rather tan being provided by the generator (real or fake)
    - Generative Adversarial Networks has the two model which are trained to compete with each other.

## Tutorial
- https://realpython.com/generative-adversarial-networks/#implementing-the-generator

### toysample
- we consider the data which consists of two feature $x_1,x_2$
- Especially, we want to estimate the $x_2=\sin(x_1)$

![](fig_gan.webp)

### Structure of the generator
- Args:
    - input: random data from a latent space $z=(z_1, z_2)$
    - output: generated data which is resembling to the real-data
- can be MLP and CNN
- e.g. 
    - Decoder part in Autoencoder
    - MLP (Multiple Perceptron)
    - CNN (Convolutional Neural Network)

### Structure of the discriminator
- Args:
    - input: generated data or sampled data from real-world data distribution
    - output: the probability that the inpput belongs to the real-dataset
        - high prob: the input is from real-world
        - low prob: the input is generated by the generator

### Minmax game
- D is adapted to minimize the discrimination error between real and generated samples
- G is adapted to maximize the probability of D making a mistake

To train discriminator, at each iteration you label some real samples taken from the training data as 1 and some generated samples provided from the generator as 0.
We can train the discriminator which can minimize the binary cross entropy as binary classification problem
![](fig_train_discriminator.webp)

After updates of parameter of discriminator, we can train generator to produce better generated samples. The output of generator is connected t D, whose paramters are kept frozen.
![](fig_train_generator.webp)

Now we can consider this model as single classifier such that
- input: random data z
- output: probability of $p(y=1 \mid z)$
- label associated with each input: 1
    - Every generated data should be classified as class 1 (fool the discriminator)
We freeze the parameter of discriminator, using the binary cross entropy as a loss function, we can optimize the paramters of generator such that minimize the loss function (misclassifincation).
When generator can do a good enough job, the output probability should be close to 1.

- At the end
    - __Generator__ will be able to generateo more closely resemble the real data
    - __Discriminator__ will have more trouble to distinguish between real and generated. 

In [None]:
//