## Generative Adversarial Network (GAN)

A generative adversarial network (GAN) consists of two neural networks trained in opposition to one another.

The **generator G** takes as input a random noise vector z and outputs an image $X_{fake} = G(z)$.

The **discriminator D** receives as input either a training image or a synthesized image from the generator and outputs a probability distribution $P(S \mid X) = D(X)$ over possible image sources.

The discriminator is trained to **maximize** the log-likelihood it assigns to the correct source:

$$
L = E \left [log \ P(S=real \mid X_{real}) \right ] +  E \left [log \ P(S=fake \mid X_{fake}) \right ]
$$

**Note:** This is the same as cross-entropy, we are only interested in maximizing the log probability of the correct labels.

The generator is trained to **minimize** the second term in the expression:

$$
L = E \left [log \ P(S=fake \mid X_{fake}) \right ]
$$

I had some difficulties to understand this training objective. The generator is not trained in isolation, instead the expression trained is $D(G(z))$ where the weights of the discriminator are fixed. The goal is to come up with a generator that produces images that are so similar to the real images that the discriminator can not classify them as fake any more. Therefore the optimization goal of the generator makes sense.

## Auxiliary classifier GAN (AC-GAN)

Every generated sample has a corresponding class label, $c ∼ p_c$ in addition to the noise z. 

The generator uses both to generate images $X_{fake} = G(c, z)$.

The discriminator returns both a probability distribution over sources and a probability distribution over class labels: $P(S \mid X), P(C \mid X) = D(X)$.

The objective function has two parts: 
 * the log-likelihood of the correct source $L_S$
 * the log-likelihood of the correct class $L_C$
 
$ L_S = E \left [log \ P(S=real \mid X_{real}) \right ] +  E \left [log \ P(S=fake \mid X_{fake}) \right ] $

$ L_C = E \left [log \ P(C=c \mid X_{real}) \right ] +  E \left [log \ P(C=c \mid X_{fake}) \right ] $

D is trained to maximize $ L_S + L_C $

**Note:** the goal is a discriminator that is good in detecting fakes **and** good in predicting class labels

G is trained to maximize $ L_C - L_S $

**Note:** this expression maximizes $L_C$ and minimizes $L_S$ at the same time. The goal is a generator that produces images that are typical for there class but can not be classified as fakes.

**Other Note:** Compared to the description of GANs above the generator optimization goal minimizes the whole $L_S$ instead of just the seconds part $E \left [log \ P(S=fake \mid X_{fake}) \right ]$. The result is the same because in the context of training the generator all images are fake. So there is only the second expression of $L_S$.

