# Cost Function (part 1)<hr>

## "Opposite" Cost Functions
- The generator and discriminator are trying to optimize the **"opposite"** thing
- Let's start with the discrimination, since classification (supervised learning) is conceptually easier
- What does the discriminator have to do ?
- Classify an image ad *"real"* or *"fake"*
- 2 different labels -> binary classification
- We have seen this earlier in the course!
![binary_classify](../images/binary_classify.PNG)

#### Discriminator Cost (Binary Cross Entropy)
- Let t = 1 mean *"real"* and t = 0 mean *"fake"*
- Then \\(y = D(x) = p(image is real | image) \in (0, 1)\\)<br> we can say y equals D of x which is a number between 0 and 1 representing the probability that the image x is real.<br>So that's a common shorhand way of representing the discriminator just say D of X and for completion.
- If we want to explicitly mention the discriminator's params, we can say \\(y = D(x;\theta_D)\\)
- Note: x now represents any image (real or fake)

#### Notation
- Let's put a hat on x to show that's it's a fake image
- Now we can get rid of the target it
![binary_classify2](../images/binary_classify2.PNG)

#### Generator Notation
- Can we represent the output of the generator as a function?
- Seesms reasonable to use G(z)
- z represents a latent prior (same as VAE), z ~ p(z)
- 2 steps to sample:<br>
1) z ~ p(z)<br>
2) x_hat = G(z)
- Show parameters of G explicitly: \\(G(z;\theta_G)\\)<hr>

## Discriminator Cost
- Now that we have some notation for discriminator (D) and generator (G), we can state the cost function differently:
![Discriminator_cost](../images/Discriminator_cost.PNG)

#### Batches of Data
- We'll be looking at batches of data during training, so summing the individual negative log-likelihoods gives us the total cost:
![Discriminator_cost_with_batch](../images/Discriminator_cost_with_batch.PNG)

#### What does it approximate?
- This is really an estimate of the expected value over all possible data
![Discriminator_cost2](../images/Discriminator_cost2.PNG)<hr>

## Generator Cost
- Discriminator wants to discriminate between real and fake images, so it minimizes negative-log-likelihood
- Generator wants to fool the discriminator
- Perhaps we can just try to maximize the discriminator's cost !
![Generator_cost](../images/Generator_cost.PNG)

#### Zero-Sum Game
- In game theory this is called a "zero-sum-game" because sum of all players' costs is always 0
- You don't need to know game theory to understand this course, but it's nice to make this connection if you already know game theory

#### Minmax
- Zero-sum games are also called minimax games because solution involves a min and max
![Minmax](../images/Minmax.PNG)
- If you're interested you can check out Goodfellow 2014 (Generative Adversarial Nets)

<hr>
## Next Step: We have a cost -> Minimize it ! 
- Once we have a cost function, it is easy to optimize it using any of the available optimizers in Tensorflow
- Interesting situaltion we haven't seen before: 2 different neural networks, 2 different costs, in the same script, so we need 2 different optimizers

#### Pseudocode
While not converged:<br>
&nbsp;&nbsp;&nbsp; X = get batch of real images<br>
&nbsp;&nbsp;&nbsp; X_hat = sample batch of fake images from G<br>
&nbsp;&nbsp;&nbsp; \\(\theta_D = \theta_D - learningrate * dj^{(D)}/d\theta_D \\)<br>
&nbsp;&nbsp;&nbsp; \\(\theta_G = \theta_G - learningrate * dj^{(G)}/d\theta_G \\)<br><br>
Note: in practice, some implementations run generator update twice for every discriminator update