# FIT5230 Week 6: Generative Adversarial Networks (GANs)

## 1. Recap: Classification & Model Types

### Classification
* **Pattern Recognition**: The goal of classification is to predict a class label $y$ for a given sample $x$ .
* **Statistical Approach**: This is done by learning a probability distribution from the features.
    * **Training**: Learn the probability of a class $Y$ given a set of features $F$, $P(Y|F)$ .
    * **Testing**: For an unknown sample $x$, compute its features $F$ and predict the class $y$ that has the maximum probability: $\max[P(Y|F)]$ .

### Generative vs. Discriminative Models
There are two main statistical approaches to classification:

**1. Discriminative Models** 
* **Goal**: To learn the **decision boundary** between classes.
* **Mechanism**: Directly models the conditional probability $P(Y|F)$. It answers: "Given this sample's features, what is the probability it belongs to class Y?".

**2. Generative Models** 
* **Goal**: To learn the underlying **distribution of the data** itself.
* **Mechanism**: Models the joint probability $P(F, Y)$ or the conditional $P(F|Y)$. It answers: "What do samples of class Y look like?".
* **Application**: Can be used to "generate" new samples $x'$ that fit the learned distribution ($p_{model}$) .
* **Bayes' Theorem**: To use a generative model for classification, it finds $P(F|Y)$ and then uses Bayes' theorem to flip it into the discriminative $P(Y|F)$:
$$P(Y|F) = \frac{P(F|Y)P(Y)}{P(F)}$$

---
<hr>

## 2. Generative Adversarial Networks (GANs)

A GAN is a specific type of generative model proposed by Ian Goodfellow. It uses a "security game" setup to learn the data distribution .

### The Two Players
A GAN consists of two competing neural networks:
* **Generator (G)**: The "counterfeiter." Its goal is to create fake samples $x'$ from random noise $z$ . It wins if its fakes are so good that the Discriminator cannot tell them apart from real samples. Goal: **Indistinguishability (IND)** .
* **Discriminator (D)**: The "police." Its goal is to correctly identify samples as either real ($x$) or fake ($x'$). It outputs a probability that a sample is real. It wins if it can tell the difference. Goal: **Break Indistinguishability (xIND)**.


---
<hr>

## 3. GANs & Game Theory

### The Minimax Game
The training of a GAN is a two-player **minimax game**. This is different from a standard optimization problem because each player's cost function depends on the *other* player's parameters.
* **Optimization Solution**: A minimum point.
* **Game Solution**: A **Nash Equilibrium** .

The entire game is captured by a single value function $V(D, G)$:
$$min_{G} max_{D} V(D,G) = \mathbb{E}_{x \sim p_{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_{z}(z)}[\log(1 - D(G(z)))]$$

**Conceptual Breakdown**:
* **$max_{D} V(D,G)$ (Discriminator's Goal)**: The Discriminator tries to **maximize** this function .
    * $`\mathbb{E}_{x \sim p_{data}(x)}[\log D(x)]`$: This is the score from real data. D wants $D(x)$ to be 1 (100% real). `log(1)` is 0 (the max value).
    * $`\mathbb{E}_{z \sim p_{z}(z)}[\log(1 - D(G(z)))]`$: This is the score from fake data. D wants $D(G(z))$ to be 0 (100% fake). This makes `log(1-0)` = `log(1)` = 0.
* **$min_{G} V(D,G)$ (Generator's Goal)**: The Generator tries to **minimize** this function.
    * It only controls the second term. It wants to fool D into thinking its fakes are real, so it pushes $D(G(z))$ toward 1. As $D(G(z)) \rightarrow 1$, `log(1 - D(G(z)))` approaches `log(0)`, which is $-\infty$. This minimizes the function.

**Training Process**:
1.  **Train D**: Freeze G. Feed D real samples (labeled 1) and fake samples (labeled 0). Update D's weights ($\theta_D$) to minimize its loss (i.e., get better at classifying) .
2.  **Train G**: Freeze D. Feed G random noise. The fake output is passed to D. The label is *flipped* to 1 (real). G updates its weights ($\theta_G$) to minimize D's error (i.e., get better at fooling D) .

---
<hr>

## 4. Key GAN Architectures

### A. DCGAN (Deep Convolutional GAN)
* **Concept**: The first major GAN to use Convolutional Neural Networks (CNNs) effectively, establishing a stable architecture for image generation .
* **Architecture**:
    * **No Pooling**: Replaces pooling layers with strided convolutions (in D) and **transposed convolutions** (in G) for upsampling .
    * **Batch Normalization (BN)**: Used in most layers to stabilize training.
    * **Activations**: ReLU in G, LeakyReLU in D. LeakyReLU allows a small, non-zero gradient for negative inputs, preventing "dying neurons" .
    * **Optimizer**: Uses **Adam** instead of SGD for more stable weight updates.

### B. CycleGAN
* **Concept**: Performs **unpaired** image-to-image translation. It learns a mapping between two domains ($X \rightarrow Y$) without needing "before" and "after" images (e.g., horse $\leftrightarrow$ zebra, photo $\leftrightarrow$ Monet painting) .
* **Architecture**: Two Generators and two Discriminators .
    * $G: X \rightarrow Y$ (Horse $\rightarrow$ Zebra)
    * $F: Y \rightarrow X$ (Zebra $\rightarrow$ Horse)
    * $D_Y$: Discriminator for domain Y (Is this a real zebra?).
    * $D_X$: Discriminator for domain X (Is this a real horse?).
* **Losses**:
    1.  **Adversarial Loss**: The standard GAN loss (applied twice, one for each G/D pair) to make the images look realistic .
    2.  **Cycle Consistency Loss**: The key innovation. It ensures that if you translate and come back, you get the original image: **$F(G(x)) \approx x$** and **$G(F(y)) \approx y$** . This forces the generators to preserve the underlying structure (semantics) rather than just generating a realistic-looking image of the target class .

---
<hr>

## 5. Evaluation Metrics for GANs

How do you "score" a generator's output?

* **Inception Score (IS)**: Measures **quality** (is the image sharp?) and **diversity** (does it make many different things?). It uses a pre-trained Inception v3 classifier to check if $p(y|x)$ is high (confident classification) and $p(y)$ is diverse (many classes) .
    * **Formula**: $exp(\mathbb{E}_{x}KL(p(y|x) || p(y)))$ 
* **Fr√©chet Inception Distance (FID)**: Compares the statistical distribution (mean and covariance) of features from real images vs. fake images. **Lower FID is better**, meaning the distributions are similar .
    * **Formula**: $||m - m_w||_2^2 + Tr(C + C_w - 2(CC_w)^{1/2})$ 
* **Perceptual Path Length (PPL)**: Measures the "smoothness" of the latent space. A small change in the input noise vector $z$ should result in a small, smooth change in the output image. A low PPL means the latent space is well-structured and not entangled .
* **Precision & Recall**: Re-frames the problem:
    * **Precision**: How many fake images are "realistic" (lie within the real image manifold)? 
    * **Recall**: How much of the real distribution's diversity is "covered" by the generator? 

Inception score: when it's good, the generated output confidently classifies to a class  
Perceptual path length: high = not consistent   

Given:  
Inception score = high  
Frechet inception distance = Moderate  
Perceptual path length = High  
Precision = High  
Recall = Low  

This are images that are convincing but limited in diversity  