# FIT5230 Week 6: Generative Adversarial Networks (GANs)

---

## 1. Recap: Classification

### 1.1. The Pattern Recognition Pipeline
Classification is a core part of pattern recognition, which typically follows three steps:
1.  **Preprocessing**: Cleaning and preparing the input data.
2.  **Feature Extraction**: Computing relevant features from the preprocessed data.
3.  **Classification**: Assigning a class label to the sample based on its features.

### 1.2. Statistical Classification
-   **Goal**: Given an observed sample $x$, predict its class label $y$.
-   **Training**: Given a set of labeled samples $(x_i, y_i)$, we plot the frequency of feature values for each class. This helps us estimate the conditional probability $P(Y | (f_1, f_2))$, which is the probability of a class $Y$ given a set of features.
-   **Testing**: For a new, unknown sample $x$, we first compute its features $(f_1, f_2)$ and then output the class label $y$ that maximizes this conditional probability.

---

## 2. Classification Models: Generative vs. Discriminative

There are two main approaches to building classification models:

### 2.1. Discriminative Models
-   These models directly learn the conditional probability $P(Y|F)$, where $F$ represents the features.
-   They focus on finding the **decision boundary** that separates different classes.
-   The output is the class $y$ that maximizes $P(Y|F)$.

### 2.2. Generative Models
-   These models learn the joint probability distribution $P(F, Y)$ or the class-conditional probability $P(F|Y)$.
-   They model how the data for each class is generated.
-   To classify a new sample, they use **Bayes' theorem** to compute the desired conditional probability $P(Y|F)$:
    $$
    P(Y|F) = \frac{P(F|Y)P(Y)}{P(F)} 
    $$
   
-   **Use Cases**:
    1.  Given samples from a real distribution $p_{real}$, we can estimate a model of that distribution, $p_{model}$.
    2.  Given samples from $p_{real}$, we can generate *more* samples from that same distribution. This is what GANs do.

---

## 3. Generative Adversarial Networks (GANs)

A GAN is a type of generative model framed as a **security game** between two competing neural networks: the **Generator (G)** and the **Discriminator (D)**.

### 3.1. The Players
-   **Generator (G)** 🧠:
    -   Its goal is to create "fake" samples $x'$ that are so realistic they become indistinguishable from real samples $x$.
    -   It aims to achieve **INDistinguishability**, fooling the Discriminator into believing its creations are real.

-   **Discriminator (D)** 🕵️:
    -   Its goal is to correctly identify whether a given sample is real (from the actual dataset) or fake (created by the Generator).
    -   It tries to **break INDistinguishability**.
    -   It outputs a probability, $D(\cdot)$, indicating how likely it is that the input sample is real.



### 3.2. GANs and Game Theory
The training process of a GAN is a two-player **minimax game**, where the two players have opposing goals. The solution is not a simple minimum point (like in optimization) but a **Nash equilibrium**, where neither player can improve its outcome by unilaterally changing its strategy.

-   **D's Objective**: Maximize the probability of correctly identifying real data ($D(x)$) and fake data ($1 - D(G(z))$).
-   **G's Objective**: Minimize the probability that D correctly identifies its fake data, which is equivalent to maximizing $D(G(z))$.

This relationship is captured by the GAN **value function** $V(D, G)$:
$$
\min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_{z}(z)}[\log(1 - D(G(z)))]
$$

-   The first term, $\mathbb{E}_{x \sim p_{data}(x)}[\log D(x)]$, is the Discriminator's score on **real** images. D wants to make this close to 1.
-   The second term, $\mathbb{E}_{z \sim p_{z}(z)}[\log(1 - D(G(z)))]$, is the score on **fake** images. D wants $D(G(z))$ to be close to 0, while G wants it to be close to 1.

---

## 4. Key GAN Architectures

### 4.1. DCGAN (Deep Convolutional GAN)
DCGAN introduced key architectural changes that made GANs more stable and effective for image generation.

-   **Architecture**:
    -   Replaces pooling layers with **strided convolutions** (in D) and **transposed convolutions** (in G) for downsampling and upsampling.
    -   Eliminates fully connected layers.
    -   Uses **Batch Normalization** in most layers of both G and D to stabilize training.
-   **Activations**:
    -   Uses **ReLU** in the Generator (except for the final layer, which uses **Tanh**).
    -   Uses **LeakyReLU** in all layers of the Discriminator.
-   **Optimizer**: Uses the **Adam optimizer** instead of standard SGD.

### 4.2. CycleGAN
CycleGAN is designed for **unpaired image-to-image translation**, meaning it can learn to translate between two domains (e.g., horses to zebras, photos to Monet paintings) without needing direct pairs of corresponding images.

-   **Structure**:
    -   **Two Generators**: $G: X \to Y$ and $F: Y \to X$.
    -   **Two Discriminators**: $D_X$ (distinguishes real vs. fake in domain X) and $D_Y$ (distinguishes real vs. fake in domain Y).
-   **Key Innovation: Cycle Consistency Loss**
    -   This loss ensures that if you translate an image from one domain to another and back again, you should get something close to the original image.
    -   Forward cycle: $F(G(x)) \approx x$
    -   Backward cycle: $G(F(y)) \approx y$
-   **Loss Functions**:
    -   **Adversarial Loss**: Encourages the generators to create images that are indistinguishable from images in the target domain.
    -   **Cycle Consistency Loss**: Ensures the translation preserves the original image's core content.
    $$
    \mathcal{L}_{cyc}(G, F) = \mathbb{E}_{x \sim p_{data}(x)}[||F(G(x)) - x||_1] + \mathbb{E}_{y \sim p_{data}(y)}[||G(F(y)) - y||_1]
    $$
   

---

## 5. Evaluation Metrics for GANs

Evaluating GANs is challenging because there isn't a single objective metric like accuracy. Several methods are used:

### 5.1. Inception Score (IS)
-   Measures both **image quality** (are the images sharp and clear?) and **diversity** (does the generator produce a wide variety of images?).
-   It uses a pre-trained Inception v3 classifier to see if generated images are confidently classified into a single class (high quality) and if the distribution of classes is uniform (high diversity).
-   **Higher IS is better**.
-   Formula: $IS(G) = \exp(\mathbb{E}_{x} D_{KL}(p(y|x) || p(y)))$

### 5.2. Fréchet Inception Distance (FID)
-   Compares the statistics (mean and covariance) of feature embeddings from real images versus generated images.
-   It measures how similar the distribution of generated images is to the distribution of real images.
-   More robust than IS and better at detecting **mode collapse** (where the generator produces very limited variety).
-   **Lower FID is better**.
-   Formula: $FID = ||\mu_r - \mu_g||^2 + Tr(\Sigma_r + \Sigma_g - 2(\Sigma_r \Sigma_g)^{1/2})$

### 5.3. Perceptual Path Length (PPL)
-   Measures the "smoothness" of the generator's latent space.
-   If small changes in the input latent vector lead to drastic, non-sensical changes in the output image, the PPL will be high. A good generator should have smooth transitions.
-   **Lower PPL is better**.

### 5.4. Precision & Recall
-   Adapts the concepts of precision and recall to evaluate GANs.
-   **Precision**: How realistic are the generated samples? (Fraction of generated images that are within the manifold of real images).
-   **Recall**: How diverse are the generated samples? (Fraction of real images whose features are covered by the manifold of generated images).

---

## Appendix: Cross-Entropy Loss

-   **Cross-entropy loss** (or log loss) is a common loss function used in classification and in GANs. It's used to measure the difference between two probability distributions.
-   **Information**: A measure of surprise. A low-probability event is surprising and has high information content. $h(x) = -\log(p(x))$.
-   **Entropy**: The expected value of information for a probability distribution. It's the average amount of surprise or uncertainty. $H(x) = \sum p(x_i) \log(1/p(x_i)) = -\sum p(x_i) \log(p(x_i))$.
-   In GANs, cross-entropy is used to compare the distribution of the true labels (real=1, fake=0) with the distribution of the Discriminator's predicted probabilities.

Inception score: when it's good, the generated output confidently classifies to a class  
Perceptual path length: high = not consistent   

Given:  
Inception score = high  
Frechet inception distance = Moderate  
Perceptual path length = High  
Precision = High  
Recall = Low  

This are images that are convincing but limited in diversity  