# Chapter 17: Representation Learning and Generative Learning (Autoencoders and GANs)

## 1. Chapter Overview
**Goal:** Can machines be creative? In this chapter, we explore **Generative Learning**. We start with **Autoencoders**, which learn efficient representations (embeddings) of data by trying to copy inputs to outputs with constraints. Then we dive into **Generative Adversarial Networks (GANs)**, where two neural networks compete against each other—one trying to generate fake data, the other trying to detect it—resulting in incredibly realistic synthetic data.

**Key Concepts:**
* **Autoencoders:** Encoder (compresses) + Decoder (reconstructs).
* **Undercomplete Autoencoders:** Forcing the network to learn the most important features by limiting the latent size.
* **Stacked Autoencoders:** Deep networks for better feature extraction.
* **Denoising Autoencoders:** Learning robust features by recovering clean images from noisy ones.
* **Variational Autoencoders (VAEs):** Probabilistic autoencoders that can generate new instances by sampling from a latent distribution.
* **GANs (Generative Adversarial Networks):** Generator vs. Discriminator game.
* **DCGANs:** Deep Convolutional GANs for generating images.
* **Mode Collapse:** A common failure mode where the Generator produces limited varieties of samples.

**Practical Skills:**
* Building a Stacked Autoencoder for MNIST reconstruction.
* Visualizing the latent space (t-SNE).
* Implementing a Variational Autoencoder to generate new digits.
* Training a DCGAN to generate Fashion MNIST images.

In [None]:
# Setup
import sys
assert sys.version_info >= (3, 5)

import sklearn
assert sklearn.__version__ >= "0.20"

import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= "2.0"

import numpy as np
import os
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt

np.random.seed(42)
tf.random.set_seed(42)

print("TensorFlow version:", tf.__version__)

## 2. Theoretical Explanation (In-Depth)

### 1. Autoencoders
An Autoencoder is a neural network trained to output its input ($output = input$). This sounds trivial (the Identity function), but the catch is that we impose **constraints**.
* **Bottleneck:** The hidden layer in the middle is much smaller than the input (e.g., input 784 $\rightarrow$ hidden 30 $\rightarrow$ output 784). The network *must* compress the data, learning the most efficient representation (codings).
* **Usage:** Dimensionality Reduction, Denoising, and Pretraining for supervised tasks.

### 2. Variational Autoencoders (VAEs)
Standard Autoencoders map an input to a fixed vector. VAEs map an input to a **probability distribution** (mean $\mu$ and variance $\sigma$).
* **Generation:** To generate a new digit, we sample a random vector from the learned distribution and feed it to the decoder.
* **Loss Function:** Reconstruction Loss + KL Divergence (forces the distribution to look like a Gaussian).

### 3. Generative Adversarial Networks (GANs)
Proposed by Ian Goodfellow in 2014. It consists of two networks:
1.  **Generator:** Takes random noise and tries to generate realistic data (fake).
2.  **Discriminator:** Takes real data and fake data, and tries to distinguish them.

**Training (Min-Max Game):**
* The Discriminator tries to maximize its accuracy (detect fakes).
* The Generator tries to minimize the Discriminator's accuracy (fool it).
* Ideally, they reach an equilibrium where the generator produces perfect fakes.

## 3. Code Reproduction

### 3.1 Stacked Autoencoder (Dimensionality Reduction)
We will compress Fashion MNIST from 784 dimensions down to 30, and then reconstruct it.

In [None]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full.astype(np.float32) / 255
X_test = X_test.astype(np.float32) / 255
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

# Stacked Autoencoder
stacked_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(30, activation="selu"),
])
stacked_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])

stacked_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(learning_rate=1.5), metrics=["accuracy"])
# Note: Autoencoders are unsupervised, so y is X_train
history = stacked_ae.fit(X_train, X_train, epochs=10, validation_data=(X_valid, X_valid))

In [None]:
# Visualize Reconstruction
def plot_image(image):
    plt.imshow(image, cmap="binary")
    plt.axis("off")

def show_reconstructions(model, images=X_valid, n_images=5):
    reconstructions = model.predict(images[:n_images])
    fig = plt.figure(figsize=(n_images * 1.5, 3))
    for image_index in range(n_images):
        plt.subplot(2, n_images, 1 + image_index)
        plot_image(images[image_index])
        plt.subplot(2, n_images, 1 + n_images + image_index)
        plot_image(reconstructions[image_index])

show_reconstructions(stacked_ae)

### 3.2 Variational Autoencoder (VAE)
We need a custom sampling layer to sample from the latent distribution $z = \mu + \sigma \cdot \epsilon$.

In [None]:
K = keras.backend

class Sampling(keras.layers.Layer):
    def call(self, inputs):
        mean, log_var = inputs
        return K.random_normal(tf.shape(log_var)) * K.exp(log_var / 2) + mean

codings_size = 10

# Encoder
inputs = keras.layers.Input(shape=[28, 28])
z = keras.layers.Flatten()(inputs)
z = keras.layers.Dense(150, activation="selu")(z)
z = keras.layers.Dense(100, activation="selu")(z)
codings_mean = keras.layers.Dense(codings_size)(z)
codings_log_var = keras.layers.Dense(codings_size)(z)
codings = Sampling()([codings_mean, codings_log_var])
variational_encoder = keras.models.Model(
    inputs=[inputs], outputs=[codings_mean, codings_log_var, codings])

# Decoder
decoder_inputs = keras.layers.Input(shape=[codings_size])
x = keras.layers.Dense(100, activation="selu")(decoder_inputs)
x = keras.layers.Dense(150, activation="selu")(x)
x = keras.layers.Dense(28 * 28, activation="sigmoid")(x)
outputs = keras.layers.Reshape([28, 28])(x)
variational_decoder = keras.models.Model(inputs=[decoder_inputs], outputs=[outputs])

# Full VAE
_, _, codings = variational_encoder(inputs)
reconstructions = variational_decoder(codings)
vae = keras.models.Model(inputs=[inputs], outputs=[reconstructions])

# Add KL Divergence Loss
latent_loss = -0.5 * K.sum(
    1 + codings_log_var - K.exp(codings_log_var) - K.square(codings_mean),
    axis=-1)
vae.add_loss(K.mean(latent_loss) / 784.0)
vae.compile(loss="binary_crossentropy", optimizer="rmsprop")
history = vae.fit(X_train, X_train, epochs=10, batch_size=128, validation_data=(X_valid, X_valid))

### 3.3 Generative Adversarial Network (GAN)
We will build a simple DCGAN (Deep Convolutional GAN).

In [None]:
codings_size = 30

# Generator
generator = keras.models.Sequential([
    keras.layers.Dense(7 * 7 * 128, input_shape=[codings_size]),
    keras.layers.Reshape([7, 7, 128]),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2DTranspose(64, kernel_size=5, strides=2, padding="same", activation="selu"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2DTranspose(1, kernel_size=5, strides=2, padding="same", activation="tanh")
])

# Discriminator
discriminator = keras.models.Sequential([
    keras.layers.Conv2D(64, kernel_size=5, strides=2, padding="same",
                        activation=keras.layers.LeakyReLU(0.2),
                        input_shape=[28, 28, 1]),
    keras.layers.Dropout(0.4),
    keras.layers.Conv2D(128, kernel_size=5, strides=2, padding="same",
                        activation=keras.layers.LeakyReLU(0.2)),
    keras.layers.Dropout(0.4),
    keras.layers.Flatten(),
    keras.layers.Dense(1, activation="sigmoid")
])

gan = keras.models.Sequential([generator, discriminator])

# Compile
discriminator.compile(loss="binary_crossentropy", optimizer="rmsprop")
discriminator.trainable = False # Freeze discriminator during GAN training
gan.compile(loss="binary_crossentropy", optimizer="rmsprop")

print("GAN defined.")

### 3.4 Custom Training Loop for GAN
GANs require a specific training loop: Train discriminator, then train generator.

In [None]:
batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000)
dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)

def train_gan(gan, dataset, batch_size, codings_size, n_epochs=1):
    generator, discriminator = gan.layers
    for epoch in range(n_epochs):
        print(f"Epoch {epoch + 1}/{n_epochs}")
        for X_batch in dataset:
            # Phase 1: Train Discriminator
            noise = tf.random.normal(shape=[batch_size, codings_size])
            generated_images = generator(noise)
            X_fake_and_real = tf.concat([generated_images, X_batch], axis=0)
            y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)
            discriminator.trainable = True
            discriminator.train_on_batch(X_fake_and_real, y1)
            
            # Phase 2: Train Generator
            noise = tf.random.normal(shape=[batch_size, codings_size])
            y2 = tf.constant([[1.]] * batch_size) # Trick: Generator wants discriminator to say '1' (Real)
            discriminator.trainable = False
            gan.train_on_batch(noise, y2)

train_gan(gan, dataset, batch_size, codings_size, n_epochs=1)

# Generate an image
noise = tf.random.normal(shape=[1, codings_size])
generated_image = generator(noise)
plot_image(generated_image[0])

## 4. Step-by-Step Explanation

### 1. Autoencoder Compressions
**Input:** 784 pixels.
**Hidden:** 30 neurons. This forces the model to ignore noise and specific pixel positions, focusing on shapes (e.g., "a circle at top means head").
**Output:** 784 pixels. The reconstructed image is blurry because the 30-dim vector cannot hold all the details of the original image.

### 2. The Reparameterization Trick (VAE)
In a VAE, we want to sample $z$ from a Gaussian. But we cannot backpropagate gradients through a random sampling node. 
**Trick:** $z = \mu + \sigma \odot \epsilon$. We sample $\epsilon$ from a standard normal (fixed). Now the randomness is an *input* node (constant during backprop), and $\mu, \sigma$ are deterministic nodes that gradients can flow through.

### 3. GAN Training Loop Dynamics
1.  **Train Discriminator:** We feed it a batch of real images (Label 1) and a batch of fake images generated by the generator (Label 0). It learns to tell them apart.
2.  **Train Generator:** We feed noise to the GAN. The generator creates an image, the discriminator classifies it. We want the discriminator to output **1** (Real). So we set the target label to 1. Since the discriminator is frozen, the gradients flow back to update *only* the generator's weights to make the image look more "real".

### 4. Mode Collapse
Sometimes the generator finds one image that fools the discriminator well (e.g., a specific shoe). It then starts producing *only* that shoe. The discriminator learns to block that shoe, so the generator switches to a shirt. They cycle endlessly without learning diverse data.

## 5. Chapter Summary

* **Autoencoders** are excellent for dimensionality reduction and denoising.
* **VAEs** add a probabilistic twist, allowing generation of new samples by walking through the latent space.
* **GANs** produce the sharpest, most realistic images but are notoriously hard to train (unstable).
* **DCGANs** use convolutional layers to scale GANs to image data.
* **Generative Learning** is the frontier of AI creativity.