# Chapter 17: Representation Learning and Generative Learning Using Autoencoders and GANs

**Reference:** Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (Aurélien Géron)

---

## 1. Chapter Introduction

Autoencoders are artificial neural networks capable of learning dense representations of the input data, called *latent representations* or *codings*, without any supervision (i.e., the training set is unlabeled). These codings typically have a much lower dimensionality than the input data, making autoencoders useful for dimensionality reduction, especially for visualization purposes. Autoencoders also act as feature detectors, and they can be used for unsupervised pretraining of deep neural networks. Lastly, some autoencoders are *generative models*: they are capable of randomly generating new data that looks very similar to the training data. For example, you could train an autoencoder on pictures of faces, and it would then be able to generate new faces.

In contrast, faces generated by Generative Adversarial Networks (GANs) are now so convincing that it is hard to believe that the people they represent do not exist. GANs are now widely used for super resolution, colorizing black and white images, image editing, and turning sketches into photorealistic images.

In this chapter, we will start by exploring how autoencoders work and how to use them for dimensionality reduction, feature extraction, unsupervised pretraining, and visualization. Then we will move on to Variational Autoencoders (VAEs) and finally GANs.

## 2. Efficient Data Representations

An autoencoder always consists of two parts: an **encoder** (recognition network) that converts the inputs to a latent representation, and a **decoder** (generative network) that converts the internal representation back to the inputs.

As you can see, an autoencoder typically has the same architecture as a Multi-Layer Perceptron (MLP), except that the number of neurons in the output layer must be equal to the number of inputs. The outputs are often called the *reconstructions*. The cost function contains a *reconstruction loss* that penalizes the model when the reconstructions are different from the inputs.

If an autoencoder were simply composed of linear activations and the cost function was the Mean Squared Error (MSE), then it would end up performing Principal Component Analysis (PCA). Let's verify this.

In [None]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

# Generate 3D dataset
np.random.seed(4) # Reproducibility
m = 200
w1, w2 = 0.1, 0.3
noise = 0.1
angles = np.random.rand(m) * 3 * np.pi / 2 - 0.5
data = np.empty((m, 3))
data[:, 0] = np.cos(angles) + np.sin(angles)/2 + noise * np.random.randn(m) / 2
data[:, 1] = np.sin(angles) * 0.7 + noise * np.random.randn(m) / 2
data[:, 2] = data[:, 0] * w1 + data[:, 1] * w2 + noise * np.random.randn(m)

# Scale data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(data[:100])
X_test = scaler.transform(data[100:])

# Build Linear Autoencoder (PCA)
encoder = keras.models.Sequential([keras.layers.Dense(2, input_shape=[3])])
decoder = keras.models.Sequential([keras.layers.Dense(3, input_shape=[2])])
autoencoder = keras.models.Sequential([encoder, decoder])

autoencoder.compile(loss="mse", optimizer=keras.optimizers.SGD(learning_rate=1.5))
history = autoencoder.fit(X_train, X_train, epochs=20)

## 3. Stacked Autoencoders

Just like other neural networks, autoencoders can have multiple hidden layers. In this case, they are called *Stacked Autoencoders* (or Deep Autoencoders). Adding more layers helps the autoencoder learn more complex features. However, one must be careful not to make the autoencoder too powerful. If the encoder is too powerful, it will just learn to copy the input to the output without learning any useful representation (overfitting).

Let's build a stacked autoencoder for Fashion MNIST.

In [None]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full.astype(np.float32) / 255
X_test = X_test.astype(np.float32) / 255
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

# Stacked Autoencoder Architecture
# Encoder: 784 -> 100 -> 30
# Decoder: 30 -> 100 -> 784
stacked_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(30, activation="selu"),
])
stacked_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])

# Compile (using binary_crossentropy because pixel values are 0-1)
stacked_ae.compile(loss="binary_crossentropy",
                   optimizer=keras.optimizers.SGD(learning_rate=1.5), metrics=[])
history = stacked_ae.fit(X_train, X_train, epochs=10,
                         validation_data=(X_valid, X_valid))

### Visualizing Reconstructions

One way to ensure the autoencoder has learned properly is to compare the inputs and the outputs.

In [None]:
import matplotlib.pyplot as plt

def plot_image(image):
    plt.imshow(image, cmap="binary")
    plt.axis("off")

def show_reconstructions(model, images=X_valid, n_images=5):
    reconstructions = model.predict(images[:n_images])
    fig = plt.figure(figsize=(n_images * 1.5, 3))
    for image_index in range(n_images):
        plt.subplot(2, n_images, 1 + image_index)
        plot_image(images[image_index])
        plt.subplot(2, n_images, 1 + n_images + image_index)
        plot_image(reconstructions[image_index])

show_reconstructions(stacked_ae)

## 4. Convolutional Autoencoders

If you are dealing with images, autoencoders using dense layers will not work very well (they ignore spatial structure). You should use Convolutional Autoencoders. The encoder will use regular `Conv2D` layers (usually with strides to reduce spatial dimensionality), and the decoder will use `Conv2DTranspose` layers (to upsample and restore dimensionality).

In [None]:
conv_encoder = keras.models.Sequential([
    keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]),
    keras.layers.Conv2D(16, kernel_size=3, padding="SAME", activation="selu"),
    keras.layers.MaxPool2D(pool_size=2),
    keras.layers.Conv2D(32, kernel_size=3, padding="SAME", activation="selu"),
    keras.layers.MaxPool2D(pool_size=2),
    keras.layers.Conv2D(64, kernel_size=3, padding="SAME", activation="selu"),
    keras.layers.MaxPool2D(pool_size=2)
])
conv_decoder = keras.models.Sequential([
    keras.layers.Conv2DTranspose(32, kernel_size=3, strides=2, padding="VALID", activation="selu",
                                 input_shape=[3, 3, 64]),
    keras.layers.Conv2DTranspose(16, kernel_size=3, strides=2, padding="SAME", activation="selu"),
    keras.layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding="SAME", activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
conv_ae = keras.models.Sequential([conv_encoder, conv_decoder])

conv_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(learning_rate=1.0),
                metrics=["accuracy"])
# history = conv_ae.fit(X_train, X_train, epochs=5, validation_data=(X_valid, X_valid))

## 5. Recurrent Autoencoders

If you want to build an autoencoder for sequences (e.g., time series or text), you can use Recurrent Neural Networks. The encoder is typically a sequence-to-vector RNN which compresses the input sequence into a single state vector. The decoder is a vector-to-sequence RNN that repeats this vector and generates the sequence.

In [None]:
recurrent_encoder = keras.models.Sequential([
    keras.layers.LSTM(100, return_sequences=True, input_shape=[None, 28]),
    keras.layers.LSTM(30)
])
recurrent_decoder = keras.models.Sequential([
    keras.layers.RepeatVector(28, input_shape=[30]),
    keras.layers.LSTM(100, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(28, activation="sigmoid"))
])
recurrent_ae = keras.models.Sequential([recurrent_encoder, recurrent_decoder])
# recurrent_ae.compile(...) # Compilation is similar to previous models

## 6. Denoising Autoencoders

Another way to force the autoencoder to learn useful features is to add noise to its inputs, training it to recover the original, noise-free inputs. This prevents the autoencoder from trivially copying its inputs to its outputs and forces it to find patterns in the data.

The noise can be pure Gaussian noise added to the inputs, or it can be randomly switching off inputs (like Dropout).

In [None]:
dropout_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(30, activation="selu")
])
dropout_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
dropout_ae = keras.models.Sequential([dropout_encoder, dropout_decoder])
dropout_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(learning_rate=1.0),
                   metrics=["accuracy"])
history = dropout_ae.fit(X_train, X_train, epochs=10, validation_data=(X_valid, X_valid))

## 7. Variational Autoencoders (VAEs)

Variational autoencoders differ from regular autoencoders in two ways:
1.  They are **probabilistic** autoencoders: their outputs are partly determined by chance, even after training.
2.  They are **generative** autoencoders: they can generate new instances that look like they were sampled from the training set.

Instead of directly producing a coding for a given input, the encoder produces a *mean coding* $\mu$ and a *standard deviation* $\sigma$. The actual coding is then sampled from a Gaussian distribution with mean $\mu$ and standard deviation $\sigma$. The decoder then decodes the sampled coding.

The cost function is composed of two parts:
1.  **Reconstruction loss:** forces the autoencoder to reproduce its inputs.
2.  **Latent loss:** pushes the autoencoder to have codings that look as though they were sampled from a simple Gaussian distribution (KL Divergence).

In [None]:
class Sampling(keras.layers.Layer):
    def call(self, inputs):
        mean, log_var = inputs
        return keras.backend.random_normal(tf.shape(log_var)) * tf.exp(log_var / 2) + mean

codings_size = 10

inputs = keras.layers.Input(shape=[28, 28])
z = keras.layers.Flatten()(inputs)
z = keras.layers.Dense(150, activation="selu")(z)
z = keras.layers.Dense(100, activation="selu")(z)
codings_mean = keras.layers.Dense(codings_size)(z)
codings_log_var = keras.layers.Dense(codings_size)(z)
codings = Sampling()([codings_mean, codings_log_var])
variational_encoder = keras.models.Model(
    inputs=[inputs], outputs=[codings_mean, codings_log_var, codings])

decoder_inputs = keras.layers.Input(shape=[codings_size])
x = keras.layers.Dense(100, activation="selu")(decoder_inputs)
x = keras.layers.Dense(150, activation="selu")(x)
x = keras.layers.Dense(28 * 28, activation="sigmoid")(x)
outputs = keras.layers.Reshape([28, 28])(x)
variational_decoder = keras.models.Model(inputs=[decoder_inputs], outputs=[outputs])

_, _, codings = variational_encoder(inputs)
reconstructions = variational_decoder(codings)
vae = keras.models.Model(inputs=[inputs], outputs=[reconstructions])

latent_loss = -0.5 * keras.backend.sum(
    1 + codings_log_var - keras.backend.exp(codings_log_var) - keras.backend.square(codings_mean),
    axis=-1)
vae.add_loss(keras.backend.mean(latent_loss) / 784.)
vae.compile(loss="binary_crossentropy", optimizer="rmsprop", metrics=[])
history = vae.fit(X_train, X_train, epochs=25, batch_size=128,
                  validation_data=(X_valid, X_valid))

## 8. Generative Adversarial Networks (GANs)

GANs were proposed by Ian Goodfellow in 2014. They consist of two networks:
* **Generator:** Takes a random distribution as input (typically Gaussian) and outputs some data (e.g., an image). Its goal is to trick the discriminator into believing the output is real.
* **Discriminator:** Takes either a fake image from the generator or a real image from the training set and outputs a probability that the image is real. Its goal is to distinguish real from fake.

Training is a zero-sum game: the generator tries to minimize the discriminator's ability to distinguish, while the discriminator tries to maximize it.

### Implementation (Simple GAN for Fashion MNIST)

In [None]:
codings_size = 30

generator = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[codings_size]),
    keras.layers.Dense(150, activation="selu"),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])

discriminator = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(150, activation="selu"),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(1, activation="sigmoid")
])

gan = keras.models.Sequential([generator, discriminator])

discriminator.compile(loss="binary_crossentropy", optimizer="rmsprop")
discriminator.trainable = False # To train the generator without affecting the discriminator
gan.compile(loss="binary_crossentropy", optimizer="rmsprop")

dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000)
dataset = dataset.batch(32, drop_remainder=True).prefetch(1)

def train_gan(gan, dataset, batch_size, codings_size, n_epochs=5):
    generator, discriminator = gan.layers
    for epoch in range(n_epochs):
        print(f"Epoch {epoch + 1}/{n_epochs}")
        for X_batch in dataset:
            # Phase 1 - Train Discriminator
            noise = tf.random.normal(shape=[batch_size, codings_size])
            generated_images = generator(noise)
            X_fake_and_real = tf.concat([generated_images, X_batch], axis=0)
            y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)
            discriminator.trainable = True
            discriminator.train_on_batch(X_fake_and_real, y1)
            
            # Phase 2 - Train Generator
            noise = tf.random.normal(shape=[batch_size, codings_size])
            y2 = tf.constant([[1.]] * batch_size) # We want discriminator to think these are real
            discriminator.trainable = False
            gan.train_on_batch(noise, y2)

train_gan(gan, dataset, 32, codings_size)