# Stacked Autoencoders

**Autoencoders** are artificial neural networks that learn dense representations of the input data without any supervision. The dense representations are called *latent representations* or *codings*. The codings typically have much lower dimensionality than the input data, which makes autoencoders useful for dimensionality reduction. They can also act as feature detectors (feature extraction), unsupervised pretraining of deep neural networks, and generative models. As generative models, they can randomly generate new data that looks very similar to the training data.

Simply, autoencoders are trained in an unsupervised manner to learn the low-level features of an input (latent representations or codings), which are then used to reconstruct the original input. So, an autoencoder consists of 3 components: encoder, latent representations (or codings), and decoder. The encoder compresses the input and produces the codings. The decoder then reconstructs the input from the codings.

Resources:

https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798

https://www.tensorflow.org/tutorials/generative/autoencoder

https://blog.keras.io/building-autoencoders-in-keras.html

https://www.datacamp.com/community/tutorials/autoencoder-keras-tutorial

# Import **tensorflow** library

Import library and alias it:

In [None]:
import tensorflow as tf

# GPU Hardware Accelerator

To vastly speed up processing, we can use the GPU available from the Google Colab cloud service. Colab provides a free Tesla K80 GPU of about 12 GB. It’s very easy to enable the GPU in a Colab notebook:

1.	click **Runtime** in the top left menu
2.	click **Change runtime** type from the drop-down menu
3.	choose **GPU** from the Hardware accelerator drop-down menu
4.	click **SAVE**

Verify that GPU is active:

In [None]:
tf.__version__, tf.test.gpu_device_name()

# Stacked Autoencoders

**Stacked encoders** have multiple hidden layers. The architecture is typically symmetrical with regard to the central hidden layer, which is called the *coding layer*. 

## Load Data

Load Fashion-Mnist as Numpy arrays:

In [None]:
import tensorflow_datasets as tfds

(x_train_img, _), (x_test_img, _) = tfds.as_numpy(
    tfds.load('fashion_mnist', split=['train','test'],
              batch_size=-1, as_supervised=True,
              try_gcs=True))

Notice that we don't load the labels because autoencoders are unsupervised models.

## Scale

Scale by dividing datasets by the number of pixels that represent an image:

In [None]:
import numpy as np

x_train, x_test = x_train_img.astype(np.float32) / 255.,\
                  x_test_img.astype(np.float32) / 255.

## Clear Previous Models and Generate Seed

Clear previous model sessions and generate a seed for reproducibility:

In [None]:
tf.keras.backend.clear_session()
np.random.seed(0)
tf.random.set_seed(0)

## Get Input Shape

Get input shape for use in the model:

In [None]:
in_shape = x_train.shape[1:]
in_shape

## Build Stacked Autoencoder

Stacked encoders have multiple hidden layers. The architecture is typically symmetrical with regard to the central hidden layer, which is the coding layer. We split the autoencoder model into the encoder and decoder.

Import libraries:

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten,\
  Reshape

In our example, the encoder accepts 28 x 28 pixel grayscale images, flattens them so that each image is represented as a vector of size 784, and processes the vectors through three Dense layers of diminishing sizes (128 units to 64 units to 32 units). The 32 unit layer is the coding layer (central hidden layer). For each input image, the encoder outputs a vector of size 32. 

In [None]:
stacked_encoder = Sequential([
  Flatten(input_shape=in_shape),
  Dense(128, activation='relu'),
  Dense(64, activation='relu'),
  Dense(32, activation='relu')
])

The decoder accepts codings of size 32 (output by the encoder) and processes them through three Dense layers of increasing sizes (64 units to 128 units to 784 units). It then reshapes the final vectors into 28 x 28 arrays so the decoder's outputs have the same shape as the encoder's inputs. 

In [None]:
stacked_decoder = Sequential([
  Dense(64, activation='relu'),
  Dense(128, activation='relu'),
  Dense(28 * 28, activation='sigmoid'),
  Reshape(in_shape)
])

## Create Stacked Autoencoder

Create stacked autoencoder based on stacked encoder and decoder:

In [None]:
stacked_ae = Sequential([stacked_encoder, stacked_decoder])

## Create Appropriate Metric

Create metric to track model performance:

In [None]:
def rounded_accuracy(y_true, y_pred):
    return tf.keras.metrics.binary_accuracy(tf.round(y_true),
                                            tf.round(y_pred))

The *accuracy* metric won't work properly since it expects labels to be either 0 or 1 for each pixel.

## Compile

Use **binary crossentropy** as the loss function because the reconstruction task is a multilabel binary classification problem since each pixel intensity represents the probability that the pixel should be black.

In [None]:
opt = tf.keras.optimizers.SGD(lr=1.5)

stacked_ae.compile(
    loss='binary_crossentropy',
    optimizer=opt, metrics=[rounded_accuracy])

## Train

Train the model using x_train as both the input and the target. The encoder will learn to compress the dataset from 784 dimensions to the latent space, and the decoder will learn to reconstruct the original images.

In [None]:
sae_history = stacked_ae.fit(
    x_train, x_train, epochs=10,
    validation_data=(x_test, x_test))

## Visualize Performance

Import a plotting library:

In [None]:
import matplotlib.pyplot as plt

Create a visualization function:

In [None]:
def viz_history(training_history):
  loss = training_history.history['loss']
  val_loss = training_history.history['val_loss']
  accuracy = training_history.history['rounded_accuracy']
  val_accuracy = training_history.history['val_rounded_accuracy']
  plt.figure(figsize=(14, 4))
  plt.subplot(1, 2, 1)
  plt.title('Loss')
  plt.xlabel('Epoch')
  plt.ylabel('Loss')
  plt.plot(loss, label='Training set')
  plt.plot(val_loss, label='Test set', linestyle='--')
  plt.legend()
  plt.grid(linestyle='--', linewidth=1, alpha=0.5)
  plt.subplot(1, 2, 2)
  plt.title('Accuracy')
  plt.xlabel('Epoch')
  plt.ylabel('Accuracy')
  plt.plot(accuracy, label='Training set')
  plt.plot(val_accuracy, label='Test set', linestyle='--')
  plt.legend()
  plt.grid(linestyle='--', linewidth=1, alpha=0.5)
  plt.show()

Visualize:

In [None]:
viz_history(sae_history)

## Visualize the Reconstructions

Create a function to plot a grayscale 28x28 image:

In [None]:
import matplotlib.pyplot as plt

def plot_image(image):
    plt.imshow(image, cmap='binary')
    plt.axis('off')

Create a function to visualize original images and reconstructions:

In [None]:
def show_reconstructions(model, images, n_images):
  reconstructions = model.predict(images[:n_images])
  reconstructions = tf.squeeze(reconstructions) # drop '1' dimension
  fig = plt.figure(figsize=(n_images * 1.5, 3))
  for image_index in range(n_images):
    plt.subplot(2, n_images, 1 + image_index)
    plot_image(images[image_index])
    plt.subplot(2, n_images, 1 + n_images + image_index)
    plot_image(reconstructions[image_index])

The predict() function adds the *1* dimension back.

Check dimensionality of test data:

In [None]:
x_test.shape

To visualize with imshow(), we must remove dimensions of size 1 from the shape of a tensor:

In [None]:
x_test_imgs = tf.squeeze(x_test)
x_test_imgs.shape

Visualize:

In [None]:
show_reconstructions(stacked_ae, x_test_imgs, 6)

Reconstructed images are generated from **test images** based on predictions from the trained model.

## Breakdown

Grab an image from the test set:

In [None]:
img = x_test[:1]

Since the prediction method computations are done in batches, we grab the first image as a batch of one.

Make a prediction based on the image batch:

In [None]:
reconstruction = stacked_ae.predict(img)

Drop the '1' dimension:

In [None]:
reconstruction = tf.squeeze(reconstruction)

Plot reconstruction:

In [None]:
plot_image(reconstruction)

Plot actual image:

In [None]:
plot_image(tf.squeeze(x_test[0]))

We squeeze out the '1' dimension from the image to plot.

## Visualize with Dimensionality Reduction

To perform dimensionality reduction, we need labels. So load labels from the **test data** set:

In [None]:
test = tfds.as_numpy(
    tfds.load('fashion_mnist', split=['test'],
              batch_size=-1, as_supervised=True,
              try_gcs=True))

Slice test labels from the test data set:

In [None]:
y_test = test[0][1]

Use the encoder to reduce dimensionality to 32:

In [None]:
from sklearn.manifold import TSNE

np.random.seed(0)
x_test_compressed = stacked_encoder.predict(x_test_imgs)
tsne = TSNE()
x_test_2D = tsne.fit_transform(x_test_compressed)
x_test_2D = (x_test_2D - x_test_2D.min()) /\
  (x_test_2D.max() - x_test_2D.min())

We used Scikit-Learn's implementation of the t-SNE algorithm to reduce dimensionality to 2D for visualization.

Visualize:

In [None]:
plt.scatter(x_test_2D[:, 0], x_test_2D[:, 1],
            c=y_test, s=10, cmap='tab10')
plt.axis('off')
plt.show()

Each class is represented by a different color.

Display a prettier visualization:

In [None]:
import matplotlib as mpl

plt.figure(figsize=(10, 8))
cmap = plt.cm.tab10
plt.scatter(x_test_2D[:, 0], x_test_2D[:, 1],
            c=y_test, s=10, cmap=cmap)
image_positions = np.array([[1., 1.]])
for index, position in enumerate(x_test_2D):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(x_test_imgs[index],
                                      cmap='binary'),
            position, bboxprops={
                'edgecolor': cmap(y_test[index]), 'lw': 2})
        plt.gca().add_artist(imagebox)
plt.axis('off')
plt.show()

Adapted from https://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html

# Tying Weights

When an autoencoder is neatly symmetrical, we can tie the weights of the decoder layers to the weights of the encoder layers. As a result, we halve the number of weights in the model, which speeds training and reduces overfitting. 

## Define a Custom Layer

To tie the weights of the encoder and the decoder, we use the transpose of the encoder's weights as the decoder weights:

In [None]:
class DenseTranspose(tf.keras.layers.Layer):
  def __init__(self, dense, activation=None, **kwargs):
    self.dense = dense
    self.activation = tf.keras.activations.get(activation)
    super().__init__(**kwargs)
  def build(self, batch_input_shape):
    self.biases = self.add_weight(
        name='bias', shape=[self.dense.input_shape[-1]],
        initializer='zeros')
    super().build(batch_input_shape)
  def call(self, inputs):
    z = tf.matmul(
        inputs, self.dense.weights[0], transpose_b=True)
    return self.activation(z + self.biases)

The class accepts a layer from a model, an activation function (if included in a layer), and transposes the data. A lot of times we have to preprocess data fed into machine learning algorithms. The reason is that data may be stored as rows, but the machine learning algorithm expects input as columns or vice versa. So transposition is a very useful operation in machine learning.

Resource:

https://www.youtube.com/watch?v=QDpeRUIrb6U

## Clear Models and Generate Seed

Clear previous model sessions and generate a seed for reproducibility:

In [None]:
tf.keras.backend.clear_session()
np.random.seed(0)
tf.random.set_seed(0)

## Create Dense Layers

Create three dense layers for the model:

In [None]:
dense_1 = Dense(128, activation='relu')
dense_2 = Dense(64, activation='relu')
dense_3 = Dense(32, activation='relu')

## Build the Encoder

Build the encoder with three dense layers:

In [None]:
tied_encoder = Sequential([
  Flatten(input_shape=in_shape),
  dense_1,
  dense_2,
  dense_3
])

## Build the Decoder

Build the decoder and tie weights with the encoder:

In [None]:
tied_decoder = Sequential([
  DenseTranspose(dense_3, activation='relu'),
  DenseTranspose(dense_2, activation='relu'),
  DenseTranspose(dense_1, activation='sigmoid'),
  Reshape([28, 28])
])

## Build Tied Model

Build the model with tied weights between the encoder and decoder:

In [None]:
tied_ae = Sequential([tied_encoder, tied_decoder])

## Compile

Compile with **binary crossentropy**:

In [None]:
tied_ae.compile(loss='binary_crossentropy',
                optimizer=opt, metrics=[rounded_accuracy])

## Train

Train for ten epochs:

In [None]:
tied_history = tied_ae.fit(
    x_train, x_train, epochs=10,
    validation_data=(x_test, x_test))

Visualize training performance:

In [None]:
viz_history(tied_history)

## Visualize Reconstructions

Show test image reconstructions based on predictions from the trained model:

In [None]:
show_reconstructions(tied_ae, x_test_imgs, 6)
plt.show()

# Denoising Autoencoders

An autoencoder can also be trained to remove noise from images. We can add noise to inputs and train to recover the original noise-free inputs.

## Clear Model and Generate Seed

Clear previous model sessions and generate a seed for reproducibility:

In [None]:
tf.keras.backend.clear_session()
np.random.seed(0)
tf.random.set_seed(0)

## Build the Encoder with Gaussian Noise

Add pure Gaussian noise directly in the encoder:

In [None]:
from tensorflow.keras.layers import GaussianNoise

gaussian_encoder = Sequential([
  Flatten(input_shape=in_shape),
  GaussianNoise(0.2),
  dense_1,
  dense_2,
  dense_3
])

## Build the Decoder

Tie the weights of the decoder layers to the weights of the encoder layers:

In [None]:
gaussian_decoder = Sequential([
  DenseTranspose(dense_3, activation='relu'),
  DenseTranspose(dense_2, activation='relu'),
  DenseTranspose(dense_1, activation='sigmoid'),
  Reshape([28, 28])
])

## Build the Denoising Autoencoder

Build the denoising autoencoder from the gaussian encoder and decoder:

In [None]:
gaussian_ae = Sequential([gaussian_encoder, gaussian_decoder])

## Compile

Compile with **binary crossentropy**:

In [None]:
gaussian_ae.compile(
    loss='binary_crossentropy',
    optimizer=opt, metrics=[rounded_accuracy])

## Train

Train model for ten epochs:

In [None]:
gae_history = gaussian_ae.fit(
    x_train, x_train, epochs=10,
    validation_data=(x_test, x_test))

Visualize training performance:

In [None]:
viz_history(tied_history)

## Visualize Reconstructions

Add the same amount of Gaussian noise to **test** images:

In [None]:
noise = GaussianNoise(0.2)
show_reconstructions(gaussian_ae, noise(x_test_imgs), 6)
plt.show()

# Build the Encoder with Dropout

Add dropout directly into the encoder. Dropout adds random noise to the images.

In [None]:
from tensorflow.keras.layers import Dropout

tf.keras.backend.clear_session()
np.random.seed(0)
tf.random.set_seed(0)

dropout_encoder = Sequential([
  Flatten(input_shape=in_shape),
  Dropout(0.5),
  dense_1,
  dense_2,
  dense_3
])

## Build the Decoder

Tie the weights of the decoder layers to the weights of the encoder layers:

In [None]:
dropout_decoder = Sequential([
  DenseTranspose(dense_3, activation='relu'),
  DenseTranspose(dense_2, activation='relu'),
  DenseTranspose(dense_1, activation='sigmoid'),
  Reshape([28, 28])
])

We tie the weights together because the performance is better.

## Build the Dropout Autoencoder

Build the autoencoder from the dropout encoder and decoder:

In [None]:
dropout_ae = Sequential([dropout_encoder, dropout_decoder])

## Compile

Compile with **binary crossentropy**:

In [None]:
dropout_ae.compile(
    loss='binary_crossentropy',
    optimizer=opt, metrics=[rounded_accuracy])

## Train

Train the model for ten epochs:

In [None]:
drop_history = dropout_ae.fit(
    x_train, x_train, epochs=10,
    validation_data=(x_test, x_test))

Visualize performance:

In [None]:
viz_history(drop_history)

## Visualize Reconstructions

Add the same amount of dropout noise to test images:

In [None]:
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

dropout = Dropout(0.5)
show_reconstructions(dropout_ae, dropout(x_test_imgs), 6)
plt.show()