Auto-Encoders

A typical use of a Neural Network is a case of supervised learning. It involves training data which contains an output label. The neural network tries to learn the mapping from the given input to the given output label. But what if the output label is replaced by the input vector itself? Then the network will try to find the mapping from the input to itself. This would be the identity function which is a trivial mapping.

But if the network is not allowed to simply copy the input, then the network will be forced to capture only the salient features. This constraint opens up a different field of applications for Neural Networks which was unknown. The primary applications are dimensionality reduction and specific data compression.

The network is first trained on the given input. The network tries to reconstruct the given input from the features it picked up and gives an approximation to the input as the output. The training step involves the computation of the error and backpropagating the error. The typical architecture of an Auto-encoder resembles a bottleneck.

The encoder part of the network is used for encoding and sometimes even for data compression purposes although it is not very effective as compared to other general compression techniques like JPEG. Encoding is achieved by the encoder part of the network which has decreasing number of hidden units in each layer. Thus this part is forced to pick up only the most significant and representative features of the data. The second half of the network performs the Decoding function. This part has the increasing number of hidden units in each layer and thus tries to reconstruct the original input from the encoded data.

Thus Auto-encoders are an unsupervised learning technique.

Training of an Auto-encoder for data compression: For a data compression procedure, the most important aspect of the compression is the reliability of the reconstruction of the compressed data. This requirement dictates the structure of the Auto-encoder as a bottleneck.

Step 1: Encoding the input data

The Auto-encoder first tries to encode the data using the initialized weights and biases.

Step 2: Decoding the input data

The Auto-encoder tries to reconstruct the original input from the encoded data to test the reliability of the encoding.

Step 3: Backpropagating the error

After the reconstruction, the loss function is computed to determine the reliability of the encoding. The error generated is backpropagated.

The above-described training process is reiterated several times until an acceptable level of reconstruction is reached.

After the training process, only the encoder part of the Auto-encoder is retained to encode a similar type of data used in the training process.

The different ways to constrain the network are:-

Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible, then the network will be forced to pick up only the representative features of the data thus encoding the data.
Regularization: In this method, a loss term is added to the cost function which encourages the network to train in ways other than copying the input.
Denoising: Another way of constraining the network is to add noise to the input and teaching the network how to remove the noise from the data.
Tuning the Activation Functions: This method involves changing the activation functions of various nodes so that a majority of the nodes are dormant thus effectively reducing the size of the hidden layers.

The different variations of Auto-encoders are:-

Denoising Auto-encoder: This type of auto-encoder works on a partially corrupted input and trains to recover the original undistorted image. As mentioned above, this method is an effective way to constrain the network from simply copying the input.
Sparse Auto-encoder: This type of auto-encoder typically contains more hidden units than the input but only a few are allowed to be active at once. This property is called the sparsity of the network. The sparsity of the network can be controlled by either manually zeroing the required hidden units, tuning the activation functions or by adding a loss term to the cost function.
Variational Auto-encoder: This type of auto-encoder makes strong assumptions about the distribution of latent variables and uses the Stochastic Gradient Variational Bayes estimator in the training process. It assumes that the data is generated by a Directed Graphical Model and tries to learn an approximation to $q_{\phi}(z|x)$ to the conditional property $q_{\theta}(z|x)$ where $\phi$ and $\theta$ are the parameters of the encoder and the decoder respectively.



AutoEncoder with TensorFlow 2.0

This tutorial demonstrates how to generate images of handwritten digits using graph mode execution in TensorFlow 2.0 by training an Autoencoder. 
An AutoEncoder is a data compression and decompression algorithm implemented with Neural Networks and/or Convolutional Neural Networks. the data is compressed to a bottleneck that is of a lower dimension than the initial input. The decompression uses the intermediate representation to generate the same input image again. Let us code up a good AutoEncoder using TensorFlow 2.0 which is eager by default to understand the mechanism of this algorithm. AutoEncoders are considered a good pre-requisite for more advanced generative models such as GANs and CVAEs. 


In [1]:
# Install TensorFlow 2.0 by using the following command
# For CPU installation
# pip install -q tensorflow == 2.0
# For GPU installation (CUDA and CuDNN must be available)
# pip install -q tensorflow-gpu == 2.0

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function


import tensorflow as tf
print(tf.__version__)


2.3.0


After confirming the appropriate TF download, import the other dependencies for data augmentation and define custom functions as shown below. The standard scaler scales the data by transforming the columns. The get_random_block_from_data function is useful when using tf.GradientTape to perform AutoDiff (Automatic Differentiation) to get the gradients.

In [2]:
import numpy as np
import sklearn.preprocessing as prep
import tensorflow.keras.layers as layers

def standard_scale(X_train, X_test):
	preprocessor = prep.StandardScaler().fit(X_train)
	X_train = preprocessor.transform(X_train)
	X_test = preprocessor.transform(X_test)
	return X_train, X_test

def get_random_block_from_data(data, batch_size):
	start_index = np.random.randint(0, len(data) - batch_size)
	return data[start_index:(start_index + batch_size)]


AutoEncoders may have a lossy intermediate representation also known as a compressed representation. This dimensionality reduction is useful in a multitude of use cases where lossless image data compression exists. Thus we can say that the encoder part of the AutoEncoder encodes a dense representation of the data. Here we will use TensorFlow Subclassing API to define custom layers for the encoder and decoder.

In [3]:
class Encoder(tf.keras.layers.Layer):
	'''Encodes a digit from the MNIST dataset'''
	
	def __init__(self,
				n_dims,
				name ='encoder',
				**kwargs):
		super(Encoder, self).__init__(name = name, **kwargs)
		self.n_dims = n_dims
		self.n_layers = 1
		self.encode_layer = layers.Dense(n_dims, activation ='relu')
		
	@tf.function	
	def call(self, inputs):
		return self.encode_layer(inputs)

class Decoder(tf.keras.layers.Layer):
	'''Decodes a digit from the MNIST dataset'''

	def __init__(self,
				n_dims,
				name ='decoder',
				**kwargs):
		super(Decoder, self).__init__(name = name, **kwargs)
		self.n_dims = n_dims
		self.n_layers = len(n_dims)
		self.decode_middle = layers.Dense(n_dims[0], activation ='relu')
		self.recon_layer = layers.Dense(n_dims[1], activation ='sigmoid')
		
	@tf.function	
	def call(self, inputs):
		x = self.decode_middle(inputs)
		return self.recon_layer(x)


We then extend tf.keras.Model to define a custom model that utilizes our previously defined custom layers to form the AutoEncoder model. The call function is overridden which is the forward passwhen the data is made available to the model object. Notice the @tf.function function decorator. It ensures that the function execution occurs in a graph which speeds up our execution.

In [4]:
class Autoencoder(tf.keras.Model):
	'''Vanilla Autoencoder for MNIST digits'''
	
	def __init__(self,
				n_dims =[200, 392, 784],
				name ='autoencoder',
				**kwargs):
		super(Autoencoder, self).__init__(name = name, **kwargs)
		self.n_dims = n_dims
		self.encoder = Encoder(n_dims[0])
		self.decoder = Decoder([n_dims[1], n_dims[2]])
		
	@tf.function	
	def call(self, inputs):
		x = self.encoder(inputs)
		return self.decoder(x)



Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 2)]               0         
_________________________________________________________________
dense_1 (Dense)              (None, 3136)              9408      
_________________________________________________________________
reshape (Reshape)            (None, 7, 7, 64)          0         
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 14, 14, 64)        36928     
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 28, 28, 32)        18464     
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 28, 28, 1)         289       
Total params: 65,089
Trainable params: 65,089
Non-trainable params: 0
_______________________________________________________

The following code block prepares the dataset and gets the data ready to be fed into the pre-processing pipeline of functions before training the AutoEncoder.

In [5]:
mnist = tf.keras.datasets.mnist

(X_train, _), (X_test, _) = mnist.load_data()
X_train = tf.cast(np.reshape(
		X_train, (X_train.shape[0],
				X_train.shape[1] * X_train.shape[2])), tf.float64)
X_test = tf.cast(
		np.reshape(X_test,
				(X_test.shape[0],
					X_test.shape[1] * X_test.shape[2])), tf.float64)

X_train, X_test = standard_scale(X_train, X_test)


It is TensorFlow best practice to use tf.data.Dataset to get tensor slices with a shuffled batch quickly from the dataset for training. The following code block demonstrates teh use of tf.data and also defines the hyperparameters for training the AutoEncoder model.

In [7]:
train_data = tf.data.Dataset.from_tensor_slices(
		X_train).batch(128).shuffle(buffer_size = 1024)
test_data = tf.data.Dataset.from_tensor_slices(
		X_test).batch(128).shuffle(buffer_size = 512)

n_samples = int(len(X_train) + len(X_test))
training_epochs = 20
batch_size = 128
display_step = 1

optimizer = tf.optimizers.Adam(learning_rate = 0.01)
mse_loss = tf.keras.losses.MeanSquaredError()
loss_metric = tf.keras.metrics.Mean()





<tensorflow.python.keras.callbacks.History at 0x1c800cbdc70>

We have completed every pre-requisite to train our AutoEncoder model! All we have left to do is to define an AutoEncoder object and compile the model with the optimizer and loss before calling model.train on it for the hyperparameters defined above. Voila! You can see the loss reducing and the AutoEncoder improving its performance!

In [None]:
ae = Autoencoder([200, 392, 784])
ae.compile(optimizer = tf.optimizers.Adam(0.01),
		loss ='categorical_crossentropy')
ae.fit(X_train, X_train, batch_size = 64, epochs = 5)
