# Chapter 14: Autoencoders

An autoencoder is a neural network that is trained to attempt to copy its input to its output

We can view it as having two parts: the <b>Encoder</b> and the <b>Decoder</b><br>
Encoder: $h = f(x)$<br>
Decoder: $r = g(h)$

<b>Traditionally, autoencoders were used for dimensionality reduction or
feature learning. Recently, theoretical connections between autoencoders and
latent variable models have brought autoencoders to the forefront of generative
modeling</b>

## 14.1: Undercomplete Autoencoders

An autoencoder whose code dimension is less than the input dimension is called undercomplete

An autoencoder trained to perform the copying task can fail to learn anything useful about
the dataset if the capacity of the autoencoder is allowed to become too great

## 14.2: Regularized Autoencoders

A regularized autoencoder can be nonlinear and overcomplete but still learn something useful about the data distribution, even if the model capacity is great enough to learn a trivial identity function

Their encodings are naturally useful because the models were trained to approximately maximize
the probability of the training data rather than to copy the input to the output

## 14.2.1: Sparse Autoencoders

A sparse autoencoder is simply an autoencoder whose training criterion involves a
sparsity penalty Ω(h) on the code layer h, in addition to the reconstruction error

Sparse autoencoders are typically used to learn features for another task, such
as classification

We can think of the penalty Ω(h) simply as a regularizer term added to
a feedforward network whose primary task is to copy the input to the output
(unsupervised learning objective) and possibly also perform some supervised task
(with a supervised learning objective) that depends on these sparse features

Rather than thinking of the sparsity penalty as a regularizer for the copying
task, we can think of the entire sparse autoencoder framework as approximating maximum likelihood training of a generative model that has latent variables

## 14.2.2: Denoising Autoencoders

Rather than adding a penalty Ω to the cost function, we can obtain an autoencoder
that learns something useful by changing the reconstruction error term of the cost
function

Traditionally, autoencoders minimize some function: $L(x, g(f(x)))$

A denoising autoencoder (DAE) instead minimizes: $L(x, g(f(x˜)))$ <br>
where x˜ is a copy of x that has been corrupted by some form of noise. Denoising
autoencoders must therefore undo this corruption rather than simply copying their
input.

## 14.2.3: Regularizing by Penalizing Derivatives

look up: contractive autoencoder

## 14.3: Representational Power, Layer Size and Depth

A deep autoencoder, with at least one additional hidden layer inside the encoder itself, can approximate any mapping from input to code arbitrarily well, given enough hidden units

Depth can exponentially reduce the computational cost of representing some
functions. Depth can also exponentially decrease the amount of training data
needed to learn some functions

## 14.4: Stochastic Encoders and Decoders

## 14.5: Denoising Autoencoders

The denoising autoencoder (DAE) is an autoencoder that receives a corrupted
data point as input and is trained to predict the original, uncorrupted data point
as its output

## 14.5.1: Estimating the Score

## 14.5.1.1: Historical Perspective

## 14.6: Learning Manifolds with Autoencoders

Like many other machine learning algorithms, autoencoders exploit the idea
that data concentrates around a low-dimensional manifold or a small set of such
manifolds

## 14.7: Contractive Autoencoders

Denoising autoencoders make the reconstruction function resist small but
finite-sized perturbations of the input, while contractive autoencoders make the
feature extraction function resist infinitesimal perturbations of the input

## 14.8: Predictive Sparse Decomposition

Predictive sparse decomposition (PSD) is a model that is a hybrid of sparse
coding and parametric autoencoders

## 14.9: Applications of Autoencoders

Lower-dimensional representations can improve performance on many tasks,
such as classification; Models of smaller spaces consume less memory and runtime

One task that benefits even more than usual from dimensionality reduction is
information retrieval, the task of finding entries in a database that resemble a
query entry