# Autoencoders

<img src="autoencoder.png" width="550px;" alt="Example of the autoencoder algorithm." />

The __Autoencoder__ is an usupervised algorithm. The algorithm uses a _hidden layer_ to encode inputs and decodes the _hidden layer_ to achieve an output. The training process involves achieving ideal weights that help the hidden layer best reproduce the given input. While this initially sounds redundant, the __Autoencoder__ has many purposes; the most common purposes are:

* Dimensionality Reduction
* Feature Extraction
* Feature Engineering
* File Compression

The training doesn't need to be analyzed as it is very similar to other artifical neural networks. The algorithm is fed multiple inputs and performs backpropogation to adjust the weights. The most common cost function used is __Mean-Squared Error__.

__NOTE:__ The _hidden layers_ in an __Autoencoder__ are commonly referenced as the _bottleneck_. When an __Autoencoder__ has mutiple hidden layers, the hidden layer that has the fewest neurons is called the _latent space_ due to having the best possible compression of data. Additionally, the error produced by the cost function is most commonly referenced as the _reconstruction error_.

- - - - 

## Biases

<img src="bias.png" width="400px;" alt="Bias representation within a neural network diagram." />

__Biases__ may also be shown within neural network diagrams as nodes connected through dashed lines. They are simply constants that are factored in by the next layer.

- - - -

## Efficiency

<img src="stacked.png" width="400px;" alt="Multi-Layered Autoencoder" />

__Stacked Autoencoders__, which are autoencoders with multiple *bottlenecks*, have proven to be very powerful. The results from such encoders have shown to supercede the results from __Deep Belief Networks (DBNs)__.

- - - -

## Overcomplete Hidden Layers

<img src="overcomplete.png" width="400px;" alt="Diagram containing more nodes in the hidden layer than in the input/output layer." />

When using __Autoencoders__ for *feature engineering*, the _bottleneck_ often contains more nodes than the input and output layers. This is because more nodes will allow for more features to be detected. Unfortunately, a common problem that arises from using more nodes in the _bottlenec_ is that a node in the _bottleneck_ will often map to a single node in the input layer. This means that nodes in the _bottleneck_ will simply replicate the values of the _input layer_ and not learn any meaningful features of the dataset. That said, there are many solutions to this problem.

- - - -

### Sparse Autoencoders

<img src="sparse.png" width="400px;" alt="Only a few nodes are training in the hidden layer." />

The most common type of autoencoders used are __Sparse Autoencoders__. These autoencoders will only use a certain set of nodes to calculate the _loss function_. As a result, the loss function will return a large value if the _bottleneck_ nodes directly map to the input nodes and some of them aren't factored in the calculation. 

- - - -

### Denoising Autoencoders

<img src="denoise.png" width="550px;" alt="Some of the inputs are turned to 0." />

With __Denoising Autoencoders__, some of the input is randomly turned to 0 when fed into the autoencoder. Despite this, the loss function will still use the unmodified set of inputs to calculate the loss function. _Bottleneck_ nodes can't map to input nodes in this scenario because 

- - - - 

### Contractive Autoencoders

<img src="contractive.png" width="400px;" alt="The contractive autoencoder uses a loss function mean to combat bottleneck nodes directly copying input nodes." />

__Contractive Autoencoders__ have a special loss function that is tailored to dealing with the problem of the _bottleneck_ nodes simply mapping to input nodes. 
