<!-- ## Autoencoders -->

<h1 align='center'> 
    <b>
        <u>Autoencoders</u>
    </b> 
</h1>

**References:**
1. [Tensorflow Doc - Into to Autoencoders](https://www.tensorflow.org/tutorials/generative/autoencoder) 
2. [tds Article by Arden Dertat](https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798)
3. [Jeremy Jordan - Intro to Autoencoders](https://www.jeremyjordan.me/autoencoders/)
3. [Autoencoder Feature Extraction- mlmastery](https://machinelearningmastery.com/autoencoder-for-classification/)

## What are Autoencoders?

$\rightarrow \textbf{An Autoencoder}$ is a neural network model that seeks to learn a compressed representation of an input.

$\rightarrow \textbf{Autoencoders:}$ are a specific type of feedforward neural networks **where the input is the same as the output.** 

- They compress the input into a lower-dimensional **code** and then reconstruct the output from this representation. The **code** is a compact “summary” or “compression” of the input, also called the **latent-space representation.**

- An autoencoder consists of 3 components: **encoder, code, and decoder**
    - **The encoder compresses the input and produces the code, the decoder then reconstructs the input only using this code.**

<div align='center'>
    <img src='images/autoencoders.png' width=400/>
    <img src='images/autoencoder_schema1.png' width=600/>
</div>

- To build an autoencoder we need 3 things: 
    - **an encoding method, decoding method, and a loss function** to compare the output with the target.
    
* **

## Properties of Autoencoders

**Autoencoders are mainly a dimensionality reduction (or compression) algorithm** with foll. properties:

1. **Data-specific:** Autoencoders are only able to meaningfully compress data similar to what they have been trained on. So we can’t expect an autoencoder trained on handwritten digits to compress landscape photos.

2. **Lossy:** The output of the autoencoder will not be exactly the same as the input, it will be a close but degraded representation. 

3. **Unsupervised:** Autoencoders are considered an unsupervised learning technique since they don’t need explicit labels to train on, just simply pass the raw i/p data. But to be more precise they are self-supervised because they generate their own labels from the training data.

* **

## Architechture of Autoencoders

The **autoencoder** architechture aims to re-create the provided input data with minimal error. It is a dimentionality reduction technique similar PCA.

So, while training an autoencoder algo. we pass the same input data as the target feature. It's like: 

$$\boxed{\large{\text{Autoencoder}(X:x, y:x) \rightarrow \hat{x} \text{ , where } \hat{x} \sim x}}$$ 

<div align='center'>
    <img src='images/autoencoders_ex.png' width=800/>
    <img src='images/autoencoder_schema.png'/>
</div>

* **

- Both the **encoder and decoder** are fully-connected feedforward neural networks.
- **Code** is a single layer of an ANN with the dimensionality of our choice.
- The number of nodes in the code layer(**code size**) is a hyperparameter that we set before training the autoencoder.

<div align='center'>
    <img src='images/autoencoders.png' width=600/>
</div>

Above we can see the architechture of an Autoencoder:
- First the input passes through the encoder, which is a fully-connected ANN, to produce the code. The decoder, which has the similar ANN structure, then produces the output only using the code. The goal is to get an output identical with the input.

>🗝️**Note that the decoder architecture is the mirror image of the encoder.** This is not a requirement but it’s typically the case. 
>>The only requirement is the dimensionality of the i/p and o/p needs to be the same. Anything in the middle can be played with. For e.g.: in case of image data, the i/p image shape and o/p shape must be same.

<div align='center'>
    <img src='images/autoencoders_architecture.png' width=1000/>
</div>

* **

### Hyperparameters

There are 4 hyperparameters that we need to set before training an autoencoder:

- **Code size:** # nodes in the middle layer. Smaller size results in more compression.

- **Number of layers:** the autoencoder can be as deep as we like. In the figure above we have 2 layers in both the encoder and decoder.

- **Number of nodes per layer:** the autoencoder architecture we’re working on is called a **stacked autoencoder** since the layers are stacked one after another. 
    - Usually stacked autoencoders look like a “sandwitch”. The number of nodes per layer decreases with each subsequent layer of the encoder, and increases back in the decoder. 
    
    - Also the decoder is symmetric to the encoder in terms of layer structure. As noted above this is not necessary and we have total control over these parameters.<br></br>

- **Loss function:** we either use mean squared error (mse) or binary crossentropy. If the input values are in the range [0, 1] then we typically use crossentropy, otherwise we use the mean squared error. [more details - video](https://youtu.be/xTU79Zs4XKY)


**Autoencoders are trained the same way as ANNs via backpropagation.**

* **

## Dimentionality Reduction: Autoencoders v/s PCA

- Unlike PCA which linearly transforms the data where (most of) the variation in the data can be described with fewer dimensions than the initial data, Autoencoders brings non-linearity to the table which helps in dimentionality reduction of more complex data like images where the spatial structure of the data needs to be also considered.

- Neural networks are capable of learning nonlinear relationships, this can be thought of as a more powerful (nonlinear) generalization of PCA. 

- Whereas PCA attempts to discover a lower dimensional hyperplane which describes the original data, autoencoders are capable of learning nonlinear manifolds (a manifold is defined in simple terms as a continuous, non-intersecting surface). The difference between these two approaches is visualized below.

<div align='center'>
    <img src='images/autoencoder_pca.png'/>
</div>

For higher dimensional data, autoencoders are capable of learning a complex representation of the data (manifold) which can be used to describe observations in a lower dimensionality and correspondingly decoded into the original input space.
<div align='center'>
    <img src='images/autoencoder_LinearNonLinear.png'/>
</div>

## More..
- Denoising & Sparse Autoenoders...See the reference links