Autoencoder is an unsupervised artificial neural network that learns how to efficiently compress and encode data then learns how to reconstruct the data back from the reduced encoded representation to a representation that is as close to the original input as possible.

Autoencoder, by design, reduces data dimensions by learning how to ignore the noise in the data.

![title](autoencoder.png)

Autoencoder Components:

Autoencoders consists of 4 main parts:

1. **Encoder**: In which the model learns how to reduce the input dimensions and compress the input data into an encoded representation.

2. **Bottleneck**: which is the layer that contains the compressed representation of the input data. This is the lowest possible dimensions of the input data.

3. **Decoder**: In which the model learns how to reconstruct the data from the encoded representation to be as close to the original input as possible.

4. **Reconstruction Loss**: This is the method that measures measure how well the decoder is performing and how close the output is to the original input.

The training then involves using back propagation in order to minimize the network’s reconstruction loss.

In [None]:
import matplotlib
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
plt.style.use('ggplot')

import numpy as np

In [None]:
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

from sklearn.neural_network import MLPRegressor
from sklearn.decomposition import PCA

from sklearn.metrics import mean_squared_error, silhouette_score

In [None]:
colors = ['#1FC17B', '#78FECF', '#555B6E', '#CC998D', '#429EA6',
          '#153B50', '#8367C7', '#EE6352', '#C287E8', '#F0A6CA', 
          '#521945', '#361F27', '#828489', '#9AD2CB', '#EBD494', 
          '#53599A', '#80DED9', '#EF2D56', '#446DF6', '#AF929D']

Now we will use sklearn make_blobs function to create a toy dataset. <code>make_blobs</code> function will create a dataset based on the parameter passed by the user. Here X values are the location of the data and y being the cluster label for the data.
<br>
<br>
**n_features** is the number of features we are expecting
<br>
**centers** tells it to create as many centers of data, meaning how many clusters of data should be in the dataset
<br>
**n_samples** requests the number of datapoints in the dataset
<br>
**cluster_std** tells it to limit the standard deviation of every cluster to 0.2
<br>
**center_box** limits the upper and lower bounds for the center states in the clusters
<br>
**random_state** is for reproduceability

In [None]:
X, y = make_blobs(n_features = 50, centers = 20, n_samples = 20000,
                 cluster_std = 0.2, center_box = [-1, 1], random_state = 17)

In [None]:
# Looking at the data
# You see that there are 100 data points generated and their standard deviation is 
print(X[0])

print(f'Standard deviation among clusters is: {X[0].std():.3f}')
print(X_train.shape)

In [None]:
# Splitting our data in train and test folds
# Scaling the dataset using the MinMaxScaler so that each data point will be scaled between 0 and 1

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# Creating a baseline model using PCA

pca = PCA(n_components = 2)
pca = pca.fit(X_train)

# Fit the model on X_test (transform the X_test data according the preprocessing on the training data)
results_pca = pca.transform(X_test)
results_pca

In [None]:
# Plotting the results of the PCA on a scatter plot to determine how the PCA has performed

print(results_pca.shape)

unique_label = np.unique(y_test)

# Plotting the scatter by each cluster

for index, label in enumerate(unique_label):
    X_data = results_pca[y_test == unique_label[index]]
    
    # X_data[:, 0] is the first PCA dimension (x axis)
    # X_data[:, 1] is the second PCA dimension (y axis)
    # alpha is to give each point a bit of transperancy so that if points are plotted over each other we can still see those
    # One color for each cluster
    
    plt.scatter(X_data[:, 0], X_data[:, 1], alpha = 0.3, c = colors[index])

plt.xlabel('Principle Component #1')
plt.ylabel('Principle Component #2')
plt.title('PCA Results')

# Looking at the below plot, it seems that there are 10 clear, well-separated clusters instead of 20. This could be
# an issue 

In [None]:
# Now we are going to build a multilayer perceptron (where each node in the hidden layer is a weighted, non-linear
# combination of all the other nodes in the previous layer).

# alpha = Learning rate is a hyperparameter that determines how much the model moves in the direction of reducing 
# the loss function

# Defining the neural network achitecture

autoencoder = MLPRegressor(alpha = 1.0e-15,
                          hidden_layer_sizes = (50, 100, 50, 2, 50, 100, 50),
                          random_state = 17,
                          max_iter = 10000)

autoencoder.fit(X_train, X_train)

In [None]:
# Now that we have built the autoencoder, we are going to pull some of its features. Here the coefficients are the 
# weights and the intercepts are like constant in y = mx + c equation.

# Since the autoencoder builds an encoder and a decoder, for this task we only need the encoder part (which is the 
# first four steps of the model) so we will pull it out of the model

W = autoencoder.coefs_
biases = autoencoder.intercepts_

for w in W:
    print(w.shape)

In [None]:
encoder_weights = W[0:4]
encoder_biases = biases[0:4]
print(encoder_weights)
print(encoder_biases)

In [None]:
def encode(data, encoder_weights, encoder_biases):
    results_ae = data
    
    for index, (W, b) in enumerate(zip(encoder_weights, encoder_biases)):
        if results_ae.any() == len(encoder_weights):
            results_ae = results_ae@W + b
        else:
            results_ae = np.maximum(0, results_ae@W + b) # Creates a dot product function by giving the market @
    
    return results_ae

In [None]:
res_ae = encode(X_test, encoder_weights, encoder_biases)

In [None]:
print(res_ae.shape)

In [None]:
for index, label in enumerate(unique_label):
    latent_space = res_ae[y_test == unique_label[index]]
    
    plt.scatter(latent_space[:, 0], latent_space[:, 1], alpha = 0.3, c = colors[index])

plt.xlabel('Latent X')
plt.ylabel('Latent Y')
plt.title('Encoder Results')

In [None]:
# Interpretation of Silhouette Score - it ranges from -1 to +1, where -1 is the very poor clustering and +1 is the
# perfect clustering
# Here it's shown that the encoder is working better than the PCA since the encoder has a higher silhouette score

print(silhouette_score(X_test, y_test)) # Original data using all 50 dimensions
print(silhouette_score(results_pca, y_test)) # PCA
print(silhouette_score(res_ae, y_test)) # autoencoder

# The autoencoder seems to have done better that the original data itself to classifiy each of the labels in the 
# cluster

Autoencoders are really cool and powerful. You can encode data into lower dimensional space, like we did, and find better clusters than we initially created, which is quite an amazing result! You can also use the decoder to create new datasets in a generative approach by sampling inside your latent space.

Autoencoders can be used as compression algorithms, where the decoder recovers a compressed file, like a photo. They can also be used to denoise data -- by adding noise to input, and learning to recreate the noise-free data, you can achieve this.

