## Intro to Autoencoders in python 

Autoencoders are special types of neural networks that are often attributed to non-linear PCA (principal component analysis). Autoencoders have several use cases. In this blog post we will discuss autoencoders in the context of data quality. We will use autoencoders to detect outliers in data. For example, given an image of a handwrtten digit, an autoencoder first encodes the image into a lower dimensional latent space like PCA, then decodes the latent representation back to an image. Autoencoders can also be used for to compress images, they do this while minimizing the *reconstruction error*. 

The reconstruction error here means the difference (by some distance measure) between the actual image and reconstructed image.

## Import libraries

In [4]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf

In [7]:
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
from tensorflow.keras import layers, losses
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Model

## Loading the data
The Fashon MNIST dataset is a very familiar data set that has been used in many tutorials to explain simple concepts about neural networks. We will do the same in this blog post.  Each image in the MNIST dataset is 28x28 pixels. 

In [9]:
(x_train, _), (x_test, _) = fashion_mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.


def gen_image(arr):
    two_d = (np.reshape(arr, (28, 28)) * 255).astype(np.uint8)
    plt.imshow(two_d, interpolation='nearest')
    return plt

print (x_train.shape)
print (x_test.shape)

(60000, 28, 28)
(10000, 28, 28)


We have printed the dimension  of the train and test dataset,  there are 60k images in the training dataset and 10k in the test.
We will define a very simple autoencoder which compresses (`encoder`)the input images into a 64 dimensional laten vector, and a `decoder`, that reconstructs the original image from the latent space. We define a simple class for this. The constructor of the class takes the dimension of the latent space.For the encoder, we will use `Tensoflow`'s `Sequential` API. We first flatten the images and then pass it to a dense layer. The decoder has one dense layer and then reshapes the image back to ther original dimension. 

In [16]:
latent_dim = 64

class Autoencoder(Model):
  def __init__(self, latent_dim):
    super(Autoencoder, self).__init__()
    self.latent_dim = latent_dim   
    self.encoder = tf.keras.Sequential([
      layers.Flatten(),
      layers.Dense(latent_dim, activation='relu'),
    ])
    self.decoder = tf.keras.Sequential([
      layers.Dense(784, activation='sigmoid'),
      layers.Reshape((28, 28))
    ])

  def call(self, x):
    encoded = self.encoder(x)
    decoded = self.decoder(encoded)
    return decoded

In [18]:
# instantial the Autoencoder class
autoencoder = Autoencoder(latent_dim)

In [19]:
autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError())

In [20]:
autoencoder.fit(x_train, x_train,
                epochs=10,
                shuffle=True,
                validation_data=(x_test, x_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f0145056c10>