#**Autoencoders**

In the previous step we described how methods such as Neural Networks or linear regresion could be used to make predictions. We also discussed in previous steps how Principal Components could be used to both reduce dimensionality and help remove noise.
Autoencoders are a form of unsupervised technique which use neural networks to help reduce dimensionality or remove noise by compressing our input data through a hidden layer back to an orginal represention of the input layer. Figure 1 shows how the network compresses the data before recreating it in the output layer.

<img src="https://www.computing.dcu.ie/~amccarren/mcm_images/autoencoder.png"/>

Figure 1 [Source](https://www.jeremyjordan.me/autoencoders/)

We examine this process a little more by looking at Figure 2 where we see there is an "Encoder" used to create "h" the latent representation. The latent representation can then be used as a decompressed or dimensionally reduced dataset.

<img src="https://www.computing.dcu.ie/~amccarren/mcm_images/Autoencoder_2.png"/>

Figure 2 (source: [Towards Data Science](https://towardsdatascience.com/deep-inside-autoencoders-7e41f319999f))

Autoencoders are only useful when there are relationships between the input variables, as mathematically it is difficult to compress the features without substanial loss of information, if there was no correlation between the variables. However, if there is a relationship then the structure can be learned, and we can effectively compress our data into a hidden layer. The outcome variable is a transformed version of the hidden nodes. Thus the output variables can be considered to be denoised.

Now we have outlined that Autoencoders can be used for dimensionality reduction as we will be creating a lower number of latent variables than the number of original input variables. This is similar to PCA. So what are there the differences?

* PCA is a linear transformation of the data and assumes the new latent variables are linear combination of the original variables.

* PCA features are not linearly correlated but Autoencoders might have correlations

* PCA is less computationally intensive than Autoencoders.

* A single layered autoencoder with a linear activation function is very similar to PCA.

* Autoencoders may require regularisation as they are prone to overfitting.

In the code below we build an autoencoder using Keras. The dataset is the Minst dataset which composes of 60000 images of numbers. Each picture has 28x28 (784) pixels. We will flatten each picture into a single vector and then use these vectors as training data to our Neural Network.






In order to understand the difference between the various [optimizers](http://tflearn.org/optimizers/) have a look at this artcile from [Toward Science](https://towardsdatascience.com/a-look-at-gradient-descent-and-rmsprop-optimizers-f77d483ef08b). It should give you a feel for how they work. You can now see that we have an Autoencoder framework built to process all our images.



In [None]:
import tensorflow as tf
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
from keras.utils import plot_model
from keras import optimizers
from keras import regularizers
from keras.optimizers import RMSprop

(x_train, y_train), (x_test, y_test) = mnist.load_data()
img_rows, img_cols = 28, 28

# Reshape the images to add the channel dimension. The channel dimension indicates how many color layers each pixel has.
# MNIST has 1 channel (grayscale). While, a normal color photo has 3 channels (R, G, B).
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)

# Converting the pixel values in images from integers (0–255) to floating-point numbers (decimal values).
# By default, MNIST images are stored as unsigned 8-bit integers (uint8), meaning each pixel is an integer between 0 and 255.
# Neural networks work better with floating-point data, so we convert them to float32 type.
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# Normalizing the data, which helps the neural network train faster and more stably.
# Large unscaled inputs (like 0–255) can make gradient-based optimization harder.
x_train /= 255
x_test /= 255


# Flattenning the images manually
x_train = np.array(x_train)
x_train = x_train.reshape(60000,784)

# Defining the model’s structure
model = Sequential() # Creates an empty neural network using Keras’s Sequential API, which lets you stack layers one after another, similar to a pipeline.
model.add(Dense(128,activation='relu',input_dim=784)) # This layer compresses the 784-pixel input down to 128 features. It’s the encoder part of the autoencoder.
model.add(Dense(128,activation='relu',input_dim=128)) # This layer refines the representation learned by the first layer.
model.add(Dense(784,activation='relu')) # This layer has 784 output neurons — one for each pixel of the input image to reconstruct the original input image. So, this is the decoder part of the autoencoder.

# Preparing the model for training by setting up the loss function, optimizer, and metrics.
model.compile(loss='mean_squared_error', optimizer='RMSprop', metrics = ['accuracy'])
# Showing the model's structure
plot_model(model, to_file='model_plot.png',  show_layer_names=True)


We now have a our framework which is shown above and we can now run our data through it using the model.fit method. We will save the model to our local google drive. This is worth doing regularly as you may have to start your analysis again as third party platforms like Google Colab will shut you out after a specified time.

In [None]:
history=model.fit(x_train,x_train,verbose=1,epochs=2,batch_size=256)
model.save('/content/auto_en.keras')


In [None]:
print(history.history['loss'])

**In this code snipet we are trying to extract the data from the hidden layer (128). We could use this data as a reduced dataset for further analysis.**

In [None]:
from tensorflow.keras.models import load_model, Model
from tensorflow.keras.layers import Input

# Load the model
seq_model = load_model('/content/auto_en.keras')


inputs = Input(shape=(784,))
x = seq_model.layers[0](inputs)
x = seq_model.layers[1](x)
encoded_model = Model(inputs, x)

x_train_encoded = encoded_model.predict(x_train)
print("Encoded data shape:", x_train_encoded.shape)



In [None]:
from keras.models import load_model
import cv2
import matplotlib.pyplot as plt
#from google.colab.patches import cv2_imshow
model = load_model('/content/auto_en.keras')

test = x_train[1].reshape(1,784)
y_test = model.predict(test)

inp_img = []
temp = []
for i in range(len(test[0])):
    if((i+1)%28 == 0):
        temp.append(test[0][i])
        inp_img.append(temp)
        temp = []
    else:
        temp.append(test[0][i])
out_img = []
temp = []
for i in range(len(y_test[0])):
    if((i+1)%28 == 0):
        temp.append(y_test[0][i])
        out_img.append(temp)
        temp = []
    else:
        temp.append(y_test[0][i])

inp_img = np.array(inp_img)
out_img = np.array(out_img)
#plt.imshow('Test Image',inp_img)
plt.title('Input Image')

plt.imshow(cv2.cvtColor(inp_img, cv2.COLOR_BGR2RGB))
plt.show()
plt.title('Output Image')

plt.imshow(cv2.cvtColor(out_img, cv2.COLOR_BGR2RGB))
plt.show()
#cv2_imshow(inp_img)
#cv2.imshow('Output Image',out_img)
#cv2.waitKey(0)

Hopefully by now you will have got the idea of an Autoencoder. Now I would like you to try a few things before we move on and they are as follows:

* Print out a larger number of pictures.

* Run a a number of epochs  and print a graph of the Epoch number against the Loss or MSE. You may find this [code](https://keras.io/visualization/) useful.

* Do this analysis for the Boston housing data. Try and use the hidden layer as input to both a regression and Neural Network to predict housing prices.



In [None]:
import matplotlib.pyplot as plt

# Plot training & validation loss values
plt.plot(history.history['loss'])

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()