# Technical: Autoencoders

In [10]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import keras
from keras import backend as bkend
from keras.datasets import cifar10, mnist
from keras.layers import Dense, BatchNormalization, Dropout, Flatten, convolutional, pooling
from keras import metrics
import tensorflow as tf
# from tensorflow.python.client import device_lib



## Plan


* Text:
    * Write out high-level intuition for AE (wiki)
    * Develop questions for later study (widki) 
    * Introduce each type of autoencoder
    * Utilise notes, Blog, DLwP, Hamad and Ajit's notes
    * Read DL book chapter (read through, don't take endless notes)
    * Write about what you discover in the code
    * Relate to GANs
• Code:
    * Vanilla, Denoising, Convolutional, Variational
    * Use MNIST and insurance; perhaps RR data set
    * Understand outputs of encoders and decoders 
    * Code in the models using variables; experiment with layers and other structures 
    * Code as a class, start with deconstructing and understanding Hamaad's
* Go through Chp 6 Deep Learning with Python notebook - loads of interesting stuff in there
* Understand whcih lessons to transfer to Adam:
    * generators
    * RNN/LSTM for time series 
    * autoencoders

## Outcomes 

• Understand autoencoders, PCA, GANs and how they relate to each other
• Understand the code required to implement them
• Apply to data and understand the benefit 

## References

**Meterology Datasets**

• Meterology dataset used in RNN: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/6.3-advanced-usage-of-recurrent-neural-networks.ipynb

• https://github.com/hamaadshah/hackathon_june_2018/blob/master/hackathon.ipynb

• https://github.com/hamaadshah/autoencoders/blob/master/Python/autoencoders.ipynb

**Representational Learning**

• https://github.com/hamaadshah/autoencoders_public/tree/master/Python

• https://blog.keras.io/building-autoencoders-in-keras.html 

**Bayesian Inference**

• https://towardsdatascience.com/automatic-feature-engineering-using-deep-learning-and-bayesian-inference-application-to-computer-7b2bb8dc7351\

**GANs**

• https://github.com/hamaadshah/gan_public

• https://medium.com/@Moscow25/gans-will-change-the-world-7ed6ae8515ca

• https://towardsdatascience.com/automatic-feature-engineering-using-generative-adversarial-networks-8e24b3c16bf3

## Overview of Autoencoders

* Let's get out terminology right...we *encode* a representation of the data and then decode it to get back to a form of the data.
* You can pass the enoded data on to any ML algo. You can build an autoencoder and include it as part of your pipeline
* Why encode then decode? What is the difference between the encoded data and the normal input data?
    * DL Book: simply creating a copy of x tells us little. Setting the parameters such that the encodng creates a representation of the features in less dimensionsis more useful.
    * But why choose to decode once we have encoded a representation of the data?
* What ways of doing it are there? How many methods? With what impact? 
* What are all the different types?
    * Vanilla
    * Denoising
    * 1D Convolution, 2D 
    * Seq2Seq
* How autoencoders coded/structured? What governs the choice of layers? what governs the activation functions, normalization etc? Is it similar to other DL models (i.e. some knowledge, some intuition?) How do you know they are successful? What is their objective function? What differs for different types of autoencoders? What does this mean for their success or otherwise?
* How do autoencoders compare with other methods of feature engineering/dimensionality reduction? Is it right to think of them in the same family, and if not why not?
 

**Overview**
* Attempts to automatically learn good features or representations 
* Autoencoders for automatic feature engineering. The idea is to automatically learn a set of features from raw data that can be useful in supervised learning tasks such as in computer vision and insurance.
* 

## Loading the data

**Meterology data**

In [4]:
data_dir = os.getcwd()
fname = os.path.join(data_dir, "jena_climate_2009_2016.csv")

f = open(fname)
data = f.read()
f.close()

# check to see if this could have been taken care of with open - `the \n param
lines = data.split("\n")
header = lines[0].split(",")
lines = lines[1:]

float_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
    values = [float(x) for x in line.split(",")[1:]]
    float_data[i, :] = values

mean = float_data[:20000].mean(axis=0)
float_data -= mean
std = float_data[:20000].std(axis=0)
float_data /= std    
    
def generator(data,
              lookback,
              delay,
              min_index,
              max_index,
              shuffle=False,
              batch_size=25,
              step=6):
    if max_index is None:
        max_index = len(data) - delay - 1
    
    i = min_index + lookback
  
    while 1:
        if shuffle:
            rows = np.random.randint(min_index + lookback, max_index, size=batch_size)
        else:
            if i + batch_size >= max_index:
                i = min_index + lookback
            rows = np.arange(i, min(i + batch_size, max_index))
            i += len(rows)
      
        samples = np.zeros((len(rows), lookback // step, data.shape[-1]))
        targets = np.zeros((len(rows),))
    
        for j, row in enumerate(rows):
            indices = range(rows[j] - lookback, rows[j], step)
            samples[j] = data[indices]
            targets[j] = data[rows[j] + delay][1]

        return samples, targets

lookback = 360 # We use 60 hours for our training window: 360 * 10 / 60 = 60 hours.
step = 6 # Each row represents hourly measurements: 6 * 10 / 60 = 1 hour.
delay = 144 # We predict the temperature 1 day ahead: 144 * 10 / 60 = 24 hours.

train_gen = generator(data=float_data,
                      lookback=lookback,
                      delay=delay,
                      min_index=0,
                      max_index=200000,
                      shuffle=True,
                      step=step,
                      batch_size=1000)

test_gen = generator(data=float_data,
                     lookback=lookback,
                     delay=delay,
                     min_index=300001,
                     max_index=None,
                     shuffle=False,
                     step=step,
                     batch_size=1000)

pipe_base = Pipeline(steps=[("model", RandomForestRegressor())])

pipe_base = pipe_base.fit(X=np.reshape(train_gen[0], [train_gen[0].shape[0], train_gen[0].shape[1] * 
                                                      train_gen[0].shape[2]]), y=train_gen[1])

print("The MSE score for the meteorology regression task without autoencoders: %.6f." 
      % sklearn.metrics.mean_squared_error(y_pred=pipe_base.predict(X=np.reshape(test_gen[0], 
        [test_gen[0].shape[0], test_gen[0].shape[1] * test_gen[0].shape[2]])), y_true=test_gen[1]))

NameError: name 'Pipeline' is not defined

## PCA

In [None]:
# to be added

## Vanilla Autoencoder
• class autoencoder developed by Hamad
• code an autoencoder as a function and then move on to other autoencoders. Provide top level description and play with parameters. look at DLWP, Blog, and Guerillion 
• loop back to code the classes

In [None]:
from autoencoders_keras.vanilla_autoencoder import VanillaAutoencoder 

autoencoder = VanillaAutoencoder(n_feat=train_gen[0].shape[1] * train_gen[0].shape[2],
                                 n_epoch=50,
                                 batch_size=25,
                                 encoder_layers=5,
                                 decoder_layers=8,
                                 n_hidden_units=100,
                                 encoding_dim=50,
                                 denoising=None)

print(autoencoder.autoencoder.summary())

pipe_autoencoder = Pipeline(steps=[("autoencoder", autoencoder),
                                   ("scaler", StandardScaler()),
                                   ("model", RandomForestRegressor())])

pipe_autoencoder = pipe_autoencoder.fit(X=np.reshape(train_gen[0], [train_gen[0].shape[0], 
                                        train_gen[0].shape[1] * train_gen[0].shape[2]]),y=train_gen[1])
    
print("The MSE score for the meteorology regression task with an autoencoder: %.6f." 
      % sklearn.metrics.mean_squared_error(y_pred=pipe_autoencoder.predict(X=np.reshape(test_gen[0], 
        [test_gen[0].shape[0], test_gen[0].shape[1] * test_gen[0].shape[2]])), y_true=test_gen[1]))

**Questions from running the model above:**
* How does the fit function work aceoss the pipeline? what is the shape we fit in?
    * think we need to flatten the input (28,60,14 and feed that in)
* How does the autoencoder train? looks like through epochs, like other models
* What does vanilla mean?
* Remind yourself of batch normalization and dense layers (think this is just feed forward)
* We do we use a standard scaler and a feature sclae in the first part of the code?
* How is the autoencoder output fed in to the regression model? (might require a look at the class as well as the code above)
* How have the parameters been used to represent the data in another dimension?

### Coding a vanilla autoencoder
* start off by using the Keras blog code - this is a different dataset. 
* type up, understand each line and then apply to the weather dataset

In [5]:
# import statements
from keras.layers import Input, Dense
from keras.models import Model

# we need to specify the size of our encoded data, as a variable
encoding_dim = 32  # assuming 784 as the input_dim, this is a compression factor of 24.5

# input dimension placholder    
input_img = Input(shape=(784,))

# encoded is the encoded representation of the input
encoded = Dense(encoding_dim,activation='relu')(input_img)

# decoded is the lossy reconstruction of the input
decoded = Dense(784,activation='sigmoid')(encoded)

# this model maps the an input to its reconstruction
# the Model class seems to match input to output generally
autoencoder = Model(input_img,decoded)

autoencoder.summary()

# learning: Dense creates a tensor (a certain sort). Model turns them into a model

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_2 (Dense)              (None, 784)               25872     
Total params: 50,992
Trainable params: 50,992
Non-trainable params: 0
_________________________________________________________________


**Breaking down the characteristics of the model a little more...**
* we will see that we have created tensors of certain lengths
* these only become a model with the Model function
* I would suggest Input is a container of some sort, used to pass in the values to the other stages of the model
* encoded is a tensor too, but with more functionality (an activation function), as is decoded
* what is interesting is that it doesn't become a model until the Model layer, which takes only the original input_img (once the model is fit, this will be the pixel values), and the decoded layer
* so our encoded representation of the input is 'simply' the input tensor reduced in dimensions using a *relu* activation function - i.e it looks like there is no encoder Model here - *so how is this different than just a normal feedforward model?* 
    * of course, the model is trained on itself.
    * but the model will have layers of params, each representing part of the overall representation of the model, how do we know which is the deconstruction (encoded) and reconstruction (decoded) layer?
        * I believe this will be based on the number of parameters - we know 32<784 and that the output layer is 784, so 32 has to be a deconstruction of 784, and 784 a reconstruction. 
        * Presumably, through gradient descent, we update the parameters in the layers to be as close as possible to the original input (you could see the coefficients mirroring in a two layer model? *check understanding of the layers*
        * So what is our encoded layer in multi-layer models, do we get to decided that? 

In [22]:
# what is input_image?
#input_img.dtype
input_img

<tf.Tensor 'input_1:0' shape=(?, 784) dtype=float32>

In [21]:
# what is encoded?
encoded

<tf.Tensor 'dense_1/Relu:0' shape=(?, 32) dtype=float32>

In [23]:
decoded

<tf.Tensor 'dense_2/Sigmoid:0' shape=(?, 784) dtype=float32>

In [24]:
# Model does indeed call up some specific code 
autoencoder

<keras.engine.training.Model at 0x11aa86cc0>

**Creating a separate encoder model**

In [26]:
# we also create a separate encoder model
# note that this includes encoded as the output layer, part of the autoencoder model
# after the autoencoder has been trained, we can use the param values on just the encoder
# by passing in encoder
encoder = Model(input_img,encoded)
# I believe that this compresses data only, and doesn't train it back to x
encoder.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                25120     
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________


In [27]:
encoder

<keras.engine.training.Model at 0x11fcfe908>

**Question**: why do we use Model here and not just the encoded layer from above? Would that not have our representation at the end of training? 
* I wonder if it is because you can't train just a tensor? or that you can't extract the params from just a tensor
* Also, how will this model create the same params as the model with a decoder layer, as we are not modelling against the output (input), we are modelling against the compressed tensors?
    * It might be that, because we pass in *encoded*, if we run the full model first, we will get those values in our model as well.
    * But, again, why run this model at all, if all we need is the params at the encoder stage? what is about it being a Model that is important
        * Model allows you fit, compile, and have other attributes
* the code here refers to the functional Model API, which does pass one layer to the next using the () at the end of the line of code

**Lets create a separate decoder model**
* Looks like the main thing that matters is the input and output dimensions, we aren't modelling anything yet

In [None]:
# we also create a separate decoder model ## what does Model() enable us to do?  

# create a placholder for the encoded input
encoded_input = Input(shape=(encoding_dim,))
# retrieve the last layer of the autoencoder model (which was the decoded layer)
# note that the syntax here does not include the (previous_layer) thing, which makes me thing that is passed on 
# as part of retrieving the layers - or is taken care of by passing the input into the Model(output()) bit
decoder_layer = autencoder.layers[-1]
# create the decoded model
decoder = Model(encoded_input,decoder_layer(encoded_input))

decoder.summary()

In [None]:
# time to compile the model - described in the blog as 'configure' 

autoencoder.compile(optimizer='adadelta',loss='binary_crossentropy')

In [None]:
# we now import the dataset, remembering that we do not need the y labels
from keras.datasets import mnist
import numpy as np

(x_train,_),(x_test,_) = mnist.load_data()

In [None]:
# we normalise the values of the test and training set and flatten the 28x28 images into 784
x_train = x_train.astype('float32')/255
x_test = x_test.astype('float32')/255
x_train = x_train.reshape((len(x_train),np.prod(x_train.shape[1:])))
x_test = x_test.reshape(len(x_test),np.prod(x_train.shape[1:]))
print(x_train.shape)
print(x_test.shape)

In [None]:
# we now can fit and train the autoencoder
autoencoder.fit(x_train,x_train,
               epochs=50,
               batch_size=256,
               shuffle=True,
               validation_data=(x_test,x_test))

**Questions/Comments:**
* It looks like our basic autoencoder is just a model that compresses data to an arbitrary specified dimension
    * Is it as easy as simply specifying the dims we want?
    * Do we have to have in mind that the compressed dims are then trained back to the input data?
    * If so, how do we account for this in the above if we want to pull off the encoded values - i.e. make sure we don't pull off the compressed dims that have just been compressed
* When instantiating and developing our models, why do we need dim placeholders?
    * *because this is just a model, it needs be compiled and mapped to the dataset, so the dims need to match the anticipated dataset. Looks like Model requires the dimensions to be established upfront*
    
**Actions**
* Need to pass back in to a DL/Keras intro how Model, Input and Dense work, their arguments, etc. 
    * Looks like Model has fit, compile and predict methods attached to it
    * Why relu and sigmoid? What else works (sessions on different activation functions)
    * Why did we choose the optimizers and loss function that we did? 