An autoencoder network is an artificial neural network used for unsupervised learning of an efficient data representation (encoding), typically for the purpose of dimensionality reduction.

### Vanilla Autoencoders
An autoencoder in possibly the simplest architecture is referred to as a Vanilla autoencoder and consists of three layers, the input, the hidden, and the output layer. Basically,  given an input, Vanilla autoencoders are used to reconstruct the input at the output via a single hidden layer that has less nuerons than the input's dimension. This results in producing a bottleneck effect on the flow of information in the network, and therefore we can think of the hidden layer as a bottleneck layer, restricting the information that would be stored.

### Denoising Autoencoders
A denoising autoencoder learns from a corrupted (noisy) input; it feeds its encoder network the noisy input and then the reconstructed output from the decoder is compared with the original input. The idea is that this will help the network learn how to denoise an input. Thus learning its core features.

### Stacked Denoising Autoencoders
It is possible to have multiple layers in encoder and decoder segments of the network. Using deeper encoder and decoder networks can allow the autoencoder to represent complex features. The structure so obtained is called a *stacked autoencoder* (deep autoencoders); the features extracted by one encoder are passed on to the next encoder as input. The stacked  autoencoder can be trained as a whole network with an aim to minimise the reconstruction error.

### Tensorflow Implementation
For implementation, I use the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) of handwritten digits.


<img style="float: center;" src="https://github.com/Ceppehr/NoveltyDetection/blob/master/Images/mnist.png?raw=true">



The inputs to the autoencoder are the noisy digits so as the first step, let's apply some noise to corrupt our input **x**:

<img style="float: center;" src="https://github.com/Ceppehr/NoveltyDetection/blob/master/Images/DAEN.png?raw=true">



<h3><center> Preamble </center><h3>

In [2]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import itertools
import math

import tensorflow as tf
import pandas as pd
import numpy as np
from math import pi
import matplotlib.pyplot as plt
from pylab import rcParams
import xlrd


# SKLearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import IsolationForest

import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

<h3><center> Methods and Classes </center><h3>

#### Autoencoder class

In [3]:
class DeepAutoEncoder(object):
    def __init__(self, list1, eta=0.02):
        """
        :param list1: [input_dimension, hidden_layer_1, ..., hidden_layer_n]
        :param eta: Learning rate
        """
        N = len(list1) - 1
        self._m = list1[0]
        self.learning_rate = eta
        
        # Create Computational Graph
        self._W = {}
        self._b = {}
        self._X = {}
        
        # -- Placeholder for inputs
        self._X['0'] = tf.placeholder('float', shape=[None, list1[0]])
        self._X_noisy = tf.placeholder('float', shape=[None, self._m])
        
        # -- Weights and biases
        for i in range(N):
            layer = '{0}'.format(i+1)
            print("AutoEncoder layer {0} : {1}-->{2}".format(layer, list1[i], list1[i+1]))
            self._W['E' + layer] = tf.Variable(tf.random_normal(shape=(list1[i], list1[i+1])),
                                              name='WtsEncoder'+layer)
            self._b['E' + layer] = tf.Variable(np.zeros(list1[i+1]).astype(np.float32),
                                               name='BiasEncoder' + layer)
            self._X[layer] = tf.placeholder('float', shape=[None, list1[i+1]])
            self._W['D' + layer] = tf.transpose(self._W['E' + layer]) #Shared weight
            self._b['D' + layer] = tf.Variable(np.zeros(list1[i]).astype(np.float32),
                                              name='BiasDecoder' + layer)
            
        
        # -- Pretraining
        self.train_ops = {}
        self.out = {}
        for i in range(N):
            layer = '{0}'.format(i+1)
            prev_layer = '{0}'.format(i)
            opt = self.pretrain(self._X[prev_layer], layer)
            self.train_ops[layer] = opt
            self.out[layer] = self.one_pass(self._X[prev_layer], self._W['E' + layer],
                                           self._b['E' + layer])
        
        self.y = self.encoder(self._X_noisy, N) #Encoder output
        self.r = self.decoder(self.y, N) #Decoder output
        
        optimiser = tf.train.AdamOptimizer(self.learning_rate)
        error = self._X['0'] - self.r #Reconstruction error
        
        self._loss = tf.losses.mean_squared_error(self._X['0'], self.r)
        self._opt = optimiser.minimize(self._loss)
        
    # Methods
    
    def encoder(self, X, N):
        x = X #original input
        for i in range(N):
            layer = '{0}'.format(i+1)
            hiddenE = tf.nn.sigmoid(tf.add(tf.matmul(x, self._W['E' + layer]), self._b['E' + layer]))
            x = hiddenE #new input for the next layer
        return x
    
    def decoder(self, X, N):
        x = X
        for i in range(N, 0, -1):
            layer = '{0}'.format(i)
            hiddenD = tf.nn.sigmoid(tf.add(tf.matmul(x, self._W['D' + layer]), self._b['D' + layer]))
            x = hiddenD
        return x
    
    def set_session(self, session):
        self.session = session
        
    def reconstruct(self, x, n_layers):
        h = self.encoder(x, n_layers)
        r = self.decoder(h, n_layers)
        return self.session.run(r, feed_dict={self._X['0']:x})
    
    def pretrain(self, X, layer):
        y = tf.nn.sigmoid(tf.add(tf.matmul(X, self._W['E' + layer]), self._b['E' + layer]))
        r = tf.nn.sigmoid(tf.add(tf.matmul(y, self._W['D' + layer]), self._b['D' + layer]))
        loss = tf.losses.mean_squared_error(X, r)
        opt = tf.train.AdamOptimizer(0.001).minimize(loss, var_list=[self._W['E' + layer], self._b['E' + layer]])
        return opt
    
    def one_pass(self, X, W, b):
        h = tf.nn.sigmoid(tf.add(tf.matmul(X, W), b))
        return h
    
    def getWeights(self, N):
        return self.session.run([self._W['E' + str(N)], self._W['D' + str(N)], self._b['E' + str(N)], self._b['D' + str(N)]])
        
    def fit(self, Xtrain, Xtr_noisy, layers, epochs=1, batch_size=100):
        N, D = Xtrain.shape
        num_batches = N//batch_size
        X_noisy = {}
        X = {}
        X_noisy['0'] = Xtr_noisy
        X['0'] = Xtrain
        for i in range(layers):
            Xin = X[str(i)]
            print('Pretraining Layer ', i+1)
            for e in range(5):
                for j in range(num_batches):
                    batch = Xin[j*batch_size:(j*batch_size+batch_size)]
                    self.session.run(self.train_ops[str(i+1)], feed_dict={self._X[str(i)]:batch})
            print("Pretraining Finished!")
            X[str(i+1)] = self.session.run(self.out[str(i+1)], feed_dict={self._X[str(i)]:Xin})
            
        obj=[]
        for i in range(epochs):
            for j in range(num_batches):
                batch = Xtrain[j*batch_size:(j*batch_size+batch_size)]
                batch_noisy = Xtr_noisy[j*batch_size:(j*batch_size+batch_size)]
                _, ob = self.session.run([self._opt, self._loss], feed_dict={self._X['0']:batch,
                                                                            self._X_noisy:batch_noisy})
                if j%100==0:
                    print("Training epoch {0} batch{1} loss {2}".format(i, j, ob))
                obj.append(ob)
        return obj

#### ELM Class

In [4]:
omega = 1.

class ELM(object):
    def __init__(self, sess, batch_size, input_len, hidden_num, output_len):
        '''
        Args:
          sess : TensorFlow session.
          batch_size : The batch size (N)
          input_len : The length of input. (L)
          hidden_num : The number of hidden node. (K)
          output_len : The length of output. (O)
        '''
    
        self._sess = sess 
        self._batch_size = batch_size
        self._input_len = input_len
        self._hidden_num = hidden_num
        self._output_len = output_len 

        # for train
        self._x0 = tf.placeholder(tf.float32, [self._batch_size, self._input_len])
        self._t0 = tf.placeholder(tf.float32, [self._batch_size, self._output_len])

        # for test
        self._x1 = tf.placeholder(tf.float32, [None, self._input_len])
        self._t1 = tf.placeholder(tf.float32, [None, self._output_len])

        self._W = tf.Variable(
          tf.random_normal([self._input_len, self._hidden_num]),
          trainable=False, dtype=tf.float32)
        self._b = tf.Variable(
          tf.random_normal([self._hidden_num]),
          trainable=False, dtype=tf.float32)
        self._beta = tf.Variable(
          tf.zeros([self._hidden_num, self._output_len]),
          trainable=False, dtype=tf.float32)
        self._var_list = [self._W, self._b, self._beta]

        self.H0 = tf.matmul(self._x0, self._W) + self._b # N x L
        self.H0_T = tf.transpose(self.H0)

        self.H1 = tf.matmul(self._x1, self._W) + self._b # N x L
        self.H1_T = tf.transpose(self.H1)

        # beta analytic solution : self._beta_s (K x O)
        if self._input_len < self._hidden_num: # L < K
            identity = tf.constant(np.identity(self._hidden_num), dtype=tf.float32)
            self._beta_s = tf.matmul(tf.matmul(tf.matrix_inverse(
                tf.matmul(self.H0_T, self.H0) + identity/omega), self.H0_T), self._t0)
          # _beta_s = (H_T*H + I/om)^(-1)*H_T*T
        else:
            identity = tf.constant(np.identity(self._batch_size), dtype=tf.float32)
            self._beta_s = tf.matmul(tf.matmul(self.H0_T, tf.matrix_inverse(
                tf.matmul(self.H0, self.H0_T)+identity/omega)), self._t0)
                # _beta_s = H_T*(H*H_T + I/om)^(-1)*T

        self._assign_beta = self._beta.assign(self._beta_s)
        self._fx0 = tf.matmul(self.H0, self._beta)
        self._fx1 = tf.matmul(self.H1, self._beta)

        self._cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = self._fx0, labels=self._t0))

        self._init = False
        self._feed = False

        # for the mnist test
        self._correct_prediction = tf.equal(tf.argmax(self._fx1,1), tf.argmax(self._t1,1))
        self._accuracy = tf.reduce_mean(tf.cast(self._correct_prediction, tf.float32))

    def feed(self, x, t):
        '''
        Args :
          x : input array (N x L)
          t : label array (N x O)
        '''

        if not self._init : self.init()
        self._sess.run(self._assign_beta, {self._x0:x, self._t0:t})
        self._feed = True

    def init(self):
        self._sess.run(tf.initialize_variables(self._var_list))
        self._init = True

    def test(self, x, t=None):
        if not self._feed : exit("Not feed-forward trained")
        if t is not None :
            print("Accuracy: {:.9f}".format(self._sess.run(self._accuracy, {self._x1:x, self._t1:t})))
        else :
            return self._sess.run(self._fx1, {self._x1:x})

<h3><center> Data </center><h3>

In [5]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST", one_hot=False)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST/t10k-labels-idx1-ubyte.gz
