In this exercise, I have experimented with an interesting neural network topic - 'Autoencoders'. The code has been written in Python with Tensorflow as the backend. Four different types of autoencoders have been implemented and simple analysis comparing the networks in general and the autoencoder types has also been done. I have implemented all autoencoders from scratch and the types are :
1. Shallow Autoencoder (simple network with one hidden layer)
2. Deep Autoencoder (larger network with #of hidden layers specified by the user)
3. Sparse Autoencoder (introduced sparsity in the network by activating minimal number of neurons to study the performance)
4. Denoising Autoencoder (introduce noise in the images and then let the autoencoder network learn the original images)

First, let's briefly review the autoencoders, the algorithm and different types and then discuss about the implementation and results.
Autoencoders:
Autoencoder is a neural network that is mostly used for dimensionality reduction purposes and by that I mean, the image is encoded into a lower-dimensionality at one end and then the original image is reconstructed at the decoding end. So, in a way the autoencoders are similar to PCA and t-SNE and other dimensionality reduction techniques. They are also used for image denoising (we have explored this in denoising autoencoders section) and are believed to learn good representations (at lower dimension). 

Auto-encoder image (image taken from Stanford UFLDL page)

![Alt text](imgs/AEN_image.PNG?raw=true "AEN_image")

So usually the autoencoder has two main components - the encoder and decoder,
Input --> Encoder --> Encoded output --> Decoder --> Output (Input_hat)

In this exercise, I have first implemented shallow and deep AENs from scratch defining each of the layers - conv, max_pool, fc (fully connected) and deconv (deconvolution). I have also worked directly with the tf.layers.conv2D functions and rewritten the code. The four hyperparameters essential for designing a conv net are :
1. Output Depth 2. Stride and 3. Zero padding size
In this exercise, I have designed my own filters and chose values for depth, stride and zero padding so as to reconstruct the original images.

In [None]:
import numpy as np
import os
import cv2
from scipy import ndimage
import scipy
import time
from math import sqrt, cos, pi
import matplotlib 
from matplotlib import pyplot as plt
import matplotlib.cm as cm
import pickle
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from skimage import transform
import tensorflow.contrib.layers as layers

class AEN_analysis:

    def __init__(self):
        self.train_data = []
        self.train_labels = []
        self.test_data = []
        self.test_labels = []
        self.data_classes = {}

        #MNIST Data
        self.data = []
        self.test = []
        self.eval_data = []

        # network_parameters initial setup (can be modified later if required
        # with the nw_params_modify function:
        self.batch_size = 1000
        self.image_shape = (28, 28, 1)
        self.image_shape_batch = (self.batch_size, self.image_shape[0], self.image_shape[1], self.image_shape[2])
        self.learning_rate = 0.01
        self.epoch_count = 10
        self.show_recon_num = 10
    
    #CIFAR dataset prelim options - load data, show histo of features and show original images:
    def unpickle(self, file):
        with open(file, 'rb') as fo:
            dict = pickle.load(fo, encoding='bytes')
        return dict

    def load_data_cifar(self):
        # Load the CIFAR dataset, there are 5 batches of training images, let's load
        # each of them and append them together
        print('loading data of batch: ')
        for i in range(5):
            print(str(i+1))
            this_data = self.unpickle('dataset_CIFAR/data_batch_'+str(i+1))
            # the dictionary has the following elements - batch_label, labels, data and filenames
            # we only need the data and the labels
            if i == 0:
                self.train_data = this_data[b'data']
                self.train_labels = this_data[b'labels']
                continue
            self.train_data = np.vstack((self.train_data, this_data[b'data']))
            self.train_labels.extend(this_data[b'labels'])
        # now load the test data
        test_all_data = self.unpickle('dataset_CIFAR/test_batch')
        self.test_data = test_all_data[b'data']
        self.test_labels = test_all_data[b'labels']
        # this data_class dict object in turn has following elements - num_cases_per_batch,
        # label_names and num_vis. We may need only label_names mostly, hence taking only that
        self.data_classes = self.unpickle('dataset_CIFAR/batches.meta')[b'label_names']
        return

    def show_original_data_cifar(self):
        
        # show the original images (one image from each of the 10 categories)
        num_eg = 5
        indi = [np.where(np.array(self.train_labels)==lab)[:num_eg] for lab in range(0, len(self.data_classes))]
        # plotting the images
        fig, aa = plt.subplots(len(self.data_classes),num_eg,figsize=(32, 32))
        for class_id in range(len(self.data_classes)):
            for i in range(num_eg):
                #aa[class_id,i].imshow(np.reshape(self.train_data[indi[class_id][0][i]],(32,32,3)))
                aa[class_id, i].imshow(np.dstack((self.train_data[indi[class_id][0][i]][:1024].reshape((32,32,1)),self.train_data[indi[class_id][0][i]][1024:2048].reshape((32,32,1)),self.train_data[indi[class_id][0][i]][2048:].reshape((32,32,1)))))
                aa[class_id,i].axis('off')

        fig.suptitle('{} sample images from each class'.format(5))
        plt.show()
        return

    def load_data_mnist(self):
        data = input_data.read_data_sets('MNIST_data')
        self.data = data.train.images
        self.test = data.test.images
        self.eval_data = data.validation
        return data
    
    # performance - accuracy, precision, recall and ROC (only for classification AEN all options):
    def performance(self):
        pass

    # loss comparison with different depths of aen (separate exp):
    def loss_compare(self):
        pass
    
    # creation of the different layers - convolution layer (conv), deconvolution layer(deconv),
    # max pool layer (max_pool) and fully connected layer (fc)
    def conv(self, inp, inp_name, fs, strde, pad='SAME', non_linearity=tf.nn.relu):
        Bi,Hi,Wi,Di = inp.shape
        Fh,Fw,Fd = fs
        Sh,Sw = strde
        with tf.variable_scope(inp_name):
            Weight_m = tf.get_variable('weights',[Fh,Fw,Di,Fd])
            Biases_m = tf.get_variable('biases',[Fd])
            conv_v = tf.nn.bias_add(tf.nn.conv2d(inp, Weight_m, [1,Sh,Sw,1],pad),Biases_m)
            final_conv = non_linearity(conv_v,inp_name)
            return final_conv

    def deconv(self, inp, inp_name, fs, strde, pad='SAME', non_linearity=tf.nn.relu):
        Bi,Hi,Wi,Di = inp.shape
        Fh,Fw,Fd = fs
        Sh,Sw = strde
        if pad=='VALID':
            Ho = (Hi-1)*Sh+Fh
            Wo = (Wi-1)*Sw+Fw
        elif pad=='SAME':
            Ho = Hi*Sh
            Wo = Wi*Sw
        output_shape = [self.batch_size, int(Ho), int(Wo), Fd]
        with tf.variable_scope(inp_name):
            Weight_m = tf.get_variable('weights',[Fh,Fw,Fd,Di])
            Biases_m = tf.get_variable('biases',[Fd])
            conv_v = tf.nn.bias_add(tf.nn.conv2d_transpose(inp, Weight_m, output_shape, [1,Sh,Sw,1],pad),Biases_m)
            final_deconv = non_linearity(conv_v,inp_name)
            return final_deconv

    def fc(self, inp, inp_name, output_shape, non_linearity = tf.nn.relu):
        with tf.variable_scope(inp_name):
            if len(inp.shape)==4:
                batch_size, Hi, Wi, Di = inp.shape
                input_shape = Hi*Wi*Di
                x = tf.reshape(inp,[batch_size, input_shape])
            else:
                input_shape = inp.shape[1]
                x = inp
            Weight_m = tf.get_variable('weights',[input_shape, output_shape])
            Biases_m = tf.get_variable('biases',[output_shape])
            if non_linearity:
                fc_output = non_linearity(tf.nn.xw_plus_b(x,Weight_m, Biases_m),inp_name)
            else:
                fc_output = x
            return fc_output

    def max_pool(self, inp, fs, strde, pad='SAME'):
        Fh, Fw = fs.shape
        Sh, Sw = strde.shape
        pool_output = tf.nn.max_pool(inp, [1,Fh, Fw, 1],[1,Sh,Sw,1],pad)
        return pool_output
    
    # the two parts of autoencoder - encoder and decoder:
    #for normal autoencoder
    def encoder(self, inp_im, nh, fs, filt, strd):
        for i in range(nh):
            this_name = str('conv')+str(i+1)
            if i==0:
                this_conv = layers.conv2d(inp_im, filt[i], [fs,fs], stride=strd,padding='SAME')
                continue
            else:
                this_conv = layers.conv2d(this_conv, filt[i], [fs,fs], stride=strd,padding='SAME')
        print(this_conv.shape)
        return this_conv

    #for sparse auto encoder:
    '''def encoder(self, inp_im, nh, fs, filt, strd):
        for i in range(nh):
            this_name = str('conv')+str(i+1)
            if i==0:
                this_conv = tf.layers.conv2d(inp_im, filt[i], [fs,fs], strides=strd,padding='SAME', activity_regularizer=tf.contrib.layers.l2_regularizer(10e-5))
                continue
            else:
                this_conv = tf.layers.conv2d(this_conv, filt[i], [fs,fs], strides=strd,padding='SAME', activity_regularizer=tf.contrib.layers.l2_regularizer(10e-5))
        print(this_conv.shape)
        return this_conv'''

    def decoder(self, inp_im, nh, fs, filt, pad, strd):
        #Now the input that is coming here is of size 4 x 4 x 4 (for deep aen)
        for i in range(nh):
            this_name = str('deconv')+str(i+1)
            if i==0:
                this_deconv = layers.conv2d_transpose(inp_im, filt[i], [fs[i],fs[i]], stride=strd[i],padding=pad[i])
            else:
                this_deconv = layers.conv2d_transpose(this_deconv, filt[i], [fs[i], fs[i]], stride = strd[i], padding=pad[i])
        print(this_deconv.shape)
        return this_deconv
    
    #encoder and decoder for using my implementation of conv and other such layers:
    def encoder_m(self, inp_im, nh, fs, strde):
        #nh is the number of hidden layers
        # can modify these hyperparameters, stride, final_size and filter size (3 or 5 is common) if required
        final_size = 128
        for i in range(nh):
            this_name = str('conv')+str(i+1)
            if i==0:
                this_conv = self.conv(inp_im, this_name, fs[i], strde)
                continue
            this_conv = self.conv(this_conv, this_name, fs[i], strde)
        fc_im = self.fc(this_conv, 'fc_im', final_size, non_linearity= None)
        print(fc_im.shape)
        return fc_im

    def decoder_m(self, inp_im, nh, fs, strde, final_size, f_s):
        fc_im = self.fc(inp_im, 'fc_deconv', final_size)
        inp = tf.reshape(fc_im, [-1,f_s[0],f_s[1],f_s[2]]) #this makes each one of size 4 x 4 (with 4 feature maps)
        for i in range(nh):
            this_name = str('deconv')+str(i+1)
            if i==0:
                this_deconv = self.deconv(inp, this_name, fs[i], strde[i],pad='VALID')
                continue
            elif i==nh-1:
                this_deconv = self.deconv(this_deconv, this_name, fs[i], strde[i],pad='VALID', non_linearity= tf.sigmoid)
            else:
                this_deconv = self.deconv(this_deconv, this_name, fs[i], strde[i],pad='SAME')
        print(this_deconv.shape)
        return this_deconv
    
    # cross entropy loss may be better, see that
    def loss_v(self, orig_im, decd_im):
        diff = (orig_im-decd_im)**2
        return tf.div(tf.reduce_sum(diff),tf.constant(float(self.batch_size)))
        
    def nw_params_modify(self, bs, img_sh, learn_rt, epoch):
        if bs is not None:
            self.batch_size = bs
        if img_sh is not None:
            self.image_shape = img_sh
        self.image_shape_batch = (self.batch_size, self.image_shape[0], self.image_shape[1], self.image_shape[2])
        if learn_rt is not None:
            self.learning_rate = learn_rt
        if epoch is not None:
            self.epoch_count = epoch
        return

    #show the original vs. decoded image
    def show_reconstructed(self, orig, decoded_im):
        fold_imgsave = './results/'
        if not os.path.exists(fold_imgsave):
              os.makedirs(fold_imgsave)
        orig_1 = orig[:self.show_recon_num]
        recon_1 = decoded_im[:self.show_recon_num]
        print(recon_1)
        img_no=1
        for (o, r) in zip(orig_1, recon_1):
            orig1 = np.reshape(o, (self.image_shape[0],
                                     self.image_shape[1]))
            recon = np.reshape(r, (self.image_shape[0],
                                   self.image_shape[1]))
            f, ax = plt.subplots(1,2)
            ax[0].imshow(orig1, cmap='gray')
            ax[1].imshow(recon, cmap='gray')
            #plt.show()
            plt.savefig(fold_imgsave + "res_%d.png" % img_no)
            img_no+=1
        return
        
    # now different types of autoencoders experimented in this exercise:

    def shallow_aen(self):
        #step-0: In shallow aen, number of hidden layers is fixed at 1
        num_hidden_l = 1
        f_size = 5
        f_size_d = [5]
        filt_e = [16]
        filt_d = [1]
        pad_d = ['SAME']
        strde = 4
        strd = [4]
        
        #step-1: set the parameters of the network, if needed change values here:
        self.nw_params_modify(None,None,None,None)
        
        #step-1: load dataset
        self.load_data_mnist()

        #step-2: set routine for encoding the image
        input_im = tf.placeholder(tf.float32, shape=self.image_shape_batch, name='input_im')
        with tf.variable_scope('aen'):
            encoded_img = self.encoder(input_im, num_hidden_l, f_size, filt_e, strde)
            #step-3: set routine for decoding the image
            decoded_img = self.decoder(encoded_img, num_hidden_l, f_size_d, filt_d, pad_d, strd)
        
        #step-4: set to calculate loss
        loss = self.loss_v(input_im, decoded_img)

        #step-5: start session and get the tf graph
        #first, set optimizer
        optimizer_nw = tf.train.AdamOptimizer(self.learning_rate).minimize(loss)
        initialize = tf.global_variables_initializer()
        this_loss = 0
        cumul_loss = 0
        with tf.Session() as session:
            session.run(initialize)
            #the data has to be trained epoch_count times in size of batch_size, so
            #the total number of iterations in this case would be:
            num_iter = self.epoch_count*len(self.data)
            num_iter/= self.batch_size
            print('Total number of iterations is, ', num_iter)
            k=0
            # begin iteration and training:
            for i in range(int(num_iter)):
                this_data = np.reshape(self.data[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                recon_im, this_loss, this_opt = session.run([decoded_img, loss, optimizer_nw],feed_dict={input_im:this_data})
                #print the loss every 50th iteration :
                if i%50 == 0:
                    print("Iteration ", i, " : Loss = ", this_loss)
                cumul_loss += this_loss
                k+=1
                if (self.batch_size*k)==len(self.data):
                    k = 0
            #step-6: display the reconstructed images in intermediate epochs
            num_iter = len(self.test)/self.batch_size
            k=0
            for j in range(int(num_iter)):
                this_test = np.reshape(self.test[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                decoded_im = session.run(decoded_img, feed_dict={input_im:this_test})
                if j%100 == 0:
                    self.show_reconstructed(this_test, decoded_im)
                k+=1
                if (self.batch_size*k)==len(self.test):
                    k = 0
        return

    def deep_aen(self):
        #step-0: get the number of hidden layers:
        num_hidden_l = 3
        f_size = 3
        f_size_d = [7,3,3]
        filt_e = [16, 8, 4]
        filt_d = [8, 16, 1]
        pad_d = ['VALID', 'SAME', 'VALID']
        strd = [2,2,1]
        #step-1: set the parameters of the network, if needed change values here:
        self.nw_params_modify(None,None,None,None)
        
        #step-1: load dataset
        self.load_data_mnist()

        #step-2: set routine for encoding the image
        input_im = tf.placeholder(tf.float32, shape=self.image_shape_batch, name='input_im')
        with tf.variable_scope('aen'):
            encoded_img = self.encoder(input_im, num_hidden_l, f_size, filt_e, 2)
            #step-3: set routine for decoding the image
            decoded_img = self.decoder(encoded_img, num_hidden_l, f_size_d, filt_d, pad_d, strd)
        
        #step-4: set to calculate loss
        loss = self.loss_v(input_im, decoded_img)

        #step-5: start session and get the tf graph
        #first, set optimizer
        optimizer_nw = tf.train.AdamOptimizer(self.learning_rate).minimize(loss)
        initialize = tf.global_variables_initializer()
        this_loss = 0
        cumul_loss = 0
        with tf.Session() as session:
            session.run(initialize)
            #the data has to be trained epoch_count times in size of batch_size, so
            #the total number of iterations in this case would be:
            num_iter = self.epoch_count*len(self.data)
            num_iter/= self.batch_size
            print('Total number of iterations is, ', num_iter)
            k=0
            # begin iteration and training:
            for i in range(int(num_iter)):
                this_data = np.reshape(self.data[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                recon_im, this_loss, this_opt = session.run([decoded_img, loss, optimizer_nw],feed_dict={input_im:this_data})
                #print the loss every 50th iteration :
                if i%50 == 0:
                    print("Iteration ", i, " : Loss = ", this_loss)
                cumul_loss += this_loss
                k+=1
                if (self.batch_size*k)==len(self.data):
                    k = 0
            #step-6: display the reconstructed images in intermediate epochs
            num_iter = len(self.test)/self.batch_size
            k=0
            for j in range(int(num_iter)):
                this_test = np.reshape(self.test[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                decoded_im = session.run(decoded_img, feed_dict={input_im:this_test})
                if j%100 == 0:
                    self.show_reconstructed(this_test, decoded_im)
                k+=1
                if (self.batch_size*k)==len(self.test):
                    k = 0
        return

    def sparse_aen(self):
        #step-0: get the number of hidden layers:
        num_hidden_l = 3
        f_size = 3
        f_size_d = [7,3,3]
        filt_e = [16, 8, 4]
        filt_d = [8, 16, 1]
        pad_d = ['VALID', 'SAME', 'VALID']
        strd = [2,2,1]
        #step-1: set the parameters of the network, if needed change values here:
        self.nw_params_modify(None,None,None,None)
        
        #step-1: load dataset
        self.load_data_mnist()

        #step-2: set routine for encoding the image
        input_im = tf.placeholder(tf.float32, shape=self.image_shape_batch, name='input_im')
        with tf.variable_scope('aen'):
            encoded_img = self.encoder(input_im, num_hidden_l, f_size, filt_e, 2)
            mean_v = tf.reduce_mean(encoded_img)
            #step-3: set routine for decoding the image
            decoded_img = self.decoder(encoded_img, num_hidden_l, f_size_d, filt_d, pad_d, strd)
        
        #step-4: set to calculate loss
        loss = self.loss_v(input_im, decoded_img)

        #step-5: start session and get the tf graph
        #first, set optimizer
        optimizer_nw = tf.train.AdamOptimizer(self.learning_rate).minimize(loss)
        initialize = tf.global_variables_initializer()
        this_loss = 0
        cumul_loss = 0
        with tf.Session() as session:
            session.run(initialize)
            #the data has to be trained epoch_count times in size of batch_size, so
            #the total number of iterations in this case would be:
            num_iter = self.epoch_count*len(self.data)
            num_iter/= self.batch_size
            print('Total number of iterations is, ', num_iter)
            k=0
            # begin iteration and training:
            for i in range(int(num_iter)):
                this_data = np.reshape(self.data[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                recon_im, this_loss, this_opt, mean_va = session.run([decoded_img, loss, optimizer_nw, mean_v],feed_dict={input_im:this_data})
                #print the loss every 50th iteration :
                if i%50 == 0:
                    print("Iteration ", i, " : Loss = ", this_loss, " Encoded image mean: ", mean_va)
                cumul_loss += this_loss
                k+=1
                if (self.batch_size*k)==len(self.data):
                    k = 0
            #step-6: display the reconstructed images in intermediate epochs
            num_iter = len(self.test)/self.batch_size
            k=0
            for j in range(int(num_iter)):
                this_test = np.reshape(self.test[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                decoded_im = session.run(decoded_img, feed_dict={input_im:this_test})
                if j%100 == 0:
                    self.show_reconstructed(this_test, decoded_im)
                k+=1
                if (self.batch_size*k)==len(self.test):
                    k = 0
        return

    def denoise_aen(self):
         #step-1: set the parameters of the network, if needed change values here:
        self.nw_params_modify(None,None,None,None)
        
        #step-1: load dataset
        self.load_data_mnist()
        
        data_noisy = np.array(self.data).astype('float32')/255.
        test_noisy = np.array(self.test).astype('float32')/255.
        data_noisy = np.reshape(data_noisy,(len(data_noisy),28,28,1))
        test_noisy = np.reshape(test_noisy,(len(test_noisy),28,28,1))

        mean = 0
        sigma = 0.002
        noise_1 = np.random.normal(mean, sigma, size=data_noisy.shape)
        noise_2 = np.random.normal(mean, sigma, size=test_noisy.shape)
        data_noisy += noise_1
        test_noisy += noise_2
        data_noisy = np.clip(data_noisy, 0.,1.)
        test_noisy = np.clip(test_noisy, 0.,1.)

        #visualize the noisy images:
        num_visual = 10
        plt.figure(figsize=(32, 2))
        for i in range(num_visual):
            ax = plt.subplot(1, num_visual, i+1)
            plt.imshow(data_noisy[i].reshape(28,28))
            plt.gray()
        plt.show()

        #now design the network :
        #trying shallow aen:
        #step-0: In shallow aen, number of hidden layers is fixed at 1
        num_hidden_l = 1
        f_size = 5
        f_size_d = [5]
        filt_e = [64]
        filt_d = [1]
        pad_d = ['SAME']
        strde = 4
        strd = [4]
        
        #step-2: set routine for encoding the image
        input_im = tf.placeholder(tf.float32, shape=self.image_shape_batch, name='input_im')
        origg_im = tf.placeholder(tf.float32, shape=self.image_shape_batch, name='origg_im')
        with tf.variable_scope('aen'):
            encoded_img = self.encoder(input_im, num_hidden_l, f_size, filt_e, strde)
            #step-3: set routine for decoding the image
            decoded_img = self.decoder(encoded_img, num_hidden_l, f_size_d, filt_d, pad_d, strd)
        
        #step-4: set to calculate loss
        loss = self.loss_v(origg_im, decoded_img)

        #step-5: start session and get the tf graph
        #first, set optimizer
        optimizer_nw = tf.train.AdamOptimizer(self.learning_rate).minimize(loss)
        initialize = tf.global_variables_initializer()
        this_loss = 0
        cumul_loss = 0
        with tf.Session() as session:
            session.run(initialize)
            #the data has to be trained epoch_count times in size of batch_size, so
            #the total number of iterations in this case would be:
            num_iter = self.epoch_count*len(self.data)
            num_iter/= self.batch_size
            print('Total number of iterations is, ', num_iter)
            k=0
            # begin iteration and training:
            for i in range(int(num_iter)):
                this_data = np.reshape(data_noisy[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                this_orig = np.reshape(self.data[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                recon_im, this_loss, this_opt = session.run([decoded_img, loss, optimizer_nw],feed_dict={input_im:this_data, origg_im: this_orig})
                #print the loss every 50th iteration :
                if i%50 == 0:
                    print("Iteration ", i, " : Loss = ", this_loss)
                cumul_loss += this_loss
                k+=1
                if (self.batch_size*k)==len(self.data):
                    k = 0
            #step-6: display the reconstructed images in intermediate epochs
            num_iter = len(self.test)/self.batch_size
            k=0
            for j in range(int(num_iter)):
                this_test = np.reshape(test_noisy[self.batch_size*k:self.batch_size*(k+1)],self.image_shape_batch)
                decoded_im = session.run(decoded_img, feed_dict={input_im:this_test})
                if j%100 == 0:
                    self.show_reconstructed(this_test, decoded_im)
                k+=1
                if (self.batch_size*k)==len(self.test):
                    k = 0        
        return

    def var_aen(self):
        pass

    def main(self, choice):
        # Step 1: Try with MNIST and if it works, experiment with CIFAR-10 or some other dataset dataset
        options = {'shallow_aen':self.shallow_aen,
                   'deep_aen':self.deep_aen,
                   'sparse_aen': self.sparse_aen,
                   'denoise_aen': self.denoise_aen,
                   'var_aen':self.var_aen,
            }
        options[choice]()
        return


if __name__ == "__main__":

    # creating an instance of the AEN class
    aen_class = AEN_analysis()

    # Call to the main function
    # Arguments to the main function is the choice of Autoencoder : available are shallow AEN, deep AEN, sparse AEN, denoising AEN, variational AEN, Stacked AEN for classification
    tic = time.time()
    print('Choices are : \n 1. Shallow (1 hidden layer) AEN (shallow_aen), \n 2. Deep AEN (deep_aen), \n 3. Sparse Autoencoder (sparse_aen) and \n 4. Denoising AEN(denoise_aen)')
    choice_v = input("Enter the choice: ")
    aen_class.main(choice_v)
    toc = time.time() - tic
    print("Running time: " + str(toc))


I worked with MNIST data for the most part and as we can see from the results, this is a pretty easy dataset to work on. Other datasets take longer time for training. I tried the CIFAR-10 dataset, but that is taking quite long to train, so sharing the results of MNIST in this exercise.
As we know, MNIST is a very popular dataset of digits with images of size (28, 28). The dataset is quite simple in the sense that the pixels are correlated and works good for testing any classification/regression/dimensionality reduction algorithms.

1. Shallow AEN :
For the shallow AEN, the network was designed this way:
 There is only one hidden layer, so 
 Input --> Hidden Layer --> Output
The values of the hyperparameters are :
    a. Number of hidden layers = 1
    b. Size of filter = 5 x 5
    c. Number of filters (K) = 16
    d. Padding type = 'SAME' (that means, zeros are padded along the image to match output dimension with input dimension)
    e. Stride = 4 (Stride value specifies how many pixels to move when convolving with the filter)
This screenshot shows the reduction in loss as the network is trained. The loss used throughout this exercise is the mean squared error. Binary cross entropy may be a better option and I would like to evaluate that as well.
![Alt text](imgs/loss_red.PNG?raw=true "loss_red")

*Analysis on Filter Size:*
I experimented with filters of different size (going upto 7 x 7 though 7 x7 is very rarely used), so I tried 1 x 1, 3 x 3, 5 x 5 and 7 x7 filters and found that the reconstruction worked the best with filters of larger size (5x5 and 7x7)
Here are the results with 5x5 filter size in both the encoding and decoding side
![Alt text](imgs/res_1.png?raw=true "res_1")
![Alt text](imgs/res_2.png?raw=true "res_2")
![Alt text](imgs/res_4.png?raw=true "res_4")

With 7x7 filters, the result was pretty similar
![Alt text](imgs/res_71.png?raw=true "res_71")
![Alt text](imgs/res_72.png?raw=true "res_72")
![Alt text](imgs/res_74.png?raw=true "res_74")

Now, let's look at the results from the 3 x 3 filters
![Alt text](imgs/res_31.png?raw=true "res_31")
![Alt text](imgs/res_32.png?raw=true "res_32")
![Alt text](imgs/res_34.png?raw=true "res_34")


And with the 1 x 1 filters,
![Alt text](imgs/res_11.png?raw=true "res_11")
![Alt text](imgs/res_12.png?raw=true "res_12")
![Alt text](imgs/res_14.png?raw=true "res_14")

We can only barely recognize the digit in this case

*Analysis on # of filters (depth slice):*
I also found that increasing the number of filters improves the loss and reconstruction. In the above hyperparameter setup K was equal to 16. This means the spatial dimensions change this way,
28x28x1 --> 7x7x16 --> 28x28x1
And when I change K to 64,
28x28x1 --> 7x7x64 --> 28x28x1
Output in this case,

![Alt text](imgs/res_9.png?raw=true "res_9")
![Alt text](imgs/res_10.png?raw=true "res_10")

2. Deep AEN:
For a deep AEN, I did not really go that deep since that would require GPU or may be very slow on a CPU. So, I set the # of hidden layers = 3. This would mean,
Input --> Hidden Layer1 --> Hidden Layer2 ---> Hidden Layer 3 ---> Deconv Hidden1 --> Deconv Hidden2 -->Deconv Hidden3 -->Output

Designing the filters in this case was a bit challenging. I had to play around with different numbers for quite sometime before I found the value of hyperparameters that converged with stability in few epochs. All of the experiments for shallow and deep AEN were iterated for 10 epochs.
The hyperparameter values are :
    a. Number of hidden layers = 3 (each on encoding and decoding side)
    b. Size of filter at encoder = 3 x 3
    c. Size of filter at decoder (layer-wise) = 7x7,3x3,3x3
    c. Number of filters at encoder layer-wise (K) = [16,8,4]
    d. Number of filters at decoder layer-wise (K) = [8,16,1]
    e. Padding type = 'SAME' at encoder and for decoder (layer-wise) = ['valid', 'same', 'valid'] and here valid padding implies that there is no zero padding
    f. Stride = 2 at encoder and [2,2,1] at decoder 

Usually stride is used to reduce the spatial dimension. Using the max_pool also in addition would have been a good idea.
At the encoder side for a conv layer, the output size is determined using this formula,
Wo = (Wi-F+2P)/S + 1
Ho = (Hi-F+2P)/S + 1
Where (Wo, Ho) are output shapes, (Wi, Hi) - input shape, P - padding size and S - stride size

At the decoder side for a conv layer, spatial expansion happens with the stride and can be determined as,
For same padding,
Wo = WixSw
Ho = HixSh
For valid padding,
Wo = (Wi-1)xSw + Fw
Ho = (Hi-1)xSh + Fh

With the above mentioned values of hyperparameters I achieved the following dimensionalities :
28x28x1 --> 14x14x16 --> 7x7x8 --> 4x4x8 --> 13x13x8 --> 26x26x16 --> 28x28x1

Output:
![Alt text](imgs/loss_myfilt2.png?raw=true "loss_myfilt2")
![Alt text](imgs/res_d1.png?raw=true "res_d1")
![Alt text](imgs/res_d9.png?raw=true "res_d9")

This result is from training the network with just 10 epochs. Since the network is larger, training for more epochs and also modifying the # of filters and other such hyperparameters produces better results.
        
3. Sparse AEN:
For the sparse network, what I did was introduce sparsity in the activation of the nodes in the network by using regularization. The conv2D function in tensorflow has the optional argument - activity_regularizer, setting that to be equal to tf.layers.l2_regularizer(1e-05) or some scalar value enables sparsity. 
Output,
![Alt text](imgs/res_sp4.png?raw=true "res_sp4")
![Alt text](imgs/res_sp10.png?raw=true "res_sp10")
As we can see there is not much difference in the reconstructed image but there is sparsity

4. Denoising AEN:
This part was really interesting. I introduced noise in the image(gaussian and salt and pepper type) in the image and trained the shallow autoencoder on the noisy images. The loss is optimized with respect to the original image and from the results below we can see that the AEN is able to recover the original images from the noisy images.

![Alt text](imgs/res_de4.png?raw=true "res_de4")
![Alt text](imgs/res_de8.png?raw=true "res_de8")

I also observed that since the input is noisy, more epochs were needed to reconstruct the image. The above results are from training for 100 epochs. With 10 epochs the result is poor, as we can see below:
![Alt text](imgs/res_de14.png?raw=true "res_de14")
![Alt text](imgs/res_de7.png?raw=true "res_de7")

It would be interesting to work with Variational Autoencoders and GAN, but ran out of time. Working on different dataset (such as the CIFAR-10) would help improve and restructure the code better.