<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">Autoencoder for Dimensionality Reduction.</h1><a id = "1" ></a>

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='color:white; background:#0096FF ; border:0' role="tab" aria-controls="home"><center>Quick Navigation</center></h3>

* [Importing Libraries](#1.1)
* [Data Preprocessing](#2)
    - [Load Training Dataset](#2.1)
    - [Split Data](#2.2)
    - [Image Preprocessing](#2.3)
    
* [Data Visualization](#3)
* [Model Summary](#5)
    
* [**Autoencoder Architecture**](#5.3) 
    
* [Save Model](#6.2)
* [Plot Evaluation](#8)

**Objective**
* The stacker Autoencoder is used to Dimensionality reduction of the given Digit images from the MNIST dataset and Fashion MNIST dataset implemented using Tensorflow 2.0.

<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">1. IMPORTING LIBRARIES</h1><a id = "1.1" ></a>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn import decomposition

import tensorflow 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input,Dense, Activation,Flatten, Conv2D, Dropout,Reshape
from tensorflow.keras.layers import MaxPool2D, LSTM, BatchNormalization,concatenate
from tensorflow.keras.callbacks import ReduceLROnPlateau,ModelCheckpoint
from tensorflow.keras.layers import ELU
from tensorflow.keras.losses import sparse_categorical_crossentropy, categorical_crossentropy
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam,SGD
import warnings
warnings.filterwarnings('ignore')

from sklearn.manifold import TSNE

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

<h1 style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">2. DATA PREPROCESSING</h1><a id = "2" ></a>

<h2  style='color:#0096FF; background:white ; border:0' class="list-group-item list-group-item-action active">2.1 LOAD TRAINING DATASET</h2><a id = "2.1" ></a>

In [None]:
loc = '../input/digit-recognizer'
train_data = pd.read_csv(loc+'/train.csv')
fashion_train_data = pd.read_csv('../input/fashionmnist/fashion-mnist_train.csv')


In [None]:
test_file = loc+"/test.csv"
test_data = pd.read_csv(test_file)
fashion_test_data = pd.read_csv('../input/fashionmnist/fashion-mnist_test.csv')

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Data Dimension</h3>

In [None]:
print(f"train.csv size is {train_data.shape}")
print(f"test.csv size is {test_data.shape}")

In [None]:
train_data.head()

In [None]:
train_data.describe()

**To Check how many pixel columns are there such that all their values are zero as we can see from the describe() function.**

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Missing values</h3>

In [None]:
train_data.isna().sum()

* No missing values

In [None]:
train_labels=train_data['label']

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">2.2 SPLIT DATA</h2><a id = "2.2" ></a>

In [None]:
img_rows, img_cols = 28, 28 # 28*28 dimension of image reshape
num_classes = 10 # 10 number of labels

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">2.3 PREPROCESSING</h2><a id = "2.3" ></a>

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">One hot Encoding</h3>

One hot encoding is one method of converting data to prepare it for an algorithm and get a better prediction. With one-hot, we convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns. Each integer value is represented as a binary vector.

In [None]:
def data_prep(raw):
    out_y = tensorflow.keras.utils.to_categorical(raw.label, num_classes)

    num_images = raw.shape[0]
    x_as_array = raw.values[:,1:]
    x_shaped_array = x_as_array.reshape(num_images, img_rows, img_cols, 1)
    # normalization
    out_x = x_shaped_array / 255
    return out_x, out_y

x, y = data_prep(train_data)
x_fashion,y_fashion = data_prep(fashion_train_data)

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Data Split</h3>

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=42)#ratio 90:10
x_train, x_val, y_train, y_val= train_test_split(x_train, y_train, test_size = 1/9, random_state=42)

In [None]:
x_train_fashion, x_test_fashion, y_train_fashion, y_test_fashion = train_test_split(x_fashion, y_fashion, test_size=0.1, random_state=42)#ratio 90:10
x_train_fashion, x_val_fashion, y_train_fashion, y_val_fashion= train_test_split(x_train_fashion, y_train_fashion, test_size = 1/9, random_state=42)

* Splitting data into 90% training and 10% test data.
* Splitting the training data into 90% training data and 10% validation data

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Reshaping the Data to 4 dimension</h3>

In [None]:
# Normalization
test_data = test_data / 255  
# reshaping
test_data = test_data.values.reshape(-1,28,28,1)
test_data.shape

In [None]:
# Normalization
fashion_test_data = fashion_test_data / 255  
# reshaping
#fashion_test_data = fashion_test_data.values.reshape(-1,28,28,1)
#fashion_test_data.shape

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Data Dimension</h3>

Dimensions for MNIST Digit dataset.

In [None]:
print(f"Training data size is {x_train.shape}")
print(f"Training data size is {y_train.shape}")
print(f"Testing data size is {x_test.shape}")
print(f"Training data size is {y_test.shape}")

Dimensions of Fashion MNIST Dataset

In [None]:
print(f"Training data size is {x_train_fashion.shape}")
print(f"Training data size is {y_train_fashion.shape}")
print(f"Testing data size is {x_test_fashion.shape}")
print(f"Training data size is {y_test_fashion.shape}")

<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">3. VISUALIZE DATA</h1><a id = "3" ></a>

In [None]:
title=[j for i in range(1, 10) for j in range(0,10) if y_train[i][j] == 1]
title

In [None]:
title_fashion=[j for i in range(1, 10) for j in range(0,10) if y_train_fashion[i][j] == 1]
title_fashion

In [None]:
plt.figure(figsize=(7,9))
for i in range(1, 10):
    plt.subplot(330 + i)
    plt.imshow(x_train[i], cmap=plt.get_cmap('gray'))
    plt.title(title[i-1])
    
plt.tight_layout()

In [None]:
plt.figure(figsize=(9,9))
for i in range(1, 10):
    plt.subplot(330 + i)
    plt.imshow(x_train_fashion[i], cmap=plt.get_cmap('gray'))
    plt.title(title_fashion[i-1])
    
plt.tight_layout()

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">4. MODEL SUMMARY</h2><a id = "5" ></a>

<h2  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Autoencoder Architecture</h2><a id = "5.3" ></a>

  Stacked Autoencoder is a method used for compressing and dimensionality reduction of the Digit images.
* Stacked Autoencoder has stacked_encoder and stacker decoder layers.  

**Stacked Encoder**

* The Stacked encoder has 3 hidden layers. Flatten layer with 100 neural units, Dense layer with 100 neurons , Dense layer with 20 neurons with 'SELU' activation functions.

**Stacked Decoder**

* The Stacked encoder has 3 hidden layers. Dense layer with 100 neural units, Dense layer with 784 neurons (activation='sigmoid') , Reshape layer with 28,28 image rows,image columns.

The Autoencoder model is one of the Dimensionality reduction technique.


In [None]:
stacked_encoder = Sequential()

stacked_encoder.add(Flatten(input_shape=(img_rows, img_cols, 1)))
stacked_encoder.add(Dense(256,activation='relu'))
stacked_encoder.add(Dense(40,activation='relu'))

stacked_decoder=Sequential()
stacked_decoder.add(Dense(256,activation='relu',input_shape=[40]))
stacked_decoder.add(Dense(28*28,activation='sigmoid'))
stacked_decoder.add(Reshape([28,28]))

stacked_ae = Sequential([stacked_encoder,stacked_decoder])
model = stacked_ae

In [None]:
from tensorflow.keras.utils import plot_model
plot_model(model, to_file='model.png', show_shapes=True)
from IPython.display import Image
Image("model.png")

In [None]:
epochs = 20

In [None]:
model.compile(optimizer = SGD(lr=1.5),loss = 'binary_crossentropy' )
model_fit = model.fit(x=x_train,y=x_train, epochs=epochs,validation_data=(x_val,x_val) ,verbose =1)

In [None]:
stacked_ae.compile(optimizer = SGD(lr=1.5),loss = 'binary_crossentropy' )
stacked_ae_fit = stacked_ae.fit(x=x_train_fashion,y=x_train_fashion, epochs=20 ,validation_data=(x_val_fashion,x_val_fashion) ,verbose =1)

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">SAVE MODEL</h2><a id = "6.2" ></a>

In [None]:
!mkdir -p saved_model
model.save('saved_model/auto_encoder.json')
stacked_ae.save('saved_model/fashion_auto_encoder.json')


<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">6. PLOT & EVALUATION</h1><a id = "8" ></a>

Digit MNIST compression

In [None]:
x_test_compress = model.predict(x_test)

Fashion MNIST dataset compression

In [None]:
fashion_compress = stacked_ae.predict(x_test_fashion)

In [None]:
fashion_compress = model.predict(x_test_fashion)

Plotting the uncompressed and compressed MNIST Digit images together.

In [None]:
fig, axs = plt.subplots(10,2,figsize=(30,50))
for i in range(1, 10):
    axs[i][0].imshow(x_test[i], cmap=plt.get_cmap('gray'))
    axs[i][1].imshow(x_test_compress[i], cmap=plt.get_cmap('gray'))


Plotting the Fashion MNIST datasets UnCompressed and compressed images. 

In [None]:
fig, axs = plt.subplots(10,2,figsize=(30,50))
for i in range(1, 10):
    axs[i][0].imshow(x_test_fashion[i], cmap=plt.get_cmap('gray'))
    axs[i][1].imshow(fashion_compress[i], cmap=plt.get_cmap('gray'))
plt.tight_layout()
plt.show()

##### If you like my work, please do upvote. Thanks - `@tejasurya`