### Build a DNN using Keras with `RELU` and `ADAM`

#### Load tensorflow

In [17]:
%tensorflow_version 2.x
import tensorflow as tf
print(tf.__version__)

2.1.0-rc1


Importing tensorflow and setting version as 2.0 above

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### Collect Fashion mnist data from tf.keras.datasets 

In [0]:
from keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

In [20]:
print('train dataset shape:')
print(x_train.shape)
print('test dataset shape:')
print(x_test.shape)

train dataset shape:
(60000, 28, 28)
test dataset shape:
(10000, 28, 28)


#### Change train and test labels into one-hot vectors

In [0]:
y_train_cat = tf.keras.utils.to_categorical(y_train)
y_test_cat = tf.keras.utils.to_categorical(y_test)

As this is multi-class problem, categorical crossentropy will be used as a loss function which requires Labels in the data to be one-hot encoded.

#### Build the Graph

#### Initialize model, reshape & normalize data

In [0]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.backend import clear_session
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import BatchNormalization

In [0]:
clear_session()

#Initialize model
model1 = Sequential()

#Reshape data from 2D to 1D -> 28x28 to 784
model1.add(Reshape((784,),input_shape=(28,28,)))

#Normalize the data
model1.add(BatchNormalization())

#### Add two fully connected layers with 200 and 100 neurons respectively with `relu` activations. Add a dropout layer with `p=0.25`

In [0]:
#Add First Dense layer
model1.add(Dense(200, activation='relu'))

#Add Second Dense layer
model1.add(Dense(100, activation='relu'))

#Add dropout layer
model1.add(Dropout(0.25))

### Add the output layer with a fully connected layer with 10 neurons with `softmax` activation. Use `categorical_crossentropy` loss and `adam` optimizer and train the network. And, report the final validation.

In [0]:
#Add Output layer
model1.add(Dense(10, activation='softmax'))

In [0]:
#Compile model with categorical_crossentropy loss and adam optimizer
model1.compile(optimizer='adam', 
              loss='categorical_crossentropy', metrics=['accuracy'])

In [27]:
model1.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
reshape (Reshape)            (None, 784)               0         
_________________________________________________________________
batch_normalization (BatchNo (None, 784)               3136      
_________________________________________________________________
dense (Dense)                (None, 200)               157000    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               20100     
_________________________________________________________________
dropout (Dropout)            (None, 100)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
Total params: 181,246
Trainable params: 179,678
Non-trainable params: 1,568
______________________________________________

Model summary above depicts the model representation.

Reshaping layer is the first layer or input layer which does not have any parameters (as reshaped layer output is directly fed to the subsequent dense layer with is first hidden layer).

Batch Normalization layer in this model normalizes the previous layer output at each batch.This leads to 784 * 2 = 1568 non-trainable parameters mu and signma i.e. not updated by gradient descent and trainable parameters gamma and beta which are updated based on required scaling and shifting as per gradient descent.Batchnormalization layer is used to reduce overfitting and also normalization helps reduce the effect of varying distribution of input data when test data is fed to the model.

After Batch Normalization layer this model has two fully connected or dense hidden layers with given number of neurons.

Dropout layer in this model temporarily removes 25% of the neurons randomly in the second hidden layer.This layer helps to reduce overfitting.






In [28]:
#Train the model
model1.fit(x_train,y_train_cat,          
          validation_data=(x_test,y_test_cat),
          epochs=5,
          batch_size=32)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fa200137d30>

Model was trained with 5 epochs which means all training examples are fed to the model 5 times and batch size of 32 which means weights are updated after processing every batch of 32 training examples.



In [29]:
print('First Model Evaluation on test data: ')
results = model1.evaluate(x_test, y_test_cat, verbose=0)
print('Test loss, Test Accuracy : ', results)

First Model Evaluation on test data: 
Test loss, Test Accuracy :  [0.3599478364944458, 0.8705]


Model training with 5 epochs and batch size of 32 gives the test accuracy of around 87% and loss (categorical crossentropy or multi-class log loss) value of around 0.359 