<h1>Deep Neural Networks for MNIST Classification</h1>

Load Tensorflow

In [None]:
#Import Tensorflow
import tensorflow as tf
tf.random.set_seed(42)

Collect Data

We will use MNIST dataset for this exercise. This dataset contains images of hand written numbers with each image being a black & white picture of size 28x28. We will download the data using tensorflow API. The dataset has 60,000 training examples and 10,000 test examples. Please note that images have already been converted to numpy arrays.


In [None]:
#Download dataset
(trainX, trainY),(testX, testY) = tf.keras.datasets.mnist.load_data()

In [None]:
#Check number of training examples and size of each example
trainX.shape

In [None]:
testX.shape

Visualize the data

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
#function to display test image
def show_random_test_image():
  #Get a random integer between 0 and number of test examples
  img_num = np.random.randint(0, testX.shape[0])
  #SHOW THE IMAGE from test dataset
  plt.imshow(testX[img_num],cmap='gray')
  plt.suptitle('Number: '+ str(testY[img_num]))
  plt.show()

In [None]:
#Run the function multiple times to look at different test images
show_random_test_image()

<h1>Convert Output label to probabilities

As our model will predict 10 probabilities (probability of 0, 1, 2...9) for each image, we need to have 10 actual probabilities (probabilities of 0,1,2,...9). Here we convert our single number label to 10 actual probabilities using One hot encoding. In one hot encoding, a number gets represented by multiple numbers e.g. number_classes (10 in this example). Out of 10 numbers in One hot encoding, 9 will have value of '0' whereas number will have value of 1.

In [None]:
#Now check the label after conversion to 10 numbers
testY[0]

In [None]:
#Convert single number labels to 10 numbers using One hot encoding. In Keras API, 
#we can use 'to_categorical' method to do the same. We are converting both test 
#and training labels.

trainY= tf.keras.utils.to_categorical(trainY, num_classes= 10)
testY = tf.keras.utils.to_categorical(testY, num_classes= 10)

In [None]:
testY[0]

For our first test image (which has 7 as indicated above), we now have 10 labels or 10 probabilities. First number is probability for picture having number '0' in it, 2nd number is probability if picture having number '1' in it and so on. Here all the probabilities are 0% except for probability being 100% for number 7.

<h1>Build the Model(Graph)</h1>

In [None]:
#Initialize the Sequential Model
model = tf.keras.models.Sequential()

**Reshape data** from 2D to 1D -> 28x28 to 784. This is needed as Dense layer requires each example to be 1D i.e a Vector. Also note that our input data shape is (28,28) for MNIST.

In [None]:
model.add(tf.keras.layers.Reshape((784,),input_shape=(28,28,)))

**Normalize the data** : From now on, we will add a layer to normalize our data inside the model (as normalization is also a math function). We can use BatchNormalization layer to do the same. This means, we need not save our normalizer object using pickle separately.

In [None]:
model.add(tf.keras.layers.BatchNormalization())

**Add hidden layers**: We will build a model with 4 hidden layers. Number of neurons in hidden layer will be 200, 100, 60 and 30 respectively. Both number of hidden layers and number of neurons in each hidden layer are hyperparameters i.e you can change these values to improve the model. Output of each neuron in hidden layer will be passed through an activation function.

In [None]:
#Add 1st Hidden layer
model.add(tf.keras.layers.Dense(200, activation='sigmoid'))

In [None]:
#Add 2nd Hidden Layer
model.add(tf.keras.layers.Dense(100, activation='sigmoid'))

In [None]:
#Add 3rd Hidden Layer
model.add(tf.keras.layers.Dense(60, activation='sigmoid'))

In [None]:
#Add 4th Hidden Layer
model.add(tf.keras.layers.Dense(30, activation='sigmoid'))

**Add Output layer**: Dense Layer to create **10 equations** which provides 10 outputs after applying softmax.

In [None]:
model.add(tf.keras.layers.Dense(10, activation='softmax'))

**Compile** the model. We will use non-default learning rate (which is usually set to 0.01 in Keras) for our model. So first we will create an optimizer object and specify the learning rate.




In [None]:
#Create optimizer with learning rate of 0.03
sgd_optimizer = tf.keras.optimizers.SGD(learning_rate=0.03)

In [None]:
from keras.losses import categorical_crossentropy
model.compile(optimizer = sgd_optimizer,
              loss= categorical_crossentropy,
              metrics=['accuracy'])

Train the Model

In [None]:
model.fit(trainX,trainY, #Training data - Features and One hot encoded labels         
          validation_data=(testX,testY), #Test data
          epochs=50, #Number of iterations
          batch_size= 32) #Here we train model with 32 examples at a time. You can change this number to see if model accuracy improves.

Save the Model

In [None]:
#Save the model in current directory
model.save('mnist_dnn_v1.h5')

## Model Prediction

Prediction on Test Image

It tells us model can take any number of examples as in put ('None' in shape) and each example should have 28x28 shape (2D).

In [None]:
#Shape of each example in test dataset
testX[0].shape

In [None]:
#Make it 3 dimension shape i.e make it (1,28,28). This will mean one example with that example having a shape of 28x28
input_data = np.expand_dims(testX[0], axis=0)
input_data.shape

Model Prediction

In [None]:
#Model prediction
pred = model.predict(input_data)
pred

In [None]:
#Model prediction shape
pred.shape

In [None]:
#Model prediction for first example
pred[0]

Find the number for which probability is highest using 'argmax' function

In [None]:
#This gives us predicted label
np.argmax(pred[0])

In [None]:
#Actual label
np.argmax(testY[0])

In [None]:
#Lets visualize the image as well
import matplotlib.pyplot as plt
plt.imshow(testX[0],cmap='gray')
plt.show()

As we see, Deep Learning model can achieve **much better results** compared to Logistic Regression we have used earlier. Deep Learning allows our model to learn **better features** (called hidden features) and hence usually provide better results (Please note that this may not always be true for all datasets).

Some **questions** to ponder...
1. Why did training start slow i.e after 1st iteration the validation accuracy was <20%.
2. By the end of the training, the training accuracy is higher than test accuracy. Is that ok?