# **MNIST dataset**

The MNIST dataset refers to handwritten digit recognition.

It is considered to be the most common and the 'Hello World' problem of deep learning because:

1. It is a very visual problem
2. Extremely common
3. Easy to build up to CNN
4. Very big and Preprocessed


# **About the datset**

The dataset provides 70,000 images Each of 28x28 pixels,overall having 784 pixels of handwritten digits. It can be thought of as a 28x28 matrix where input values are from 0 to 255.

On the greysacle:

0 --> purely black

255 --> purely white

# **About the Approach**
The approach for a deep neral network is to "flatten" each image into a vector of 284x1. The goal is to write an algorithm to detect which digit is written. Since there is only 10 digits, it is a classification problem with 10 classes

Our model will have 2 hidden layers as for a huge dataset like this, 2 hidden layers are enough to provide with high accuracy rate.


# **Steps carried out:**
1. Preparing our data and preprocessing it. Creating training,validation and test datasets.
2. Outline our model and choose activation functions.
3. Set the appropriate and advanced customizers and the loss functions.
4. Make the algorithm learn through backpropagation techniques, and at each epoch we will validate it.
5. Finally, we will test the accuracy of our model.


In [1]:
#Importing relevant libraries
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [2]:
#loading the data
mnist_dataset, mnist_info = tfds.load(name = 'mnist',with_info=True, as_supervised=True)

In [3]:
# Creating training,validation and test datasets.
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']
# By default MNIST of Tensorflows only contains train and test data, 
# therefore we will have to extract validation dataset from the train data(as train data is more as compared to test data.
# train data --> 60,000
# test data --> 10,000
# Lets take 10% of train data as validation data.
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

# Scaling the data between 0 to 1.
def scale(image,label):
  image = tf.cast(image, tf.float32)
  image /=255
  return image,label

# This will scale all the data
scaled_train_and_validation_data = mnist_train.map(scale)
scaled_test_data = mnist_test.map(scale)


In [4]:
# Now we will shuffle(keeping th same info but in different order)the data and create validation sets
# While dealing with enormous dataset, we cant shuffle all data at once, hence we create buffer, taking 10,000 at a time

BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

shuffled_train_and_validation_data 

<ShuffleDataset shapes: ((28, 28, 1), ()), types: (tf.float32, tf.int64)>

In [5]:
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [6]:
# Performing batching
# batch size = 1 --> SGD
# batch size = no. of samples = GD
# therefore, we want
# 1< batch size < no. of samples = mini-batch GD.
batch_size = 100
train_data = train_data.batch(batch_size)
train_data
test_data = mnist_test.map(scale)
test_data = test_data.batch(num_test_samples)

In [7]:
# when batching, we generally find average loss and accuracy
validation_data = validation_data.batch(num_validation_samples) # in this way, we'll create a new column in our tensor

In [8]:
validation_inputs, validation_targets = next(iter(validation_data))

# Outlining the model
Width and depth are the hyperparameters 

In [12]:
# Declaring 3 variables for width,inputs,outputs
hidden_layer_size = 100 # as taking suboptimal 
input_size = 784
output_size = 10 # as we have 10 digits

# 1st layer ->input layer
# Because we dont know CNN rightnow, we will have to flatten the images using flatten function
model = tf.keras.Sequential([
                             tf.keras.layers.Flatten(input_shape = (28,28,1)),
                             # this will be finding the dot product of inputs and weights and adding the bias
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
                             # Until now we got our first hidden layer  and our flatten inputs
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # creating second hidden layer
                             tf.keras.layers.Dense(output_size, activation='softmax')
                             # as for classifying we should opt for softmax

])
model

<tensorflow.python.keras.engine.sequential.Sequential at 0x7f63ba12e650>

# Choosing the optimizer and loss functions

The three types of loss in tensorflow :
 1. Binary crossentropy for binary dataset
 2. Categorical crossenetropy : 
 
 This expects that you have already hot-encoded your targets
 3. Sparse_categorical crossentropy:

 Almost similar, but it will apply hot-encoding


In [13]:
model.compile(optimizer='adam', loss ='sparse_categorical_crossentropy', metrics=['accuracy'])


# **Training**

In [16]:
NUM_EPOCHS = 7
 # arbitrarily set

# now fit the model
# When we reach the maximum number of epochs, training will be over.
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/7
540/540 - 5s - loss: 0.0197 - accuracy: 0.9936 - val_loss: 0.0314 - val_accuracy: 0.9900
Epoch 2/7
540/540 - 5s - loss: 0.0167 - accuracy: 0.9949 - val_loss: 0.0223 - val_accuracy: 0.9923
Epoch 3/7
540/540 - 5s - loss: 0.0165 - accuracy: 0.9945 - val_loss: 0.0182 - val_accuracy: 0.9933
Epoch 4/7
540/540 - 4s - loss: 0.0128 - accuracy: 0.9963 - val_loss: 0.0183 - val_accuracy: 0.9935
Epoch 5/7
540/540 - 4s - loss: 0.0117 - accuracy: 0.9965 - val_loss: 0.0135 - val_accuracy: 0.9947
Epoch 6/7
540/540 - 4s - loss: 0.0095 - accuracy: 0.9972 - val_loss: 0.0223 - val_accuracy: 0.9923
Epoch 7/7
540/540 - 4s - loss: 0.0116 - accuracy: 0.9962 - val_loss: 0.0132 - val_accuracy: 0.9957


<tensorflow.python.keras.callbacks.History at 0x7f63b9eb3d50>

We got the highest validation accuracy so far after changing the number of epochs and hidden layer size.
The accuracy we obtained is the accuracy of the algorithm.
We have overfitted our validation dataset.

Our validation accuracy comes out to be -> 99.6%

# **Testing the data**

Test data makes sure that our hyperparameters - width,depth,batchsize, no. of epochs, etc dont overfit

In [17]:
test_loss, test_accuracy = model.evaluate(test_data)
print ('Test Loss: {0:.2f}. Test Accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100))

Test Loss: 0.09. Test Accuracy: 97.88%


**Therefore, we come to an end of training our model with 97.9% accuracy**