# Deep Neural Network for MNIST Classification


The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

In [5]:
import tensorflow_datasets as tfds

## Data

That's where we load and preprocess our data.

In [6]:
# mnist_dataset = tfds.load(name='mnist', as_supervised=True)
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)


In [9]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']


# create a variable that number of data need to split from the train dataset.
# In general, 7:2:1 or 8:1:1
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)



# Create a scaler to scale our data since we prefer to have inputs between 0 and 1
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    
    return image, label



# scale oure train and validation data,  test data respectively
scaled_train_and_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)



# We need to shuffle the data to make it random so that each batch is representative. 
# To optimize computation, we set the size of the shuffled batch to handle it at once until the entire dataset is shuffled.
BUFFER_SIZE =10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)



# Split the validation dataset from train dataset.
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

train_data = shuffled_train_and_validation_data.skip(num_validation_samples)



# We use mini batch method and iterate over the different batches
BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)

# batch the test dataset (formating_batch, size remain)
test_data = test_data.batch(num_test_samples)


# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation__inputs, validation_tragets = next(iter(validation_data))


### Outline the model
    building the model

In [10]:
# input = 28*28
# output = 10 numbers

input_size = 784
output_size = 10

# Use same hidden layer size for both hidden layers
hidden_layer_size = 200

model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # flatten means you need to change it 1D array as inputs
    tf.keras.layers.Flatten(input_shape = (28,28,1)),
    
    # tf.keras.layers.Dense is basically implementing: output dot(input, weight) + bias
    
    # 1st  hidden layers
    # ReLU =  A rectified linear unit (ReLU)
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    
    # 2nt  hidden layers
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    
    # the final output layer
    # softmax = Softmax activation
    tf.keras.layers.Dense(output_size, activation='softmax')
        
])

### Choose the optimizer and the loss function

In [11]:
# The optimization process involves updating the model parameters (weights and biases)
# ---based on the gradients of the loss function with respect to those parameters

model.compile(optimizer="adam", loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

### Training

In [13]:
# determine the maximum number of epochs
NUM_EPOCHS = 5

# fit the model
# 
model.fit(train_data, epochs= NUM_EPOCHS, validation_data=(validation__inputs,validation_tragets), verbose=2)

Epoch 1/5
540/540 - 3s - loss: 0.2731 - accuracy: 0.9198 - val_loss: 0.1231 - val_accuracy: 0.9623 - 3s/epoch - 5ms/step
Epoch 2/5
540/540 - 1s - loss: 0.1043 - accuracy: 0.9678 - val_loss: 0.0900 - val_accuracy: 0.9727 - 1s/epoch - 3ms/step
Epoch 3/5
540/540 - 1s - loss: 0.0717 - accuracy: 0.9779 - val_loss: 0.0590 - val_accuracy: 0.9815 - 1s/epoch - 3ms/step
Epoch 4/5
540/540 - 1s - loss: 0.0504 - accuracy: 0.9839 - val_loss: 0.0529 - val_accuracy: 0.9837 - 1s/epoch - 3ms/step
Epoch 5/5
540/540 - 1s - loss: 0.0405 - accuracy: 0.9865 - val_loss: 0.0407 - val_accuracy: 0.9865 - 1s/epoch - 3ms/step


<keras.callbacks.History at 0x261be2d51f0>

## Test the model
    after training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before. 

The test is the absolute final instance. You should not test before you are completely done with adjusting your model.

If you adjust your model after testing, you will start overfitting the test dataset, which will defeat its purpose.

In [14]:
test_loss, test_accuracy = model.evaluate(test_data)



In [15]:
# We can apply some nice formatting if we want to
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.07. Test accuracy: 97.78%
