# Practical Deep Learning - The MNIST Dataset

"Hello World" of Machine Learning

The images are in Grayscale from 0 to 255, in which 0 corresponds to pure black, and 255 to pure white

Each picture has 784 pixels, so our input layer will be an array of 784 inputs

We will create 2 hidden layers with the same width, and an output layer with 10 units (since there are 10 digits to identify)

### Steps:

-   1 - Prepare and preprocess the data. Split the Training, Validation and Test datasets.

-   2 - Outline the model and choose the Activation Functions

-   3 - Set the approapriate advamced optimizers and loss function

-   4 - Train it and make it learn, validating in each epoch

-   5 - Test the accuracy

### Data preprocessing

Imports

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

  from .autonotebook import tqdm as notebook_tqdm


Loading the dataset

In [2]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True) #loading the dataset from 'tensorflow_datasets'
# 'as_supervised=True' will load the data in 2 tuples: input and target

#Dataset stored at C:\Users\Pichau\tensorflow_datasets

Splitting tha data

In [3]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test'] # by default tensors only split into train and test

#splitting the validation samples from the train data

num_val_samples = 0.1 * mnist_info.splits['train'].num_examples # this will return the total number of training samples divided by 10
num_val_samples = tf.cast(num_val_samples,tf.int64) # since the previous result may not be an integer, here we overwrite it with 'tf.cast', to convert it to an integer

num_test_samples = mnist_info.splits['test'].num_examples # this will return the total number of test samples
num_test_samples = tf.cast(num_test_samples, tf.int64) # this is just to guarantee the output will be an integer

    ! tf.cast converts the values to a set data type

Function to scale the inputs

In [4]:
# function to scale the inputs

def scale(image,label):
  image= tf.cast(image,tf.float32) #ensuring the image input will be a float
  #255 is the total possible values each pixel can receive
  image/=255. # this will scale the inputs to a range 0 -> 1. The dot at the end states once more that the value should be a float
  return image, label

scaled_train_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)

Shuffle the data

Very important to prevent that the train set see only some of the possible values

In [5]:
# shuffling the scaled data in case the data is ordered (which could compromise the train efficiency)

BUFFER_SIZE = 10000 # defines how much data will be taken at each shuffling batch, since the dataset is to big to shuffle it all at once

shuffled_train_validation_data = scaled_train_validation_data.shuffle(BUFFER_SIZE) # shuffle method which receives only the buffer size as argument

validation_data = shuffled_train_validation_data.take(num_val_samples) # assigning for validation 10% of the shuffled train data

train_data = shuffled_train_validation_data.skip(num_val_samples) # will get all data except for the validation data

In [6]:
# this will add a new column to the tensor which indicates how many samples it should take on each batch

BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE) # combines consecutive elements into batches

#for validation and test we dont need to separate the batches, but still need to put the whole set into a batch
validation_data = validation_data.batch(num_val_samples)

test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data)) # 'next' loads the next element of an iterable object, and 'iter' makes the data iterable

### Outlining the model

The activation functions, optimizers, loss functions, etc are all selected from the TF library

It is better to create dedicated variables for some of the parameters so that we can easily modify them during when optimizing the model

In [7]:
input_size = 784
output_size = 10
hidden_layer = 100

model = tf.keras.Sequential([
    #input layer
    tf.keras.layers.Flatten(input_shape=(28,28,1)), #the input shape corresponds to the size of the image, flatten to a vector
    #hidden layers
    tf.keras.layers.Dense(hidden_layer, activation='relu'), #'relu' = activation function
    tf.keras.layers.Dense(hidden_layer, activation='relu'),
    #output layer
    tf.keras.layers.Dense(output_size, activation='softmax') #'sofmax' = activaion function -> this funtion will transform the values into probabilities
])

### Optimizer and loss funtion

Loss functions:

-   binary_crossentropy -> used when we have binary data encoding

-   categorical_crossentropy -> expects that the data is already one-hot encoded

-   sparse_categorical_crossentropy -> applies one-hot encoding to the data

In [8]:
# model.compile(optimizer, loss)

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', # ADAM = adaptive moment estimation
              metrics=['accuracy']) # include metrics that we wish to be calculated during the training and testing

### Training

In [9]:
NUM_EPOCHS = 100
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

#the model will run the training data in batches for the set number of epochs 
#then it will run the validation data all at once since there is only one batch
model.fit(train_data, 
          epochs=NUM_EPOCHS,
          callbacks=[early_stopping],
          validation_data=(validation_inputs,validation_targets),
          verbose=2)


Epoch 1/100
540/540 - 2s - loss: 0.3369 - accuracy: 0.9034 - val_loss: 0.1664 - val_accuracy: 0.9518 - 2s/epoch - 4ms/step
Epoch 2/100
540/540 - 1s - loss: 0.1399 - accuracy: 0.9589 - val_loss: 0.1217 - val_accuracy: 0.9623 - 1s/epoch - 2ms/step
Epoch 3/100
540/540 - 1s - loss: 0.1013 - accuracy: 0.9693 - val_loss: 0.0882 - val_accuracy: 0.9742 - 1s/epoch - 2ms/step
Epoch 4/100
540/540 - 1s - loss: 0.0792 - accuracy: 0.9761 - val_loss: 0.0749 - val_accuracy: 0.9770 - 1s/epoch - 2ms/step
Epoch 5/100
540/540 - 1s - loss: 0.0621 - accuracy: 0.9811 - val_loss: 0.0672 - val_accuracy: 0.9785 - 1s/epoch - 2ms/step
Epoch 6/100
540/540 - 1s - loss: 0.0504 - accuracy: 0.9843 - val_loss: 0.0573 - val_accuracy: 0.9818 - 1s/epoch - 2ms/step
Epoch 7/100
540/540 - 1s - loss: 0.0424 - accuracy: 0.9866 - val_loss: 0.0420 - val_accuracy: 0.9872 - 1s/epoch - 2ms/step
Epoch 8/100
540/540 - 1s - loss: 0.0329 - accuracy: 0.9897 - val_loss: 0.0379 - val_accuracy: 0.9880 - 1s/epoch - 2ms/step
Epoch 9/100
540/

<keras.callbacks.History at 0x22183592220>

After setting an early stopping mechanism validation accuracy reached 99.40% before the model started to overfit

## Parameters Analisys

### Basis with original code:

Validation accuracy: 97.13% (8.6 sec)

### Change on hidden layer size

    Hidden layer size changed from 50 to 100

Validation accuracy: 98.82% (8.1 sec)

    Hidden layer size changed from 100 to 200

Validation accuracy: 98.53% (11.3 sec) -> not a significant improvment

### Change on number of hidden layers

    Hidden layers changed from 2 to 3

Validation accuracy: 98.17% (10.3 sec) -> no improvment

    Hidden layers changed from 2 to 5

Validation accuracy: 98.08% (11.5 sec) -> no improvment


### Change on activation function

    Changed from 'relu' to 'sigmoid'

Validation accuracy: 96.12% (9.9 sec) -> model regressed

    Changed from 'relu' to 'tanh' only on the second hidden layer

Validation accuracy: 98.02% (9.6sec) -> no improvment

### Change on batch size

    Changed from 100 to 1000

Validation accuracy: 95.80% (6.5 sec) -> model regressed

## Testing the model

Here we will be able to see if the model was overfitted. After this point, we cannot keep changing the parameters

In [10]:
model.evaluate(test_data, verbose=1)




[0.115258127450943, 0.9736999869346619]

    Test Accuracy: 97.37%

Smaler then the validation accuracy, which is expected