# Deep Neural Network for MNIST Classification

The dataset is called MNIST and refers to handwritten digit recognition. More information can be found on Yann LeCun's website (Director of AI Research, Facebook). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

The goal is to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

  from .autonotebook import tqdm as notebook_tqdm


## Data

That's where we load and preprocess our data.

In [2]:
mnist_dataset, mnist_info = tfds.load('mnist', with_info=True, as_supervised=True)

In [3]:
mnist_info

tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='~\\tensorflow_datasets\\mnist\\3.0.1',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)

In [4]:
mnist_train, mnist_test = mnist_dataset['train'],mnist_dataset['test']

num_validation_samples = tf.cast(0.1*mnist_info.splits['train'].num_examples, tf.int64)
num_test_samples = tf.cast(mnist_info.splits['test'].num_examples, tf.int64)

def scale(image,label):
    '''Function for scaling the data'''
    image = tf.cast(image, tf.float32)
    image/=255.
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)
scaled_test_data = mnist_test.map(scale) 

# Shuffle the data

In [5]:
# number of batchs to shuffle at a time. if buffer_size=1, no shuffling will happen
BUFFER_SIZE = 10000 
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

BATCH_SIZE = 100

# the method batch will add a parameter which indicates how many sample should be taken in each batch
train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = scaled_test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

In [6]:
train_data

<BatchDataset element_spec=(TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>

# Create the model

In [26]:
input_size = 784
output_size = 10
hidden_layer_size = 100

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='tanh'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

optimizer_learning_rate = 0.002

'''
    binary_crossentropy - when binary encoding was applied
    categorical_crossentropy - when one-hot encoding was applied
    sparse_categorical_crossentropy - automatically applies one-hot encoding
'''

custom_optimizer = tf.keras.optimizers.Adam(learning_rate=optimizer_learning_rate)
model.compile(optimizer=custom_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Training

In [27]:
NUM_EPOCHS = 5

model.fit(
    train_data, epochs=NUM_EPOCHS,validation_data = (validation_inputs,validation_targets), verbose=2
)

Epoch 1/5
540/540 - 4s - loss: 0.2584 - accuracy: 0.9230 - val_loss: 0.1230 - val_accuracy: 0.9643 - 4s/epoch - 8ms/step
Epoch 2/5
540/540 - 4s - loss: 0.1046 - accuracy: 0.9675 - val_loss: 0.0948 - val_accuracy: 0.9722 - 4s/epoch - 7ms/step
Epoch 3/5
540/540 - 4s - loss: 0.0706 - accuracy: 0.9776 - val_loss: 0.0758 - val_accuracy: 0.9758 - 4s/epoch - 7ms/step
Epoch 4/5
540/540 - 4s - loss: 0.0547 - accuracy: 0.9826 - val_loss: 0.0742 - val_accuracy: 0.9763 - 4s/epoch - 7ms/step
Epoch 5/5
540/540 - 4s - loss: 0.0460 - accuracy: 0.9859 - val_loss: 0.0511 - val_accuracy: 0.9853 - 4s/epoch - 7ms/step


<keras.callbacks.History at 0x2737f428100>

# Test the model

In [28]:
'''
When ajusting the hyperparameters to increase the validation occuracy, there is a risk of 
overfitting the validation data, so it's necessary to test the model with the test data
'''

"\nWhen ajusting the hyperparameters to increase the validation occuracy, there is a risk of \noverfitting the validation data, so it's necessary to test the model with the test data\n"

In [29]:
test_loss, test_occuracy = model.evaluate(test_data)
print('Test loss: {0:.2f} - Test accuracy: {1:.2f}%'.format(test_loss,test_occuracy*100))

Test loss: 0.08 - Test accuracy: 97.73%


# Use the model

In [30]:
import cv2
image = cv2.imread('./digit.png')[:,:,0]
image = np.invert(np.array([image]))
prediction = model.predict(img)
print(f'This digit is {np.argmax(prediction)}') 
plt.imshow(img[0], cmap=plt.cm.binary)

ModuleNotFoundError: No module named 'cv2'