# Deep Neural Network for MNIST Classification


The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what everyone has been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs).

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image).

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes.

My goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [None]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds


## Data

That's where I load and preprocess data.

In [None]:

mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label


scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)

BUFFER_SIZE = 10000
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

BATCH_SIZE = 100
train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)

test_data = test_data.batch(num_test_samples)
validation_inputs, validation_targets = next(iter(validation_data))

Downloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /root/tensorflow_datasets/mnist/3.0.1...


Dl Completed...:   0%|          | 0/5 [00:00<?, ? file/s]

Dataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.


## Model

### Outline the model
When thinking about a deep learning algorithm,  mostly imagine building the model. So, let's get started :)

In [None]:
input_size = 784
output_size = 10
hidden_layer_size = 50

# define how the model will look like
model = tf.keras.Sequential([

    # the first layer (the input layer)
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer

    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer

    # the final layer is no different,  just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [None]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where I train the model I have built.

In [None]:
# determine the maximum number of epochs
NUM_EPOCHS = 5
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5
540/540 - 13s - loss: 0.4222 - accuracy: 0.8789 - val_loss: 0.2272 - val_accuracy: 0.9360 - 13s/epoch - 24ms/step
Epoch 2/5
540/540 - 4s - loss: 0.1903 - accuracy: 0.9457 - val_loss: 0.1624 - val_accuracy: 0.9508 - 4s/epoch - 7ms/step
Epoch 3/5
540/540 - 4s - loss: 0.1455 - accuracy: 0.9572 - val_loss: 0.1306 - val_accuracy: 0.9602 - 4s/epoch - 6ms/step
Epoch 4/5
540/540 - 5s - loss: 0.1160 - accuracy: 0.9651 - val_loss: 0.1144 - val_accuracy: 0.9652 - 5s/epoch - 10ms/step
Epoch 5/5
540/540 - 4s - loss: 0.0969 - accuracy: 0.9709 - val_loss: 0.1010 - val_accuracy: 0.9697 - 4s/epoch - 7ms/step


<keras.src.callbacks.History at 0x7c7787133460>

## Test the model


In [None]:
test_loss, test_accuracy = model.evaluate(test_data)



In [None]:
#  apply some nice formatting
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.11. Test accuracy: 96.67%


Using the initial model and hyperparameters given in this notebook, the final test accuracy should be roughly around 97%.

Each time the code is rerun, I get a different accuracy as the batches are shuffled, the weights are initialized in a different way, etc.

Finally, I have intentionally reached a suboptimal solution