# Deep Neural Network for MNIST Classification

In this project, I am to essentially build the 'Hello World!' of Deep Learning. I hope to apply all the knowledge I have gained in my courses to build my very first deep learning algorithm.

The dataset, MNIST, is for Handwritten Digit Recognition. This project aims to classify each handwritten digit to the actual digit. The goal is to build a deep neural network.

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

In [2]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

In [3]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']
# Note that tfds does not have a validation set, we will have to make one on our own.

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)  # This method casts the first parameter to the data type provided in the second parameter

# Lets also get easier access to the number of test samples
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

## Preprocessing
We know that the MNIST dataset consists of images with each pixel ranging from 0 to 255 in terms of their 'blackness', with 255 meaning white and 0 meaning black. We need to standardize this before we apply any machine learning model.

In [4]:
def scale(image, label):
    image = tf.cast(image, tf.float32)  # Make sure the image is a float
    image = image / 255.0
    return image, label

In [5]:
scaled_train_and_validation_data = mnist_train.map(scale)   # Maps each input to the function return
scaled_test_data = mnist_test.map(scale)

Now, we also need to shuffle and batch the data so that it is randomly spread, thus giving better accuracy to our model.

In [6]:
BUFFER_SIZE = 10000 # Take 10,000 samples at a time, shuffle them, then take the next one
# This is needed because the dataset is large.

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)  # Shuffle function takes the buffer

# Lets also split the training and validation sets
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [7]:
BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples) # Techinically, this just creates a single batch of the entire thing. 
# The reason we need this is because the tensor will now have a 'batch' column that it can use, so it will not get confused when trying to forward propogate later on
scaled_test_data = scaled_test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data)) # The next() function loads the next element of the iterable object, in this case our batches
# Since there is one batch, it will just load the inputs and targets

2025-07-03 22:49:14.192096: I tensorflow/core/kernels/data/tf_record_dataset_op.cc:387] The default buffer size is 262144, which is overridden by the user specified `buffer_size` of 8388608
2025-07-03 22:49:14.456655: W tensorflow/core/kernels/data/cache_dataset_ops.cc:916] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


## Model
Now that we have preprocessed the data by:
1. Scaling all the pixel inputs from 0 to 1 by dividing it by 255.
2. Made sure that the training, validation and test sets are all batched.

We can now move on to actually building the model. 

In [8]:
input_size = 784 # Total number of pixels in the image (our image is 28x28)
output_size = 10    # There are 10 digits to choose from (or classify into)
hidden_layer_size = 50  # The assumption is that all hidden layers are the same size

model = tf.keras.Sequential([
    # First of all, our inputs are of a tensor size 28x28x1, which we cannot work with
    # To resolve this, we need to flatten this tensor into a single vector of size 784
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # This method takes the dot product of the inputs and the weights then adds the bias
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),    # This is us stacking the layer

    tf.keras.layers.Dense(output_size, activation='softmax'),    # Our final layer (output)
    # Softmax gives us the probability
])

  super().__init__(**kwargs)


In [9]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# The sparse loss function one-hot encodes our data
# The metrics is for what you need to calculate

## Training the model

In [None]:
EPOCH_NUM = 5

model.fit(train_data, epochs=EPOCH_NUM, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/100
540/540 - 1s - 2ms/step - accuracy: 0.9762 - loss: 0.0778 - val_accuracy: 0.9785 - val_loss: 0.0778
Epoch 2/100
540/540 - 1s - 2ms/step - accuracy: 0.9795 - loss: 0.0704 - val_accuracy: 0.9767 - val_loss: 0.0744
Epoch 3/100
540/540 - 1s - 2ms/step - accuracy: 0.9818 - loss: 0.0613 - val_accuracy: 0.9748 - val_loss: 0.0761
Epoch 4/100
540/540 - 1s - 2ms/step - accuracy: 0.9836 - loss: 0.0553 - val_accuracy: 0.9803 - val_loss: 0.0642
Epoch 5/100
540/540 - 1s - 2ms/step - accuracy: 0.9847 - loss: 0.0506 - val_accuracy: 0.9808 - val_loss: 0.0667
Epoch 6/100
540/540 - 1s - 2ms/step - accuracy: 0.9859 - loss: 0.0465 - val_accuracy: 0.9832 - val_loss: 0.0562
Epoch 7/100
540/540 - 1s - 2ms/step - accuracy: 0.9871 - loss: 0.0424 - val_accuracy: 0.9858 - val_loss: 0.0501
Epoch 8/100
540/540 - 1s - 2ms/step - accuracy: 0.9884 - loss: 0.0374 - val_accuracy: 0.9885 - val_loss: 0.0448
Epoch 9/100
540/540 - 1s - 2ms/step - accuracy: 0.9903 - loss: 0.0329 - val_accuracy: 0.9868 - val_loss:

This model, at first, has 97% accuracy. I will now change the epoch number from 5 to 100.