# Deep Neural Network for MNIST Classification

In this project, I am to essentially build the 'Hello World!' of Deep Learning. I hope to apply all the knowledge I have gained in my courses to build my very first deep learning algorithm.

The dataset, MNIST, is for Handwritten Digit Recognition. This project aims to classify each handwritten digit to the actual digit. The goal is to build a deep neural network.

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

In [2]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

In [3]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']
# Note that tfds does not have a validation set, we will have to make one on our own.

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)  # This method casts the first parameter to the data type provided in the second parameter

# Lets also get easier access to the number of test samples
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

## Preprocessing
We know that the MNIST dataset consists of images with each pixel ranging from 0 to 255 in terms of their 'blackness', with 255 meaning white and 0 meaning black. We need to standardize this before we apply any machine learning model.

In [4]:
def scale(image, label):
    image = tf.cast(image, tf.float32)  # Make sure the image is a float
    image = image / 255.0
    return image, label

In [5]:
scaled_train_and_validation_data = mnist_train.map(scale)   # Maps each input to the function return
scaled_test_data = mnist_test.map(scale)

Now, we also need to shuffle and batch the data so that it is randomly spread, thus giving better accuracy to our model.

In [6]:
BUFFER_SIZE = 10000 # Take 10,000 samples at a time, shuffle them, then take the next one
# This is needed because the dataset is large.

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)  # Shuffle function takes the buffer

# Lets also split the training and validation sets
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [7]:
BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples) # Techinically, this just creates a single batch of the entire thing. 
# The reason we need this is because the tensor will now have a 'batch' column that it can use, so it will not get confused when trying to forward propogate later on
scaled_test_data = scaled_test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data)) # The next() function loads the next element of the iterable object, in this case our batches
# Since there is one batch, it will just load the inputs and targets

2025-07-03 23:04:47.944982: I tensorflow/core/kernels/data/tf_record_dataset_op.cc:387] The default buffer size is 262144, which is overridden by the user specified `buffer_size` of 8388608
2025-07-03 23:04:48.203080: W tensorflow/core/kernels/data/cache_dataset_ops.cc:916] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


## Model
Now that we have preprocessed the data by:
1. Scaling all the pixel inputs from 0 to 1 by dividing it by 255.
2. Made sure that the training, validation and test sets are all batched.

We can now move on to actually building the model. 

In [8]:
input_size = 784 # Total number of pixels in the image (our image is 28x28)
output_size = 10    # There are 10 digits to choose from (or classify into)
hidden_layer_size = 50  # The assumption is that all hidden layers are the same size

model = tf.keras.Sequential([
    # First of all, our inputs are of a tensor size 28x28x1, which we cannot work with
    # To resolve this, we need to flatten this tensor into a single vector of size 784
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # This method takes the dot product of the inputs and the weights then adds the bias
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),    # This is us stacking the layer

    tf.keras.layers.Dense(output_size, activation='softmax'),    # Our final layer (output)
    # Softmax gives us the probability
])

  super().__init__(**kwargs)


In [9]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# The sparse loss function one-hot encodes our data
# The metrics is for what you need to calculate

## Training the model

In [10]:
EPOCH_NUM = 5

model.fit(train_data, epochs=EPOCH_NUM, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
540/540 - 1s - 3ms/step - accuracy: 0.8839 - loss: 0.4134 - val_accuracy: 0.9377 - val_loss: 0.2194
Epoch 2/5
540/540 - 1s - 2ms/step - accuracy: 0.9464 - loss: 0.1848 - val_accuracy: 0.9503 - val_loss: 0.1674
Epoch 3/5
540/540 - 1s - 2ms/step - accuracy: 0.9588 - loss: 0.1397 - val_accuracy: 0.9613 - val_loss: 0.1262
Epoch 4/5
540/540 - 1s - 2ms/step - accuracy: 0.9662 - loss: 0.1158 - val_accuracy: 0.9655 - val_loss: 0.1132
Epoch 5/5
540/540 - 1s - 2ms/step - accuracy: 0.9714 - loss: 0.0966 - val_accuracy: 0.9713 - val_loss: 0.0973


<keras.src.callbacks.history.History at 0x345b95910>

As we can see, our model has a 97% accuracy! This is decent, but we can improve the model further by increasing the number of hidden layers and the like.

## Improving the model

In [16]:
hidden_layer_size = 200

improved_model = tf.keras.Sequential([
    # First of all, our inputs are of a tensor size 28x28x1, which we cannot work with
    # To resolve this, we need to flatten this tensor into a single vector of size 784
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # This method takes the dot product of the inputs and the weights then adds the bias
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),    # This is us stacking the layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),

    tf.keras.layers.Dense(output_size, activation='softmax'),    # Our final layer (output)
    # Softmax gives us the probability
])

custom_optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)

improved_model.compile(optimizer=custom_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

EPOCH_NUM = 10
model.fit(train_data, epochs=EPOCH_NUM, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/10


  super().__init__(**kwargs)


540/540 - 1s - 2ms/step - accuracy: 0.9713 - loss: 0.1219 - val_accuracy: 0.9702 - val_loss: 0.1542
Epoch 2/10
540/540 - 1s - 2ms/step - accuracy: 0.9751 - loss: 0.1034 - val_accuracy: 0.9730 - val_loss: 0.0993
Epoch 3/10
540/540 - 1s - 2ms/step - accuracy: 0.9746 - loss: 0.1048 - val_accuracy: 0.9695 - val_loss: 0.1172
Epoch 4/10
540/540 - 1s - 2ms/step - accuracy: 0.9743 - loss: 0.1106 - val_accuracy: 0.9680 - val_loss: 0.1344
Epoch 5/10
540/540 - 1s - 2ms/step - accuracy: 0.9747 - loss: 0.1096 - val_accuracy: 0.9717 - val_loss: 0.1147
Epoch 6/10
540/540 - 1s - 2ms/step - accuracy: 0.9761 - loss: 0.1001 - val_accuracy: 0.9775 - val_loss: 0.0914
Epoch 7/10
540/540 - 1s - 2ms/step - accuracy: 0.9756 - loss: 0.1054 - val_accuracy: 0.9703 - val_loss: 0.1326
Epoch 8/10
540/540 - 1s - 2ms/step - accuracy: 0.9752 - loss: 0.1106 - val_accuracy: 0.9690 - val_loss: 0.1173
Epoch 9/10
540/540 - 1s - 2ms/step - accuracy: 0.9766 - loss: 0.0950 - val_accuracy: 0.9742 - val_loss: 0.1048
Epoch 10/10


<keras.src.callbacks.history.History at 0x34b8f0da0>

I do not think I can get a better accuracy than this without having to restart the entire model. Looking at the tensorflow documentation, it seems that 97% is already a pretty good accuracy, as the solution they have provided only takes us to 92%.

## Test the Model

In [17]:
test_loss, test_accuracy = model.evaluate(scaled_test_data)
print(test_loss, test_accuracy)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 182ms/step - accuracy: 0.9580 - loss: 0.2754
0.2754209041595459 0.9580000042915344


Ok, so we actually have a 95.8% test accuracy, meaning we have slightly overfit the hyperparameters of the model to the validation set during training. But still, a decent model.

And with that, this concludes me playing around with the MNIST dataset! Thanks for reading through me bumbling around with all this new syntax and theory. With such deep possibilites, I think I will most likely play around more with deep neural networks!