# Detecting handwritten numbers using TensorFlow
In this project, I will create a deep learning model using TensorFlow to detect handwritten numbers. The model will be trained, validated, and tested using the MNIST dataset, which is a large collection of handwritten digits widely used for training various image processing systems.

In [9]:
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds

## Step 1: Data Preparation
The MNIST dataset consists of 60,000 training images and 10,000 testing images of handwritten digits from 0 to 9. Each image is a 28x28 grayscale image. The dataset is already labeled, which means each image comes with the correct digit label.

In [60]:
mnist_dataset, mnist_info = tfds.load(
    name='mnist',
    split=['train', 'test'],
    as_supervised=True,
    with_info=True,
)

mnist_train, mnist_test = mnist_dataset[0], mnist_dataset[1]

# Setting the training and validation sample sizes
num_validation_samples = int(0.1 * mnist_info.splits["train"].num_examples)
num_test_samples = int(mnist_info.splits["test"].num_examples)

# Scaling inputs to be between 0 and 1
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)
scaled_test_data = mnist_test.map(scale)

# Shuffling the data
BUFFER_SIZE = 10000
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# Assigning the validation and training datasets
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

# Setting the batch size for the training data and updating the training, validation and test datasets
BATCH_SIZE = 200
train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = scaled_test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

## Step 2: Building the Model
I will use TensorFlow, an open-source deep learning framework, to build the neural network model. The architecture of the model will include:

Input Layer: This layer will accept the 28x28 pixel values of the images.
Hidden Layers: Several dense (fully connected) layers with ReLU & tahn activation functions will be used. These layers will help the model learn complex patterns in the data.
Output Layer: The final layer will have 10 neurons with a softmax activation function, representing the probability distribution of the 10 digit classes (0-9).

In [83]:
input_size = 784
output_size = 10
hidden_layer_size = 200

# The model has 3 hidden layers and an output layer. I'll use a combination of the Rectified Linear Unit (ReLU) & hyperbolic tangent (tahn) activation functions for the hidden layers, and the softmax activatin function for the output to produce a probability.
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation="relu"),
    tf.keras.layers.Dense(hidden_layer_size, activation="tanh"),
    tf.keras.layers.Dense(hidden_layer_size, activation="tanh"),
    tf.keras.layers.Dense(output_size, activation="softmax")
])

## Step 3: Training and Validating the Model
The model will be trained on the MNIST training dataset. The process involves:

Loss Function: Categorical Crossentropy will be used as the loss function to measure the difference between the predicted and actual labels.
Optimiser: The Adam optimiser will be utilized to adjust the learning rate dynamically and improve the model's accuracy.
Metrics: Accuracy will be the primary metric for evaluating the model's performance.

To ensure the model is not overfitting and generalises well to unseen data, I will validate the model using a portion of the training dataset as a validation set. This will help in tuning hyperparameters and improving the model's performance.

In [84]:
# To optimise the model I'll use Adaptive Moment Estimation (ADAM) optimiser as it combines the benefits of RMSprop and Stochastic Gradient Descent with momentum.
# The loss function I'll be using is Sparse Catetorical Crossentropy which applies one-hot encoding so that the output shape matches the target shape.
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

In [85]:
EPOCHS_NUM = 5
#To train the model, I'll set the parameters as below. The training cycles (epochs) are set to 5 epochs. I'll also add the validation data and targets to validate the model.
model.fit(train_data, epochs = EPOCHS_NUM, validation_data = (validation_inputs, validation_targets), verbose = 2)

Epoch 1/5
270/270 - 3s - 11ms/step - accuracy: 0.9145 - loss: 0.2871 - val_accuracy: 0.9605 - val_loss: 0.1373
Epoch 2/5
270/270 - 2s - 9ms/step - accuracy: 0.9681 - loss: 0.1042 - val_accuracy: 0.9755 - val_loss: 0.0894
Epoch 3/5
270/270 - 2s - 7ms/step - accuracy: 0.9786 - loss: 0.0689 - val_accuracy: 0.9800 - val_loss: 0.0682
Epoch 4/5
270/270 - 2s - 8ms/step - accuracy: 0.9837 - loss: 0.0511 - val_accuracy: 0.9823 - val_loss: 0.0610
Epoch 5/5
270/270 - 2s - 8ms/step - accuracy: 0.9881 - loss: 0.0369 - val_accuracy: 0.9872 - val_loss: 0.0454


<keras.src.callbacks.history.History at 0x156a8c57210>

## Step 4: Testing the Model
Finally, the model will be tested on the MNIST test dataset to evaluate its accuracy and robustness. The performance metrics obtained from the test data will give an indication of how well the model can recognize handwritten digits in real-world scenarios.

In [86]:
model.evaluate(test_data)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 118ms/step - accuracy: 0.9766 - loss: 0.0793


[0.07927960157394409, 0.9765999913215637]

## Conclusion
The model accuracy based onthe test dataset is 97.66% compared to 98.81% and 98.72% for the training and validation datasets respectively. This shows that the model's performance on the test dataset is very close to the training and validation datasets. Moreover, the test loss is 0.0793 compared to 0.0369 and 0.0454 for the training and validation datasets respectively. This shows a very close result indicating that I haven't overfitted the data during the training and validation process.
This will conclude the development of this model.