<b>The MNIST dataset consists of a collection of 28x28 pixel grayscale images of handwritten digits (0 through 9). Each image is labeled with the corresponding digit it represents. For example, an image might depict the handwritten digit "7", and its corresponding label would be "7".

<font color  = "red"><b>Problem Statement:

Given a set of these handwritten digit images and their corresponding labels, the task is to develop a machine learning or deep learning model that can accurately classify unseen handwritten digits into their respective categories (0 through 9).The goal is to train a model that can generalize well to classify digits it hasn't seen before with high accuracy.

In [1]:
# Import necessary libraries for numerical operations
import numpy as np

# Import TensorFlow and Keras for deep learning functionalities
import tensorflow as tf
import tensorflow.keras as keras

# Import specific modules from Keras for defining neural network layers and models
from tensorflow.keras import layers, models

# Import function for splitting data into training and testing sets
from sklearn.model_selection import train_test_split

# Import OneHotEncoder for encoding categorical labels into one-hot vectors
from sklearn.preprocessing import OneHotEncoder


In [2]:
# Load the MNIST dataset from the Keras datasets module
mnist = tf.keras.datasets.mnist

# Load the training and testing data along with their respective labels
# The dataset is divided into training and testing sets, where x_train and x_test contain the images,
# and y_train and y_test contain the corresponding labels
(x_train, y_train), (x_test, y_test) = mnist.load_data()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


<b> Let us check the dataset

In [3]:
x_train

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 

In [4]:
x_test

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 

<b> <font color="black">Checking the shape of the data

In [5]:
x_train.shape

(60000, 28, 28)

In [6]:
x_test.shape

(10000, 28, 28)

In [7]:
y_train.shape

(60000,)

In [8]:
y_test.shape

(10000,)

In [9]:
# Normalize the training and testing images by dividing each pixel value by 255.0
# This scales the pixel values to be in the range between 0 and 1, which is beneficial for training neural networks
x_train = x_train / 255.0
x_test = x_test / 255.0

In [10]:
# Reshape the training and testing images from 28x28 arrays to flattened vectors
# This flattening process converts each 28x28 image into a 1-dimensional array of length 784 (28*28)
# It reshapes the data to be compatible with the input layer of a neural network
x_train_flat = x_train.reshape(x_train.shape[0], -1)
x_test_flat = x_test.reshape(x_test.shape[0], -1)

In [11]:
# Print shapes of train and test data
print(x_train_flat.shape)
print(x_test_flat.shape)

(60000, 784)
(10000, 784)


In [13]:
# Perform one-hot encoding on the training and testing labels
# One-hot encoding converts categorical labels (digits 0 through 9) into binary vectors
# Each binary vector has a length equal to the number of classes (10 in this case), with a 1 at the index corresponding to the class and 0s elsewhere
# This transformation is necessary for training a neural network, as it enables the network to predict multiple classes simultaneously
onehot_encoder = OneHotEncoder(sparse_output=False)
y_train_onehot = onehot_encoder.fit_transform(y_train.reshape(-1, 1))
y_test_onehot = onehot_encoder.transform(y_test.reshape(-1, 1))

In [14]:
# Print the results
print("One-hot encoded y_train shape:", y_train_onehot.shape)
print("One-hot encoded y_test shape:", y_test_onehot.shape)

One-hot encoded y_train shape: (60000, 10)
One-hot encoded y_test shape: (10000, 10)


In [15]:
# Define the architecture of the Artificial Neural Network (ANN) model using the Sequential API
# The Sequential API allows you to create a linear stack of layers for building the neural network model

model = models.Sequential([
    # Add a fully connected (dense) layer with 128 units
    # The 'relu' activation function is used to introduce non-linearity into the network
    # The input shape is specified as (784,), corresponding to the flattened input images
    layers.Dense(128, activation='relu', input_shape=(784,)),

    # Add a dropout layer with a dropout rate of 0.2
    # Dropout is a regularization technique used to prevent overfitting by randomly dropping a fraction of the input units during training
    # A dropout rate of 0.2 means that 20% of the input units will be randomly set to 0 during each training iteration
    layers.Dropout(0.2),

    # Add another fully connected (dense) layer with 10 units
    # The 'softmax' activation function is used to convert the raw outputs into probability scores for each class
    # Softmax ensures that the output probabilities sum up to 1, making it suitable for multi-class classification problems
    layers.Dense(10, activation='softmax')
])


In [19]:
# Compile the neural network model with specified optimizer, loss function, and evaluation metrics

# Optimizer: 'adam'
# Adam is an adaptive learning rate optimization algorithm that combines the benefits of both AdaGrad and RMSProp
# It dynamically adjusts the learning rate during training, making it well-suited for a wide range of problems
# The 'adam' optimizer is commonly used for training neural networks due to its efficiency and effectiveness

# Loss Function: 'categorical_crossentropy'
# Categorical Crossentropy is a common loss function used for multi-class classification problems with one-hot encoded labels
# It calculates the difference between the predicted probability distribution and the true probability distribution
# The goal is to minimize this difference, effectively training the model to predict the correct class with high confidence

# Metrics: ["accuracy"]
# The accuracy metric is used to evaluate the performance of the model during training and testing
# It measures the proportion of correctly classified samples over the total number of samples
# Maximizing accuracy is typically the primary goal when training classification models

model.compile(optimizer="adam",   # Specify the optimizer as 'adam'
              loss='categorical_crossentropy',   # Use 'categorical_crossentropy' as the loss function
              metrics=["accuracy"])   # Evaluate the model's performance using accuracy as the metric


In [21]:
# Split training data into training and test
x_train_split, x_val, y_train_split, y_val=train_test_split(x_train_flat, y_train_onehot, test_size=0.2, random_state=5)

In [22]:
# Train the neural network model on the training data

# Training Data: (x_train_split, y_train_split)
# x_train_split: Training images after splitting from the original training data
# y_train_split: Corresponding one-hot encoded labels for the training images

# Validation Data: (x_val, y_val)
# x_val: Validation images used to monitor the performance of the model during training
# y_val: Corresponding one-hot encoded labels for the validation images

# Epochs: 10
# An epoch refers to one complete pass through the entire training dataset during training
# Training the model for multiple epochs allows it to learn from the data and improve its performance gradually
# Here, the model will be trained for 10 epochs

# Batch Size: 128
# Batch size refers to the number of samples processed before the model's parameters are updated during training
# Using a batch size of 128 means that the training data will be divided into batches of 128 samples each
# The model's parameters will be updated based on the average loss calculated over each batch

# Training Procedure:
# During training, the model will learn to minimize the specified loss function (categorical crossentropy) using the 'adam' optimizer
# The performance of the model will be evaluated on both the training and validation datasets using the accuracy metric

img_model = model.fit(x_train_split,   # Training images
                      y_train_split,   # Training labels
                      epochs=10,   # Number of training epochs
                      batch_size=128,   # Batch size
                      validation_data=(x_val, y_val))   # Validation data for monitoring performance


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [26]:
# Evaluate the trained model on the testing data to assess its performance

# Testing Data: (x_test_flat, y_test_onehot)
# x_test_flat: Flattened testing images after normalization
# y_test_onehot: One-hot encoded labels for the testing images

# The 'evaluate' method computes the loss value and metrics (accuracy in this case) for the testing data
test_loss, test_accuracy = model.evaluate(x_test_flat, y_test_onehot)

# Print the test loss and test accuracy
print(f"Test loss: {test_loss}, Test accuracy: {test_accuracy}")


Test loss: 0.07770629972219467, Test accuracy: 0.9765999913215637


<font color = Green><b> Results

The results of evaluating the model on the testing data are as follows:

- **Test Loss**: 0.0777
- **Test Accuracy**: 0.9766 (97.66%)

Interpretation:
- **Test Loss**: The average loss (categorical crossentropy) calculated on the testing data is approximately 0.0777. This indicates how well the model's predictions match the true labels on average. A lower loss value suggests better performance.
  
- **Test Accuracy**: The accuracy of the model on the testing data is approximately 97.66%. This means that the model correctly classified 97.66% of the testing images into their respective classes. A higher accuracy indicates better performance.

Overall, the model performed very well on the unseen testing data, achieving high accuracy and low loss, which suggests that it has learned to generalize effectively from the training data to make accurate predictions on new, unseen images.

In [27]:
# Use the trained model to make predictions on the testing data

# Testing Data: x_test_flat
# x_test_flat: Flattened testing images after normalization

# The 'predict' method generates predictions for the testing data using the trained model
# It returns the predicted probability distribution for each class for each testing sample
y_pred = model.predict(x_test_flat)




In [28]:
y_pred

array([[4.3260939e-06, 1.0490104e-08, 3.2743465e-05, ..., 9.9987388e-01,
        1.8458086e-06, 1.9532261e-05],
       [1.2957817e-07, 9.8363862e-06, 9.9997574e-01, ..., 4.3122684e-12,
        2.2468637e-06, 8.5945953e-14],
       [5.4935854e-06, 9.9658543e-01, 8.0584921e-04, ..., 1.6444027e-03,
        3.8950267e-04, 2.2102631e-05],
       ...,
       [3.6375561e-10, 1.0981247e-10, 1.2968375e-10, ..., 2.9031321e-06,
        3.1332111e-06, 2.0152076e-05],
       [2.3094890e-09, 4.7519128e-10, 7.6320478e-12, ..., 5.2811540e-08,
        5.0837748e-06, 5.0765139e-11],
       [3.2166458e-06, 2.8291688e-12, 8.2772829e-07, ..., 9.2499142e-12,
        2.2854125e-09, 1.0124429e-09]], dtype=float32)