## Handwritten Digit Classifier using a Simple Neural Network with 99.4% accuracy

### MNIST Dataset
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. This dataset is considered to be the "hello world" dataset for Computer Vision.

I have written a blog to give a better explanation of the approach I have used, here is the link - https://medium.com/analytics-vidhya/get-started-with-your-first-deep-learning-project-7d989cb13ae5

It has a training set of 60,000 examples and a test set of 10,000 examples for handwritten digits with a fixed dimension of 28X28 pixels. The goal is to correctly identify digits and find ways to improve the performance of the model. So let's dive into it -

## Import the required libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.utils import to_categorical                        

2024-09-11 09:57:10.825529: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


NumPy is an advanced Math Library in Python. Matplotlib will be used to plot graphs and for data visualization. We will import the MNIST dataset which is pre-loaded in Keras. We will use the Sequential Model and import the basic layers and util tools.

## Load the Dataset

In [2]:
# Load local images from directory
train_dir = "/Users/zubairfaruqui/Downloads/CNN Data/Training Dataset"  # Specify the path to the training folder
test_dir = "/Users/zubairfaruqui/Downloads/CNN Data/Test Dataset"    # Specify the path to the testing folder

batch_size = 32
img_size = (960, 480)  

In [4]:
# Load train and test datasets from the directories, in grayscale mode
train_dataset = tf.keras.utils.image_dataset_from_directory(
    train_dir,
    image_size=img_size,  
    batch_size=batch_size,
    color_mode='grayscale',  # Load as grayscale images
    label_mode='categorical'
)

test_dataset = tf.keras.utils.image_dataset_from_directory(
    test_dir,
    image_size=img_size,
    batch_size=batch_size,
    color_mode='grayscale',  # Load as grayscale images
    label_mode='categorical'
)

Found 1854 files belonging to 2 classes.
Found 807 files belonging to 2 classes.


We load the dataset and verify the dimensions of the training and testing sets.

In [None]:
def dataset_to_numpy(dataset):
    images = []
    labels = []
    for image_batch, label_batch in dataset:
        images.append(image_batch.numpy())
        labels.append(label_batch.numpy())
    return np.concatenate(images), np.concatenate(labels)

Here we are randomly selecting 9 images from the dataset and plotting them to get an idea of the handwritten digits and their respective classes.

## Data Preprocessing

Instead of a 28 x 28 matrix, we build our network to accept a 784-length vector. Pixel values range from 0 to 255 where 0 is black and 255 is pure white. We will normalize these values by dividing them by 255 so that we get the output pixel values between [0,1] in the same magnitude.

Note that we are working with grayscale images of dimension 28 x 28 pixels. If we have color images, we have 3 channels for RGB, i.e. 28 x 28 x 3, each with pixel value in the range 0 to 255.

In [60]:

X_train, y_train = dataset_to_numpy(train_dataset)
X_test, y_test = dataset_to_numpy(test_dataset)

# Normalize images (convert from range [0, 255] to [0, 1])
X_train = X_train / 255.0
X_test = X_test / 255.0

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1] * X_train.shape[2]) 
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1] * X_test.shape[2]) 


# Check shapes
print("X_train shape", X_train.shape)
print("y_train shape", y_train.shape)
print("X_test shape", X_test.shape)
print("y_test shape", y_test.shape)

2024-09-06 10:45:17.841671: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [1854]
	 [[{{node Placeholder/_4}}]]
2024-09-06 10:45:17.842063: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [1854]
	 [[{{node Placeholder/_4}}]]
2024-09-06 10:45:26.198304: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [807]
	 

X_train shape (1854, 460800)
y_train shape (1854, 2)
X_test shape (807, 460800)
y_test shape (807, 2)


In [62]:

# Show some sample images
# for i in range(9):
#     plt.subplot(960, 480, i + 1)
#     num = random.randint(0, len(X_train))
#     plt.imshow(X_train[num].squeeze(), cmap='gray', interpolation='none')  # Use .squeeze() to remove single channel dimension
#     plt.title("Class {}".format(y_train[num]))
# 
# plt.tight_layout()

# Flatten the images from (28, 28, 1) to (784) for the neural network
X_train = X_train.reshape(X_train.shape[0], 960 * 480)
X_test = X_test.reshape(X_test.shape[0], 960 * 480)

# Convert labels to categorical (one-hot encoding)
no_classes = len(np.unique(y_train))
Y_train = to_categorical(y_train, no_classes)
Y_test = to_categorical(y_test, no_classes)

# Build the model
model = Sequential()
model.add(Dense(512, input_shape=(960 * 480,)))  # Input shape adjusted for flattened 28x28 grayscale images
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(no_classes))
model.add(Activation('softmax'))

model.summary()


# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
history = model.fit(X_train, Y_train, batch_size=32, epochs=1, verbose=1)

# Evaluate the model
score = model.evaluate(X_test, Y_test)
print('Test accuracy:', score[1])

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_36 (Dense)            (None, 512)               235930112 
                                                                 
 activation_36 (Activation)  (None, 512)               0         
                                                                 
 dropout_24 (Dropout)        (None, 512)               0         
                                                                 
 dense_37 (Dense)            (None, 512)               262656    
                                                                 
 activation_37 (Activation)  (None, 512)               0         
                                                                 
 dropout_25 (Dropout)        (None, 512)               0         
                                                                 
 dense_38 (Dense)            (None, 2)               

ValueError: in user code:

    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/engine/training.py", line 1284, in train_function  *
        return step_function(self, iterator)
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/engine/training.py", line 1268, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/engine/training.py", line 1249, in run_step  **
        outputs = model.train_step(data)
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/engine/training.py", line 1051, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/engine/training.py", line 1109, in compute_loss
        return self.compiled_loss(
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/engine/compile_utils.py", line 265, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/losses.py", line 142, in __call__
        losses = call_fn(y_true, y_pred)
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/losses.py", line 268, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/losses.py", line 1984, in categorical_crossentropy
        return backend.categorical_crossentropy(
    File "/opt/miniconda3/envs/zConda/lib/python3.11/site-packages/keras/backend.py", line 5559, in categorical_crossentropy
        target.shape.assert_is_compatible_with(output.shape)

    ValueError: Shapes (None, 2, 2) and (None, 2) are incompatible


Since the output will be classified as one of the 10 classes we use one-hot encoding technique to form the output (Y variable). Read more about one-hot encoding here - https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/



## Building a 3-layer Neural Network

![alt text](https://chsasank.github.io/assets/images/crash_course/mnist_net.png)


In [None]:
# Plot accuracy and loss
fig = plt.figure()
plt.subplot(2, 1, 1)
plt.plot(history.history['accuracy'])
plt.title('Model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='lower right')

The sequential API allows you to create models layer-by-layer.

## First Hidden Layer

In [None]:
plt.subplot(2, 1, 2)
plt.plot(history.history['loss'])
plt.title('Model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')

plt.tight_layout()


The first hidden layer has 512 nodes (neurons) whose input is a vector of size 784. Each node will receive an element from each input vector and apply some weight and bias to it.

In [None]:
# Predictions on test data
predicted_classes = np.argmax(model.predict(X_test), axis=1)

correct_indices = np.nonzero(predicted_classes == y_test)[0]
incorrect_indices = np.nonzero(predicted_classes != y_test)[0]

# Visualize correct predictions
plt.figure()
for i, correct in enumerate(correct_indices[:9]):
    plt.subplot(960, 480, i + 1)
    plt.imshow(X_test[correct].reshape(960, 480), cmap='gray', interpolation='none')
    plt.title("Predicted {}, Class {}".format(predicted_classes[correct], y_test[correct]))

plt.tight_layout()

In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. ReLU stands for rectified linear unit, and is a type of activation function. $$ ReLU: f(x) = max (0,x)$$

In [None]:
# Visualize incorrect predictions
plt.figure()
for i, incorrect in enumerate(incorrect_indices[:9]):
    plt.subplot(960, 480, i + 1)
    plt.imshow(X_test[incorrect].reshape(960, 480), cmap='gray', interpolation='none')
    plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], y_test[incorrect]))

plt.tight_layout()

Dropout randomly selects a few nodes and nullifies their output (deactivates the node). This helps in ensuring that the model is not overfitted to the training dataset.

## Second Hidden Layer

In [None]:
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))

The second hidden layer also has 512 nodes and it takes input from the 512 nodes in the previous layer and gives its output to the next subsequent layer.

## Final Output Layer

The final layer of 10 neurons in fully-connected to the previous 512-node layer.
The final layer should be equal to the number of desired output classes.

In [None]:
model.add(Dense(10))
model.add(Activation('softmax'))

The Softmax Activation represents a probability distribution over n different possible outcomes. Its values are all non-negative and sum to 1. For example, if the final output is: [0, 0.94, 0, 0, 0, 0, 0, 0.06, 0, 0] then it is most probable that the image is that of the digit 1

In [None]:
model.summary()

## Model Chart

In [None]:
from keras.utils import plot_model
plot_model(model, to_file='model_chart.png', show_shapes=True, show_layer_names=True)
from IPython.display import Image
Image("model_chart.png")

## Compiling the model

When compiling a model, Keras asks you to specify your loss function and your optimizer.

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The loss function we'll use here is called categorical cross-entropy and is a loss function well-suited to comparing two probability distributions. The cross-entropy is a measure of how different your predicted distribution is from the target distribution. <br><br>
Optimizers are algorithms or methods used to change the attributes of the neural network such as weights and learning rate to reduce the losses. Optimizers are used to solve optimization problems by minimizing the loss function. In our case, we use the Adam Optimizer.

In [None]:
history = model.fit(X_train, Y_train,
          batch_size=128, epochs=10,
          verbose=1)

The batch size determines how much data per step is used to compute the loss function, gradients, and backpropagation. Note that the accuracy increases after every epoch. We need to have a balanced number of epochs as higher epochs come at the risk of overfitting the model to the training set and may result in lower accuracy in the test case.

## Evaluate the model

We will now evaluate our model against the Testing dataset

In [None]:
score = model.evaluate(X_test, Y_test)
print('Test accuracy:', score[1])

Plot the accuracy and loss metrics of the model.

In [None]:
fig = plt.figure()
plt.subplot(2,1,1)
plt.plot(history.history['accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='lower right')

plt.subplot(2,1,2)
plt.plot(history.history['loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')

plt.tight_layout()

In [None]:
results = model.predict(test_data)

In [None]:
results = np.argmax(results,axis = 1)
results = pd.Series(results,name="Label")
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),results],axis = 1)

submission.to_csv("submission.csv",index=False)

In [None]:
submission

Now let us introspect a few correctly and wrongly classified images to get a better understanding of where the model fails and hopefully take corrective measures to increse its accuracy.

In [None]:
predicted_classes =   np.argmax(model.predict(X_test),axis=1)

correct_indices = np.nonzero(predicted_classes == y_test)[0]

incorrect_indices = np.nonzero(predicted_classes != y_test)[0]

In [None]:
plt.figure()
for i, correct in enumerate(correct_indices[:9]):
    plt.subplot(3,3,i+1)
    plt.imshow(X_test[correct].reshape(28,28), cmap='gray', interpolation='none')
    plt.title("Predicted {}, Class {}".format(predicted_classes[correct], y_test[correct]))
    
plt.tight_layout()
    
plt.figure()
for i, incorrect in enumerate(incorrect_indices[:9]):
    plt.subplot(3,3,i+1)
    plt.imshow(X_test[incorrect].reshape(28,28), cmap='gray', interpolation='none')
    plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], y_test[incorrect]))
    
plt.tight_layout()

Congratulations on completing your first Deep Learning model. I hope you understood the basic concepts behind data pre-processing, model framing, training, and testing.

There are many ways in which we can improve the performance of the model by tuning the hyperparameters, data validation, augmentation, trying different optimizers and avoiding biased training, and many more! 

I have written a blog to give a better explanation of the approach I have used, here is the link - https://medium.com/analytics-vidhya/get-started-with-your-first-deep-learning-project-7d989cb13ae5

Let me know if you have any suggestions/doubts. Happy Kaggling :)