# Digit Recognition with Neural Networks

Digit recognition is a classic problem in the field of computer vision and machine learning, where the goal is to correctly identify digits from a given set of images. This task is fundamental in applications such as automated data entry, license plate recognition, and handwriting recognition.

The MNIST dataset, which stands for Modified National Institute of Standards and Technology database, is a large database of handwritten digits that is commonly used for training various image processing systems. It serves as a benchmark for evaluating the performance of these systems.

In this notebook, we will explore how different neural network architectures can be applied to this problem, comparing their effectiveness and efficiency in recognizing handwritten digits.


# Data Loading of MNIST

The MNIST dataset consists of 70,000 images of handwritten digits, split into a training set of 60,000 images and a test set of 10,000 images. Each image is a grayscale image, 28x28 pixels in size.

Here's how we load and preprocess the data using TensorFlow and Keras:

In [99]:
import tensorflow as tf

# Loading our dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalizing the data
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

# Basics of Fully Dense Networks

The fully connected (dense) neural network we discussed is structured as follows:

1. **Input Layer**: Flattens the input image data, transforming it from a 2D array (28x28 pixels) into a 1D array (784 pixels).

2. **Hidden Layers**: Two dense layers with 16 neurons each, using ReLU (Rectified Linear Unit) as the activation function. These layers are responsible for learning the nonlinear relationships in the data.

3. **Output Layer**: A dense layer with 10 neurons (one for each digit from 0 to 9), using the softmax activation function to output probabilities for each class.

Here's the simplified code representation:

In [106]:
# Defining our model to train
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28, 28, 1)))
model.add(tf.keras.layers.Dense(16, activation="relu"))
model.add(tf.keras.layers.Dense(16, activation="relu"))
model.add(tf.keras.layers.Dense(10, activation="softmax"))

# Train the model
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x177edbd90>

# Explanation of the SimpleNet Model

SimpleNet is a more complex neural network architecture that utilizes convolutional layers. Here’s a brief overview of its structure:

1. **Convolutional Layers**: Designed to capture spatial hierarchies in images by applying filters that scan the input images.

2. **Pooling Layers**: Reduce the spatial dimensions of the output from the convolutional layers, helping in making the representation smaller and more manageable.

3. **Flatten and Dense Layers**: Convert the 2D features into 1D features and then into classification probabilities.

The primary advantage of SimpleNet over fully connected networks is its ability to preserve spatial relationships between pixels, making it better suited for image data. Here is how it's implemented:

In [104]:
SimpleNet_model = tf.keras.models.Sequential()

SimpleNet_model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
SimpleNet_model.add(tf.keras.layers.BatchNormalization())
SimpleNet_model.add(tf.keras.layers.MaxPooling2D((2, 2)))
SimpleNet_model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
SimpleNet_model.add(tf.keras.layers.BatchNormalization())
SimpleNet_model.add(tf.keras.layers.MaxPooling2D((2, 2)))
SimpleNet_model.add(tf.keras.layers.Flatten())
SimpleNet_model.add(tf.keras.layers.Dense(64, activation='relu'))
SimpleNet_model.add(tf.keras.layers.Dense(10, activation='softmax'))

SimpleNet_model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
SimpleNet_model.fit(x_train, y_train, epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10




INFO:tensorflow:Assets written to: SimpleNet.model/assets


INFO:tensorflow:Assets written to: SimpleNet.model/assets


# Comparison of Fully Dense Networks and SimpleNet

When comparing the fully dense network with SimpleNet on the MNIST dataset, several key differences emerge:

- **Performance**: SimpleNet generally outperforms fully connected networks on image recognition tasks due to its convolutional layers that better capture the spatial structures within images.

- **Complexity and Training**: While SimpleNet is more complex and might require more computational resources, its ability to train faster per epoch and achieve higher accuracy levels justifies the extra complexity.

- **Generalization**: SimpleNet tends to generalize better to new, unseen images, which is crucial for robust machine learning models.

In conclusion, while fully dense networks can serve as a good starting point for understanding neural network operations, architectures like SimpleNet offer significant advantages for tasks involving image data, demonstrating superior performance and efficiency.


In [111]:
loss, accuracy = model.evaluate(x_test, y_test)
# loss, accuracy = tf.keras.models.load_model("models/3Blue1Brown_1024.model").evaluate(x_test, y_test)
# loss, accuracy = tf.keras.models.load_model("models/3Blue1Brown_2048.model").evaluate(x_test, y_test)
# loss, accuracy = tf.keras.models.load_model("models/SimpleNet.model").evaluate(x_test, y_test)

print("Loss: ", loss)
print("Accuracy: ", accuracy)

Loss:  0.05779415741562843
Accuracy:  0.9886000156402588


# Useful Website for Visualization and References

#### **Distill**: https://distill.pub/2020/grand-tour/

#### **3Blue1Brown**: https://www.3blue1brown.com/lessons/neural-network-analysis#analyzing-the-network

#### **SimpleNet**: https://github.com/Coderx7/SimpleNet