Documetation:MNIST Handwritten Digit Classification with Deep MLP

1.Project Overview
This project involves building,training
and evaluating a Deep Multi-Layer 
Perceptron(MLP) to classify
handwritten digits from the MNIST 
dataset.The goal is to achieve over 98%
precision on the test set while
incorporating the best practices, such as
saving checkppoints,using early
stopping,and visualizing learning 
curves with TensorBoard. 

2.Problem Statement:
The MNIST dataset consists of 28x28 
grayscale images of handwritten digits
(0-9).The task is to develop a machine 
learning model that can classify each 
image into the correct digit category.

3.Dataset Description.


Source: The MNIST dataset is available in the

tensorflow.keras.datasets module

Training Set: 60,000 images.

Test Set:10,000 images

Features:Each images is a 28x28 
matrix of pixel values(grayscale)

Labels:Each image corresponds to a digit from 0 to 9(10 classes)

Processing Steps:

1.Normalize pixel values to the range [0,1]

2.Convert the labels to one-hot encoding for multi-class classification.


4.Model Architecture

The MLP model includes:


1.Input layer:Flatten the 28x28 input images into a vector of size 784.


2.Hidden Layers:
   

   Dense layers with ReLU activation functions.

   Dropout layers for regularization(prevent overfitting).



3.Output Layer: Dense layer with 10 units and softmax activation to predict class probabilities.




Model Summary:

| Layer Type   | Number of Neurons | Activation | Dropout |
|--------------|--------------------|------------|---------|
| Input Layer  | 784                | -          | -       |
| Dense Layer  | 512                | ReLU       | 0.3     |
| Dense Layer  | 256                | ReLU       | 0.3     |
| Dense Layer  | 128                | ReLU       | 0.3     |
| Output Layer | 10                 | Softmax    | -       |


5. Training Process

Hyperparameters:

Optimizer: Adam (adaptive learning rate optimization).

Loss Function: Categorical Cross-Entropy.

Metrics: Accuracy.

Batch Size: 128.

Epochs: Up to 50 (with early stopping).

Callbacks:

Model Checkpoint: Saves the model weights with the highest validation accuracy.

Early Stopping: Stops training if validation loss doesn't improve for 5 consecutive epochs.

TensorBoard: Logs training and validation metrics for visualization.


6. Implementation

Step 1: Import Libraries

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense,Dropout,Flatten
from tensorflow.keras.callbacks import ModelCheckpoint,EarlyStopping,TensorBoard
import numpy as np
import os
import datetime

Step 2: Load and Preprocess Data

In [4]:
#Load and preproccess the mnist dataset
(x_train,y_train) ,(x_test,y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = keras.utils.to_categorical(y_train,10)
y_test = keras.utils.to_categorical(y_test,10)

Step 3: Define the Model

In [5]:
#Define the deep MLP model
model = keras.Sequential([
    Flatten(input_shape=(28,28)),
    Dense(512,activation="relu"),
    Dropout(0.3),
    Dense(256,activation="relu"),
    Dropout(0.3),
    Dense(128,activation="relu"),
    Dropout(0.3),
    Dense(10,activation="softmax")

])

  super().__init__(**kwargs)


Step 4: Compile the Model

In [6]:
#Compile the model
model.compile(
    optimizer="adam",
    loss = "categorical_crossentropy",
    metrics = ['accuracy']
)

Step 5: Set Up Callbacks

In [10]:
#Define callbacks
log_dir = os.path.join("logs",datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
callbacks = [
    ModelCheckpoint("best_model.h5.keras",save_best_only=True,monitor="val_accuracy"),
    EarlyStopping(monitor="val_loss",patience=5,restore_best_weights=True),
    TensorBoard(log_dir=log_dir,histogram_freq=1)

]

Step 6: Train the Model

In [11]:
history = model.fit(
    x_train,y_train,
    validation_split = 0.2,
    epochs = 50,
    batch_size = 128,
    callbacks = callbacks
)

Epoch 1/50
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 28ms/step - accuracy: 0.7710 - loss: 0.7126 - val_accuracy: 0.9608 - val_loss: 0.1293
Epoch 2/50
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 23ms/step - accuracy: 0.9506 - loss: 0.1684 - val_accuracy: 0.9714 - val_loss: 0.1005
Epoch 3/50
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 22ms/step - accuracy: 0.9655 - loss: 0.1192 - val_accuracy: 0.9729 - val_loss: 0.0937
Epoch 4/50
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 20ms/step - accuracy: 0.9733 - loss: 0.0883 - val_accuracy: 0.9753 - val_loss: 0.0832
Epoch 5/50
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 19ms/step - accuracy: 0.9755 - loss: 0.0778 - val_accuracy: 0.9750 - val_loss: 0.0876
Epoch 6/50
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 20ms/step - accuracy: 0.9808 - loss: 0.0625 - val_accuracy: 0.9771 - val_loss: 0.0807
Epoch 7/50
[1m375/

Step 7: Evaluate the Model

In [12]:
test_loss, test_accuracy = model.evaluate(x_test,y_test,verbose=0)
print(f"Test Accuracy {test_accuracy*100:.2f}%")

Test Accuracy97.91%


Step 8: Save the Model

In [13]:
#Save final model
model.save("final_mlp_model.h5")



7. Results

Expected Test Accuracy: ≥98%.

Best Model: Saved as best_model.h5.

Training Metrics: Available through TensorBoard (tensorboard --logdir logs/).

8. Future Improvements

Experiment with advanced architectures, such as Convolutional Neural Networks (CNNs), which are better suited for image data.

Implement data augmentation to artificially expand the training dataset.

Optimize hyperparameters using techniques like grid search or Bayesian optimization.

9. Project Files

final_mlp_model.h5: The final trained model.

best_model.h5: The best-performing model during training.

TensorBoard logs: Stored in the logs/ directory.