This project demonstrates a complete deep learning workflow for classifying handwritten digits from the MNIST dataset using a Convolutional Neural Network (CNN).
It covers every stage — from data preprocessing to hyperparameter optimization — to achieve a high-performing and well-regularized model.
The objective of this notebook is to build, train, evaluate, and optimize a CNN model capable of recognizing handwritten digits (0–9) from grayscale 28×28 pixel images.
Through systematic training and tuning, the model’s performance is enhanced for better accuracy and generalization.
- Loaded using TensorFlow’s built-in
tf.keras.datasets.mnist. - Split into 60,000 training and 10,000 testing images.
- Normalized pixel values to the range
[0, 1]. - Reshaped images to
(28, 28, 1)to fit CNN input dimensions. - Converted labels into integer-encoded classes.
- Defined a baseline CNN using
tensorflow.keras.Sequential. - Layers included:
Conv2DandMaxPooling2Dfor feature extraction.FlattenandDensefor classification.
- Designed to balance simplicity, performance, and interpretability.
- Optimizer: Adam
- Loss Function: Sparse Categorical Crossentropy
- Metric: Accuracy
- Trained the model on the training set and validated on the test set.
- Visualized accuracy and loss trends across epochs.
- Evaluated model performance on unseen test data.
- Displayed predictions vs. actual labels for sample images.
- Provided insights into strengths and misclassifications.
To further improve performance, Keras Tuner’s Random Search was used to explore various configurations of the CNN model.
- Defined a hyperparameter search space for:
- Number of filters and kernel sizes in
Conv2Dlayers - Learning rate of the optimizer
- Dropout rate for regularization
- Batch size and dense layer units
- Number of filters and kernel sizes in
- Added Batch Normalization and Dropout layers for better regularization.
- Introduced an additional
Conv2Dlayer to explore deeper networks.
- Implemented a modular function
build_cnn_model(hp)to dynamically construct models based on hyperparameters. - Used RandomSearch to identify the optimal configuration based on validation accuracy.
- Retrieved best hyperparameters from the tuner.
- Rebuilt and retrained the CNN using the optimal configuration.
- Evaluated on the full training and test datasets.
- The tuned CNN achieved significant improvement in accuracy and generalization.
- Both models were compared in terms of training behavior and evaluation metrics.
Key Findings:
- Regularization (Dropout, BatchNorm) effectively reduced overfitting.
- Learning rate tuning stabilized convergence.
- Model depth enhanced feature extraction without vanishing gradients.
- 🐍 Python
- 🧠 TensorFlow / Keras
- 🎯 Keras Tuner
- 📊 NumPy, Matplotlib, Seaborn for data analysis and visualization
| Model Version | Test Accuracy | Validation Loss | Key Features |
|---|---|---|---|
| Baseline CNN | ~98% | Moderate | Basic CNN architecture |
| Tuned CNN | >99% | Lower | BatchNorm, Dropout, optimized hyperparams |
- Systematic hyperparameter tuning greatly boosts CNN performance.
- Regularization is critical for preventing overfitting.
- Visualization helps identify convergence issues and failure patterns.
- The MNIST dataset remains a powerful benchmark for experimentation.
- Extend to Fashion-MNIST or CIFAR-10 for more complex tasks.
- Use Bayesian Optimization or Hyperband for efficient tuning.
- Visualize feature maps to interpret learned representations.
- Apply transfer learning or quantization for deployment efficiency.
This project showcases the complete deep learning pipeline — from building a baseline CNN to performing rigorous hyperparameter optimization.
The final tuned model delivers high accuracy on the MNIST dataset and serves as a foundation for more advanced computer vision research.
ARIF RABBANI
Software Engineering Student | Machine Learning Enthusiast