A CNN-based handwritten digit classifier trained on both the MNIST and USPS datasets. This project explores how data augmentation techniques (noise injection and rotation) affect model robustness across different handwriting sources.
- Data Loading — MNIST digits via TensorFlow and USPS postal digits via DeepLake
- Preprocessing — USPS images (16x16) are padded to match MNIST dimensions (28x28)
- Data Augmentation — Random noise injection and rotation applied at configurable levels
- Model Training — Four CNN variants trained under different data conditions
- Evaluation — Models compared across clean, noisy, and rotated test sets using accuracy, precision, recall, and F1 score
| Model | Training Data | Purpose |
|---|---|---|
| Clean Model | Clean MNIST + USPS | Baseline performance |
| Mixed Model | Noisy MNIST + USPS | Robustness to noise |
| Rotated Clean Model | Rotated clean data | Robustness to rotation |
| Rotated Mixed Model | Rotated noisy data | Combined robustness |
- 3 convolutional layers (32, 64, 64 filters) with ReLU activation and max pooling
- Fully connected dense layer (64 units)
- Output layer with 10 units (softmax) for digit classification
- Optimizer: Adam | Loss: Sparse Categorical Crossentropy
- Python 3.11
- TensorFlow / Keras
- NumPy, SciPy
- scikit-learn
- Matplotlib
- Pillow
- DeepLake
├── HandwrittenDigitClassification.ipynb # Full pipeline notebook
├── Models/ # Saved trained models (.keras)
├── requirements.txt # Python dependencies
└── README.md
git clone https://github.com/yourusername/handwritten-digit-classification.git
cd handwritten-digit-classification
pip install -r requirements.txt
jupyter notebook HandwrittenDigitClassification.ipynbMIT License