# Handwritten Digit Classification on the MNIST Dataset


## Introduction

This project applies the universal workflow of machine learning from *Deep Learning with Python* (Chapter 4.5, 1st edition) to the MNIST handwritten digit dataset. The goal is to build, analyze, and improve a neural network that can recognize digits from images.

The task is to develop a machine learning model that can accurately classify images of handwritten digits (0–9) from the MNIST dataset. This is a supervised learning problem, where the model is trained on labeled examples of images and their corresponding digit classes. Each input is a 28×28 grayscale image, and the model outputs a label indicating which digit (0–9) it predicts for that image. The MNIST dataset is a widely used benchmark in machine learning and computer vision, consisting of 70,000 images in total: 60,000 for training and 10,000 for testing.

MNIST is an ideal choice for this project because it is relatively small, easy to load and preprocess, and has been extensively studied, which makes it well suited for learning and comparing different model configurations.

In this project, the models are restricted to Keras/TensorFlow Sequential architectures built only from Dense and Dropout layers, without using more complex layers such as convolutional layers. This constraint reflects the course requirements and encourages a focus on understanding the core ideas of fully connected neural networks and regularization, rather than relying on more advanced architectures.

The main metric for success in this project is classification accuracy on a held‑out test set, measuring the proportion of correctly classified digit images. The goal is to build a model that achieves high accuracy while following the DLWP universal workflow, rather than necessarily reaching state‑of‑the‑art performance. Additional metrics such as precision, recall, and F1‑score will be considered during evaluation to provide a more detailed picture of model performance, particularly if class imbalance or specific error types become relevant.

The workflow for this project follows the universal machine learning process described in Deep Learning with Python: defining the problem and dataset, choosing a success metric, and deciding on an evaluation protocol. After that, the data will be prepared and split, a baseline model will be built and evaluated, a larger model will be trained to intentionally overfit, and finally regularization techniques and hyperparameter tuning will be applied based on validation performance to improve generalization.

In this project, the universal workflow of machine learning from Deep Learning with Python is followed step by step. The problem and dataset are first defined, along with the success metric and evaluation protocol. The data is then loaded, preprocessed, and split into training, validation, and test sets. A simple baseline model is built and evaluated, followed by a larger model that is allowed to overfit in order to explore the model’s capacity. Finally, regularization techniques and hyperparameter tuning are applied to improve generalization, and the best model is evaluated on the test set and discussed in terms of its strengths, limitations, and possible extensions.

## 1. Problem definition and dataset

The problem addressed in this project is the classification of handwritten digits from the MNIST dataset. The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0–9), each of size 28×28 pixels. It is split into a training set of 60,000 images and a test set of 10,000 images. This is a supervised multiclass classification problem, where the goal is to train a model that takes an image as input and outputs the correct digit label.

MNIST is a good choice for this project because it is widely used in the machine learning community, making it easy to compare results with existing work. It is also relatively small and straightforward to load and preprocess, which allows us to focus on the modeling aspects rather than on complex data handling. Additionally, the task of digit classification is a well‑defined and intuitive problem that serves as a gentle introduction to image classification and neural networks.

## 2. Measure of Success

Our mains metric for success in this project is classification accuracy on a held‑out test set, which measures the proportion of correctly classified digit images. 

$$
Accuracy = \frac{Number\ of\ Correct\ Predictions}{Total\ Number\ of\ Predictions}
$$

The goal is to build a model that achieves high accuracy while following the DLWP universal workflow, rather than necessarily reaching state‑of‑the‑art performance. Additional metrics such as precision, recall, and F1‑score will be considered during evaluation to provide a more detailed picture of model performance, particularly if class imbalance or specific error types become relevant.

## 3. Evaluation Protocol

lorem ipsum

## 5. Baseline model: small dense network
lorem ipsum

## 6. Overfitting model: larger dense network
lorem ipsum

## 7. Regularization and hyperparameter tuning

lorem ipsum

## 8. Final model evaluation on the test set

lorem ipsum

## 9. Discussion and conclusions

lorem ipsum

## 10. References and code credits

lorem ipsum