
# PixelClassifier – LeNet style CNN (MNIST)

### Objective
In this notebook, I implement a simple LeNet-style Convolutional Neural Network (CNN) to classify handwritten digits from the MNIST dataset. I perform **5-fold Stratified Cross-Validation** using fixed hyperparameters and retrain the best-performing model on the full training set to evaluate its test accuracy.

In [None]:

# Import core libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, utils, optimizers
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score



## Loading and Preprocessing the MNIST Dataset
I load the MNIST dataset directly from Keras.  
Images are reshaped to include a single grayscale channel and normalized to [0, 1] for stable training.  
Labels are integer-encoded here and will be one-hot encoded **within each fold** to save memory.


In [None]:

# Load dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize and reshape
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

num_classes = 10
print("Training data shape:", x_train.shape)
print("Test data shape:", x_test.shape)


Training data shape: (60000, 28, 28, 1)
Test data shape: (10000, 28, 28, 1)



## Defining the CNN Model (LeNet-style)
I designed a simple CNN that follows the classic LeNet structure:
- Two convolutional + pooling layers
- One dense hidden layer
- A softmax output layer

I use fixed hyperparameters: 32 filters, kernel size (3,3), learning rate 0.001, and dense layer size 128.


In [None]:
def build_lenet(filters, learning_rate):
    model = models.Sequential([
        layers.Conv2D(filters, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(filters * 2, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    optimizer = optimizers.Adam(learning_rate=learning_rate)
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model


## 5-Fold Stratified Cross-Validation
I split the dataset into five folds using `StratifiedKFold`.  
For each fold, a new CNN model is trained for **3 epochs** using one-hot encoded labels.  
After training, I record the validation accuracy for each fold and compute the mean performance.


In [None]:
param_combinations = [
    {'filters': 16, 'lr': 0.001},
    {'filters': 16, 'lr': 0.01},
    {'filters': 32, 'lr': 0.001},
    {'filters': 32, 'lr': 0.01}
]

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
results = []

for params in param_combinations:
    filters = params['filters']
    lr = params['lr']
    print(f"\nHyperparameter combination: filters = {filters}, learning rate = {lr}")
    fold_accuracies = []

    for fold, (train_idx, val_idx) in enumerate(skf.split(x_train, y_train), 1):
        # Split and encode
        X_train_fold, X_val_fold = x_train[train_idx], x_train[val_idx]
        y_train_fold, y_val_fold = y_train[train_idx], y_train[val_idx]
        y_train_fold_oh = utils.to_categorical(y_train_fold, num_classes)
        y_val_fold_oh = utils.to_categorical(y_val_fold, num_classes)

        # Build model
        model = build_lenet(filters, lr)
        model.fit(X_train_fold, y_train_fold_oh, epochs=3, batch_size=128,
                  validation_data=(X_val_fold, y_val_fold_oh), verbose=0)

        # Evaluate accuracy
        val_preds = np.argmax(model.predict(X_val_fold), axis=1)
        fold_acc = accuracy_score(y_val_fold, val_preds)
        fold_accuracies.append(fold_acc)
        print(f"  Fold {fold} accuracy = {fold_acc:.4f}")

    avg_acc = np.mean(fold_accuracies)
    results.append({'filters': filters, 'lr': lr, 'mean_acc': avg_acc})
    print(f"Average accuracy = {avg_acc:.4f}")



Hyperparameter combination: filters = 16, learning rate = 0.001


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step
  Fold 1 accuracy = 0.9835


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 2 accuracy = 0.9827


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 3 accuracy = 0.9826


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 4 accuracy = 0.9804


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 5 accuracy = 0.9814
Average accuracy = 0.9821

Hyperparameter combination: filters = 16, learning rate = 0.01


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 1 accuracy = 0.9827


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 2 accuracy = 0.9815


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 3 accuracy = 0.9869


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 4 accuracy = 0.9823


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step
  Fold 5 accuracy = 0.9758
Average accuracy = 0.9819

Hyperparameter combination: filters = 32, learning rate = 0.001


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step
  Fold 1 accuracy = 0.9876


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step
  Fold 2 accuracy = 0.9858


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 10ms/step
  Fold 3 accuracy = 0.9858


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step
  Fold 4 accuracy = 0.9834


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step
  Fold 5 accuracy = 0.9826
Average accuracy = 0.9850

Hyperparameter combination: filters = 32, learning rate = 0.01


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step
  Fold 1 accuracy = 0.9853


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step
  Fold 2 accuracy = 0.9831


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step
  Fold 3 accuracy = 0.9820


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 10ms/step
  Fold 4 accuracy = 0.9812


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 8ms/step
  Fold 5 accuracy = 0.9757
Average accuracy = 0.9815



## Selecting the Best Hyperparameter Combination and Retraining
After all combinations have been tested, I identify the one with the highest mean validation accuracy.  
Then I retrain the model using that configuration on the **entire training dataset** for 3 epochs and report final test accuracy.


In [None]:
best_params = max(results, key=lambda x: x['mean_acc'])
print("\nBest hyperparameters found:")
print(best_params)

# One-hot encode full dataset
y_train_oh = utils.to_categorical(y_train, num_classes)
y_test_oh = utils.to_categorical(y_test, num_classes)

# Retrain best model on full training data
final_model = build_lenet(best_params['filters'], best_params['lr'])
final_model.fit(x_train, y_train_oh, epochs=3, batch_size=128, verbose=1)

# Evaluate on test data
test_loss, test_acc = final_model.evaluate(x_test, y_test_oh, verbose=0)
print(f"\nFinal Test Accuracy: {test_acc:.4f}")


Best hyperparameters found:
{'filters': 32, 'lr': 0.001, 'mean_acc': np.float64(0.9850166666666667)}


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/3
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 87ms/step - accuracy: 0.8565 - loss: 0.4845
Epoch 2/3
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 87ms/step - accuracy: 0.9805 - loss: 0.0606
Epoch 3/3
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 88ms/step - accuracy: 0.9870 - loss: 0.0409

Final Test Accuracy: 0.9868


# Final Report

## 1. Dataset and Preparation

For this lab, I used the MNIST dataset, which consists of 70,000 grayscale images of handwritten digits (0–9). Each image is 28×28 pixels in size. The dataset is split into 60,000 training samples and 10,000 test samples.

Before training, I normalized all pixel values by dividing them by 255 so that they fall in the range [0, 1]. This normalization helps the model converge faster. Each image was reshaped to include a single channel, resulting in an input shape of (28, 28, 1). The class labels were one-hot encoded within each fold of the cross-validation loop to ensure efficient memory usage and proper label formatting for categorical cross-entropy.


## 2. CNN Model Architecture

The model implemented in this lab follows a simplified LeNet-style CNN structure that is lightweight yet expressive enough for MNIST. The layer sequence is as follows:

Input → Conv2D (ReLU) → MaxPooling2D → Conv2D (ReLU) → MaxPooling2D → Flatten → Dense (ReLU) → Dense (Softmax)

- The first convolutional layer extracts local spatial features using a 3×3 kernel.  
- The max-pooling layer reduces feature map dimensions and adds translation invariance.  
- A second convolution–pool block increases feature depth for stronger representations.  
- The Flatten layer converts the 2D feature maps into a 1D vector.  
- This is followed by a Dense layer with 128 units (ReLU activation), and finally a Dense output layer with 10 neurons (Softmax) for classification.

The architecture is deliberately compact to keep computation time short while maintaining high accuracy. It reflects the same design principles as LeNet, but with minor updates like the use of Adam as the optimizer.

## 3. Hyperparameter Exploration

To analyze the impact of network capacity and learning rate, I evaluated four fixed hyperparameter configurations using 5-fold Stratified Cross-Validation. Each configuration combined:

- Filters: {16, 32}  
- Learning rate: {0.001, 0.01}  

Each fold was trained for 3 epochs, balancing speed with sufficient convergence.

The configurations tested were:

| Filters | Learning Rate |
|:--------:|:--------------:|
| 16 | 0.001 |
| 16 | 0.01  |
| 32 | 0.001 |
| 32 | 0.01  |

## 4. Results and Evaluation

The mean accuracies across folds for each configuration were:

| Filters | Learning Rate | Mean Accuracy |
|:--------:|:--------------:|:--------------:|
| 16 | 0.001 | 0.9821 |
| 16 | 0.01  | 0.9819 |
| 32 | 0.001 | **0.9850** |
| 32 | 0.01  | 0.9815 |

Based on these results, the best-performing combination was:
> filters = 32, learning rate = 0.001

This configuration achieved a mean validation accuracy of 98.5% across folds.

I then retrained this configuration on the full 60,000-image training set for 3 epochs and evaluated it on the 10,000-image test set. The model achieved a final test accuracy of 98.68%, which is an excellent result for a lightweight CNN trained in under a few minutes.

## 5. Discussion

This experiment shows that even a relatively simple CNN can reach near state-of-the-art accuracy on MNIST without deep architectures or long training times.  For context, deeper models trained with more epochs (20–50) or additional regularization such as dropout and batch normalization can reach 99.2–99.4% accuracy. However, given that my model was trained for just 3 epochs with only two convolutional layers, achieving 98.68% demonstrates strong generalization and efficiency.

Some ways the accuracy could be further improved include:
- Adding another convolution–pool block for more feature extraction.
- Increasing training epochs.
- Introducing small data augmentations to improve robustness.
- Trying batch normalization or dropout to reduce minor overfitting.

## 6. Conclusion

Through this lab, I successfully implemented and evaluated a compact LeNet-style CNN on the MNIST dataset using 5-fold Stratified Cross-Validation.
The best-performing configuration (filters = 32, learning rate = 0.001) produced a final test accuracy of 98.68% after retraining on the full dataset.  


##References

[1] Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner: Gradient Based Learning Applied to Document Recognition, Proceedings of IEEE, 86(11):2278–2324, 1998

[2] Keras Conv2D and Convolutional Layers (https://pyimagesearch.com/2018/12/31/keras-conv2d-and-convolutional-layers/)

[3] Convolutional Neural Network (CNN) (https://blog.gopenai.com/convolutional-neural-network-cnn-054ac70d40ec)