# CNN classifier for the MNIST dataset

## Project Overview
The goal of this project is to build and train a Convolutional Neural Network (CNN) to classify handwritten digits from the famous MNIST dataset. 

Using **TensorFlow** and **Keras**, we will:
1. Load and preprocess the image data.
2. Design a CNN architecture suitable for image classification.
3. Train the model and visualize the learning performance (Accuracy/Loss).
4. Evaluate the model on unseen test data.

**Stack:** Python, TensorFlow 2, Pandas, Matplotlib.

In [1]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

print(f"TensorFlow version: {tf.__version__}")

## 1. Data Loading and Preprocessing

The MNIST dataset contains 60,000 training images and 10,000 testing images of normalized handwritten digits.
We normalize the pixel values to be between 0 and 1 to improve training convergence and add a channel dimension to match the expected CNN input format (Height, Width, Channel).

#### Load and preprocess the data

In [2]:
def load_and_process_data():
    """
    Loads the MNIST dataset and performs necessary preprocessing steps for CNN training.
    
    Steps:
    1. Load raw data (split into train/test).
    2. Normalize pixel intensity values to range [0, 1].
    3. Reshape input tensors to include the channel dimension (Grayscale).
    
    Returns:
        tuple: ((train_images, train_labels), (test_images, test_labels))
    """
    # Load data from TensorFlow/Keras datasets
    mnist_data = tf.keras.datasets.mnist
    (train_images, train_labels), (test_images, test_labels) = mnist_data.load_data()
    
    # ---------------------------------------------------------
    # NORMALIZATION
    # ---------------------------------------------------------
    # Neural networks converge faster and more strictly when inputs are small.
    # We scale the pixel values from [0, 255] to [0.0, 1.0].
    train_images = train_images / 255.0
    test_images = test_images / 255.0
    
    # ---------------------------------------------------------
    # RESHAPING FOR CNN
    # ---------------------------------------------------------
    # Keras Conv2D layers expect a 4D tensor input: (Batch_Size, Height, Width, Channels).
    # Since MNIST is grayscale, we add a single channel dimension at the end.
    # Shape transformation: (60000, 28, 28) -> (60000, 28, 28, 1)
    train_images = train_images[..., np.newaxis]
    test_images = test_images[..., np.newaxis]
    
    return (train_images, train_labels), (test_images, test_labels)

# Execute loading
(scaled_train_images, train_labels), (scaled_test_images, test_labels) = load_and_process_data()

print(f"Training set tensor shape: {scaled_train_images.shape}")
print(f"Test set tensor shape: {scaled_test_images.shape}")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## 2. Neural Network Architecture

We construct a Sequential CNN model with the following layers:
* **Conv2D**: 8 filters, 3x3 kernel, ReLU activation, 'SAME' padding.
* **MaxPooling2D**: Downsampling with a 2x2 window.
* **Flatten**: Converts 2D feature maps to 1D vectors.
* **Dense**: Two fully connected layers with 64 units each (ReLU).
* **Output**: Softmax layer with 10 units (representing digits 0-9).

In [None]:
def build_cnn_model(input_shape):
    """
    Constructs a Sequential Convolutional Neural Network (CNN).
    
    Architecture Design:
    - Feature Extraction Block: Conv2D + MaxPooling
    - Classifier Block: Flatten + Dense layers
    
    Args:
        input_shape (tuple): The shape of a single input image (H, W, C).
        
    Returns:
        tf.keras.Model: The uncompiled Keras model.
    """
    model = tf.keras.Sequential([
        # =================================================================
        # FEATURE EXTRACTION LAYERS
        # =================================================================
        # Conv2D: Learns spatial filters (edges, textures) from the input image.
        # - filters=8: We learn 8 different feature maps.
        # - kernel_size=(3,3): Standard size for local feature detection.
        # - padding='SAME': Output spatial dimensions match input dimensions.
        tf.keras.layers.Conv2D(8, (3, 3), padding='SAME', activation='relu', input_shape=input_shape),
        
        # MaxPooling2D: Downsamples the input representation.
        # This reduces computational cost and helps make the model robust to small translations.
        tf.keras.layers.MaxPooling2D((2, 2)),

        # =================================================================
        # CLASSIFICATION LAYERS
        # =================================================================
        # Flatten: Unrolls the 3D output of the convolutional part into a 1D vector.
        tf.keras.layers.Flatten(),

        # Dense (Hidden): Fully connected layers to interpret the features.
        # ReLU activation introduces non-linearity, allowing the model to learn complex patterns.
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),

        # Output Layer: 10 neurons correspond to the 10 digit classes (0-9).
        # Softmax activation converts raw scores (logits) into probabilities summing to 1.
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    return model

# Instantiate and visualize the model architecture
model = build_cnn_model(scaled_train_images[0].shape)
model.summary()

In [None]:
# ---------------------------------------------------------
# COMPILATION
# ---------------------------------------------------------
# Optimizer: 'Adam' is an adaptive learning rate optimization algorithm 
# that is generally efficient for computer vision tasks.
# Loss Function: 'sparse_categorical_crossentropy' is used because:
#   1. It is a multi-class classification problem.
#   2. Our targets (labels) are integers (e.g., 5), not one-hot vectors.
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# ---------------------------------------------------------
# TRAINING
# ---------------------------------------------------------
# We fit the model to the training data.
# Epochs: Number of full passes through the dataset.
print("Starting training process...")
history = model.fit(
    scaled_train_images, 
    train_labels, 
    epochs=5, 
    verbose=1  # Displays a progress bar
)

## 3. Training Performance

Visualizing the accuracy and loss over the training epochs helps identify if the model is learning correctly or overfitting.

In [None]:
# ---------------------------------------------------------
# LEARNING CURVES VISUALIZATION
# ---------------------------------------------------------
# We use Pandas to easily manage the history dictionary returned by Keras.
# The history object contains the loss and metric values for each epoch.
frame = pd.DataFrame(history.history)

# Initialize a figure with two subplots side-by-side
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# ---------------------------------------------------------
# ACCURACY PLOT
# ---------------------------------------------------------
# A rising curve indicates the model is correctly learning to classify the training data.
axes[0].plot(frame['accuracy'], label='Train Accuracy')
axes[0].set_title('Accuracy vs Epochs')
axes[0].set_xlabel('Epochs')
axes[0].set_ylabel('Accuracy')
axes[0].grid(True) # Adding grid for better readability

# ---------------------------------------------------------
# LOSS PLOT
# ---------------------------------------------------------
# The loss (error) should decrease over time. 
# A flat line would indicate the model has stopped learning (convergence or stuck in local minima).
axes[1].plot(frame['loss'], color='orange', label='Train Loss')
axes[1].set_title('Loss vs Epochs')
axes[1].set_xlabel('Epochs')
axes[1].set_ylabel('Loss')
axes[1].grid(True)

plt.tight_layout() # Adjusts subplot params so that subplots are nicely fit in the figure
plt.show()

## 4. Model Evaluation & Predictions

Finally, we assess the model's performance on the unseen test set and visualize specific predictions along with their confidence scores.

In [None]:
# ---------------------------------------------------------
# EVALUATION ON TEST SET
# ---------------------------------------------------------
# We assess the model on data it has never seen before to check for overfitting.
test_loss, test_accuracy = model.evaluate(scaled_test_images, test_labels, verbose=0)

print("-" * 30)
print(f"Final Test Loss: {test_loss:.4f}")
print(f"Final Test Accuracy: {test_accuracy*100:.2f}%")
print("-" * 30)

# ---------------------------------------------------------
# PREDICTION VISUALIZATION
# ---------------------------------------------------------
# Visualize model confidence on a few random samples

# Randomly select 4 images from the test set
num_test_images = scaled_test_images.shape[0]
random_inx = np.random.choice(num_test_images, 4)
random_test_images = scaled_test_images[random_inx, ...]
random_test_labels = test_labels[random_inx, ...]

# Get probability distributions
predictions = model.predict(random_test_images)

fig, axes = plt.subplots(4, 2, figsize=(16, 12))
fig.subplots_adjust(hspace=0.4, wspace=-0.2)

for i, (prediction, image, label) in enumerate(zip(predictions, random_test_images, random_test_labels)):
    # Plot the image
    axes[i, 0].imshow(np.squeeze(image), cmap='gray') # Remove channel dim for plotting
    axes[i, 0].axis('off')
    axes[i, 0].text(10., -1.5, f'True Label: {label}', fontsize=12, color='blue')
    
    # Plot the confidence bar chart
    axes[i, 1].bar(np.arange(len(prediction)), prediction, color='gray')
    axes[i, 1].set_xticks(np.arange(len(prediction)))
    
    # Highlight the predicted class
    pred_label = np.argmax(prediction)
    confidence = np.max(prediction)
    
    title = f"Prediction: {pred_label} ({confidence:.1%})"
    axes[i, 1].set_title(title, fontsize=12, fontweight='bold')
    
plt.show()