# MNIST Digit Classification with Feature Extraction and MLP Models

This notebook implements a digit classification task using the Reduced MNIST dataset. We explore three feature extraction techniques—Principal Component Analysis (PCA), Discrete Cosine Transform (DCT), and Autoencoders (AE)—and evaluate their performance with Multi-Layer Perceptron (MLP) models of varying architectures.

In [2]:
# Import necessary libraries
import os
import cv2
import keras
import time
import numpy as np
from sklearn.decomposition import PCA
from scipy.fft import dct
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras import regularizers
from keras.utils import to_categorical
from sklearn.utils import shuffle

2025-03-26 20:02:23.791981: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-26 20:02:23.885786: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-26 20:02:23.958267: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743012144.034971    5392 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743012144.057239    5392 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1743012144.210454    5392 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

## 1. Data Loading and Preprocessing

In this section, we load the Reduced MNIST dataset from the specified training and testing directories. The images are grayscale (28x28 pixels), and the folder names (0-9) serve as labels. We shuffle the data to ensure randomness.

In [3]:
# Define paths to training and testing data directories
train_data_dir = './Reduced MNIST Data/Reduced Training data'
test_data_dir = './Reduced MNIST Data/Reduced Testing data'

# Get list of subdirectories (each representing a digit class)
train_class_dirs = os.listdir(train_data_dir)
test_class_dirs = os.listdir(test_data_dir)

print("Training class directories:", train_class_dirs)
print("Testing class directories:", test_class_dirs)

# Initialize lists to store images and labels
train_images = []
train_labels = []

# Load training data
for digit_class in train_class_dirs:
    class_path = os.path.join(train_data_dir, digit_class)
    for image_file in os.listdir(class_path):
        image_path = os.path.join(class_path, image_file)
        # Read image in grayscale (0-255 pixel values)
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        train_images.append(image)
        train_labels.append(digit_class)  # Folder name is the label

# Convert lists to NumPy arrays
train_images = np.array(train_images)
train_labels = np.array(train_labels)

print("Training images shape:", train_images.shape)
print("Training labels shape:", train_labels.shape)

# Load testing data
test_images = []
test_labels = []

for digit_class in test_class_dirs:
    class_path = os.path.join(test_data_dir, digit_class)
    for image_file in os.listdir(class_path):
        image_path = os.path.join(class_path, image_file)
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        test_images.append(image)
        test_labels.append(digit_class)

# Convert lists to NumPy arrays
test_images = np.array(test_images)
test_labels = np.array(test_labels)

print("Testing images shape:", test_images.shape)
print("Testing labels shape:", test_labels.shape)

# Shuffle training and testing data for randomness
train_images, train_labels = shuffle(train_images, train_labels, random_state=4)
test_images, test_labels = shuffle(test_images, test_labels, random_state=4)

Training class directories: ['0', '7', '5', '2', '6', '8', '4', '9', '3', '1']
Testing class directories: ['0', '7', '5', '2', '6', '8', '4', '9', '3', '1']
Training images shape: (10000, 28, 28)
Training labels shape: (10000,)
Testing images shape: (2000, 28, 28)
Testing labels shape: (2000,)


## 2. Feature Extraction Techniques

We apply three feature extraction methods to reduce dimensionality and extract meaningful features from the 28x28 images (784 dimensions):
- **PCA**: Reduces dimensions while retaining 95% variance.
- **DCT**: Extracts frequency-based features using a 15x15 coefficient grid (225 components).
- **Autoencoder**: Learns a compressed representation (225 components) through an encoder-decoder network.

In [4]:
# --- PCA Feature Extraction ---
desired_variance = 0.95
pca_model = PCA(n_components=desired_variance)

# Flatten images to 784-dimensional vectors for PCA
train_images_flat = train_images.reshape(train_images.shape[0], 28 * 28)
test_images_flat = test_images.reshape(test_images.shape[0], 28 * 28)

# Fit PCA on training data and transform both sets
train_pca_features = pca_model.fit_transform(train_images_flat)
test_pca_features = pca_model.transform(test_images_flat)

print(f"Original dimensions: {28*28}, PCA dimensions: {train_pca_features.shape[1]}")

# --- DCT Feature Extraction ---
def extract_dct_features(images, num_components=225):
    """Extract DCT features from flattened images."""
    sqrt_img_size = int(np.sqrt(images.shape[1]))  # 28
    sqrt_components = int(np.sqrt(num_components))  # 15
    dct_features = []
    
    for img in images:
        # Reshape to 28x28 for 2D DCT
        img_2d = img.reshape(sqrt_img_size, sqrt_img_size)
        # Apply 2D DCT with orthogonal normalization
        dct_img = dct(dct(img_2d, axis=0, norm='ortho'), axis=1, norm='ortho')
        # Take top-left 15x15 coefficients and flatten
        dct_features.append(dct_img[:sqrt_components, :sqrt_components].flatten())
    
    return np.array(dct_features)

# Normalize images to [0, 1] range
train_images_normalized = train_images_flat / 255.0
test_images_normalized = test_images_flat / 255.0

# Extract DCT features
train_dct_features = extract_dct_features(train_images_normalized)
test_dct_features = extract_dct_features(test_images_normalized)

print("DCT training features shape:", train_dct_features.shape)
print("DCT testing features shape:", test_dct_features.shape)

# --- Autoencoder Feature Extraction ---
def extract_autoencoder_features(train_data, test_data, num_components=64, epochs=10, batch_size=64):
    """Train an autoencoder and extract features from the bottleneck layer."""
    input_dim = train_data.shape[1]  # 784, assuming MNIST-like data
    
    # Define encoder architecture
    input_layer = Input(shape=(input_dim,))
    encoded1 = Dense(512, activation='relu')(input_layer)  # 784 -> 512
    encoded2 = Dense(256, activation='relu')(encoded1)     # 512 -> 256
    encoded3 = Dense(128, activation='relu')(encoded2)     # 256 -> 128
    bottleneck = Dense(num_components, activation='relu')(encoded3)  # 128 -> num_components (default 64)
    
    # Define decoder architecture
    decoded1 = Dense(128, activation='relu')(bottleneck)   # num_components -> 128
    decoded2 = Dense(256, activation='relu')(decoded1)     # 128 -> 256
    decoded3 = Dense(512, activation='relu')(decoded2)     # 256 -> 512
    output_layer = Dense(input_dim, activation='sigmoid')(decoded3)  # 512 -> 784
    
    # Build and compile autoencoder
    autoencoder = Model(inputs=input_layer, outputs=output_layer)
    autoencoder.compile(optimizer='adam', loss='mean_squared_error')
    
    # Build encoder for feature extraction
    encoder = Model(inputs=input_layer, outputs=bottleneck)
    
    # Train the autoencoder
    autoencoder.fit(train_data, train_data,
                    epochs=epochs,
                    batch_size=batch_size,
                    shuffle=True,
                    validation_data=(test_data, test_data),
                    verbose=1)
    
    # Extract features
    train_ae_features = encoder.predict(train_data)
    test_ae_features = encoder.predict(test_data)
    
    return train_ae_features, test_ae_features, encoder, autoencoder

# Extract autoencoder features
train_ae_features, test_ae_features, ae_encoder, ae_model = extract_autoencoder_features(
    train_images_normalized, test_images_normalized, epochs=10
)

Original dimensions: 784, PCA dimensions: 262
DCT training features shape: (10000, 225)
DCT testing features shape: (2000, 225)


E0000 00:00:1743012151.149292    5392 cuda_executor.cc:1228] INTERNAL: CUDA Runtime error: Failed call to cudaGetRuntimeVersion: Error loading CUDA libraries. GPU will not be used.: Error loading CUDA libraries. GPU will not be used.
W0000 00:00:1743012151.150237    5392 gpu_device.cc:2341] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Epoch 1/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 23ms/step - loss: 0.1095 - val_loss: 0.0508
Epoch 2/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 27ms/step - loss: 0.0465 - val_loss: 0.0397
Epoch 3/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 14ms/step - loss: 0.0375 - val_loss: 0.0332
Epoch 4/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 18ms/step - loss: 0.0319 - val_loss: 0.0302
Epoch 5/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 23ms/step - loss: 0.0290 - val_loss: 0.0285
Epoch 6/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 16ms/step - loss: 0.0271 - val_loss: 0.0272
Epoch 7/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 14ms/step - loss: 0.0251 - val_loss: 0.0260
Epoch 8/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 13ms/step - loss: 0.0235 - val_loss: 0.0249
Epoch 9/10
[1m157/157[0m [32m

## 3. MLP Model Training and Evaluation

We train MLP models with 1, 2, and 3 hidden layers on each feature set (PCA, DCT, Autoencoder) and evaluate their performance based on accuracy, training time, evaluation time, and inference time.

In [None]:
def train_and_evaluate_mlp(features_train, labels_train, features_test, labels_test, 
                          hidden_layer_configs, input_dim, epochs, batch_size, method_name):
    """Train and evaluate MLP models with varying hidden layers."""
    results = {}
    # Convert labels to one-hot encoding
    labels_train_encoded = to_categorical(labels_train)
    labels_test_encoded = to_categorical(labels_test)
    
    for layer_sizes in hidden_layer_configs:
        # Define model using Functional API
        inputs = keras.Input(shape=(input_dim,))
        x = inputs
        for size in layer_sizes:
            x = Dense(size, activation='relu')(x)
        outputs = Dense(10, activation='softmax')(x)
        model = keras.Model(inputs=inputs, outputs=outputs)
        
        # Compile model
        model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
        
        # Train model and measure time
        start_time = time.time()
        model.fit(features_train, labels_train_encoded, epochs=epochs, 
                  batch_size=batch_size, shuffle=True, verbose=0)
        training_time = (time.time() - start_time)* 1000  # Convert to ms
        
        # Evaluate model
        eval_start = time.time()
        loss, accuracy = model.evaluate(features_test, labels_test_encoded, verbose=0)
        eval_time = (time.time() - eval_start)* 1000  # Convert to ms
        
        # Measure inference time for a single sample
        pred_start = time.time()
        model.predict(features_test[0].reshape(1, input_dim))
        inference_time = (time.time() - pred_start) * 1000  # Convert to ms
        
        # Store results
        model_key = f"MLP With {len(layer_sizes)} Hidden Layer{'s' if len(layer_sizes) > 1 else ''} ({method_name})"
        results[model_key] = {
            'training_time': round(training_time, 1),
            'evaluation_time': round(eval_time, 1),
            'inference_time': round(inference_time, 1),
            'total_time': round(training_time + eval_time, 1),
            'accuracy': round(accuracy * 100, 2)
        }
        
        # Display results
        print(f"----- {model_key} -----")
        print(f"Training Time: {results[model_key]['training_time']} s")
        print(f"Evaluation Time: {results[model_key]['evaluation_time']} s")
        print(f"Inference Time: {results[model_key]['inference_time']} ms")
        print(f"Total Time: {results[model_key]['total_time']} s")
        print(f"Test Accuracy: {results[model_key]['accuracy']} %\n")
    
    return results

# Define MLP architectures
hidden_layer_configs = [
    [512],          # 1 hidden layer
    [512, 256, 128], # 3 hidden layers
    [512, 256, 128, 64, 32]  # 5 hidden layers
]

# Train and evaluate MLPs for each feature set
pca_results = train_and_evaluate_mlp(train_pca_features, train_labels, test_pca_features, test_labels,
                                    hidden_layer_configs, train_pca_features.shape[1], epochs=40, batch_size=64, method_name="PCA")
dct_results = train_and_evaluate_mlp(train_dct_features, train_labels, test_dct_features, test_labels,
                                    hidden_layer_configs, train_dct_features.shape[1], epochs=40, batch_size=64, method_name="DCT")
ae_results = train_and_evaluate_mlp(train_ae_features, train_labels, test_ae_features, test_labels,
                                   hidden_layer_configs, train_ae_features.shape[1], epochs=40, batch_size=64, method_name="Autoencoder")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
----- MLP With 1 Hidden Layer (PCA) -----
Training Time: 20240.2 s
Evaluation Time: 209.6 s
Inference Time (single sample): 65.9 ms
Total Time: 20449.8 s
Test Accuracy: 96.2 %

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
----- MLP With 3 Hidden Layers (PCA) -----
Training Time: 30213.7 s
Evaluation Time: 286.3 s
Inference Time (single sample): 77.3 ms
Total Time: 30499.9 s
Test Accuracy: 95.5 %

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step
----- MLP With 5 Hidden Layers (PCA) -----
Training Time: 33127.4 s
Evaluation Time: 353.5 s
Inference Time (single sample): 98.4 ms
Total Time: 33480.9 s
Test Accuracy: 95.95 %

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step
----- MLP With 1 Hidden Layer (DCT) -----
Training Time: 18224.4 s
Evaluation Time: 212.4 s
Inference Time (single sample): 72.7 ms
Total Time: 18436.8 s
Test Accuracy: 97.3 %



## 4. Results Comparison

Here, we compare the performance of PCA, DCT, and Autoencoder features across the three MLP architectures.

In [6]:
print("\n--- Comparison of Feature Extraction Methods ---")
for layers in hidden_layer_configs:
    layer_name = f"MLP With {len(layers)} Hidden Layer{'s' if len(layers) > 1 else ''}"
    pca_key = f"{layer_name} (PCA)"
    dct_key = f"{layer_name} (DCT)"
    ae_key = f"{layer_name} (Autoencoder)"
    
    print(f"\n{layer_name} Comparison:")
    print(f"PCA - Accuracy: {pca_results[pca_key]['accuracy']}%, "
          f"Training: {pca_results[pca_key]['training_time']}ms, "
          f"Eval: {pca_results[pca_key]['evaluation_time']}ms, "
          f"Inference: {pca_results[pca_key]['inference_time']}ms, "
          f"Total: {pca_results[pca_key]['total_time']}ms")
    print(f"DCT - Accuracy: {dct_results[dct_key]['accuracy']}%, "
          f"Training: {dct_results[dct_key]['training_time']}ms, "
          f"Eval: {dct_results[dct_key]['evaluation_time']}ms, "
          f"Inference: {dct_results[dct_key]['inference_time']}ms, "
          f"Total: {dct_results[dct_key]['total_time']}ms")
    print(f"AE  - Accuracy: {ae_results[ae_key]['accuracy']}%, "
          f"Training: {ae_results[ae_key]['training_time']}ms, "
          f"Eval: {ae_results[ae_key]['evaluation_time']}ms, "
          f"Inference: {ae_results[ae_key]['inference_time']}ms, "
          f"Total: {ae_results[ae_key]['total_time']}ms")


--- Comparison of Feature Extraction Methods ---

MLP With 1 Hidden Layer Comparison:
PCA - Accuracy: 96.2%, Training: 20240.2ms, Eval: 209.6ms, Inference: 65.9ms, Total: 20449.8ms
DCT - Accuracy: 97.3%, Training: 18224.4ms, Eval: 212.4ms, Inference: 72.7ms, Total: 18436.8ms
AE  - Accuracy: 96.1%, Training: 12710.5ms, Eval: 234.1ms, Inference: 85.3ms, Total: 12944.7ms

MLP With 3 Hidden Layers Comparison:
PCA - Accuracy: 95.5%, Training: 30213.7ms, Eval: 286.3ms, Inference: 77.3ms, Total: 30499.9ms
DCT - Accuracy: 97.55%, Training: 30026.0ms, Eval: 322.3ms, Inference: 97.8ms, Total: 30348.3ms
AE  - Accuracy: 95.15%, Training: 24727.7ms, Eval: 611.9ms, Inference: 76.9ms, Total: 25339.7ms

MLP With 5 Hidden Layers Comparison:
PCA - Accuracy: 95.95%, Training: 33127.4ms, Eval: 353.5ms, Inference: 98.4ms, Total: 33480.9ms
DCT - Accuracy: 97.2%, Training: 31431.5ms, Eval: 319.1ms, Inference: 93.1ms, Total: 31750.6ms
AE  - Accuracy: 95.65%, Training: 28342.9ms, Eval: 267.0ms, Inference: 94.

## Conclusion

This notebook demonstrates the application of PCA, DCT, and Autoencoders for feature extraction on the Reduced MNIST dataset, followed by classification using MLPs. The comparison highlights trade-offs between accuracy, training time, and inference speed, allowing us to assess the suitability of each method for digit classification tasks.