# MNIST Classification: Demo Notebook
1. **Random Forest (RF)** - Traditional machine learning model.
2. **Feed-Forward Neural Network (NN)** - A simple neural network.
3. **Convolutional Neural Network (CNN)** - Designed for image processing.
Goals:
1. Load and preprocess the MNIST dataset  
2. Train and evaluate each model  
3. Compare performance metrics  
4. Test models with edge cases (e.g., noisy, rotated images)  


In [9]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [10]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

In [11]:
from mnist_classifier import MnistClassifier


Used libraries:
- `numpy`, `matplotlib`, `seaborn` - Data handling and visualization
- `torch`, `torchvision` - Deep learning models (NN and CNN)
- `sklearn.ensemble.RandomForestClassifier` - Random Forest model
- `sklearn.metrics.accuracy_score` - Accuracy evaluation
- `MnistClassifier` - custom model wrapper

Before training the models, some requirement must be met:
1. Loading the MNIST dataset.
2. Normalizing pixel values (0 to 1 range).
3. Flattening images (for Random Forest).
4. Converting data into PyTorch tensors (for NN and CNN).


In [12]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_data = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_data = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

100.0%
100.0%
100.0%
100.0%


In [13]:
X_train = train_data.data.numpy().reshape(len(train_data), -1) / 255.0
y_train = train_data.targets.numpy()
X_test = test_data.data.numpy().reshape(len(test_data), -1) / 255.0
y_test = test_data.targets.numpy(

SyntaxError: incomplete input (1197704416.py, line 4)

In [None]:
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)

In [None]:
plt.imshow(train_data.data[0], cmap='gray')
plt.title(f"Label: {train_data.targets[0]}")
plt.show()

We train Random Forest on flattened MNIST images and evaluate its accuracy on test data.

In [None]:
rf_classifier = MnistClassifier("rf")
rf_classifier.train(X_train, y_train)

y_pred_rf = rf_classifier.predict(X_test)
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))


In [None]:
nn_classifier = MnistClassifier("nn")
nn_classifier.train(train_loader)
y_pred_nn = nn_classifier.predict(test_loader)
print("Feed-Forward NN Accuracy:", accuracy_score(y_test, y_pred_nn))

we train CNN on raw MNIST images and compare its performance to RF and NN.

In [None]:
cnn_classifier = MnistClassifier("cnn")
cnn_classifier.train(train_loader)
y_pred_cnn = cnn_classifier.predict(test_loader)
print("CNN Accuracy:", accuracy_score(y_test, y_pred_cnn))

# Evaluating Edge Cases to see how models generalize to unseen variations
1. **Noisy images** - Blurred or corrupted digits.
2. **Rotated images** - Handwritten digits tilted at different angles.
3. **Partially erased digits** - Incomplete numbers.

In [None]:
import cv2
def add_noise(image):
    noise = np.random.normal(0, 0.3, image.shape)
    return np.clip(image + noise, 0, 1)
original_img = test_data.data[1].numpy() / 255.0
noisy_img = add_noise(original_img)

plt.subplot(1, 2, 1)
plt.imshow(original_img, cmap='gray')
plt.title("Original")

plt.subplot(1, 2, 2)
plt.imshow(noisy_img, cmap='gray')
plt.title("Noisy")

plt.show()


# Conclusion
1. Random Forest is fast but less effective on images.
2. Feed-Forward NN performs well but struggles with spatial information.
3. CNN achieves the best accuracy as it captures spatial patterns effectively.
