# MNIST Classifier Demo
This notebook demonstrates the implementation of three MNIST classifiers:
1. Random Forest
2. Feed-Forward Neural Network
3. Convolutional Neural Network (CNN)

Each model implements the `MnistClassifierInterface` with methods:
- `train`
- `predict`

In [37]:
# ================================
# 1. Import libraries
# ================================

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import copy
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import accuracy_score, classification_report
from abc import ABC, abstractmethod

In [38]:
# ================================
# 2. Data preparation
# ================================

# Transform images to tensors for PyTorch models
transform = transforms.ToTensor()

# Download MNIST dataset
train_dataset = datasets.MNIST(root="data", train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root="data", train=False, download=True, transform=transform)

# DataLoader for NN and CNN
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# For RandomForest — convert to numpy arrays
X_train_rf = train_dataset.data.view(-1, 28*28).numpy().astype(np.float32) / 255.0
y_train_rf = train_dataset.targets.numpy().astype(np.int64)
X_test_rf  = test_dataset.data.view(-1, 28*28).numpy().astype(np.float32) / 255.0
y_test_rf  = test_dataset.targets.numpy().astype(np.int64)

In [39]:
# ================================
# 3. MnistClassifierInterface
# ================================
class MnistClassifierInterface(ABC):
    @abstractmethod
    def fit(self, train_data, train_labels=None):
        pass

    @abstractmethod
    def predict(self, x):
        pass

In [None]:
# ================================
# 4. Random Forest Model with Randomized Search
# ================================

# Random search parameters
param_dist = {
    'n_estimators': [150, 200],
    'max_depth': [None, 30],
    'max_features': ['sqrt'],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'bootstrap': [True],
    'criterion': ['gini', 'entropy']
}

# Model
rf = RandomForestClassifier(random_state=42)

# Using RandomizedSearchCV is faster because it only tests n_iter combinations
random_search = RandomizedSearchCV(estimator=rf,
                                   param_distributions=param_dist,
                                   n_iter=20,  # 20 random combinations
                                   cv=3,
                                   scoring='accuracy',
                                   verbose=1,
                                   n_jobs=-1,
                                   random_state=42)

# Model training
random_search.fit(X_train_rf, y_train_rf)

# Result
print("The best parameters:", random_search.best_params_)
print("Best accuracy (CV):", random_search.best_score_)

# Test set score
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test_rf)
test_acc = accuracy_score(y_test_rf, y_pred)
print(f"Accuracy on the test: {test_acc:.4f}")

Fitting 3 folds for each of 20 candidates, totalling 60 fits
Найкращі параметри: {'n_estimators': 150, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'max_depth': None, 'criterion': 'gini', 'bootstrap': True}
Найкраща точність (CV): 0.9655666666666667
Точність на тесті: 0.9706


In [17]:
# ================================
# Random Forest Implementation
# ================================

class RandomForestMnist(MnistClassifierInterface):
    def __init__(self):
        self.model = RandomForestClassifier(
            n_estimators=150, min_samples_split=2, min_samples_leaf=1,
            max_features='sqrt', max_depth= None,
            criterion='gini', bootstrap=True,
            random_state=42
        )

    def fit(self, train_data, train_labels=None, **kwargs):
        if train_labels is None:
            raise ValueError("train_labels required for RandomForestMnist.fit")
        self.model.fit(train_data, train_labels)

    def predict(self, x, **kwargs):
        return self.model.predict(x)

In [18]:
# ================================
# 5. Base class for trainable PyTorch models with training, validation, and early stopping
# ================================

class TrainableNN(nn.Module):
    def __init__(self):
        super().__init__()

    def fit(self, train_data, val_data=None, epochs=50, lr=0.001, device="cpu", early_stopping_patience=5):
        self.to(device)
        optimizer = torch.optim.Adam(self.parameters(), lr=lr)
        loss_fn = nn.CrossEntropyLoss()

        best_loss = float('inf')
        best_model_wts = copy.deepcopy(self.state_dict())
        patience_counter = 0

        for epoch in range(epochs):
            # Train
            self.train()
            running_loss = 0.0
            num_batches = 0
            for images, labels in train_data:
                images, labels = images.to(device), labels.to(device)
                optimizer.zero_grad()
                outputs = self(images)
                loss = loss_fn(outputs, labels)
                loss.backward()
                optimizer.step()
                running_loss += loss.item()
                num_batches += 1
            train_loss = running_loss / max(1, num_batches)

            # Validation
            if val_data is not None:
                self.eval()
                val_loss = 0.0
                val_batches = 0
                with torch.no_grad():
                    for images, labels in val_data:
                        images, labels = images.to(device), labels.to(device)
                        outputs = self(images)
                        val_loss += loss_fn(outputs, labels).item()
                        val_batches += 1
                val_loss /= max(1, val_batches)

                print(f"Epoch [{epoch+1}/{epochs}] — Train Loss: {train_loss:.4f} — Val Loss: {val_loss:.4f}")

                # Save best
                if val_loss < best_loss:
                    best_loss = val_loss
                    best_model_wts = copy.deepcopy(self.state_dict())
                    patience_counter = 0
                else:
                    patience_counter += 1
                    if patience_counter >= early_stopping_patience:
                        print(f"Early stopping at epoch {epoch+1}")
                        break
            else:
                print(f"Epoch [{epoch+1}/{epochs}] — Train Loss: {train_loss:.4f}")

        # Restore best
        self.load_state_dict(best_model_wts)
        print(f"\nRestored best model (Val Loss: {best_loss:.4f})")

    def predict(self, x, device="cpu"):
        # Make predictions on a batch or full dataset
        self.eval()
        self.to(device)
        x = x.to(device)
        with torch.no_grad():
            outputs = self(x)
            return torch.argmax(outputs, dim=1)

In [19]:
# ================================
# 6. Feed-Forward Neural Network
# ================================

class FeedForwardNN(TrainableNN):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 512)
        self.bn1 = nn.BatchNorm1d(512)
        self.fc2 = nn.Linear(512, 256)
        self.bn2 = nn.BatchNorm1d(256)
        self.fc3 = nn.Linear(256, 10)
        self.dropout = nn.Dropout(0.3)

    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = F.relu(self.bn1(self.fc1(x)))
        x = self.dropout(x)
        x = F.relu(self.bn2(self.fc2(x)))
        x = self.dropout(x)
        return self.fc3(x)

In [29]:
# 3тя спроба покращити результат
class CNNClassifierKerasStyle(TrainableNN):
    def __init__(self):
        super().__init__()
        # --- Convolutional layers ---
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)

        # --- Pooling ---
        self.pool = nn.MaxPool2d(2, 2)

        # --- Dropout ---
        self.dropout_conv = nn.Dropout(0.25)
        self.dropout_fc = nn.Dropout(0.3)

        # --- Fully connected ---
        self.fc1 = nn.Linear(128 * 7 * 7, 256)
        self.fc2 = nn.Linear(256, 10)
        self.dropout_fc2 = nn.Dropout(0.2)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        x = self.dropout_conv(x)
        x = self.pool(F.relu(self.bn3(self.conv3(x))))
        x = self.dropout_conv(x)

        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = self.dropout_fc(x)
        x = self.dropout_fc2(x)
        x = self.fc2(x)
        return x

In [24]:
# ================================
# 7. Convolutional Neural Network
# ================================

class CNNClassifierKerasStyle(TrainableNN):
    def __init__(self):
        super().__init__()
        # Conv layers
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)

        # Pooling
        self.pool = nn.MaxPool2d(2, 2)

        # Dropout
        self.dropout_conv = nn.Dropout(0.25)
        self.dropout_fc = nn.Dropout(0.5)

        # Fully connected
        self.fc1 = nn.Linear(128 * 7 * 7, 256)  # 128 канали × 7 × 7
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))  # 28 → 14
        x = self.pool(F.relu(self.bn2(self.conv2(x))))  # 14 → 7
        x = F.relu(self.bn3(self.conv3(x)))             # 7 → 7
        x = self.dropout_conv(x)
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = self.dropout_fc(x)
        return self.fc2(x)

In [30]:
# ==================
# 8. MnistClassifier
# ==================

class MnistClassifier:
    def __init__(self, algorithm):
        if algorithm == "rf":
            self.model = RandomForestMnist()
        elif algorithm == "nn":
            self.model = FeedForwardNN()
        elif algorithm == "cnn":
            self.model = CNNClassifierKerasStyle()
        else:
            raise ValueError("Unknown algorithm")

    def fit(self, train_data, train_labels=None, **kwargs):
        device = kwargs.get("device", "cpu")
        if isinstance(self.model, nn.Module):
            self.model.to(device)
            self.model.fit(
                train_data,
                val_data=kwargs.get("val_data"),
                epochs=kwargs.get("epochs", 50),
                lr=kwargs.get("lr", 0.001),
                device=device,
                early_stopping_patience=kwargs.get("early_stopping_patience", 5)
            )
        else:
            if train_labels is None:
                raise ValueError("train_labels required for non-torch models")
            self.model.fit(train_data, train_labels)

    def predict(self, x, device="cpu"):
        if isinstance(self.model, nn.Module):
            return self.model.predict(x, device=device)
        else:
            return self.model.predict(x)

In [32]:
# ================================
# 9. Evaluation Function
# ================================

def evaluate(classifier: MnistClassifier, loader, device="cpu"):
    correct, total = 0, 0
    for images, labels in loader:
        images, labels = images.to(device), labels.to(device)
        preds = classifier.predict(images, device=device)
        correct += (preds == labels).sum().item()
        total += labels.size(0)
    return correct / total

In [None]:
# ================================
# 10. Example Usage
# ================================

# ---- Random Forest ----
rf = RandomForestMnist()
rf.fit(X_train_rf, y_train_rf)

rf_preds = rf.predict(X_test_rf)
rf_acc = accuracy_score(y_test_rf, rf_preds)
print(f"Random Forest Accuracy: {rf_acc:.4f}")

print("\nClassification Report:")
print(classification_report(y_test_rf, rf_preds))

Random Forest Accuracy: 0.9706

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.99      0.98       980
           1       0.99      0.99      0.99      1135
           2       0.96      0.97      0.97      1032
           3       0.96      0.96      0.96      1010
           4       0.98      0.97      0.98       982
           5       0.97      0.97      0.97       892
           6       0.98      0.98      0.98       958
           7       0.97      0.96      0.97      1028
           8       0.96      0.96      0.96       974
           9       0.96      0.95      0.96      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000



Random Forest serves as a baseline classical ML model.
The model performs very well with an overall accuracy of 97% on the test set.

The F1-scores for all classes range between 0.96 and 0.99, indicating balanced precision and recall across digits.

Some digits are easier to recognize, e.g., 0, 1, and 6 have high recall values (>0.98), meaning the model correctly identifies almost all instances of these digits.

Some digits are slightly more challenging, e.g., 9 has a recall of 0.95, showing that it is occasionally misclassified.

In [22]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
# --- Feed-Forward Neural Network ---

nn_model = MnistClassifier("nn")
nn_model.fit(train_loader, val_data=test_loader, epochs=30, lr=0.001, device=device, early_stopping_patience=7)
nn_acc = evaluate(nn_model, test_loader, device=device)
print(f"Feed-Forward NN Accuracy: {nn_acc:.4f}")

Epoch [1/30] — Train Loss: 0.2400 — Val Loss: 0.1019
Epoch [2/30] — Train Loss: 0.1212 — Val Loss: 0.0773
Epoch [3/30] — Train Loss: 0.0967 — Val Loss: 0.0664
Epoch [4/30] — Train Loss: 0.0793 — Val Loss: 0.0620
Epoch [5/30] — Train Loss: 0.0687 — Val Loss: 0.0578
Epoch [6/30] — Train Loss: 0.0651 — Val Loss: 0.0592
Epoch [7/30] — Train Loss: 0.0561 — Val Loss: 0.0572
Epoch [8/30] — Train Loss: 0.0499 — Val Loss: 0.0544
Epoch [9/30] — Train Loss: 0.0478 — Val Loss: 0.0609
Epoch [10/30] — Train Loss: 0.0424 — Val Loss: 0.0561
Epoch [11/30] — Train Loss: 0.0424 — Val Loss: 0.0581
Epoch [12/30] — Train Loss: 0.0377 — Val Loss: 0.0539
Epoch [13/30] — Train Loss: 0.0356 — Val Loss: 0.0498
Epoch [14/30] — Train Loss: 0.0329 — Val Loss: 0.0515
Epoch [15/30] — Train Loss: 0.0316 — Val Loss: 0.0511
Epoch [16/30] — Train Loss: 0.0297 — Val Loss: 0.0522
Epoch [17/30] — Train Loss: 0.0280 — Val Loss: 0.0556
Epoch [18/30] — Train Loss: 0.0252 — Val Loss: 0.0561
Epoch [19/30] — Train Loss: 0.0275 — 

Training loss steadily decreased from 0.2400 to 0.0246, indicating good learning of the patterns in the training set.

Validation loss reached its minimum at 0.0498, showing the model generalizes well to unseen data without significant overfitting.

The final accuracy on the test set is 98.54%.

In [33]:
# --- Convolutional Neural Network ---

cnn_model = MnistClassifier("cnn")
cnn_model.fit(train_loader, val_data=test_loader, epochs=100, lr=0.0008, device=device, early_stopping_patience=10)
cnn_acc = evaluate(cnn_model, test_loader, device=device)
print(f"CNN Accuracy: {cnn_acc:.4f}")

Epoch [1/100] — Train Loss: 0.1754 — Val Loss: 0.0556
Epoch [2/100] — Train Loss: 0.0786 — Val Loss: 0.0274
Epoch [3/100] — Train Loss: 0.0615 — Val Loss: 0.0250
Epoch [4/100] — Train Loss: 0.0523 — Val Loss: 0.0241
Epoch [5/100] — Train Loss: 0.0457 — Val Loss: 0.0269
Epoch [6/100] — Train Loss: 0.0390 — Val Loss: 0.0212
Epoch [7/100] — Train Loss: 0.0364 — Val Loss: 0.0175
Epoch [8/100] — Train Loss: 0.0334 — Val Loss: 0.0178
Epoch [9/100] — Train Loss: 0.0283 — Val Loss: 0.0178
Epoch [10/100] — Train Loss: 0.0270 — Val Loss: 0.0202
Epoch [11/100] — Train Loss: 0.0250 — Val Loss: 0.0168
Epoch [12/100] — Train Loss: 0.0241 — Val Loss: 0.0180
Epoch [13/100] — Train Loss: 0.0225 — Val Loss: 0.0155
Epoch [14/100] — Train Loss: 0.0193 — Val Loss: 0.0179
Epoch [15/100] — Train Loss: 0.0201 — Val Loss: 0.0220
Epoch [16/100] — Train Loss: 0.0169 — Val Loss: 0.0165
Epoch [17/100] — Train Loss: 0.0170 — Val Loss: 0.0174
Epoch [18/100] — Train Loss: 0.0149 — Val Loss: 0.0176
Epoch [19/100] — Tr

## Results Summary

| Model | Test Accuracy |
|--------|----------------|
| Random Forest | 0.9706 |
| Feed-Forward NN | 0.9854 |
| CNN | 0.9956 |

CNN achieved the highest accuracy on MNIST, confirming the advantage of convolutional architectures for image data.

## Edge Cases

1. **Empty input tensor** → model should raise a clear error instead of crashing.  
2. **Wrong image size (e.g., 32×32)** → should be reshaped or rejected before training.  
3. **Too few epochs** → underfitting, accuracy <90%.  
4. **Too high learning rate** → unstable training, loss oscillates.  
5. **No early stopping** → risk of overfitting after ~20 epochs.

In [34]:
# Edge case:

# 1. Empty input tensor — expect runtime error
try:
    cnn_model.predict(torch.tensor([]))
except Exception as e:
    print("Handled empty input:", e)

Handled empty input: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [0]


In [35]:
# 2. Wrong image size (32x32) — should trigger shape mismatch
wrong_input = torch.randn(1, 1, 32, 32)
try:
    cnn_model.predict(wrong_input)
except Exception as e:
    print("Handled wrong shape:", e)

Handled wrong shape: mat1 and mat2 shapes cannot be multiplied (1x8192 and 6272x256)


In [36]:
# 3. Too high learning rate
small_cnn = MnistClassifier("cnn")
small_cnn.fit(train_loader, val_data=test_loader, epochs=3, lr=0.1, device=device)

Epoch [1/3] — Train Loss: 6.1067 — Val Loss: 2.3206
Epoch [2/3] — Train Loss: 2.3103 — Val Loss: 2.3109
Epoch [3/3] — Train Loss: 2.3095 — Val Loss: 2.3111

Restored best model (Val Loss: 2.3109)
