# 👩‍💻 Classifying Handwritten Digits Using SVMs

## 📋 Overview
In this lab, you’ll train **Support Vector Machine (SVM)** classifiers with both **linear** and **RBF kernels** to classify digits from the **MNIST dataset**. You'll evaluate, tune, and compare the models to understand the critical role of kernel selection in SVM performance.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- Load and prepare image-based numerical data

- Train and evaluate linear and RBF kernel-SVMs

- Visualize and compare classification results

- Tune hyperparameters for SVM performance improvement


## Task 1: Load the MNIST Dataset
**Context:** Understand how digit images are represented numerically.

**Steps:**
1. Load the digits dataset from Scikit-learn.
2. Preview the shape of `X` (features) and `y` (target labels).
3. Visualize a few digit images with `matshow()`.

**Prompting Questions:**

- What does each row in `X` represent?
- How is pixel intensity encoded?

💡 **Tip:** Pixel values typically range from 0 to 16 — we'll normalize them next.

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, ConfusionMatrixDisplay

# Load MNIST digits dataset
digits = datasets.load_digits()

# Features and target
X = digits.data
y = digits.target

# Preview dataset
# <your code here>

## Task 2: Splitting Data
**Context:** Separate data into training and testing subsets.

**Steps:**

1. Use `train_test_split()` to divide data (e.g., 80% train, 20% test).
2. Use stratification `(stratify=y)` to maintain balanced class distributions.

**Prompting Questions:**

- Why is stratified sampling important in classification tasks?

💡 **Tip:** Set `random_state=42` for reproducibility.

In [None]:
# Split the data into training and testing sets

## Task 3: Data Preparation
**Context:** Normalize features to help SVM training.

**Steps:**

1. Use `StandardScaler` to normalize pixel values (center the data to mean 0 and scale to unit variance).
2. Standardization helps create more stable and fairer decision boundaries.

**Prompting Questions:**

- Are features roughly normalized?
- Why does SVM performance improve with scaling?

💡 **Tip:**`StandardScaler` or simple division by 16 can work — SVMs are sensitive to feature scales.

In [None]:
# Normalize the pixel intensity features

## Task 4: Train SVM with a Linear Kernel
**Context:** Baseline SVM model with linear decision boundaries.

**Steps:**

1. Create and train an `SVC(kernel='linear')`.
2. Fit the model to the training data.
3. Measure training time if interested.

**Prompting Questions:**

- How fast was training with a linear kernel?
- How complex are the decision boundaries?

In [None]:
# Train an SVM with a linear kernel

## Task 5: Evaluate Linear SVM
**Context:** Assess how well the linear SVM performs.

**Steps:**

1. Predict labels for the test set.
2. Calculate accuracy and F1-score.
3. Generate a confusion matrix.

**Prompting Questions:**

- Which digits are most commonly confused?
- Is overall performance satisfactory?

💡 **Tip:** `classification_report()` gives precision, recall, and F1 easily.

In [None]:
# Evaluate the linear kernel SVM

## Task 6: Train SVM with an RBF Kernel
**Context:** Move to a more flexible, non-linear SVM.

**Steps:**

1. Create and train an `SVC(kernel='rbf')`.
2. Fit the model to the training data.
3. Optionally set parameters like `C=1.0` and `gamma='scale'` initially.

**Prompting Questions:**

- Was RBF slower or faster to train?
- Does the RBF kernel capture non-linear patterns better?

In [None]:
# Train an SVM with an RBF kernel

## Task 7: Evaluate and Compare SVM Kernels 
**Context:** Compare performance between linear and RBF kernels.

**Steps:**

1. Predict with the RBF model on test data.
2. Calculate accuracy and F1-score again.
3. Compare confusion matrices between kernels.

**Prompting Questions:**

- Which digits benefit most from non-linear decision boundaries?
- Is there a noticeable improvement?

In [None]:
# Evaluate and compare the RBF kernel SVM

## Task 8: Tune Hyperparameters (Optional Extension)
**Context:** Further boost performance with careful tuning.

**Steps:**

1. Use `GridSearchCV` to search over `C` and `gamma values`.
2. Find the best model and re-evaluate on test data.

**Prompting Questions:**

- How sensitive is model performance to C and gamma?
- Is the search space wide enough?

💡 **Tip:** Try `C: [0.1, 1, 10` and `gamma: ['scale', 0.01, 0.001]` for a quick grid.

In [None]:
# Tune hyperparameters for the RBF kernel SVM

## Task 9: Reflect on Kernel Impact
**Context:** Understand why kernels matter.

**Steps:**

1. Consider advantages and trade-offs of linear vs RBF kernels.
2. Reflect on model performance vs computational cost.

**Prompting Questions:**

- When would you prefer a linear kernel?
- When is RBF worth the extra computational cost?

In [None]:
# Reflect on Kernel Impact

## ✅ Success Checklist
- Dataset loaded and normalized
- Train/test split completed
- Linear SVM trained, evaluated, confusion matrix created
- RBF SVM trained, evaluated, confusion matrix created
- Comparison and reflection documented

## 🔍 Common Issues & Solutions
**Problem:** Training takes too long

 **Solution:** Try a smaller subset or adjust hyperparameters (especially C and gamma)
 
**Problem:** Poor accuracy with linear kernel

 **Solution:** RBF may better capture non-linear patterns in image data
 
**Problem:** Confusion matrix hard to read

 **Solution:** Use ConfusionMatrixDisplay from sklearn.metrics

## 🔑 Key Points
- SVMs with linear kernels are faster but limited in flexibility
- RBF kernels capture complex relationships but need careful tuning
- Scaling features is critical for SVM performance

## 💻Exemplar Solution
After completing this activity (or if you get stuck!), take a moment to review the exemplar solution. This sample solution can offer insights into different techniques and approaches.
Reflect on what you can learn from the exemplar solution to improve your coding skills.
Remember, multiple solutions can exist for some problems; the goal is to learn and grow as a programmer by exploring various approaches.
Use the exemplar solution as a learning tool to enhance your understanding and refine your approach to coding challenges.


<details>

<summary><strong>Click HERE to see an exemplar solution</strong></summary>    
    
```python

# -------------------------------
# Task 1: Load and Explore MNIST Dataset
# -------------------------------

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, ConfusionMatrixDisplay

# Load the digits dataset
digits = datasets.load_digits()

# Features and labels
X = digits.data
y = digits.target

# Preview the dataset
print(X.shape)  # (1797, 64)
print(y.shape)  # (1797,)
plt.gray()
plt.matshow(digits.images[0])
plt.title(f"Label: {digits.target[0]}")
plt.show()

# -------------------------------
# Task 2: Splitting Data
# -------------------------------

from sklearn.model_selection import train_test_split

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y # Use original X here
)
print("Training set shape:", X_train.shape) # Optional prints
print("Testing set shape:", X_test.shape)

# -------------------------------
# Task 3: Data Preparation (Normalize Features)
# -------------------------------

# Normalize the data using StandardScaler
scaler = StandardScaler()
# Fit the scaler ONLY on the training data and transform training data
X_train_scaled = scaler.fit_transform(X_train)
# Transform the test data using the scaler fitted on training data
X_test_scaled = scaler.transform(X_test)

# Optional: Print shapes after scaling (should be the same)
print("Training set shape after scaling:", X_train_scaled.shape)
print("Testing set shape after scaling:", X_test_scaled.shape)


# -------------------------------
# Task 4: Train SVM with a Linear Kernel
# -------------------------------

# Train SVM with a linear kernel
svc_linear = SVC(kernel='linear', random_state=42)
# Fit using the SCALED training data
svc_linear.fit(X_train_scaled, y_train)

# -------------------------------
# Task 5: Evaluate Linear SVM
# -------------------------------

# Predict and evaluate using the SCALED test data
y_pred_linear = svc_linear.predict(X_test_scaled)

# Accuracy and Classification Report
print("\nLinear Kernel SVM Accuracy:", accuracy_score(y_test, y_pred_linear))
print(classification_report(y_test, y_pred_linear))

# Confusion Matrix
cm_linear = confusion_matrix(y_test, y_pred_linear)
# Use the display class directly
disp = ConfusionMatrixDisplay(confusion_matrix=cm_linear)
disp.plot(cmap='Blues')
plt.title('Confusion Matrix - Linear SVM')
plt.show()

# -------------------------------
# Task 6: Train SVM with an RBF Kernel
# -------------------------------

# Train SVM with RBF kernel
# Optional initial parameters (as per guide)
# svc_rbf = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svc_rbf = SVC(kernel='rbf', random_state=42) # Using defaults as in exemplar

# Fit using the SCALED training data
svc_rbf.fit(X_train_scaled, y_train)

# -------------------------------
# Task 7: Evaluate and Compare SVM Kernels
# -------------------------------

# Predict and evaluate RBF SVM using the SCALED test data
y_pred_rbf = svc_rbf.predict(X_test_scaled)

# Accuracy and Classification Report
print("\nRBF Kernel SVM Accuracy:", accuracy_score(y_test, y_pred_rbf))
print(classification_report(y_test, y_pred_rbf))

# Confusion Matrix
cm_rbf = confusion_matrix(y_test, y_pred_rbf)
# Use the display class directly
disp = ConfusionMatrixDisplay(confusion_matrix=cm_rbf)
disp.plot(cmap='Purples')
plt.title('Confusion Matrix - RBF SVM')
plt.show()

# -------------------------------
# Task 8: Tune Hyperparameters (Optional Extension)
# -------------------------------

# Set up grid search for hyperparameter tuning
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': ['scale', 0.01, 0.001]
}

# Instantiate GridSearchCV with RBF kernel SVM
# Note: The SVC estimator here should be one that will *also* be scaled if used independently,
# but GridSearchCV handles fitting on folds correctly. The key is X_train_scaled input.
grid_search = GridSearchCV(SVC(kernel='rbf', random_state=42), param_grid, cv=3)

# Fit Grid Search using the SCALED training data
grid_search.fit(X_train_scaled, y_train)

# Best parameters and performance
print("\nBest parameters from Grid Search:", grid_search.best_params_)
print("Best cross-validation score (on training data):", grid_search.best_score_)

# Evaluate best model found by Grid Search using the SCALED test data
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test_scaled)

print("\nTuned RBF SVM Test Accuracy:", accuracy_score(y_test, y_pred_best))
print(classification_report(y_test, y_pred_best))

# -------------------------------
# Task 9: Reflect on Kernel Impact
# -------------------------------

"""Reflections:
- Linear kernel SVM performed reasonably well (~95% accuracy).
- RBF kernel significantly improved classification (~98%+ accuracy).
- Non-linear kernels like RBF capture more complex boundaries between digit classes.
- RBF kernel requires more computational power but achieves better results.
- Hyperparameter tuning (C and gamma) further optimized the RBF model for maximum accuracy.
- Visualized confusion matrices show that some digits (e.g., 3 vs 5) were better separated with RBF than linear.
"""

```