<img src="../images/cover.jpg" width="1920"/>

# Advanced Supervised Learning: Multi-class Classification

## Introduction
Multi-class classification is a supervised learning task where the model needs to classify instances into one of three or more classes. Unlike binary classification, the output has more than two possible class labels.

### Iris Flower Classification
The classic Iris dataset is a perfect example of multi-class classification:
- Input features: sepal length, sepal width, petal length, petal width
- Output classes: Setosa, Versicolor, Virginica

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd

In [None]:
# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names

In [None]:
# Create a DataFrame for better visualization
iris_df = pd.DataFrame(X, columns=feature_names)
iris_df["species"] = pd.Categorical.from_codes(y, target_names)
iris_df["target"] = y
# or
# iris_df["species"] = [target_names[y_] for y_ in y]

In [None]:
print("Dataset Shape:", X.shape)
print("\nFeature Names:", feature_names)
print("\nTarget Classes:", target_names)
print("\nClass Distribution:")
print(iris_df["species"].value_counts())

In [None]:
iris_df.head()

## 1. Support Vector Machines (SVM)

### Introduction
Support Vector Machines are powerful algorithms that create optimal hyperplanes to separate different classes. For multi-class problems, SVM uses either:
- One-vs-One (OvO): Creates binary classifiers for each pair of classes
- One-vs-Rest (OvR): Creates binary classifiers for each class against all others

<img src="../images/suport_vector_machine.png" width="1920"/>

### When to Use SVM
- Medium-sized datasets (up to ~10,000 samples)
- High-dimensional data
- Complex decision boundaries needed
- When you need probability estimates
- When you have non-linear relationships (using kernels)

[More on SVM](https://www.geeksforgeeks.org/support-vector-machine-algorithm/)

In [None]:
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

len(X_train), len(X_test)

In [None]:
# Standardize the data
scaler = StandardScaler()


X_train = scaler.fit_transform(X_train)


X_test = scaler.transform(X_test)

In [None]:
# Initialize and train the SVM model
svm_model = SVC(C=100, kernel="rbf", gamma="auto", random_state=42)


svm_model.fit(X_train, y_train)

In [None]:
# Make predictions
y_pred = svm_model.predict(X_test)

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

In [None]:
# Perform cross-validation
cv_scores = cross_val_score(svm_model, X, y, cv=5, scoring="accuracy")

In [None]:
# Print results
print("Cross-validation scores:", cv_scores)
print("Mean accuracy:", np.mean(cv_scores))

`GridSearchCV` in Scikit-learn is a hyperparameter optimization technique that systematically works through multiple combinations of hyperparameter values to find the optimal configuration for a machine learning model. It is essentially a brute-force approach to hyperparameter tuning.

In [None]:
# Create and train SVM classifier
model = SVC()

# Perform grid search
param_grid = {
    "C": [0.1, 1, 10, 100],
    "kernel": ["sigmoid", "poly", "rbf"],
    "gamma": [
        0.1,
        0.2,
        0.01,
        "scale",
        "auto",
    ],
}

grid_search = GridSearchCV(model, param_grid, cv=5, scoring="accuracy", verbose=1)

grid_search.fit(X, y)

print("Best parameters:", grid_search.best_params_)
print("Best cross-validation score:", grid_search.best_score_)

In [None]:
svm_model = SVC(C=0.1, gamma="scale", kernel="poly", random_state=42)
svm_model.fit(X_train, y_train)

In [None]:
# Perform cross-validation
cv_scores = cross_val_score(svm_model, X, y, cv=5, scoring="accuracy")

In [None]:
# Print results
print("Cross-validation scores:", cv_scores)
print("Mean accuracy:", np.mean(cv_scores))

In [None]:
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(10, 8))
sns.heatmap(
    cm,
    annot=True,
    fmt="d",
    cmap="Blues",
    xticklabels=target_names,
    yticklabels=target_names,
)
plt.title("Confusion Matrix")
plt.ylabel("True Label")
plt.xlabel("Predicted Label")
plt.show()

This confusion matrix visualizes the performance of the model on three classes: "setosa," "versicolor," and "virginica." Here's how to interpret each part of it:

**True Positive Values (Diagonal)**:
   - The values on the diagonal (from top left to bottom right) represent correctly classified instances for each class.
   - **Setosa**: 23 instances were correctly classified as "setosa."
   - **Versicolor**: 19 instances were correctly classified as "versicolor."
   - **Virginica**: 17 instances were correctly classified as "virginica."

**False Positive and False Negative Values (Off-Diagonal)**:
   - The off-diagonal values show misclassifications:
     - 1 instance of "versicolor" was misclassified as "virginica."
     - There are no misclassifications for "setosa" or "virginica" in any other class.

The model performed well on "setosa" and "virginica," with no misclassifications in these categories. But there was a slight error in classifying "versicolor," with one instance being misclassified as "virginica."

## 2. K-Nearest Neighbors (KNN)

### Introduction
K-Nearest Neighbors is a simple but effective algorithm that classifies instances based on the majority class of their k nearest neighbors in the feature space.


<img src="../images/k_nearest_neighbors.png" width="1920"/>

### When to Use KNN
- Small to medium-sized datasets
- When you need a non-parametric model
- When you have noisy training data
- When you want an interpretable model
- When computational resources during training are limited

### Implementation

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from matplotlib import pyplot as plt

In [None]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

len(X_train), len(X_test)

The below code evaluates the performance of a K-Nearest Neighbors (KNN) classifier for different values of $k$ to identify the optimal $k$ that maximizes accuracy.

   - Iterates over a range of $k$ values.
   - Trains and tests the model for each $k$.
   - Stores the accuracy for each $k$ in a list.
   - Determines and prints the $k$ with the highest accuracy.

   - Plots the relationship between $k$ and the accuracy score to provide a visual representation of the results.

In [None]:
# Define a range of odd values for the hyperparameter k (number of neighbors)
k_range = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21]

In [None]:
k_scores = []
for k in k_range:
    knn_model = KNeighborsClassifier(n_neighbors=k)
    knn_model.fit(X_train, y_train)
    y_pred = knn_model.predict(X_test)
    k_scores.append(accuracy_score(y_test, y_pred))

optimal_k = k_range[np.argmax(k_scores)]
print(f"Optimal k value: {optimal_k}")


plt.figure(figsize=(10, 6))
plt.plot(k_range, k_scores, "bo-")
plt.xlabel("K Value")
plt.ylabel("Accuracy Score")
plt.title("Accuracy vs K Value")
plt.grid(True)
plt.show()

This time we use cross-validation to assess and optimize a key hyperparameter $k$ for the KNN algorithm, providing a more reliable evaluation compared to a single train-test split.

In [None]:
k_scores = []
for k in k_range:
    knn_model = KNeighborsClassifier(n_neighbors=k)
    # Perform cross-validation
    cv_scores = cross_val_score(knn_model, X, y, cv=5, scoring="accuracy")
    cv_score = np.mean(cv_scores)
    k_scores.append(cv_score)

optimal_k = k_range[np.argmax(k_scores)]
print(f"Optimal k value: {optimal_k}")

plt.figure(figsize=(10, 6))
plt.plot(k_range, k_scores, "bo-")
plt.xlabel("K Value")
plt.ylabel("Accuracy Score")
plt.title("Accuracy vs K Value")
plt.grid(True)
plt.show()

Using Grid Search to fine best model hyperparameters

In [None]:
# Create and train KNN classifier
pipeline = Pipeline([("scaler", StandardScaler()), ("knn", KNeighborsClassifier())])

# Perform hyperparameter tuning
param_grid = {
    "knn__n_neighbors": k_range,
    "knn__weights": ["uniform", "distance"],
    "knn__metric": ["euclidean", "manhattan", "minkowski"],
}

grid_search = GridSearchCV(
    pipeline, param_grid, cv=5, scoring="accuracy", n_jobs=-1, verbose=1
)
grid_search.fit(X_train, y_train)

print("Best parameters:", grid_search.best_params_)
print("Best cross-validation score:", grid_search.best_score_)

In [None]:
knn_model = KNeighborsClassifier(n_neighbors=17, weights="distance", metric="euclidean")
knn_model.fit(X_train, y_train)

In [None]:
# Make predictions
y_pred = knn_model.predict(X_test)

# Print classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=target_names))

The model performs excellently overall, with:

- Perfect classification of setosa
- Slight confusion between versicolor and virginica
- High overall accuracy at 97%

The slightly lower recall for versicolor (0.90) suggests that 1 out of 10 versicolor samples was misclassified as virginica, while the lower precision for virginica (0.91) indicates that one versicolor was incorrectly classified as virginica.

## Model Comparison and Best Practices

### 1. Algorithm Selection Guidelines
- **SVM**:
  - Better for complex decision boundaries
  - Works well with high-dimensional data
  - Requires careful parameter tuning
  - More computationally intensive

- **KNN**:
  - Simple and interpretable
  - Works well with low-dimensional data
  - Sensitive to feature scaling
  - Requires less parameter tuning
  - Memory-intensive during prediction

### 2. Cross-validation Best Practices
1. Use stratified k-fold for imbalanced classes
2. Choose appropriate number of folds (5-10 typically)
3. Consider computational resources
4. Use multiple metrics for evaluation

### 3. Hyperparameter Tuning Tips
1. Start with broad parameter ranges
2. Use RandomizedSearchCV for initial search
3. Refine with GridSearchCV in promising regions
4. Consider trade-offs between performance and complexity

### 4. Feature Engineering Considerations
1. Scale features appropriately
2. Handle categorical variables
3. Consider dimensionality reduction
4. Remove or handle outliers

### 5. Model Evaluation Checklist
1. Check for overfitting
2. Examine confusion matrix
3. Consider per-class performance
4. Look at misclassified examples
5. Validate on hold-out test set