# Multi Classes
When a classification problem involves three or more classes, we use Multiclass Classification Strategies to adapt these binary models or employ algorithms that inherently handle multiple classes.

## 1. Strategies for Binary Classifiers (Decomposition Methods)
### 1.1. One-vs-Rest (OvR) or One-vs-All (OvA)
This is the most common and simplest strategy. If you have $N$ classes, OvR trains $N$ separate binary classifiers.
- Training: Each classifier is trained to distinguish one class (the "Positive" class) from all other classes combined (the "Negative" class).Example (Classes A, B, C): Train three models: Model 1 (A vs. B and C), Model 2 (B vs. A and C), Model 3 (C vs. A and B).
- Prediction: A new sample is run through all $N$ models, and the final prediction is the class for which the corresponding model outputs the highest confidence score or probability.
### 1.2. One-vs-One (OvO)
OvO trains a binary classifier for every possible pair of classes.
- Training: If you have $N$ classes, you train $\frac{N \times (N-1)}{2}$ classifiers.Example (Classes A, B, C): Train three models: Model 1 (A vs. B), Model 2 (A vs. C), Model 3 (B vs. C).
- Prediction: When classifying a new sample, all models vote, and the class that receives the most votes wins.
## 2. Algorithms with Inherent Multiclass Capability
Some machine learning algorithms naturally handle three or more classes during their training phase:
- Decision Trees (and Ensemble Methods like Random Forest): These models partition the feature space until the leaf nodes are pure (or mostly pure) with respect to all $N$ classes simultaneously.
- K-Nearest Neighbors (KNN): KNN classifies a new point based on the majority class among its $K$ nearest neighbors, regardless of how many classes are present.
- Naive Bayes: This probabilistic model calculates the probability of a data point belonging to each of the $N$ classes and selects the one with the highest probability.
- Neural Networks: Use a softmax activation function in the output layer to directly output a probability distribution across $N$ classes.
## 3. Python Implementation (Scikit-learn)
In Scikit-learn, most binary classifiers (LogisticRegression, SVC, etc.) automatically use the OvR strategy when faced with a multiclass target variable, unless explicitly told otherwise.


In [10]:
# CODE CELL: Multiclass Classification (using Iris dataset which has 3 classes)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Load Iris Data (3 classes: 0, 1, 2)
iris = load_iris()
X, y = iris.data, iris.target 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# LogisticRegression defaults to 'multi_class='auto'' which uses 'ovr' for this case
# or 'multinomial' if the solver supports it (which is better).
# We explicitly set 'ovr' here for demonstration.
log_reg_ovr = LogisticRegression(solver='liblinear', multi_class='ovr', random_state=42)
log_reg_ovr.fit(X_train, y_train)

y_pred = log_reg_ovr.predict(X_test)

print("--- Multiclass Classification Report (using OvR strategy) ---")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

--- Multiclass Classification Report (using OvR strategy) ---
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      0.92      0.96        13
   virginica       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45

