# Ultimate Guide to Multiclass Classification With Sklearn
## Model selection, developing strategy and evaluation metrics
![](./images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/@sergiu-iacob-10475786?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Sergiu Iacob</a>
        on 
        <a href='https://www.pexels.com/photo/wave-dark-abstract-motion-7868341/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Pexels</a>
    </strong>
</figcaption>

### What you will learn

Even though multi-class classification is not as common, it certainly poses a much bigger challenge than binary classification problems. Many of the well-known strategies for solving multi-class problems breaks down the task into several or multiple (yes, in this case, there is a difference and I will explain) binary classification problems. 

After that, there is the issue of choosing an evaluation metric which accurately shows the model's performance across all classes. Since we are dealing with multiple binary classifiers, these metrics tend to get pretty complex. Finally, you must do hyperparameter tuning to optimize for a particular metric. Well, how can you do that if you don't know what you are optimizing for in the first place?

For these reasons, this article will be about an end-to-end tutorial on how to solve any multi-class supervised classification problem using Sklearn. You will learn:
- the methods Sklearn offers to binarize a multi-class problem
- quick overview of the preprocessing steps required
- how to evaluate a default model of your choice
- details of multi-class classification metrics and finally,
- how to maximize model performance for a particular metric

### Which strategy to choose: One-vs-One or One-vs-Rest?

Many algorithms such as Logistic Regression, Support Vector Machines do not support multi-class classification natively. Even if you were blindly using Sklearn classifiers for a multi-class problem, Sklearn fits a *number* of binary classifiers to different versions of the training data under the hood. 

This *number* depends on what type of strategy your model uses to binarize the problem. There are two strategies which produce different number of binary classifiers:
1. One-vs-One: this strategy splits multi-class problem into a single binary classifier for each pair of classes (more on this in a bit). Sklearn implements this strategy in `OneVsOneClassifier` (OVO) which takes a binary classifier as input. Here is an example on a synthetic dataset involving a LogisticRegression estimator:

In [6]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsOneClassifier

# Create a dataset
X, y = make_classification(
    n_samples=1000, n_features=10, n_informative=8, n_redundant=2, n_classes=4
)

# Init the strategy
clf = OneVsOneClassifier(estimator=LogisticRegression())
# Fit
_ = clf.fit(X, y)

Let's assume that the 3 classes we created above refer to lung, breast, kidney and brain cancers. In this case, OVO creates 6 individual LogisticRegression models:
- Classifier 1: lung vs breast
- Classifier 2: lung vs kidney
- Classifier 3: lung vs brain
- Classifier 4: breast vs kidney
- Classifier 5: breast vs brain
- Classifier 6: kidney vs brain

In [7]:
# Print the number of estimators created
print(len(clf.estimators_))

6


In general, the number of binary classifiers created for N-class classification problem can be found using this formula: 
![](./images/1.png)

As you might guess, this strategy can be computationally expensive because the number of binary classifiers grow exponentially when the target has high cardinality. Therefore, the second approach is preferred.

2. One-vs-All or One-vs-Rest (OVR). For N-class classification problem, this strategy creates N number of binary classifiers, one for each class. For the cancer example with 4 target classes:
- Classifier 1: lung vs \[breast, kidney, brain\]
- Classifier 2: breast vs \[lung, kidney, brain\]
- Classifier 3: kidney vs \[lung, breast, brain\]
- Classifier 4: brain vs \[lung, breast kidney\]

In the first problem, Sklearn treats lung class as the positive and encodes it as 1 and the rest of class gets converted to 0s. The same pattern continues for all classes in N-class problem.

Its implementation in Sklearn can be found under `sklearn.multiclass`. Here is an example of OVR on our synthetic dataset:

In [19]:
from sklearn.multiclass import OneVsRestClassifier

# Init/fit
clf = OneVsRestClassifier(estimator=LogisticRegression())
_ = clf.fit(X, y)

In [20]:
clf.estimators_

[LogisticRegression(),
 LogisticRegression(),
 LogisticRegression(),
 LogisticRegression()]

Even though this strategy significantly lowers the computational cost, the fact that only one class is considered positive and the rest negative makes each binary problem an *imbalanced classification*. This problem is even more pronounced for classes with low proportions in the target.

In both approaches, depending on the passed estimator, the results of all binary classifiers can be summarized in two ways:
- majority of the vote: each binary classifier predicts one class and the class that got the most votes from all classifiers is chosen
- depending on the argmax of class membership probability scores: classifiers such as LogisticRegression computes probability scores for each class (`.predict_proba()`). Then, the argmax of the sum of the scores is chosen.

Note that tree-based and ensemble models support multi-class classification natively. So, there is no need to wrap them either in OVO or OVR. However, regardless of model type, these strategies are still essential when we are talking about the evaluation metrics in the coming sections.