# Top 7 Multiclass Metrics Explained Neatly
## TODO
![](images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/@deon-black-3867281?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Deon Black</a>
        on 
        <a href='https://www.pexels.com/photo/long-fitness-health-measure-5915361/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Pexels</a>
    </strong>
</figcaption>

### Setup

In [2]:
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

warnings.filterwarnings("ignore")

In [3]:
### HIDE

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    ConfusionMatrixDisplay,
    cohen_kappa_score,
    confusion_matrix,
    f1_score,
    log_loss,
    matthews_corrcoef,
    plot_confusion_matrix,
    precision_score,
    recall_score,
    roc_auc_score,
)
from sklearn.model_selection import train_test_split

# Generate dataset
X, y = make_classification(
    n_samples=200,
    n_features=10,
    n_redundant=3,
    n_informative=7,
    n_classes=4,
    random_state=1121218,
)

# Train/test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=1121218
)

# Init logreg
logreg = LogisticRegression()
# Fit
_ = logreg.fit(X_train, y_train)
# Label predict
y_pred = logreg.predict(X_test)
# Predict proba
y_preb_probs = logreg.predict_proba(X_test)

### Introduction

I have recently published [my most challenging article](https://towardsdatascience.com/comprehensive-guide-to-multiclass-classification-with-sklearn-127cc500f362?source=your_stories_page-------------------------------------) which was on the topic of multiclass classification (MC). The difficulties I have faced along the way were largely due to the excessive number of classification metrics that I had to learn and explain. By the time I finished, I realized that these metrics deserved an article of their own. 

So, this post will be about the 6 most commonly used MC metrics: precision, recall, F1 score, ROC AUC score, Cohen Kappa score, Matthew's Correlation coefficient and log loss. You will learn how they are calculated, their nuances in Sklearn and how to use them in your own workflow.

### Interpreting an N by N confusion matrix

All of the metrics you will be introduced today are associated with confusion matrices in one way or the other. While a 2 by 2 confusion matrix is intuitive and easy to understand, larger confusion matrices can be *truly confusing*. For this reason, it is a good idea to get some exposure to larger, N by N matrices before diving deep into the metrics that are derived from them. 

Throughout this article, we will use the example of cancer classification. Specifically, the target contains 4 types of cancer: brain, lung, breast, and kidney. Evaluating any type of classifier on this cancer data will produce 4 by 4 matrix: