There are 3 different APIs for evaluating the quality of a model’s predictions:

Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they are designed to solve. 

Scoring parameter: Model-evaluation tools using cross-validation (such as model_selection.cross_val_score and model_selection.GridSearchCV) rely on an internal scoring strategy. This is discussed in the section The scoring parameter: defining model evaluation rules.

Metric functions: The sklearn.metrics module implements functions assessing prediction error for specific purposes. These metrics are detailed in sections on Classification metrics, Multilabel ranking metrics, Regression metrics and Clustering metrics.

# binary classification

## accuracy score

In [11]:
import numpy as np
from sklearn.metrics import accuracy_score
y_pred=[0,2,1,3]
y_true=[0,1,1,3]
accuracy_score(y_true,y_pred)

0.75

In [12]:
accuracy_score(y_true,y_pred,normalize=False)

3

## Confusion matrix

In [13]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
y_true=[2,0,2,2,0,1]
y_pred=[0,0,2,2,0,2]
confusion_matrix(y_true,y_pred)

array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]], dtype=int64)

## precision_score, recall_score, f1_score

In [4]:
from sklearn import metrics
y_true=[0,1,0,0]
y_pred=[0,1,0,1]
metrics.confusion_matrix(y_true,y_pred)
# TN,FP
# FN,TP

array([[2, 1],
       [0, 1]], dtype=int64)

In [5]:
metrics.precision_score(y_true,y_pred)
# TP/(TP+FP)
#1/(1+1)

0.5

In [6]:
metrics.recall_score(y_true,y_pred)
# TP/(TP+FN)
#1/(1+0)

1.0

In [7]:
metrics.f1_score(y_true,y_pred)
#2*((0.5*1)/(0.5+1))

0.6666666666666666

In [8]:
metrics.fbeta_score(y_true,y_pred,beta=0.2)
#(1+b**2)(precision*recall)/(b**2*precision+recall)

0.5098039215686274

In [9]:
metrics.fbeta_score(y_true,y_pred,beta=0.5)

0.5555555555555556

In [10]:
metrics.fbeta_score(y_true,y_pred,beta=2) 

0.8333333333333334

In [11]:
y_true=[1,1,0,1,0,1]
y_pred=[0,1,0,1,1,0]
metrics.precision_score(y_true,y_pred)

0.6666666666666666

In [12]:
metrics.recall_score(y_true,y_pred)

0.5

In [13]:
1-metrics.fbeta_score(y_true,y_pred,beta=0.5)

0.375

In [14]:
y_true=[1,1,0,1,0,1]
y_pred=[0,1,0,1,1,0,]
metrics.precision_recall_fscore_support(y_true,y_pred,beta=0.5)
#1-precision,precision
#1-recall,recall
#1-fbeta_score,fbeta_score
#(support is the occurrence of each class in y_true)

(array([0.33333333, 0.66666667]),
 array([0.5, 0.5]),
 array([0.35714286, 0.625     ]),
 array([2, 4], dtype=int64))

In [15]:
y_true=[1,1,0,0]
y_pred=[0,1,0,1]
metrics.precision_recall_fscore_support(y_true,y_pred,beta=0.5)

(array([0.5, 0.5]),
 array([0.5, 0.5]),
 array([0.5, 0.5]),
 array([2, 2], dtype=int64))

## precision_recall_score

In [22]:
import numpy as np
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
precision, recall, threshold = precision_recall_curve(y_true, y_scores)

In [23]:
average_precision_score(y_true, y_scores)

0.8333333333333333

## Classification report

In [16]:
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 1, 0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))

              precision    recall  f1-score   support

     class 0       0.67      1.00      0.80         2
     class 1       0.00      0.00      0.00         1
     class 2       1.00      0.50      0.67         2

    accuracy                           0.60         5
   macro avg       0.56      0.50      0.49         5
weighted avg       0.67      0.60      0.59         5



## Hamming loss

#### The hamming_loss computes the average Hamming loss or Hamming distance between two sets of samples.

In [17]:
from sklearn.metrics import hamming_loss
y_pred = [1, 2, 3, 4]
y_true = [2, 2, 3, 4]
hamming_loss(y_true, y_pred)

0.25

## Jaccard similarity coefficient score

### binary classification

In [25]:
import numpy as np
from sklearn.metrics import jaccard_score
y_true = np.array([[0, 1, 1],
                   [1, 1, 0]])
y_pred = np.array([[1, 1, 1],
                   [1, 0, 0]])
jaccard_score(y_true[0], y_pred[0])

0.6666666666666666

#### In the multilabel case with binary label indicators:

In [27]:
jaccard_score(y_true, y_pred, average='samples')
jaccard_score(y_true, y_pred, average='macro')
jaccard_score(y_true, y_pred, average=None)

array([0.5, 0.5, 1. ])

### Multiclass problems are binarized and treated like the corresponding multilabel problem:

In [29]:
y_pred = [0, 2, 1, 2]
y_true = [0, 1, 2, 2]
jaccard_score(y_true, y_pred, average=None)
jaccard_score(y_true, y_pred, average='macro')
jaccard_score(y_true, y_pred, average='micro')

0.3333333333333333

# Receiver operating characteristic (ROC)

In [30]:
import numpy as np
from sklearn.metrics import roc_curve
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2)