# Evaluation metrics
The main evaluation metrics for **classification** are
- Accuracy
- Precision (P)
- Recall (R)
- F1 score (F1)
- Area under the ROC (Receiver Operating Characteristic) curve or simply 
- AUC (AUC)
- Log loss
- Precision at k (P@k)
- Average precision at k (AP@k)
- Mean average precision at k (MAP@k)

The main evaluation metrics for **regression** are
- Mean absolute error (MAE)
- Mean squared error (MSE)
- Root mean squared error (RMSE)
- Root mean squared logarithmic error (RMSLE)
- Mean percentage error (MPE)
- Mean absolute percentage error (MAPE)
- R-squared ($R^2$)

## Classification metrics

The 4 possible outcomes intersecting predictions and ground-truths in a binary classification problem
- True Positives (TP)
- True Negatives (TN)
- False Positives (FP)
- False Negatives (FN)

### Accuracy
Accuracy defines how accurate your model is, which means how many predictions where correct over the whole dataset.
$$\text{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}$$

In [4]:
from sklearn.metrics import accuracy_score

y_true = [1,1,1,1,1,0,0,0,0,0]
y_pred = [1,1,1,1,1,1,1,0,0,0]

accuracy_score(y_true,y_pred)

0.8

The main problem of accuracy is that, if the dataset is heavily imbalanced, the model might completely ignore the minority class, and keep obtaining an high accuracy value. That's generally a bad outcome, since the minority class usually is the "interesting" one, that is the one the user would like to correctly classify.

In [3]:
from sklearn.metrics import accuracy_score

y_true = [1,1,1,1,1,1,1,1,1,0]
y_pred = [1,1,1,1,1,1,1,1,1,1]

accuracy_score(y_true,y_pred)

0.9

### Precision
Precision indicates how many predictions on a single class are correct over all the data points classified as part of that class.
$$\text{Precision}=\frac{TP}{TP+FP}$$

In case of a binary classification, one class is considered "positive" and the precision is computed only this class. That's the default behaviour of scikit-learn.

In [30]:
from sklearn.metrics import precision_score

y_true = [1,1,1,1,1,0,0,0,0,0]
y_pred = [1,1,1,1,1,1,1,0,0,0]

precision_score(y_true, y_pred, pos_label=1, average='binary')

0.7142857142857143

In [34]:
import numpy as np

y_true = np.array(y_true)
y_pred = np.array(y_pred)

TP = np.array([yt==yp==1 for yt,yp in zip(y_true,y_pred)]).sum()



# print(f'pred 1: {y_pred.sum()}\ntrue 1: {y_true.sum()}\nprecision: {y_true.sum()}/{y_pred.sum()} = {y_true.sum()/y_pred.sum():.2f}')

[True, True, True, True, True, False, False, False, False, False]

In case of a multiclass classification, different approaches are possible:
- micro-precision: calculate the class wise true positive and false positive and then use that to calculate overall precision

In [22]:
from sklearn.metrics import precision_score

y_true = [2,1,1,1,0,0,0,0,0,0]
y_pred = [2,2,1,1,1,1,1,0,0,0]

precision_score(y_true, y_pred, average='micro')

0.6

In [28]:
import numpy as np

y_true = np.array(y_true)
y_pred = np.array(y_pred)

np.unique(y_true,return_counts=True)

# print(f'pred 1: {y_pred.sum()}\ntrue 1: {y_true.sum()}\nprecision: {y_true.sum()}/{y_pred.sum()} = {y_true.sum()/y_pred.sum():.2f}')

(array([0, 1, 2]), array([6, 3, 1], dtype=int64))