## 3.4. Metrics and scoring: quantifying the quality of predictions


### 3.4.1. Choose score function?
- Which scoring function should I use?
- Which scoring function is a good one for my task?

It is useful to distinguish two steps:
- Predicting
- Decision making

**Predicting**: Usually, the response variable is a random variable, in the sense that there is no deterministic function of the features. $Y=g(X)$ , we usually choose a stat. property as the result. Such as mean,median,quantile 

It is best used for both: 
- as loss function for model training 
- as metric/score in model evaluation and model comparison.

**Decision Making**:The result is a single outcome,There are many scoring functions which measure different aspects of such a decision, most of them are covered with or derived from the `metrics.confusion_matrix`.

|functional         | scoring or loss function                          |  response `y`       |   prediction|
|-------------------|---------------------------------------------------|---------------------|-------------|
|**Classification**||||
|mode               | zero-one loss        |multi-class           |``predict``, categorical|
**Regression**
|mean                |mean_squared_error|  all reals             |``predict``, all reals|
|median              |mean_absolute_error |         all reals      |       ``predict``, all reals|
|quantile           |pinball_loss     |           all reals         |    ``predict``, all reals|

1. $R^2$ gives the same ranking as squared error.


### 3.4.2. Scoring API overview
There are 3 different APIs for evaluating the quality of a model’s predictions:  
1. Estimator score method: Estimators(model) have a score method providing a default evaluation criterion.
- classifiers ：  accuracy 
- regressors：$R^2$
2. Scoring parameter: Model-evaluation tools that use cross-validation (such as model_selection.GridSearchCV..) rely on an internal scoring strategy.`scoring` param.  
3. Metric functions: The sklearn.metrics module implements functions assessing prediction error for specific purposes.

### 3.4.6. Regression metrics

The sklearn.metrics module implements several loss, score, and utility functions to **measure regression performance**. Some of those have been enhanced to handle the **multioutput** case (): mean_squared_error, mean_absolute_error, r2_score...

These functions have a multioutput keyword argument:


#### 3.4.6.1. R² score, the coefficient of determination
The r2_score function computes the coefficient of determination through the **proportion** of explained **variance**.
$$R² = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}$$


$(-\inf, 1]$
- R^2 = 1 ：perfect
-  R^2 = 0 ：imperfect, ability as the mean predict (all $\hat{y}_i = \bar{y}$)
-  R^2 < 0 ：terrible

$denominator=0$:
- NaN -> 1.0
- -inf -> 0.0


In [3]:
from sklearn.metrics import r2_score
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
r2_score(y_true, y_pred)


0.9486081370449679

In [4]:
y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]
r2_score(y_true, y_pred, multioutput='variance_weighted')
r2_score(y_true, y_pred, multioutput='uniform_average')
r2_score(y_true, y_pred, multioutput='raw_values')
r2_score(y_true, y_pred, multioutput=[0.3, 0.7])

0.9253456221198156

#### 3.4.6.2 others
- Mean Absolute Error (MAE)
$$
MAE = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|
$$

- Mean Squared Error (MSE)
$$
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

- Median Absolute Error (MedAE)
$$
MedAE = \text{median} \left( \left| y_i - \hat{y}_i \right| \right)
$$
    - MedAE : outliers不敏感


- Max Error
$$
Max Error = \max \left( \left| y_i - \hat{y}_i \right| \right)



#### 3.4.6.12. Visual evaluation of regression models
PredictionErrorDisplay class allows to visually inspect the prediction errors of a model in two different manners.