## Evaluation Metrics
 
### Classification Metrics Table
指标|	描述|	Scikit-learn函数|
:----|:---|:---|
Precision|	精准度|	from sklearn.metrics import precision_score|
Recall|	召回率|	from sklearn.metrics import recall_score|
F1|	F1值|	from sklearn.metrics import f1_score|
Confusion Matrix|	混淆矩阵|	from sklearn.metrics import confusion_matrix|
ROC	|ROC曲线|	from sklearn.metrics import roc|
AUC|	ROC曲线下的面积|	from sklearn.metrics import auc|

### Regression Metrics Table
指标|	描述|	Scikit-learn函数|
:-----|:-----|:---|
Mean Square Error (MSE, RMSE)|	平均方差|	from sklearn.metrics import mean_squared_error|
Absolute Error (MAE, RAE)|	绝对误差|	from sklearn.metrics import mean_absolute_error, median_absolute_error|
R-Squared|	R平方值|	from sklearn.metrics import r2_score|
### MAE, MSE, R-Squared & RMSE


$$MAE=\frac{1}{n}\sum_{i=1}^{n}{|y_i-\hat{y_i}|}$$

$$MSE=\frac{1}{n}\sum_{i=1}^{n}{(y_i-\hat{y_i})^2}$$

$$R^2=1-\frac{\sum_{i=1}^{n}{(y_i-\hat{y_i})^2}}{\sum_{i=1}^{n}{(y_i-\bar{y})^2}}$$

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{(y_i-\hat{y_i})^2}}$$


RMSE比MAE对大误差，尤其是极值非常敏感。出现你说的情况的原因是，虽然整体上误差绝对值在不断减小，但误差的分布发生了变化，两个尾部更厚了一点。你说的反相关是负相关吧，这两个一般是正相关的，但是显然能从散点图里面挑出来一小部分点，使得他们跟整体的规律相反。但是我觉得这不是负相关。

$R^{2}$ 越大，越接近1，则表明解释变量$x_{i}$ 和预测变量$y_{i}$ 之间的相关性越强。

### Confusion matrix
Real \ Predict | Postive | Negative |
:-------------:|:-------:|:--------:|
Postive        | TP      | FN       |
Negative       | FP      | TN       |

### Accuracy(准确率)

$$Accuracy = \frac{TP + TN}{TP + FN + FP + TN}$$


### Precision (查准率)

$$P = \frac{TP}{TP + FP}$$

### Recall (查全率/召回率)

$$R = \frac{TP}{TP + FN}$$

### F1, Fn
查准率和查全率往往是矛盾的度量，所以需要F度量(调和平均值)作为平衡的指标。与算术平均值$(\frac{P+R}{2})$和几何平均值$(\sqrt{P*R})$相比，调和平均值更重视较小值。多分类时可使用混淆矩阵的平均值或直接使用查准率P和召回率R的平均值。


$$\frac{1}{F1} = \frac{1}{2} (\frac{1}{P} + \frac{1}{R})$$

$$F1 = \frac{2*P*R}{P + R}$$

$$\frac{1}{F_\beta} = \frac{1}{1+ \beta^2} (\frac{1}{P} + \frac{\beta^2}{R})$$

$$F_\beta = \frac{(1+ \beta^2)*P*R}{(\beta^2 *P) + R}$$

当$\beta>0$时度量了查全率R对查准率P的重要性。$\beta>1$时查全率R有更大影响，$\beta<1$时查准率P有更大影响。

### TPR & FPR

$$TPR(True Postive Rate) = \frac{TP}{TP + FN}$$

$$FPR(False Postive Rate) = \frac{FP}{FP + TN}$$

### ROC & AUC

[ROC from Scikit-learn:](http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py)

Example of Receiver Operating Characteristic (ROC) metric to evaluate classifier output quality.

ROC curves typically feature true positive rate on the Y axis, and false positive rate on the X axis. This means that the top left corner of the plot is the “ideal” point - a false positive rate of zero, and a true positive rate of one. This is not very realistic, but it does mean that a larger area under the curve (AUC) is usually better.

The “steepness” of ROC curves is also important, since it is ideal to maximize the true positive rate while minimizing the false positive rate.

Process for limited examples (my description):

Prepares both m+ positive and m- negative samples and sort out them by predicted posibility in desceding order. Then set the highest classifier threshold  (marks all as negative) to make TPR and FPR on (0,0). Level down the threshold just enough to make samples output positive, one by one. Mark previous sample as (x,y). If current sample is true positve, marks $(x, y + \frac{1}{m+})$ on ROC. Otherwise for false positive, marks $(x + \frac{1}{m-}, y)$ instead.

If one classifier's ROC totally cover one another, it should be better. If cross-over happens, AUC area should be counted on.  

![image](http://scikit-learn.org/stable/_images/sphx_glr_plot_roc_001.png)

![image](http://scikit-learn.org/stable/_images/sphx_glr_plot_roc_002.png)