## R Squared(最好衡量标准)

$\Large{R^{2}\quad=\quad 1-\frac{SS_{residual}}{SS_{total}}}$  
即  
$\Large{R^{2}\quad=\quad 1-\frac{\displaystyle\sum_{i}(\hat{y}^{(i)}-y^{(i)})^{2}}{\displaystyle\sum_{i}(\bar{y}-y^{(i)})^{2}}}$

分子分母对应含义如下：  
分子：使用我们的模型预测产生的错误  
分母：使用$\quad y=\bar{y}\quad$预测产生的错误(Baseline Model)

>* $ R^{2}\quad\leq\quad1$
>* $ R^{2}$越大越好。当我们的预测模型不犯任何错误时，$ R^{2}$得到最大值1  
>* 当我们的模型等于基准模型时，$ R^{2}$为0
>* 如果$ R^{2}\quad<\quad 0 $说明我们学习到的模型还不如基准模型。此时，很可能我们的数据不存在任何线性关系

## 与 MSE 和 方差 的关系

$\Large{R^{2}\quad=\quad 1-\frac{\displaystyle\sum_{i}(\hat{y}^{(i)}-y^{(i)})^{2}}{\displaystyle\sum_{i}(\bar{y}-y^{(i)})^{2}}}\quad=\quad 1-\frac{(\displaystyle\sum_{i=1}^{m}(\hat{y}^{(i)}-y^{(i)})^{2})/m}{(\displaystyle\sum_{i=1}^{m}(y^{(i)}-\bar{y})^{2})/m}\quad=\quad 1-\frac{MSE(\hat{y},y)}{Var(y)}$

## R Square

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

In [2]:
boston = datasets.load_boston()
x = boston.data[:,5] # 只使用房间数量这个特征
y = boston.target

x = x[y < 50.0]
y = y[y < 50.0]

In [3]:
from playML.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, seed=666)

In [4]:
from playML.SimpleLinearRegression import SimpleLinearRegression

reg = SimpleLinearRegression()
reg.fit(x_train, y_train)

SimpleLinearRegression()

In [5]:
reg.a_

7.8608543562689555

In [6]:
reg.b_

-27.459342806705543

In [7]:
y_predict = reg.predict(x_test)

In [8]:
from playML.metrics import mean_squared_error

1 - mean_squared_error(y_test, y_predict)/np.var(y_test)

0.6129316803937322

 ## 封装 R score

In [9]:
from playML.metrics import r2_score

r2_score(y_test, y_predict)

0.6129316803937322

## scikit-learn中的 R score

In [10]:
from sklearn.metrics import r2_score
r2_score(y_test, y_predict)

0.6129316803937324

In [11]:
reg.score(x_test, y_test)

0.6129316803937322