## 审查回归算法

- 如何审查机器学习的回归算法
- 如何审查四种线性回归算法
- 如何审查三种非线性回归算法

### 算法概述

七种回归算法：   

四种线性算法：
- 线性回归算法
- 岭回归算法
- 套索回归算法
- 弹性网络（Elastic Net）回归算法    

三种非线性算法：
- K近邻算法（KNN）
- 分类与回归树算法
- 支持向量机（SVM）
    
采用波士顿房价的数据集来审查回归算法， 10折交叉验证分离数据，通过均方误差来评估算法模型

In [9]:
# 回归问题四种线性算法

# 线性回归算法
import pandas as pd
from sklearn.model_selection import KFold,cross_val_score
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings('ignore')

filename = 'housing.csv'
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 
         'TAX', 'PRTATIO', 'B', 'LSTAT', 'MEDV']
data = pd.read_csv(filename, names=names, delim_whitespace=True)
array = data.values
X = array[:, 0:13]
Y = array[:, 13]
n_splits = 10
seed =7
kfold = KFold(n_splits=n_splits, random_state=seed)
model = LinearRegression()
scoring = 'neg_mean_squared_error'
result = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print('Linear Regression %.3f ' %  result.mean())

Linear Regression -34.705 


In [13]:
# 岭回归

from sklearn.linear_model import Ridge
model = Ridge()
result = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print('Ridge Regression %.3f ' %  result.mean())

Ridge Regression -34.078 


In [14]:
# 套索回归 lasso

from sklearn.linear_model import Lasso

model = Lasso()
result = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print('Lasso Regression %.3f ' %  result.mean())

Lasso Regression -34.464 


In [15]:
# 弹性网络回归算法

from sklearn.linear_model import ElasticNet

model = ElasticNet()
result = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print('ElasticNet Regression %.3f ' %  result.mean())

ElasticNet Regression -31.165 


In [16]:
# 非线性算法

# KNN
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor()
result = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print('KNeighbors Regression %.3f ' %  result.mean())

KNeighbors Regression -107.287 


In [17]:
# 分类与回归树 CART

from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
result = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print('DecisionTree Regression %.3f ' %  result.mean())

DecisionTree Regression -39.528 


In [18]:
# 支持向量机 SVM

from sklearn.svm import SVR
model = SVR()
result = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print('SVM %.3f ' %  result.mean())

SVM -91.048 


sklearn中算法评估矩阵通常使用cross_val_score函数，通过指定参数scoring来选择使用不同评估矩阵。scoring参数表如下：     
![](./img/13-1.png)
![](./img/13-2.png)