# 线性回归算法
- 解决回归问题
- 简单容易
- 是许多强大非线性模型的基础
- 具有可解释性
- 蕴含机器学习的重要思想

## 简单线性回归（单个特征和对应的一个标签）
- 寻找一个拟合的表达式，使得预测值与真值之间的距离尽量地小
- 通过损失函数或者效用函数寻找最优参数（**最优化原理**）

# Scikit Learn中的回归问题

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

In [3]:
boston = datasets.load_boston()
X = boston.data
y = boston.target

X = X[y < 50.0]
y = y[y < 50.0]       # 载入数据，并初步处理

In [4]:
X.shape

(490, 13)

In [5]:
from sklearn.model_selection import train_test_split     # 引入测试数据集划分

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 666)

In [6]:
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()                     # 载入线性回归分类器

In [7]:
lin_reg.fit(X_train, y_train)              # 训练数据集

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [8]:
lin_reg.coef_      #系数

array([-1.14235739e-01,  3.12783163e-02, -4.30926281e-02, -9.16425531e-02,
       -1.09940036e+01,  3.49155727e+00, -1.40778005e-02, -1.06270960e+00,
        2.45307516e-01, -1.23179738e-02, -8.80618320e-01,  8.43243544e-03,
       -3.99667727e-01])

In [9]:
lin_reg.intercept_    # 截距

32.64566083965322

In [10]:
lin_reg.score(X_test, y_test)

0.8008916199519098

# kNN Regressor

In [12]:
from sklearn.neighbors import KNeighborsRegressor        # k近邻算法的回归器

knn_reg = KNeighborsRegressor()        # 载入回归器
knn_reg.fit(X_train, y_train)         # 训练数据集
knn_reg.score(X_test, y_test)         # 验证回归器得分

0.602674505080953

# 利用网格搜索获取kNN算法超参数的取值

In [13]:
from sklearn.model_selection import GridSearchCV

param_grid = [
    {
        "weights": ["uniform"],
        "n_neighbors": [i for i in range(1, 11)]
    },
    {
        "weights": ["distance"],
        "n_neighbors": [i for i in range(1, 11)],
        "p": [i for i in range(1, 6)]
    }
]            # 定义网格搜索参数

In [14]:
knn_reg = KNeighborsRegressor()
grid_search = GridSearchCV(knn_reg, param_grid, n_jobs = -1, verbose = 1)
grid_search.fit(X_train, y_train)

Fitting 3 folds for each of 60 candidates, totalling 180 fits


[Parallel(n_jobs=-1)]: Done  40 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:    2.3s finished


GridSearchCV(cv=None, error_score='raise',
       estimator=KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
          metric_params=None, n_jobs=1, n_neighbors=5, p=2,
          weights='uniform'),
       fit_params=None, iid=True, n_jobs=-1,
       param_grid=[{'weights': ['uniform'], 'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}, {'weights': ['uniform'], 'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'p': [1, 2, 3, 4, 5]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=1)

In [15]:
grid_search.best_params_                     # 最优参数取值

{'n_neighbors': 3, 'p': 1, 'weights': 'uniform'}

In [16]:
grid_search.best_score_                    # 最高得分（交叉验证算法）

0.5732541716798997

In [17]:
grid_search.best_estimator_.score(X_test, y_test)       # 按照普通score得分

0.733942168303894