# 线性回归
- 广义线性模型
    - 自变量一次（线性关系）: $y = w_1*x_1 + w_2*x_2 + w_3*x_3 + ... + w_n*x_n + b$
    - 参数一次: $y = w_1*x_1 + w_2*x_1^2 + w_3*x_1^3 + ... + b$

## 波士顿房价预测

In [12]:
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, SGDRegressor

In [7]:
# 1）获取数据
boston = load_boston()


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_h

In [8]:
# 2）划分数据集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=0)

In [9]:
# 3）标准化
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)

### 使用正规方程优化求解

In [17]:
# 4）预估器
estimator = LinearRegression()
estimator.fit(x_train, y_train)

# 5）得出模型
print("正规方程-权重系数为：\n", estimator.coef_)
print("正规方程-偏置为：\n", estimator.intercept_)

# 6）模型评估
y_predict = estimator.predict(x_test)
print("预测房价：\n", y_predict)
error = mean_squared_error(y_test, y_predict)
print("正规方程-均方误差为：\n", error)

正规方程-权重系数为：
 [-0.97100092  1.04667838 -0.04044753  0.59408776 -1.80876877  2.60991991
 -0.19823317 -3.00216551  2.08021582 -1.93289037 -2.15743759  0.75199122
 -3.59027047]
正规方程-偏置为：
 22.6087071240106
预测房价：
 [24.95233283 23.61699724 29.20588553 11.96070515 21.33362042 19.46954895
 20.42228421 21.52044058 18.98954101 19.950983    4.92468244 16.09694058
 16.93599574  5.33508402 39.84434398 32.33549843 22.32772572 36.54017819
 31.03300611 23.32172503 24.92086498 24.26106474 20.71504422 30.45072552
 22.45009234  9.87470006 17.70324412 17.974775   35.69932012 20.7940972
 18.10554174 17.68317865 19.71354713 23.79693873 29.06528958 19.23738284
 10.97815878 24.56199978 17.32913052 15.20340817 26.09337458 20.87706795
 22.26187518 15.32582693 22.85847963 25.08887173 19.74138819 22.70744911
  9.66708558 24.46175926 20.72654169 17.52545047 24.45596997 30.10668865
 13.31250981 21.52052342 20.65642932 15.34285652 13.7741129  22.07429287
 17.53293957 21.60707766 32.91050188 31.32796114 17.64346364 32

### 使用梯度下降迭代求解

In [16]:
# 4）预估器
estimator = SGDRegressor(learning_rate="constant", eta0=0.01, max_iter=10000, penalty="l1")
estimator.fit(x_train, y_train)

# 5）得出模型
print("正规方程-权重系数为：\n", estimator.coef_)
print("正规方程-偏置为：\n", estimator.intercept_)

# 6）模型评估
y_predict = estimator.predict(x_test)
print("预测房价：\n", y_predict)
error = mean_squared_error(y_test, y_predict)
print("正规方程-均方误差为：\n", error)

正规方程-权重系数为：
 [-0.47205029  1.10725269  0.05410314  0.28081675 -1.78527002  1.77455881
 -0.08942039 -2.8651933   1.66702671 -1.58228433 -2.323182    0.9453406
 -3.19365073]
正规方程-偏置为：
 [22.78949303]
预测房价：
 [24.91063154 23.94190406 28.07677819 14.93685994 21.71602061 19.79459443
 20.94954532 21.52421358 20.62044807 19.09144342  9.3555977  16.12361095
 17.35019088  9.59804256 37.02599036 31.54749929 22.53982333 35.79894836
 30.30424434 23.51273338 24.42118012 25.53911426 21.02422277 29.98441346
 23.07732047 13.01989041 18.26711967 19.47773766 33.24442105 21.71407229
 19.07585109 18.43342586 19.9620659  24.37791037 28.87442009 18.782455
 13.59340277 23.86704012 17.27804142 16.13574172 26.35862349 21.59517368
 22.62999716 17.03188565 23.09770558 25.00244661 21.02343699 22.48709896
 13.19701201 23.82849072 20.04334461 18.5447548  24.77548255 27.37607998
 14.42544503 22.55480476 21.83574996 16.17108839 16.43954887 21.67805879
 18.43708762 22.01459436 31.83435802 30.32243035 19.09356238 32.3868

正规方程-权重系数为：
 [-1.34213142  1.26366272  0.          0.95304239 -1.67910252  2.56846379
 -0.12698283 -3.09700549  1.61011855 -1.79256309 -2.59026424  0.7942378
 -3.36366318]
正规方程-偏置为：
 [22.4506148]
预测房价：
 [24.86562739 22.31512551 31.0235234  10.3822708  21.11086712 18.76851881
 19.14595154 20.84726119 18.9305347  19.31257835  0.95312001 14.7087872
 16.06828895  4.11343833 41.83599731 32.85448199 20.9282513  37.78634403
 30.96803363 22.86459529 24.30464499 25.18081007 20.38335172 30.53248255
 22.24824336  8.51352112 17.25638196 19.32474445 35.39051298 21.20237146
 16.96195846 16.65825071 19.24807408 24.05355137 29.27698871 19.88075309
  9.58633331 24.75779786 16.04630921 14.35535179 26.15583936 20.7910872
 22.24750167 14.26762801 23.28027034 25.0414426  20.11319588 25.5063678
 11.53704093 23.90648415 23.2812327  17.52570109 24.35460794 29.09575264
 13.26188055 22.17111396 21.11134244 15.0083283  10.41080556 23.63581983
 17.44075499 21.58747581 32.91036051 30.84443635 16.13663535 33.213617