# 线性回归
- 广义线性模型
    - 自变量一次（线性关系）: $y = w_1*x_1 + w_2*x_2 + w_3*x_3 + ... + w_n*x_n + b$
    - 参数一次: $y = w_1*x_1 + w_2*x_1^2 + w_3*x_1^3 + ... + b$

## 波士顿房价预测

In [1]:
import joblib
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge

In [2]:
# 1）获取数据
boston = joblib.load('data/boston.pkz')

In [3]:
# 2）划分数据集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=0)

In [4]:
# 3）标准化
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)

### 使用正规方程优化求解

In [5]:
# 4）预估器
estimator = LinearRegression()
estimator.fit(x_train, y_train)

# 5）得出模型
print("正规方程-权重系数为：\n", estimator.coef_)
print("正规方程-偏置为：\n", estimator.intercept_)

# 6）模型评估
y_predict = estimator.predict(x_test)
print("预测房价：\n", y_predict)
error = mean_squared_error(y_test, y_predict)
print("正规方程-均方误差为：\n", error)

正规方程-权重系数为：
 [-0.97100092  1.04667838 -0.04044753  0.59408776 -1.80876877  2.60991991
 -0.19823317 -3.00216551  2.08021582 -1.93289037 -2.15743759  0.75199122
 -3.59027047]
正规方程-偏置为：
 22.6087071240106
预测房价：
 [24.95233283 23.61699724 29.20588553 11.96070515 21.33362042 19.46954895
 20.42228421 21.52044058 18.98954101 19.950983    4.92468244 16.09694058
 16.93599574  5.33508402 39.84434398 32.33549843 22.32772572 36.54017819
 31.03300611 23.32172503 24.92086498 24.26106474 20.71504422 30.45072552
 22.45009234  9.87470006 17.70324412 17.974775   35.69932012 20.7940972
 18.10554174 17.68317865 19.71354713 23.79693873 29.06528958 19.23738284
 10.97815878 24.56199978 17.32913052 15.20340817 26.09337458 20.87706795
 22.26187518 15.32582693 22.85847963 25.08887173 19.74138819 22.70744911
  9.66708558 24.46175926 20.72654169 17.52545047 24.45596997 30.10668865
 13.31250981 21.52052342 20.65642932 15.34285652 13.7741129  22.07429287
 17.53293957 21.60707766 32.91050188 31.32796114 17.64346364 32

### 使用随机梯度下降（SGD）迭代求解

In [6]:
# 4）预估器
estimator = SGDRegressor(learning_rate="constant", eta0=0.01, max_iter=10000, penalty="l1")
estimator.fit(x_train, y_train)

# 5）得出模型
print("正规方程-权重系数为：\n", estimator.coef_)
print("正规方程-偏置为：\n", estimator.intercept_)

# 6）模型评估
y_predict = estimator.predict(x_test)
print("预测房价：\n", y_predict)
error = mean_squared_error(y_test, y_predict)
print("正规方程-均方误差为：\n", error)

正规方程-权重系数为：
 [-1.46124731  1.49170463 -0.29389253  0.21034827 -1.9164089   2.82281295
 -0.3754204  -2.87927849  1.70835098 -1.86467885 -2.6691835   0.43298099
 -3.70587629]
正规方程-偏置为：
 [22.55362399]
预测房价：
 [ 2.64027434e+01  2.14941667e+01  2.79651856e+01  8.37008742e+00
  2.15901367e+01  1.90958213e+01  1.88105801e+01  2.14485971e+01
  1.88653101e+01  2.06315530e+01 -3.28942062e-02  1.48740891e+01
  1.50408598e+01  1.37217576e+00  3.91706542e+01  3.60790025e+01
  2.07321640e+01  4.09390724e+01  3.22006563e+01  2.35188965e+01
  2.51467700e+01  2.45261321e+01  2.00916121e+01  3.21264046e+01
  2.30683421e+01  7.57332571e+00  1.70471732e+01  1.81460716e+01
  3.69759722e+01  2.06576064e+01  1.58238902e+01  1.55830840e+01
  2.05129005e+01  2.52482128e+01  2.97663419e+01  1.56876669e+01
  7.42480908e+00  2.09503650e+01  1.63561677e+01  1.33465240e+01
  2.71807931e+01  2.14661069e+01  2.35955209e+01  1.30871792e+01
  2.44500272e+01  2.57132675e+01  2.04040414e+01  2.13278934e+01
  8.84082296e+0

### 岭回归，SAG梯度下降
带有L2正则化的线性回归

In [7]:
# 4）预估器
estimator = Ridge(alpha=0.5, max_iter=10000)
estimator.fit(x_train, y_train)

# 5）得出模型
print("岭回归-权重系数为：\n", estimator.coef_)
print("岭回归-偏置为：\n", estimator.intercept_)

# 6）模型评估
y_predict = estimator.predict(x_test)
print("预测房价：\n", y_predict)
error = mean_squared_error(y_test, y_predict)
print("岭回归-均方误差为：\n", error)

岭回归-权重系数为：
 [-0.96636823  1.03707659 -0.05484361  0.5961581  -1.79076794  2.61533765
 -0.20151622 -2.98347851  2.03979874 -1.89483198 -2.15345569  0.75186843
 -3.58183387]
岭回归-偏置为：
 22.6087071240106
预测房价：
 [24.99701487 23.58394125 29.18989951 11.95643128 21.3455463  19.48310372
 20.40226637 21.51615935 18.97886231 19.96108826  4.9523142  16.08213485
 16.93741294  5.33362446 39.83996682 32.34682652 22.30323121 36.54677335
 31.02760048 23.31641189 24.92108324 24.24870714 20.72127373 30.42248958
 22.44338793  9.83347533 17.71608717 17.99980732 35.70717018 20.80780865
 18.08944729 17.68209178 19.72791202 23.79078401 29.05294293 19.24205085
 10.98573938 24.53993681 17.30881302 15.18778328 26.08181569 20.87586601
 22.31126635 15.30055867 22.93288602 25.08958204 19.73842651 22.7442464
  9.69941834 24.47367734 20.76644087 17.54641628 24.43913951 30.14512678
 13.32649567 21.53226191 20.67618802 15.39692536 13.75908897 22.07387433
 17.5932237  21.61494186 32.89808227 31.30908176 17.62663382 32.6