# 多元线性回归

* 线性关系：$y = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_nx_n$
* 参数：$\theta_0, \theta_1, \theta_2, ···\theta_n$
   * 截距 ：$\theta_0$
   * 系数 ：$\theta_1, \theta_2, ···\theta_n$

$$\Theta = 
\begin{Bmatrix}
\theta_0\\
\theta_1\\
\vdots\\
\theta_i
\end{Bmatrix}
$$
* 特征：$$
X=
\begin{Bmatrix}
X_1^{(1)}&X_2^{(1)}&\cdots&X_n^{(1)}\\
X_1^{(2)}&X_2^{(2)}&\cdots&X_n^{(2)}\\
\cdots&&&\cdots\\
X_1^{(i)}&X_2^{(i)}&\cdots&X_n^{(i)}
\end{Bmatrix}
$$

* $X_b$，在X最前面增加一列横为1的X0，用于与$\theta_0$相乘

$$
X_b = 
\begin{Bmatrix}
X_0^{(1)}&X_1^{(1)}&X_2^{(1)}&\cdots&X_n^{(1)}\\
X_0^{(2)}&X_1^{(2)}&X_2^{(2)}&\cdots&X_n^{(2)}\\
\cdots&&&&\cdots\\
X_0^{(i)}&X_1^{(i)}&X_2^{(i)}&\cdots&X_n^{(i)}
\end{Bmatrix}
$$

* 线性关系的向量表达形式
$$
\hat{y} = X_b · \Theta
$$

* 目标

   * 使$\sum{(y^{(i)} - \hat{y}^{(i)})^2}$尽可能小 
   * 向量表达形式:使$(y-X_b·\Theta)^T(y-X_b·\Theta)$尽可能小
* 正规方差解Normal Equation
   * 问题：时间复杂度高：$O(n^3)$(优化能达到$O(n^{2.4})$)
$$
\Theta = (X_b^TX_b)^{-1}X_b^Ty
$$


In [46]:
%run lib/linearregression

In [47]:
from sklearn import datasets

In [48]:
boston = datasets.load_boston()
X = boston.data
y = boston.target
X = X[y < 50]
y = y[y < 50]

In [49]:
from sklearn.model_selection import train_test_split

In [50]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [51]:
X.shape

(490, 13)

In [52]:
#自建的回归类
from lib import linearregression

* 建立类实例

In [53]:
reg = LinearRegression()

* 实例拟合训练数据集，计算系数和截距

In [54]:
reg.fit_normal(X_train, y_train)

linearregression

* 计算预测

In [55]:
reg.predict(X_test)

array([19.81738102, 30.37263029, 10.09167814, 15.81900162, 18.43888527,
       25.72163117, 24.00542704, 25.73569286, 37.53003554, 16.03659048,
       10.82771493, 27.82950918, 21.32951437, 17.46182283, 17.86347411,
       22.4422624 , 17.67024953, 26.92420225,  9.04326954, 20.01260542,
       24.91973656, 26.98899625, 25.50067576, 30.72277912, 19.05317021,
       36.11044336, 20.81592698, 14.40664662, 25.71331414, 17.29310014,
       26.90131185, 18.10858551, 22.7251759 , 20.06535413, 25.10318667,
       15.70269694, 23.29277343, 26.01539282, 29.48998936, 30.03040757,
       23.61414266, 12.85739261, 26.40217487, 10.92652752, 21.60545459,
       25.31819341, 22.14824722, 15.40663952, 29.40416637, 16.48414967,
       13.4338411 , 13.22955304, 32.34714502, 14.32228204, 22.05084074,
       21.67743979, 27.09292259, 32.56318831, 32.26601607, 25.08782424,
       28.32958294, 21.68227495, 21.33466596, 32.64255116, 17.73874153,
       22.45128567, 22.39249891, 34.48436499, 24.02484103, 14.72

* 预测准确率

In [56]:
reg.score(X_test, reg.predict(X_test))

1.0