# 多元线性回归

$$y = \theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_n$$

$$\hat{y} = \theta_0 + \theta_1X_1^{(i)} + \theta_2X_2^{(i)} + ... + \theta_nX_n^{(i)}$$

**目标：** 找到$$\theta_0,\theta_1,\theta_2,...,\theta_n$$使

$$\sum_{i=1}^m(y^{(i)} - \hat{y}^{(i)})^2$$

尽可能小

### 处理式子
$$\hat{y}^{(i)} = \theta_0 + \theta_1X_1^{(i)} + \theta_2X_2^{(i)} + ... + \theta_nX_n^{(i)}$$

$$\theta = (\theta_0, \theta_1, \theta_2,...,\theta_n)^T$$

$$\hat{y}^{(i)} = \theta_0X_0^{(i)} + \theta_1X_1^{(i)} + \theta_2X_2^{(i)} + ... + \theta_nX_n^{(i)}, X_0^{(i)}\equiv1$$

$$X^{(i)} = (X_0^{(i)},X_1^{(i)},X_2^{(i)}...,X_n^{(i)})$$

$$\hat{y}^{(i)} = X^{(i)}·\theta$$

$$\mathbf{X}_b =
\left( \begin{array}{ccc}
1 & X_1^{(1)} & X_2^{(1)} & \ldots & X_n^{(1)}\\\\
1 & X_1^{(2)} & X_2^{(2)} & \ldots & X_n^{(2)}\\\\
\ldots &  &  & & \ldots \\\\
1 & X_1^{(m)} & X_2^{(m)} & \ldots & X_n^{(m)}
\end{array} \right)$$

$$\mathbf{\theta} =
\left( \begin{array} {ccc}
\theta_0\\\\
\theta_1\\\\
\theta_2\\\\
\ldots \\\\
\theta_n
\end{array}\right)$$

**预测化简为**

$$\hat{y} = X_b · \theta$$

<br/>

则目标可化简为：使
$$(y - X_b·\theta)^T(y - X_b·\theta)$$
尽可能小

可推导出
$$\theta = (X_b^TX_b)^{-1}X_b^Ty$$
即，多元线性回归的正规方程解（Normal Equation）

<br/>

问题：时间复杂度高：O(n^3)（优化O(n^2.4)）

优点： 不需要对数据做归一化处理

### 实现多元线性回归模型

In [None]:
import numpy as np
from sklearn import datasets

In [None]:
boston = datasets.load_boston()

X = boston.data
y = boston.target

X = X[y < 50]
y = y[y < 50]

In [None]:
%run ../util/model_selection.py

X_train, X_test, y_train, y_test = train_test_split(X, y, seed=666)

In [None]:
%run LinearRegression.py
reg = LinearRegression()
reg.fit_normal(X_train, y_train)

In [None]:
reg.coef_

In [None]:
reg.interception_

In [None]:
reg.score(X_test, y_test)