# 线性回归:
最小二乘法线性回归，
给定输入$𝐱 = (𝑥₁, …, 𝑥ᵣ)$, 假设 y和𝐱存在线性关系: $𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀$.该公式为回归方程
$𝛽₀, 𝛽₁, …, 𝛽ᵣ$为回归参数, $𝜀$ 为随机偏差。
则通过最小二乘法计算输出与预测的平方误差/残差和,然后使用极大似然估计求得残差和的最小值时参数的取值。
由于最小二乘法的平方残差和为二次凸函数，其取得最小时导数为0，则对损失函数求导取0可以求得回归参数的取值。

In [3]:
import numpy as np
from sklearn.linear_model import LinearRegression


In [22]:
# 准备数据
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])
print(x)
print(y)


[[ 5]
 [15]
 [25]
 [35]
 [45]
 [55]]
[ 5 20 14 32 22 38]


In [23]:
# 建立线性回归模型
model = LinearRegression()
model

LinearRegression()

额外参数：
- **it_intercept** is a Boolean (True by default) that decides whether to calculate the intercept 𝑏₀ (True) or consider it equal to zero (False).
- **normalize** is a Boolean (False by default) that decides whether to normalize the input variables (True) or not (False).
- **copy_X** is a Boolean (True by default) that decides whether to copy (True) or overwrite the input variables (False).
- **n_jobs** is an integer or None (default) and represents the number of jobs used in parallel computation. None usually means one job and -1 to use all processors.


In [24]:
# 拟合模型并得到结果
model.fit(x, y)
r_sq = model.score(x, y)
print("coefficient of determination:", r_sq)

coefficient of determination: 0.7158756137479542


In [25]:
# 模型属性 .intercept_：表示模型截距b0, .coef_表示回归参数b1
print("intercept: ", model.intercept_)
print("slop:", model.coef_)

intercept:  5.633333333333329
slop: [0.54]


验证，通过求解最小二乘法的导数，并分别对b0和b1求偏导数取0，可以得到
b0 = \sum{y_i - b_1x_i}/n
\sum{x_iy_i} - \sum{b_0x_i}-\sum{b_1x_i^2}=0
可以求得 b0 = 0.54
b1 = 5.63333
与上述模型结果相同

In [29]:
n = y.shape[0]
b1x = np.sum(x)/n
yi = np.sum(y)/n
xy = np.sum(x.squeeze()*y)
b0x = np.sum(x)
b1x2 = np.sum(x*x)
print(b1x2)



7150


In [30]:
# 结果预测
y_pred = model.predict(x)
print("predicted response: ",y_pred,sep='\n')

predicted response: 
[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]


多元线性回归模型


