## sklearn.linear_model中的LinearRegression

- 检验指标只有$R^2$

- LinearRegression的positive参数可以强制系数为正

- fit可以设置sample_weight

- fit_intercept=True：是否需要常数项

In [1]:
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit([[0, 0], 
                              [1, 1], 
                              [2, 2]], [0, 1, 2])

reg.coef_

array([0.5, 0.5])

In [2]:
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], 
              [1, 2], 
              [2, 2], 
              [2, 3]])
# y = 3 + 1 * x_0 + 2 * x_1
y = X @ np.array([1, 2]) + 3
reg = LinearRegression(fit_intercept=True).fit(X, y)

print(reg.score(X, y)) # R^2数值
print(reg.intercept_, reg.coef_)
# print(reg.get_params())
print(reg.predict(np.array([[3, 5]])))

1.0
3.000000000000001 [1. 2.]
[16.]


## $\bigstar$ statsmodels

> https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html

- 数组方法（习惯）

- 公式方法

In [5]:
# 数组方法

import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

X = np.array([[1, 1], 
              [1, 2], 
              [2, 2], 
              [2, 3]])
# y = 3 + 1 * x_1 + 2 * x_2
y = X @ np.array([1, 2]) + 3
X = sm.add_constant(X) # 要加上一列常数项1再去回归 ======== 如果没有就是默认齐次
res = sm.OLS(y, X).fit() # 自动从1开始给x编号
print(res.params)
print(res.summary())
# print(res.summary())

print(res.predict([1, 3, 5]))

[3. 1. 2.]
                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 1.194e+29
Date:                Mon, 05 Sep 2022   Prob (F-statistic):           2.05e-15
Time:                        15:10:19   Log-Likelihood:                 127.26
No. Observations:                   4   AIC:                            -248.5
Df Residuals:                       1   BIC:                            -250.4
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          3.0000   1.22e-14   2.45e+



In [4]:
# 公式方法

import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

X = np.array([[1, 1], 
              [1, 2], 
              [2, 2], 
              [2, 3]])
# y = 3 + 1 * x_1 + 2 * x_2
y = X @ np.array([1, 2]) + 3
data = pd.DataFrame({"x1": X[:, 0], "x2": X[:, 1], "y": y})
# print(data.describe())
res = smf.ols("y ~ x1 + x2", data).fit()

print(res.params)
print(res.summary())


Intercept    3.0
x1           1.0
x2           2.0
dtype: float64
                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 1.194e+29
Date:                Mon, 05 Sep 2022   Prob (F-statistic):           2.05e-15
Time:                        14:43:10   Log-Likelihood:                 127.26
No. Observations:                   4   AIC:                            -248.5
Df Residuals:                       1   BIC:                            -250.4
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------

