## Machine Learning 

### 线性回归

[1 线性回归公式](#1-公式1)
$$ y = w \cdot x_{i} + b$$

&emsp;&emsp;其中 $b$ 和 $w$是未知的参数，在机器学习里面，$x$是输入数据(<mark>特征</mark>),$y$是预测的数据。 <br>
&emsp;&emsp;我们的目的是要根据现有的 $x$的数据，去学习一个`function`，能够更好的反映出 $y$与 $x$之间的关系式。<br>
&emsp;&emsp;我们记 $y$是真实值, $y^{'}$是预测值，我们要找到一组$(w,b)$的组合，使得 $ | y -y^{'}|$尽可能小.

&emsp;&emsp; MAE(Mean Absolute Error) :  平均绝对误差  $e = |y-y^{'}|$   </br>
&emsp;&emsp; MSE(Mean Squared Error)  :  均方误差 $e = (y-y^{'})^{2}$



#### 导入必要的包

In [7]:
# from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd

#### 加载数据

In [12]:
# 加载波士顿房价数据集
boston = pd.read_csv('./datasets/boston_housing.csv')
# 特征（所有列除了价格列）
X = boston.drop('PRICE', axis=1)

# 目标变量（价格列）
y = boston['PRICE']

# 显示前几行数据
X.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [14]:
def calEvaluation(y_test,y_pred):
    # 计算评估指标
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    return mse ,r2 
    

In [17]:


# 分割数据为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 创建线性回归模型并拟合数据
model = LinearRegression()
model.fit(X_train, y_train)

# 预测测试集
y_pred = model.predict(X_test)

# 计算评估指标
mse , r2 = calEvaluation(y_test,y_pred) 
print(f" MSE : {mse} ") 
print(f" R2  : {r2} ") 
# 获取模型系数
coefficients = model.coef_

mse, r2, coefficients


 MSE : 24.29111947497345 
 R2  : 0.6687594935356329 


(24.29111947497345,
 0.6687594935356329,
 array([-1.13055924e-01,  3.01104641e-02,  4.03807204e-02,  2.78443820e+00,
        -1.72026334e+01,  4.43883520e+00, -6.29636221e-03, -1.44786537e+00,
         2.62429736e-01, -1.06467863e-02, -9.15456240e-01,  1.23513347e-02,
        -5.08571424e-01]))