# 模型评价：回归模型的常用评价指标

#### 样本误差：衡量模型在一个样本上的预测准确性

样本误差 = 样本预测值 - 样本实际值

#### 最常用的评价指标：均误差方（MSE）

指标解释：所有样本的样本误差的平方的均值

指标解读：均误差方越接近0，模型越准确

#### 较为好解释的评价指标：平均绝对误差（MAE）

指标解释：所有样本的样本误差的绝对值的均值

指标解读：平均绝对误差的单位与因变量单位一致，越接近0，模型越准确

#### 平均绝对误差的衍生指标：平均绝对比例误差（MAPE）

指标解释：所有样本的样本误差的绝对值占实际值的比值

指标解读：

#### 模型解释度：R squared R方

指标解释：应变量的方差能被自变量解释的程度

指标解读：指标越接近1，则代表自变量对于应变量的解释度越高

In [None]:
# 使用sklearn查看回归模型的各项指标
import pandas as pd
import matplotlib.pyplot as plt
# 样例数据读取
df = pd.read_excel('realestate_sample_preprocessed.xlsx')
# 根据共线性矩阵，保留与房价相关性最高的日间人口，将夜间人口和20-39岁夜间人口进行比例处理
def age_percent(row):
    if row['nightpop'] == 0:
        return 0
    else:
        return row['night20-39']/row['nightpop']
df['per_a20_39'] = df.apply(age_percent,axis=1)
df = df.drop(columns=['nightpop','night20-39'])
# 数据集基本情况查看
print(df.shape)
print(df.dtypes)
print(df.isnull().sum())

In [None]:
# 划分数据集
x = df[['complete_year','area', 'daypop', 'sub_kde',
       'bus_kde', 'kind_kde','per_a20_39']]
y = df['average_price']
print(x.shape)
print(y.shape)

In [None]:
# 建立回归模型
# 使用pipeline整合数据标准化、主成分分析与模型
import numpy as np
from sklearn.linear_model import LinearRegression, LassoCV
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler, PowerTransformer
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
# 构建模型工作流
pipe_lm = Pipeline([
        ('sc',StandardScaler()),
        ('power_trans',PowerTransformer()),
        ('polynom_trans',PolynomialFeatures(degree=3)),
        ('lasso_regr', LassoCV(alphas=(
                list(np.arange(8, 10) * 10)
            ),
    cv=KFold(n_splits=3, shuffle=True),
    n_jobs=-1))
        ])
print(pipe_lm)

In [None]:
# 查看模型表现
import warnings
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
warnings.filterwarnings('ignore')
pipe_lm.fit(x,y)
y_predict = pipe_lm.predict(x)
print(f'mean squared error is: {mean_squared_error(y,y_predict)}')
print(f'mean absolute error is: {mean_absolute_error(y,y_predict)}')
print(f'R Squared is: {r2_score(y,y_predict)}')
# 计算mape
check = df[['average_price']]
check['y_predict'] = pipe_lm.predict(x)
check['abs_err'] = abs(check['y_predict']-check['average_price'] )
check['ape'] = check['abs_err']/check['average_price']
ape = check['ape'].mean()
print(f'mean absolute percent error is: {ape}')

## 谢谢大家观看