## 模型的泛化能力（由此及彼的能力、预测）
> **过拟合虽然将原本样本拟合很好，但是与新的样本点相差很大**

> **回归问题中衡量训练数据的准确度没有意义，应该衡量测试数据集的偏差值，从而衡量模型的泛化能力**

<img src='./picture/4-1.png'>

<img src='./picture/4-3.png'>
<img src='./picture/4-4.png'>
<img src='./picture/4-2.png'>


## train test split的意义

---
使用线性回归

In [2]:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(-3, 3, size=100) # -3 到 3之间随机取值 
X = x.reshape(-1, 1)
y = 0.5 * x ** 2 + x + 2 + np.random.normal(0, 1, size=100)

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)

In [4]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error


lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
y_predict = lin_reg.predict(X_test)
mean_squared_error(y_test, y_predict)

2.074441318724263

---
使用多项式回归

In [5]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures# 导入类，使用方法与之前归一化相同

def PolynomialRegression(my_degree):
    return Pipeline([
        ('poly', PolynomialFeatures(degree=my_degree)),
        ('std_scaler', StandardScaler()),
        ('lin_reg', LinearRegression())
        ])

In [6]:
poly2_reg = PolynomialRegression(my_degree=2)
poly2_reg.fit(X_train, y_train)
y2_predict = poly2_reg.predict(X_test)
mean_squared_error(y_test, y2_predict)

0.8325453570216215

In [7]:
poly2_reg = PolynomialRegression(my_degree=10)
poly2_reg.fit(X_train, y_train)
y2_predict = poly2_reg.predict(X_test)
mean_squared_error(y_test, y2_predict)

0.7563032275644646

In [9]:
poly2_reg = PolynomialRegression(my_degree=20)
poly2_reg.fit(X_train, y_train)
y2_predict = poly2_reg.predict(X_test)
mean_squared_error(y_test, y2_predict)

88254.06867209304