## 对测试数据集如何归一化
> **对于测试数据归一化，不能使用自己的均值和方差，应该使用训练数据的均值和方差**

> **真实环境无法得到所有测试数据的均值和方差，测试数据一次只有一个向量，无法得知该数据的均值和方差**

> **要保存训练数据集得到的均值和方差，scikit-learn中使用Scaler**

<img src='./picture/8-1.png' style='weigh:300px;height:300px;float:middle'>

<img src='./picture/8-2.png' style='weigh:300px;height:300px;float:middle'>

<img src='./picture/8-3.png' style='weigh:800px;height:300px;float:middle'>

## Scikit-learn中的Scaler
> **<font color='red'>重要！！！当训练集归一化处理后，测试集也一定要进行归一化处理</font>**

In [1]:
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()

In [2]:
X = iris.data
y = iris.target

In [3]:
X[:10, :]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=666)

## scikit-learn中的StandardScaler
> **1)创建StandardScaler对象**

> **2)拟合训练集，调用自身对象查询结果（mean___下划线表示不是用户创建的变量，而是系统通过计算得到的变量用下划线表示）**

> **3)调用transform方法，对训练集和测试集进行归一化处理**

> **<font color='red'>重要！！！当训练集归一化处理后，测试集也一定要进行归一化处理</font>**

In [6]:
from sklearn.preprocessing import StandardScaler

In [8]:
standardScaler = StandardScaler()

In [10]:
standardScaler.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [11]:
standardScaler.mean_ ##查看平均值

array([5.83416667, 3.0825    , 3.70916667, 1.16916667])

In [13]:
standardScaler.scale_ ##查看方差

array([0.81019502, 0.44076874, 1.76295187, 0.75429833])

In [15]:
standardScaler.transform(X_train)

array([[-0.90616043,  0.94720873, -1.30982967, -1.28485856],
       [-1.15301457, -0.18717298, -1.30982967, -1.28485856],
       [-0.16559799, -0.64092567,  0.22169257,  0.17345038],
       [ 0.45153738,  0.72033239,  0.95909217,  1.49918578],
       [-0.90616043, -1.3215547 , -0.40226093, -0.0916967 ],
       [ 1.43895396,  0.2665797 ,  0.56203085,  0.30602392],
       [ 0.3281103 , -1.09467835,  1.07253826,  0.30602392],
       [ 2.1795164 , -0.18717298,  1.63976872,  1.2340387 ],
       [-0.78273335,  2.30846679, -1.25310662, -1.4174321 ],
       [ 0.45153738, -2.00218372,  0.44858475,  0.43859746],
       [ 1.80923518, -0.41404933,  1.46959958,  0.83631808],
       [ 0.69839152,  0.2665797 ,  0.90236912,  1.49918578],
       [ 0.20468323,  0.72033239,  0.44858475,  0.571171  ],
       [-0.78273335, -0.86780201,  0.10824648,  0.30602392],
       [-0.53587921,  1.40096142, -1.25310662, -1.28485856],
       [-0.65930628,  1.40096142, -1.25310662, -1.28485856],
       [-1.0295875 ,  0.

In [16]:
X_train = standardScaler.transform(X_train)

In [17]:
X_test_standard = standardScaler.transform(X_test)

---
对归一化后的数据进行分类

In [18]:
from sklearn.neighbors import KNeighborsClassifier

In [19]:
knn_clf = KNeighborsClassifier(n_neighbors = 3)

In [20]:
knn_clf.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='uniform')

In [21]:
knn_clf.score(X_test_standard, y_test)

1.0

In [None]:
kn