## **KNN模型回归**

再次使用系数来衡量回归模型的性能，并引入两个用于衡量回归任务性能的新指标——**平均绝对误差（MAE）** 和**均方误差（MSE）**

In [1]:
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

In [17]:
X_train = np.array([
    [158,1],
    [170,1],
    [183,1],
    [191,1],
    [155,0],
    [163,0],
    [180,0],
    [158,0],
    [170,0]
])
y_train = [64,86,84,80,49,59,67,54,67]

In [18]:
X_test=np.array([
    [168,1],
    [180,1],
    [160,0],
    [169,0]
])
y_test=[65,96,52,67]

In [4]:
K=3
clf=KNeighborsRegressor(n_neighbors=K)
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print('Predicted weights:%s'%predictions)
print('Coefficient of determination:%s'%r2_score(y_test,predictions))    #决定系数
print('Mean absolute error:%s'%mean_absolute_error(y_test,predictions))  #平均绝对误差
print('Mean squared error:%s'%mean_squared_error(y_test,predictions))    #均方误差

Predicted weights:[70.66666667 79.         59.         70.66666667]
Coefficient of determination:0.6290565226735438
Mean absolute error:8.333333333333336
Mean squared error:95.8888888888889


## **特征缩放**

In [5]:
from scipy.spatial.distance import euclidean

In [10]:
#heights in millimeters
X_train = np.array([
    [1700,1],
    [1600,0]
])
x_test=np.array([1640,1]).reshape(1,-1)
print(euclidean(X_train[0,:],x_test))
print(euclidean(X_train[1,:],x_test))

60.0
40.01249804748511


In [14]:
#height in meters
X_train = np.array([
    [1.7,1],
    [1.6,0]
])
x_test=np.array([1.64,1]).reshape(1,-1)
print(euclidean(X_train[0,:],x_test))
print(euclidean(X_train[1,:],x_test))

0.06000000000000005
1.0007996802557444


StandardScaler类是一个用于特征缩放的转换器，它能确保所有的特征都有单位方差。  
均值为0，方差为1的数据称为标准化数据。
## 把上面的回归问题特征做标准化处理

In [15]:
from sklearn.preprocessing import StandardScaler

In [19]:
ss = StandardScaler()
X_train_scaled = ss.fit_transform(X_train)
print(X_train)
print(X_train_scaled)

[[158   1]
 [170   1]
 [183   1]
 [191   1]
 [155   0]
 [163   0]
 [180   0]
 [158   0]
 [170   0]]
[[-0.9908706   1.11803399]
 [ 0.01869567  1.11803399]
 [ 1.11239246  1.11803399]
 [ 1.78543664  1.11803399]
 [-1.24326216 -0.89442719]
 [-0.57021798 -0.89442719]
 [ 0.86000089 -0.89442719]
 [-0.9908706  -0.89442719]
 [ 0.01869567 -0.89442719]]


In [22]:
X_test_scaled=ss.transform(X_test)
clf.fit(X_train_scaled,y_train)
predictions=clf.predict(X_test_scaled)
print('Coefficient of determination:%s'%r2_score(y_test,predictions))
print('Mean absolute error:%s'%mean_absolute_error(y_test,predictions))
print('Mean squared error:%s' % mean_squared_error(y_test,predictions))

Coefficient of determination:0.6706425961745109
Mean absolute error:7.583333333333336
Mean squared error:85.13888888888893


我们的模型在标准化数据上性能表现更加。表示性别的特征对实例之间的距离计算贡献更大，这让模型能做出更好的预测。