### Python实现KNN

使用[sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)中KNN的实现（包括KD Tree），下面简单介绍其ＡＰＩ的使用方法。

#### Toy dataset

In [1]:
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X, y) 

print(neigh.predict([[1.1]]))
print(neigh.predict_proba([[0.9]]))

[0]
[[0.66666667 0.33333333]]


#### Iris dataset

In [7]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

# 载入数据集
iris = datasets.load_iris()
X = iris.data
y = iris.target
# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)
# 训练
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='uniform')

In [9]:
# 两个误差
print('Train Accuracy: ', knn.score(X_train, y_train))
print('Test Accuracy: ', knn.score(X_test, y_test))

Train Accuracy:  0.96
Test Accuracy:  0.98


In [15]:
# 我们也可以手动计算训练和测试误差
import numpy as np
y_train_predict = knn.predict(X_train)
print('Train Accuracy: ', np.mean(y_train_predict == y_train))

y_test_predict = knn.predict(X_test)
print('Test Accuracy:', np.mean(y_test_predict == y_test))

Train Accuracy:  0.96
Test Accuracy: 0.98


#### 使用KD-Tree

In [16]:
# 只需要将参数algorithm设置为`kd_tree`
knn = KNeighborsClassifier(n_neighbors=3, algorithm='kd_tree')
knn.fit(X_train, y_train)

KNeighborsClassifier(algorithm='kd_tree', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='uniform')