### KNN算法实现

#### KNN算法整个过程分为三步：1.计算待分类样本与其他样本之间的距离；2.统计距离最近的K个邻居；3.对于K个最近的邻居，他们属于哪个分类最多，待分类样本就属于哪一类。

##### 通过Iris数据集实现KNN算法，定义了euc_dis函数计算两个样本之间的欧氏距离和knn_classify函数通过选票的方法预测样本标签。
##### 也可以通过from sklearn.neighbors import KNeighborsClassifier 导入 KNeighborsClassifier,进行模型建立和训练。

In [1]:
import numpy as np
from sklearn import datasets
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

#### 导入Iris数据集

In [2]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [3]:
def euc_dis(instance1, instance2):
    """
    计算两个样本instance1和instance2之间的欧式距离
    instance1:第一个样本,array型
    instance2:第二个样本,array型
    """
    dist = np.sqrt(sum((instance1 - instance2) ** 2))
    return dist

In [9]:
def knn_classify(X, y, testInstance, k):
    """
    给定一个测试数据testInstance,通过KNN算法来预测它的标签
    X:训练数据的特征
    y:训练数据的标签
    testInstance:测试数据,这里假定一个测试数据为array型
    k:划分的类别数目
    """
    # 返回testInstance的预测标签={0, 1, 2}
    distances = [euc_dis(x, testInstance) for x in X]
    # 排序
    kneighbors = np.argsort(distances)[:k]
    # count是一个字典
    count = Counter(y[kneighbors])
    # count.most_common()[0][0]是票数最多的
    return count.most_common()[0][0]

In [11]:
# 预测结果Iris是典型的三分类数据集,这里K指定为3
predictions = [knn_classify(X_train, y_train, data, 3) for data in X_test]
print(predictions[:5])
corrent = np.count_nonzero((predictions==y_test)==True)
print(corrent)
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(X_train, y_train)
print("sklearn KNN-model's Accuracy is:%.3f"%(accuracy_score(y_test,
      clf.predict(X_test))))
print("My KNN-model's Accuracy is:%.3f"%(corrent/len(X_test)))

[np.int64(2), np.int64(1), np.int64(0), np.int64(2), np.int64(0)]
37
sklearn KNN-model's Accuracy is:0.974
My KNN-model's Accuracy is:0.974
