## 超参数

超参数和模型参数
* 超参数：在算法运行前需要决定的参数
* 模型参数：算法过程中学习的参数

In [1]:
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

In [2]:
digits=datasets.load_digits()
x=digits.data
y=digits.target

In [3]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=666)

In [4]:
knn_clf=KNeighborsClassifier(n_neighbors=3)
knn_clf.fit(x_train,y_train)
knn_clf.score(x_test,y_test)

0.9888888888888889

## 寻找最好的K (进行多次实验找最好的k)

In [5]:
best_k=0
best_score=0.0
for k in range(1,11):
    knn_clf = KNeighborsClassifier(n_neighbors=k)
    knn_clf.fit(x_train,y_train)
    score = knn_clf.score(x_test,y_test)
    if score>best_score:
        best_k=k
        best_score=score
print("best_k:",best_k)
print("best_score:",best_score)

best_k: 4
best_score: 0.9916666666666667


## 考虑距离权重 (不是最近的k个元素都有相同的投票权重)

In [6]:
best_k=0
best_score=0.0
best_method=""
for method in ["uniform","distance"]:
    for k in range(1,11):
        knn_clf = KNeighborsClassifier(n_neighbors=k,weights=method)
        knn_clf.fit(x_train,y_train)
        score = knn_clf.score(x_test,y_test)
        if score>best_score:
            best_k=k
            best_method=method
            best_score=score
print("best_k:",best_k)
print("best_method:",best_method)
print("best_score:",best_score)

best_k: 4
best_method: uniform
best_score: 0.9916666666666667


## 欧拉距离、曼哈顿距离、明可夫斯基距离

欧拉距离:$\sqrt {\sum ^{n}_{i=1}\left( x_{ai}-x_{bi}\right) }$ (当 p=2)

曼哈顿距离:$\sum ^{n}_{i=1}\left| x_{ai}-x_{bi}\right|$ (当 p=1)

明可夫斯基距离：$ \left( \sum ^{n}_{i=1}\left| x_{ai}-x_{bi}\right| ^{p}\right) ^{\dfrac {1}{p}} $

In [7]:
best_k=0
best_score=0.0
best_p=0
for p in range(1,6):
    for k in range(1,11):
        knn_clf = KNeighborsClassifier(n_neighbors=k,weights="distance",p=p)
        knn_clf.fit(x_train,y_train)
        score = knn_clf.score(x_test,y_test)
        if score>best_score:
            best_k=k
            best_p=p
            best_score=score
print("best_k:",best_k)
print("best_p:",best_p)
print("best_score:",best_score)

best_k: 5
best_p: 1
best_score: 0.9888888888888889
