# 超参数和模型参数

**超参数**：在算法运行前需要决定的参数

**模型参数**：算法过程中学习的参数

KNN算法没有模型参数
    
KNN算法中的k是典型的超参数

## 寻找好的超参数

* 领域知识
* 经验数值
* 实验探索

In [1]:
import numpy as np
from sklearn import datasets

In [2]:
digit = datasets.load_digits()

In [3]:
X = digit.data
y = digit.target

In [4]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=666)

In [5]:
from encapsulations.KNN import KNNClassifier
knn_clf = KNNClassifier(k=3)

In [6]:
knn_clf.fit(X_train,y_train)
knn_clf.score(X_test,y_test)

0.9888888888888889

In [7]:
from sklearn.neighbors import KNeighborsClassifier
my_knn_clf = KNeighborsClassifier(n_neighbors=3)
my_knn_clf.fit(X_train,y_train)
my_knn_clf.score(X_test,y_test)

0.9888888888888889

## 寻找最好的K

In [9]:
best_score = 0.0
best_k = -1
for k in range(1,11):
    knn_clf = KNeighborsClassifier(n_neighbors=k)
    knn_clf.fit(X_train,y_train)
    score = knn_clf.score(X_test,y_test)
    if score > best_score:
        best_k = k
        best_score = score
print("best_k=",best_k," best_score=",best_score)

best_k= 4  best_score= 0.9916666666666667


**如果上述程序找到最好的k为10的话，通常需要继续往10以上的超参数再继续寻找，以防更好的k值在边界的外围**

## K均值中的距离问题

### 考虑距离？不考虑距离？



**考虑距离有利于解决平票的问题**


In [10]:
best_score = 0.0
best_k = -1
for method in ["uniform","distance"]:
    for k in range(1,11):
        knn_clf = KNeighborsClassifier(n_neighbors=k,weights=method)
        knn_clf.fit(X_train,y_train)
        score = knn_clf.score(X_test,y_test)
        if score > best_score:
            best_k = k
            best_score = score
print("best_method=",method,"\tbest_k=",best_k," best_score=",best_score)

best_method= distance 	best_k= 4  best_score= 0.9916666666666667


## 探索明科夫斯基距离中的P

当method使用"uniform"时不牵扯P这个超参数，当method使用“distance”时，则P默认为2

In [11]:
best_p = -1
best_score = 0.0
best_k = -1
for k in range(1,11):
    for p in range(1,6):
        knn_clf = KNeighborsClassifier(n_neighbors=k,weights="distance",p=p)
        knn_clf.fit(X_train,y_train)
        score = knn_clf.score(X_test,y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_p = p
print("best_p=",p,"\tbest_k=",best_k," best_score=",best_score)

best_p= 5 	best_k= 3  best_score= 0.9888888888888889
