# 超参数

* 超参数
    在算法运行之前就需要决定的参数
    比如knn算法中的k值
* 模型参数
    在算法过程中学习的参数
    knn算法中没有模型参数

In [12]:
from sklearn import datasets
import numpy as np

In [2]:
digits = datasets.load_digits()
X = digits.data
y = digits.target

In [8]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state = 666)

In [9]:
from sklearn.neighbors import KNeighborsClassifier
knn_classifier = KNeighborsClassifier(n_neighbors = 3)

In [10]:
knn_classifier.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='uniform')

In [13]:
knn_classifier.score(X_test,y_test)

0.9888888888888889

## 寻找最好的k(调参)

In [14]:
#给定一个k的范围，然后循环比较出最优参数
best_score = 0.0
best_k = -1
#给定k的值为1到10
for k in range(1,11):
    knn_classifier = KNeighborsClassifier(n_neighbors = k)
    knn_classifier.fit(X_train,y_train)
    score = knn_classifier.score(X_test,y_test)
    if score > best_score:
        best_k = k
        best_score = score
print("best_k = ",best_k)
print("best_score = ",best_score)

best_k =  4
best_score =  0.9916666666666667


## 判断是否加入权重

In [16]:
best_method = ""
best_score = 0.0
best_k = -1
#knn中的weights参数是指定是否考虑距离权重
for method in ["uniform","distance"]:
    for k in range(1,11):
        knn_clf = KNeighborsClassifier(n_neighbors = k,weights = method)
        knn_clf.fit(X_train,y_train)
        score = knn_clf.score(X_test,y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_method = method
            
print("best_method = ",best_method)
print("best_k = ",best_k)
print("best_score = ",best_score)

best_method =  uniform
best_k =  4
best_score =  0.9916666666666667


### ps

距离
* 曼哈顿距离
    就是指两个点在每个维度上的距离的绝对值的和

![image.png](attachment:image.png)

上图中红蓝黄线都是这两个点的曼哈顿距离       
中间的绿线是欧拉距离      
在这个图中的两个点，他们的曼哈顿距离就是指他们在X轴上的距离加上在Y轴上的距离

其实欧拉距离和曼哈顿距离在形式上是有一定的一致性的

![image.png](attachment:image.png)

**这就是明可夫斯基距离(Minkowski metric)**      
也是一个新的超参数p  

### 探索明可夫斯基距离相应的最优值p

In [21]:
%%time
best_p = -1
best_k = -1
best_score = 0.0
for k in range(1,11):
    for p in range(1,6):
        knn_clf = KNeighborsClassifier(n_neighbors = k,weights="distance")
        knn_clf.fit(X_train,y_train)
        score = knn_clf.score(X_test,y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_p = p
            
print("best_p = ",best_p)
print("best_k = ",best_k)
print("best_score = ",best_score)
        

best_p =  1
best_k =  3
best_score =  0.9888888888888889
Wall time: 3.54 s
