# 超参数
#### 超参数 ：在算法运行之前需要决定的参数
#### 模型参数 ：算法过程中学习的参数

#### KNN算法没有模型参数
#### KNN算法中的k是典型的超参数
#### 典型的说法就是算法工程师的调参工作，指的就是超参数，是要算法运行前要设置好的

### 如何寻找好的超参数？ 通常是根据以下三点：
#### 1. 领域知识  2. 经验数值 3.实验搜索

In [3]:
import numpy as np
from sklearn import datasets
from sklearn import model_selection
from sklearn.neighbors import KNeighborsClassifier



In [4]:
digits = datasets.load_digits()
X = digits.data
y = digits.target


In [5]:

_X_train,_X_test,_y_train,_y_test = model_selection.train_test_split(X,y)


In [6]:

knn_clf = KNeighborsClassifier(n_neighbors=3)
knn_clf.fit(_X_train,_y_train)
knn_clf.score(_X_test,_y_test)

0.9911111111111112

### 寻找最好的K

In [9]:
best_score = 0.0
best_k = -1
for k in range (1,11):
    knn_clf = KNeighborsClassifier(n_neighbors=k)
    knn_clf.fit(_X_train,_y_train)
    score = knn_clf.score(_X_test,_y_test)
    if score > best_score:
        best_score = score
        best_k = k
        
print("best k is ",best_k)
print("best score is ",best_score)

best k is  3
best score is  0.9911111111111112


## KNN近邻算法的另一个重要超参数 ：距离权重
### 举例说明： 
#### . 与元素A最近的三个元素分别是B1 B2 B3 
#### . 其中 B1的标签是蓝色 ，B2 B3的标签是蓝色
#### . A与B1的距离是1   ； A与B2 B3的距离分别是3 4
#### . 距离权重公式 蓝色为： 1/B2距离 + 1/B3距离 = 1/3 + 1/4 = 7/12
#### . 距离权重公式 红色为： 1/B1距离 = 1/1 = 1
#### . 红色结果要大于蓝色，所以即使蓝色有两个近邻元素，但是却输了

### 距离权重在sklearn中的具体实现如下：
#### KNeighborsClassifier(n_neighbors=k)还可以加另一个参数weights, 这个参数取值["uniform","distance"]
#### uniform : 不考滤距离权重    distance : 考滤距离权重

In [12]:
best_score = 0.0
best_k = -1
best_mode = ""

for mode in ["uniform","distance"]:
    for k in range (1,11):
        knn_clf = KNeighborsClassifier(n_neighbors=k,weights=mode)
        knn_clf.fit(_X_train,_y_train)
        score = knn_clf.score(_X_test,_y_test)
        if score > best_score:
            best_score = score
            best_k = k
            best_mode = mode
        
print("best mode is ",best_mode)
print("best k is ",best_k)
print("best score is ",best_score)

best mode is  distance
best k is  4
best score is  0.9933333333333333


## 距离超参
#### 距离是什么距离？
#### 欧拉距离、曼哈顿距离、明可夫斯基距离
#### 两点之间的直线距离就是欧拉距离，两点之间的直角线就是曼哈顿距离，把两个距离公司给简化后合并就是明可夫斯基距离
#### 明可夫斯基距离有个P参数，当P为1时就是曼哈顿距离，为2时就是欧拉距离，大于2时就是其它距离了
#### 到底用哪个参数更准确呢？这又是一个超参

### 更多的距离定义
###### 向量空间余弦相似度 Cosine Similarity
###### 调整余玄相似度 Adjusted Cosine Similarity
###### 皮尔森相关系数  Pearson Correlation Coefficient
###### Jaccard相似系数 Jaccard Coefficient

In [14]:
best_score = 0.0
best_k = -1
best_mode = ""
best_p = -1

for p in range(1,5):
    for mode in ["uniform","distance"]:
        for k in range (1,11):
            knn_clf = KNeighborsClassifier(n_neighbors=k,weights=mode,p=p)
            knn_clf.fit(_X_train,_y_train)
            score = knn_clf.score(_X_test,_y_test)
            if score > best_score:
                best_score = score
                best_k = k
                best_mode = mode
                best_p = p
        
print("best p is ",best_p)
print("best mode is ",best_mode)
print("best k is ",best_k)
print("best score is ",best_score)

best p is  3
best mode is  distance
best k is  4
best score is  0.9955555555555555


In [15]:
best_score = 0.0
best_k = -1
best_mode = ""
best_p = -1

for p in range(1,5):
    for k in range (1,11):
        knn_clf = KNeighborsClassifier(n_neighbors=k,weights="distance",p=p)
        knn_clf.fit(_X_train,_y_train)
        score = knn_clf.score(_X_test,_y_test)
        if score > best_score:
            best_score = score
            best_k = k
            best_mode = mode
            best_p = p
        
print("best p is ",best_p)
print("best k is ",best_k)
print("best score is ",best_score)

best p is  3
best k is  4
best score is  0.9955555555555555
