# 超参数：
###  在算法运行前需要决定的参数
# 模型参数：
###  算法运行过程中学习的参数

### knn没有模型参数，只有超参数K
#### 调参（调整超参数）
#### 库中预先封装的超参数一般是经验数值

In [1]:
from sklearn import datasets
digits = datasets.load_digits()
X = digits.data#特征矩阵
y = digits.target

## 寻找最好的k（自动调参）

In [2]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=100)

from sklearn.neighbors import KNeighborsClassifier#加载算法 
KNN_classifier = KNeighborsClassifier(n_neighbors=10)#K=10
KNN_classifier.fit(X_train,y_train)
KNN_classifier.score(X_test,y_test)

0.9805555555555555

In [3]:
best_score = 0.0
best_k = -1
for k in range(1,11):
    KNN_classifier = KNeighborsClassifier(n_neighbors=k)#K=-1
    KNN_classifier.fit(X_train,y_train)
    score = KNN_classifier.score(X_test,y_test)
    if score > best_score:
        best_k = k
        best_score = score
print("best_k",best_k)
print("best_score",best_score)

best_k 3
best_score 0.9972222222222222


### 距离权值 knn{隐藏的超参数}

In [4]:
best_score = 0.0
best_method = ""
best_k = -1
for method in ["uniform","distance"]:#不考虑距离权重和考虑距离权重
    for k in range(1,11):
        KNN_classifier = KNeighborsClassifier(n_neighbors=k,weights=method)#K=-1,method=""
        KNN_classifier.fit(X_train,y_train)
        score = KNN_classifier.score(X_test,y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_method = method
            
print("best_k=",best_k)
print("best_method=",best_method)
print("best_score=",best_score)

best_k= 3
best_method= uniform
best_score= 0.9972222222222222


## 距离

### 曼哈顿距离(在每个维度的距离):`(np.sum(A - B)**1)**(1/1)`
### 欧拉距离(直接距离):`(np.sum((A - B)**2))**(1/2)`
### 推导==>明可夫斯基距离:`(np.sum((A - B)**p))**(1/p)`
#### 由此得到新的超参数：p

### 向量空间余弦相似度Cosine Similarity
### 调整余弦相似度Adjusted Cosine Similarity
### 皮尔森相关系数Pearson Correlation Coefficient
### Jaccard相似系数Jaccard Coefficient
#### 在sklearn.neighbors.KNeighborsClassifier还有一个超参数：metric.是一个string,由class sklearn.neighbors.DistanceMetric定义

### 搜索明可夫斯基距离的p

In [21]:
%%time

best_score = 0.0
best_p = -1
best_k = -1
for k in range(1,11):
    for p in range(1,6):
        KNN_classifier = KNeighborsClassifier(n_neighbors=k,weights="distance",p=p)#K=-1,p=-1
        KNN_classifier.fit(X_train,y_train)
        score = KNN_classifier.score(X_test,y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_p = p
            
print("best_k=",best_k)
print("best_p=",best_p)
print("best_score=",best_score)

best_k= 1
best_p= 2
best_score= 0.9944444444444445
Wall time: 27.4 s


## Grid Search 网格搜索{寻找最好的超参数}

In [22]:
param_grid = [
    {
        'weights':['uniform'],
        'n_neighbors':[i for i in range(1,11)]
    },
    {
        'weights':['distance'],
        'n_neighbors':[i for i in range(1,11)],
        'p':[i for i in range(1,6)]
    }
]

In [23]:
%%time
from sklearn.model_selection import GridSearchCV#网格搜索交叉验证

knn = KNeighborsClassifier()
grid_search = GridSearchCV(knn,param_grid)#寻找最佳超参数

grid_search.fit(X_train,y_train)

Wall time: 3min 50s


In [24]:
grid_search.best_estimator_#根据用户传入的参数，程序自己计算的参数，定义的方法中应该有'_'

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=3,
           weights='distance')

In [25]:
grid_search.best_score_#最佳score，score的评定方法比我的accuracy更加复杂合理

0.9832985386221295

In [26]:
grid_search.best_params_#最佳参数

{'n_neighbors': 5, 'p': 3, 'weights': 'distance'}

In [28]:
knn_clf = grid_search.best_estimator_#最佳参数对应的knn
knn_clf.predict(X_test)
knn_clf.score(X_test,y_test)

0.9888888888888889

In [29]:
%%time
#n_jobs:在寻找超参数时，可以并行运算，n_jobs是使用cpu数量{线程数}，'n_jobs=-1'为最多核心数
#verbose=2：在搜索是输出数值
grid_search = GridSearchCV(knn,param_grid,n_jobs=-1,verbose=2)
grid_search.fit(X_train,y_train)

Fitting 3 folds for each of 60 candidates, totalling 180 fits


[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    3.9s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  1.2min


Wall time: 1min 30s


[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:  1.5min finished
