# KNN & Grid Search Example

This notebook shows how to tune a simple classifier like a decision tree via GridSearch.

In [1]:
%load_ext watermark
%watermark -p scikit-learn,mlxtend,xgboost

scikit-learn: 0.24.1
mlxtend     : 0.19.0
xgboost     : not installed



# With PCA

In [8]:
import pandas as pd


X_train = pd.read_csv('pcaX_train.csv').values
y_train = pd.read_csv('pcay_train.csv').values.ravel().astype(int)

X_test = pd.read_csv('pcaX_test.csv').values
y_test = pd.read_csv('pcay_test.csv').values.ravel().astype(int)

print('X_train.shape:', X_train.shape)
print('y_train.shape:', y_train.shape)
print('X_test.shape:', X_test.shape)
print('y_test.shape:', y_test.shape)


from sklearn.model_selection import train_test_split
X_train_sub, X_valid, y_train_sub, y_valid = \
    train_test_split(X_train, y_train, test_size=0.2, random_state=1, stratify=y_train)

print('Train/Valid/Test sizes:', y_train.shape[0], y_valid.shape[0])

X_train.shape: (18592, 49)
y_train.shape: (18592,)
X_test.shape: (4649, 49)
y_test.shape: (4649,)
Train/Valid/Test sizes: 18592 3719


In [9]:
from sklearn.neighbors import KNeighborsClassifier


model = KNeighborsClassifier()
model.fit(X_train, y_train)

KNeighborsClassifier()

In [10]:
import numpy as np
from sklearn.model_selection import GridSearchCV



params =  {
    'n_neighbors': [2,3,4,5,6,7,8]
}


grid = GridSearchCV(estimator=model,
                    param_grid=params,
                    cv=10,
                    n_jobs=1,
                    verbose=2)

grid.fit(X_train, y_train)

grid.best_score_

Fitting 10 folds for each of 7 candidates, totalling 70 fits
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=2; total time=   0.4s
[CV] END ......................................n_neighbors=3; total time=   0.5s
[CV] END ......................................n

0.7527974920034474

In [11]:
grid.best_params_

{'n_neighbors': 3}

In [12]:
print(f"Training Accuracy: {grid.best_estimator_.score(X_train, y_train):0.5f}")
#print(f"Validation Accuracy: {grid.best_estimator_.score(X_valid, y_valid):0.2f}")
print(f"Test Accuracy: {grid.best_estimator_.score(X_test, y_test):0.5f}")

Training Accuracy: 0.87887
Test Accuracy: 0.76145


In [13]:
%timeit model.fit(X_train, y_train) #training(fitting) time
%timeit model.score(X_valid, y_valid) #test time

2.17 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.63 s ± 98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
