Hello, Data Scientists

In this episode I will be giving you a ride deeper into the Sci-Kit-Learn features that we had talked about in the previous episode (the **Part I** of this episode). If you haven't seen the previous episode, please make sure to visit it before this one. The previous episode is the foundation of this episode so visiting it will be to your advantage.

So we will continue with our features which were:
    
    - Transformation Pipelines
	- Cross Validation
	- Hyperparameter Tuning with Grid Search and Randomized Search

We will be digging a litte deeper into these features so let's get started...

As usual, let's import the common imports

In [1]:
import numpy as np
import pandas as pd

np.random.seed(42) # To stabilise the notebook's output across all runs

So that you know, in this episode we will not be using the Titanic dataset as we did in the previous one, rather, we will be using a **Sci-Kit-Learn** dataset to demonstrate some powerful implementations of the features that we stated earlier...

In [2]:
# importing the dataset
from sklearn.datasets import make_classification

In [3]:
# loading the dataset
data = make_classification(n_samples=1000, n_features=10, random_state=42)

# n_samples => the rows,  number of rows we want our data to have
# n_features => the columns,  number of features/attributes we want our data to have

In [4]:
data

(array([[ 0.96479937, -0.06644898,  0.98676805, ..., -1.2101605 ,
         -0.62807677,  1.22727382],
        [-0.91651053, -0.56639459, -1.00861409, ..., -0.98453405,
          0.36389642,  0.20947008],
        [-0.10948373, -0.43277388, -0.4576493 , ..., -0.2463834 ,
         -1.05814521, -0.29737608],
        ...,
        [ 1.67463306,  1.75493307,  1.58615382, ...,  0.69272276,
         -1.50384972,  0.22526412],
        [-0.77860873, -0.83568901, -0.19484228, ..., -0.49735437,
          2.47213818,  0.86718741],
        [ 0.24845351, -1.0034389 ,  0.36046013, ...,  0.77323999,
          0.1857344 ,  1.41641179]]),
 array([0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
        1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
        0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0,
        0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
        0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0,
   

As we can see, we have two (2) arrays in our function, the first array has the "Data" and the other one has the "Labels". We only have two "Label" categories, 0 and 1, so automatically, our model is going to be a binary classifier.

We need to split the data into two (2) groups, the **Training set** and the **Test set**. (Although others prefer including a third set called **Validation set**, we will not be doing that in this tutorial). We will use the train_test_split feature **Sci-Kit-Learn** provides.

In [5]:
from sklearn.model_selection import train_test_split

X, y = data[0], data[1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Let's make a function that quickly gives us a **Cross Validation** score and a **Test Score** so that we measure the accuracy of the models that we will be working on

In [6]:
from sklearn.model_selection import cross_val_score

def cscore(model):
    model.fit(X_train, y_train)
    return cross_val_score(model, X_train, y_train, cv=10, scoring='accuracy').mean()
    
def score(model):
    model.fit(X_train, y_train)
    return model.score(X_test, y_test)

We will be comparing two (2) models for this tutorial, the KNeighborsClassifier and the SGDClassifier

So let's jump right in

In [7]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(tol=-np.infty, random_state=42)
cscore(sgd_clf), score(sgd_clf)

(0.8687109313955306, 0.83)

It's not that much but it's something, let's see how the KNeighborsClassifier performs

In [8]:
from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier(n_neighbors=10)
cscore(knn_clf), score(knn_clf)

(0.8536314267854352, 0.835)

It seems our **KNeighborsClassifier** performed less than the **SGDClassifier** on cross validation but did a tinny-tiny better on the test set. Let's try searching for the optimal **n_neighbors** that can give us a much higher accuracy so that the performance of the model can be elevated.

In [9]:
# We had already imported KNeighborsClassifier so there is really no need to do it again

neighbors = range(1, 30)
items = []
for neighbor in neighbors:
    knn_clf = KNeighborsClassifier(n_neighbors=neighbor)
    items.append((cscore(knn_clf), score(knn_clf), neighbor))
    
sorted(items, reverse=True)

[(0.8612263634942959, 0.82, 25),
 (0.8599755821222066, 0.835, 17),
 (0.8599605407094859, 0.83, 20),
 (0.8599601500234412, 0.83, 19),
 (0.8587255821222065, 0.815, 7),
 (0.858725191436162, 0.805, 12),
 (0.8587105407094858, 0.825, 21),
 (0.8587105407094858, 0.82, 23),
 (0.8587097593373965, 0.825, 16),
 (0.857460540709486, 0.82, 26),
 (0.8574439365525863, 0.845, 9),
 (0.8573972495702453, 0.825, 29),
 (0.8562097593373965, 0.835, 15),
 (0.8561947179246758, 0.82, 27),
 (0.8549917955930615, 0.82, 24),
 (0.8549597593373965, 0.845, 18),
 (0.8549597593373963, 0.82, 13),
 (0.8549447179246756, 0.82, 22),
 (0.8536947179246758, 0.815, 28),
 (0.8536468588842008, 0.83, 11),
 (0.8536314267854352, 0.835, 10),
 (0.8512093686513518, 0.82, 14),
 (0.8499126816690108, 0.8, 5),
 (0.8486935458665416, 0.82, 8),
 (0.846241014220972, 0.81, 6),
 (0.8399740193780278, 0.79, 3),
 (0.8387089779653071, 0.795, 4),
 (0.8049720659478042, 0.8, 1),
 (0.8025033208313799, 0.765, 2)]

In [10]:
sorted(items, reverse=True)[0]

(0.8612263634942959, 0.82, 25)

Ouch!!!, this best one we can find is actually no match for the SGDClassifier score

How else can we boost the performance of our model?

Lightbulb!, We will preprocess the data using Unsupervised preprocessing techniques

Let's start by using the KMeans Clustering algorithm to preprocess the data

Wait a minute..., How do we find the optimal n_clusters (number of clusters) for the algorithm to transform to?? We can't merely rely on guess work, All you can guess is that **WE CAN ALSO USE GRID SEARCH** for data preprocessing. So how do we go about this questionable process??

In [11]:
from sklearn.cluster import KMeans
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipeline1 = Pipeline([ 
    ('kmeans', KMeans(random_state=42)), # We first cluster the data,
    ('scale', StandardScaler()), # scale it
    ('knn_clf', KNeighborsClassifier()) # and then train the model
])

params = dict(kmeans__n_clusters=range(2, 10), knn_clf__n_neighbors=range(1, 10))

We create a dictionary that contains the parameters we want to tune, in this case we have the params variable which is contains a dictionary. 

kmeans__n_clusters=range(2, 10) (Extracted from the code)

The **kmeans** is taken from the name that we gave our KMeans transformer in the Pipeline (which is 'kmeans')

After that we have a double underscore, __ , it separates the **model**/**KMeans transformer** and the **model hyperparameter**.

After the double underscore we put the name of our parameter we want to tune and assign it to the range of values we want to test.

If the range of values isn't consistent, we make a list of the values, for example, [1, 10, 6, 4, 9] if we want to test these 5 (five) values only.

And if the values are strings, we just make a list with the values, for example, if I were using a SVC model, I would assign the **kernel** parameter with the following values ['sigmoid', 'linear', 'rbf']

Now let's introduce GridSearchCV

In [12]:
grid_search1 = GridSearchCV(pipeline1, params, cv=10, verbose=5, scoring='accuracy')
grid_search1.fit(X_train, y_train)

Fitting 10 folds for each of 72 candidates, totalling 720 fits
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.827, total=   0.3s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.642, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.750, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.3s remaining:    0.0s


[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.738, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.750, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.787, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.5s remaining:    0.0s


[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.787, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.713, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.772, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=1, score=0.772, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=2, score=0.815, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=2, score=0.679, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=2, score=0.725, total=   0.1s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=7, score=0.741, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=7, score=0.812, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=7, score=0.787, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=7, score=0.850, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=7, score=0.787, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=7, score=0.850, total=   0.1s
[CV] kmeans__n_clusters=2, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=2, knn_clf__n_neighbors=7, score=0.775, total=   0.1s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=3, score=0.900, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=3, score=0.825, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=3, score=0.861, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=3, score=0.835, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=4 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=4, score=0.877, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=4 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=4, score=0.889, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=4 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=4, score=0.850, total=   0.1s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=9, score=0.875, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=9, score=0.863, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=9, score=0.875, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=9, score=0.875, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=9, score=0.900, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=9, score=0.838, total=   0.1s
[CV] kmeans__n_clusters=3, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=3, knn_clf__n_neighbors=9, score=0.848, total=   0.1s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=4, knn_clf__n_neighbors=5, score=0.850, total=   0.1s
[CV] kmeans__n_clusters=4, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=4, knn_clf__n_neighbors=5, score=0.861, total=   0.1s
[CV] kmeans__n_clusters=4, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=4, knn_clf__n_neighbors=5, score=0.861, total=   0.1s
[CV] kmeans__n_clusters=4, knn_clf__n_neighbors=6 ....................
[CV]  kmeans__n_clusters=4, knn_clf__n_neighbors=6, score=0.889, total=   0.1s
[CV] kmeans__n_clusters=4, knn_clf__n_neighbors=6 ....................
[CV]  kmeans__n_clusters=4, knn_clf__n_neighbors=6, score=0.889, total=   0.1s
[CV] kmeans__n_clusters=4, knn_clf__n_neighbors=6 ....................
[CV]  kmeans__n_clusters=4, knn_clf__n_neighbors=6, score=0.863, total=   0.1s
[CV] kmeans__n_clusters=4, knn_clf__n_neighbors=6 ....................
[CV]  kmeans__n_clusters=4, knn_clf__n_neighbors=6, score=0.863, total=   0.2s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=2, score=0.850, total=   0.1s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=2, score=0.887, total=   0.1s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=2, score=0.900, total=   0.4s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=2, score=0.900, total=   0.3s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=2, score=0.850, total=   0.4s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=2, score=0.863, total=   0.3s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=2 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=2, score=0.848, total=   0.1s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=7, score=0.848, total=   0.1s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=7, score=0.861, total=   0.1s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=8 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=8, score=0.877, total=   0.2s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=8 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=8, score=0.901, total=   0.1s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=8 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=8, score=0.812, total=   0.3s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=8 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=8, score=0.863, total=   0.3s
[CV] kmeans__n_clusters=5, knn_clf__n_neighbors=8 ....................
[CV]  kmeans__n_clusters=5, knn_clf__n_neighbors=8, score=0.887, total=   0.3s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=6, knn_clf__n_neighbors=4, score=0.863, total=   0.2s
[CV] kmeans__n_clusters=6, knn_clf__n_neighbors=4 ....................
[CV]  kmeans__n_clusters=6, knn_clf__n_neighbors=4, score=0.900, total=   0.2s
[CV] kmeans__n_clusters=6, knn_clf__n_neighbors=4 ....................
[CV]  kmeans__n_clusters=6, knn_clf__n_neighbors=4, score=0.887, total=   0.3s
[CV] kmeans__n_clusters=6, knn_clf__n_neighbors=4 ....................
[CV]  kmeans__n_clusters=6, knn_clf__n_neighbors=4, score=0.850, total=   0.3s
[CV] kmeans__n_clusters=6, knn_clf__n_neighbors=4 ....................
[CV]  kmeans__n_clusters=6, knn_clf__n_neighbors=4, score=0.886, total=   0.1s
[CV] kmeans__n_clusters=6, knn_clf__n_neighbors=4 ....................
[CV]  kmeans__n_clusters=6, knn_clf__n_neighbors=4, score=0.823, total=   0.1s
[CV] kmeans__n_clusters=6, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=6, knn_clf__n_neighbors=5, score=0.889, total=   0.1s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=1, score=0.889, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=1, score=0.827, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=1, score=0.825, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=1, score=0.863, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=1, score=0.812, total=   0.1s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=1, score=0.900, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=1 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=1, score=0.875, total=   0.2s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=6, score=0.887, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=6 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=6, score=0.838, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=6 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=6, score=0.873, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=6 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=6, score=0.835, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=7, score=0.877, total=   0.3s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=7, score=0.864, total=   0.2s
[CV] kmeans__n_clusters=7, knn_clf__n_neighbors=7 ....................
[CV]  kmeans__n_clusters=7, knn_clf__n_neighbors=7, score=0.838, total=   0.3s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=3, score=0.840, total=   0.3s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=3, score=0.838, total=   0.1s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=3, score=0.825, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=3, score=0.825, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=3, score=0.925, total=   0.3s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=3, score=0.875, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=3 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=3, score=0.838, total=   0.2s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=8, score=0.838, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=8 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=8, score=0.875, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=8 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=8, score=0.861, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=8 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=8, score=0.861, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=9, score=0.852, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=9, score=0.889, total=   0.2s
[CV] kmeans__n_clusters=8, knn_clf__n_neighbors=9 ....................
[CV]  kmeans__n_clusters=8, knn_clf__n_neighbors=9, score=0.887, total=   0.2s
[CV] kmeans__n_cluste

[CV]  kmeans__n_clusters=9, knn_clf__n_neighbors=5, score=0.889, total=   0.2s
[CV] kmeans__n_clusters=9, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=9, knn_clf__n_neighbors=5, score=0.887, total=   0.2s
[CV] kmeans__n_clusters=9, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=9, knn_clf__n_neighbors=5, score=0.887, total=   0.2s
[CV] kmeans__n_clusters=9, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=9, knn_clf__n_neighbors=5, score=0.850, total=   0.2s
[CV] kmeans__n_clusters=9, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=9, knn_clf__n_neighbors=5, score=0.912, total=   0.2s
[CV] kmeans__n_clusters=9, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=9, knn_clf__n_neighbors=5, score=0.863, total=   0.2s
[CV] kmeans__n_clusters=9, knn_clf__n_neighbors=5 ....................
[CV]  kmeans__n_clusters=9, knn_clf__n_neighbors=5, score=0.850, total=   0.2s
[CV] kmeans__n_cluste

[Parallel(n_jobs=1)]: Done 720 out of 720 | elapsed:  2.0min finished


GridSearchCV(cv=10, error_score='raise-deprecating',
             estimator=Pipeline(memory=None,
                                steps=[('kmeans',
                                        KMeans(algorithm='auto', copy_x=True,
                                               init='k-means++', max_iter=300,
                                               n_clusters=8, n_init=10,
                                               n_jobs=None,
                                               precompute_distances='auto',
                                               random_state=42, tol=0.0001,
                                               verbose=0)),
                                       ('scale',
                                        StandardScaler(copy=True,
                                                       with_mean=True,
                                                       with_std=True)),
                                       ('knn_clf',
                                        KN

In [13]:
cvr = grid_search1.cv_results_
list(zip(cvr['mean_test_score'], cvr['params']))

[(0.75375, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 1}),
 (0.74625, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 2}),
 (0.78, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 3}),
 (0.79875, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 4}),
 (0.8075, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 5}),
 (0.81125, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 6}),
 (0.8125, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 7}),
 (0.815, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 8}),
 (0.815, {'kmeans__n_clusters': 2, 'knn_clf__n_neighbors': 9}),
 (0.83375, {'kmeans__n_clusters': 3, 'knn_clf__n_neighbors': 1}),
 (0.83875, {'kmeans__n_clusters': 3, 'knn_clf__n_neighbors': 2}),
 (0.8675, {'kmeans__n_clusters': 3, 'knn_clf__n_neighbors': 3}),
 (0.86, {'kmeans__n_clusters': 3, 'knn_clf__n_neighbors': 4}),
 (0.875, {'kmeans__n_clusters': 3, 'knn_clf__n_neighbors': 5}),
 (0.86125, {'kmeans__n_clusters': 3, 'knn_clf__n_neighbors': 6}),
 (0.87, {'kmeans__n_clust

In [14]:
grid_search1.best_score_, grid_search1.best_params_

(0.885, {'kmeans__n_clusters': 5, 'knn_clf__n_neighbors': 7})

Wow, that's amazing, the cross validation score is pretty high

**NB**: We used a cv of 10 with **Grid Search** so it's the same as running a KNeighborsClassifier with these parameters on our **cscore** function

And if you need proof, you can test it out...

In [15]:
knn_new1 = grid_search1.best_estimator_
cscore(knn_new1), score(knn_new1)

(0.8849292858259104, 0.885)

The score has only been rounded but it's technically the same

Dimensionality Reduction is also another Unsupervised Preprocessing Step that we can take, Let's set up another Pipeline and GridSearchCV, by now you might have got the hang of it. We will use PCA, a very popular package for dimensionality reduction.

In [16]:
from sklearn.decomposition import PCA

pipeline2 = Pipeline([
    ('pca', PCA(random_state=42)),
    ('scale', StandardScaler()),
    ('knn_clf', KNeighborsClassifier())
])

params = dict(pca__n_components=range(1, 10), knn_clf__n_neighbors=range(1, 20))

In [17]:
grid_search2 = GridSearchCV(pipeline2, params, cv=10, verbose=5, scoring='accuracy')
grid_search2.fit(X_train, y_train)

Fitting 10 folds for each of 171 candidates, totalling 1710 fits
[CV] knn_clf__n_neighbors=1, pca__n_components=1 .....................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.728, total=   0.2s
[CV] knn_clf__n_neighbors=1, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.679, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.637, total=   0.1s
[CV] knn_clf__n_neighbors=1, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.675, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.787, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.675, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.725, total=   0.0s
[CV] knn_clf__n_neighbors=1,

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.3s remaining:    0.0s



[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.759, total=   0.8s
[CV] knn_clf__n_neighbors=1, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=1, score=0.696, total=   0.5s
[CV] knn_clf__n_neighbors=1, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=2, score=0.938, total=   0.7s
[CV] knn_clf__n_neighbors=1, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=2, score=0.877, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=2, score=0.875, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=2, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=2, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=1

[CV]  knn_clf__n_neighbors=1, pca__n_components=8, score=0.827, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=8, score=0.778, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=8, score=0.825, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=8, score=0.787, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=8, score=0.725, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=8, score=0.713, total=   0.0s
[CV] knn_clf__n_neighbors=1, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=1, pca__n_components=8, score=0.750, total=   0.0s
[CV] knn_clf__n_neighbors=1,

[CV]  knn_clf__n_neighbors=2, pca__n_components=5, score=0.790, total=   0.0s
[CV] knn_clf__n_neighbors=2, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=2, pca__n_components=5, score=0.775, total=   0.0s
[CV] knn_clf__n_neighbors=2, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=2, pca__n_components=5, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=2, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=2, pca__n_components=5, score=0.850, total=   0.0s
[CV] knn_clf__n_neighbors=2, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=2, pca__n_components=5, score=0.825, total=   0.0s
[CV] knn_clf__n_neighbors=2, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=2, pca__n_components=5, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=2, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=2, pca__n_components=5, score=0.762, total=   0.0s
[CV] knn_clf__n_neighbors=2,

[CV]  knn_clf__n_neighbors=3, pca__n_components=2, score=0.901, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=2, score=0.889, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=2, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=2, score=0.900, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=2, score=0.900, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=2, score=0.938, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=2, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=3,

[CV]  knn_clf__n_neighbors=3, pca__n_components=7, score=0.810, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=7, score=0.823, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=8, score=0.840, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=8, score=0.790, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=8, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=8, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=3, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=3, pca__n_components=8, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=3,

[CV] knn_clf__n_neighbors=4, pca__n_components=4 .....................
[CV]  knn_clf__n_neighbors=4, pca__n_components=4, score=0.810, total=   0.0s
[CV] knn_clf__n_neighbors=4, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=4, pca__n_components=5, score=0.877, total=   0.0s
[CV] knn_clf__n_neighbors=4, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=4, pca__n_components=5, score=0.852, total=   0.0s
[CV] knn_clf__n_neighbors=4, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=4, pca__n_components=5, score=0.825, total=   0.0s
[CV] knn_clf__n_neighbors=4, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=4, pca__n_components=5, score=0.850, total=   0.0s
[CV] knn_clf__n_neighbors=4, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=4, pca__n_components=5, score=0.850, total=   0.0s
[CV] knn_clf__n_neighbors=4, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=4, pca__

[CV]  knn_clf__n_neighbors=5, pca__n_components=2, score=0.912, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=2, score=0.950, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=2, score=0.950, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=2, score=0.925, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=2, score=0.850, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=2, score=0.873, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=2, score=0.886, total=   0.0s
[CV] knn_clf__n_neighbors=5,

[CV]  knn_clf__n_neighbors=5, pca__n_components=8, score=0.901, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=8, score=0.802, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=8, score=0.875, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=8, score=0.875, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=8, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=8, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=5, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=5, pca__n_components=8, score=0.800, total=   0.0s
[CV] knn_clf__n_neighbors=5,

[CV]  knn_clf__n_neighbors=6, pca__n_components=4, score=0.823, total=   0.0s
[CV] knn_clf__n_neighbors=6, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=6, pca__n_components=5, score=0.864, total=   0.0s
[CV] knn_clf__n_neighbors=6, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=6, pca__n_components=5, score=0.852, total=   0.0s
[CV] knn_clf__n_neighbors=6, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=6, pca__n_components=5, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=6, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=6, pca__n_components=5, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=6, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=6, pca__n_components=5, score=0.875, total=   0.0s
[CV] knn_clf__n_neighbors=6, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=6, pca__n_components=5, score=0.925, total=   0.0s
[CV] knn_clf__n_neighbors=6,

[CV] knn_clf__n_neighbors=7, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=1, score=0.800, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=1, score=0.713, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=1, score=0.797, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=1, score=0.722, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=2, score=0.877, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=2, score=0.914, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=2 .....................
[CV]  knn_clf__n_neighbors=7, pca__

[CV]  knn_clf__n_neighbors=7, pca__n_components=7, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=7, score=0.812, total=   0.1s
[CV] knn_clf__n_neighbors=7, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=7, score=0.823, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=7, score=0.810, total=   0.1s
[CV] knn_clf__n_neighbors=7, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=8, score=0.889, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=8, score=0.827, total=   0.0s
[CV] knn_clf__n_neighbors=7, pca__n_components=8 .....................
[CV]  knn_clf__n_neighbors=7, pca__n_components=8, score=0.850, total=   0.0s
[CV] knn_clf__n_neighbors=7,

[CV]  knn_clf__n_neighbors=8, pca__n_components=4, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=8, pca__n_components=4 .....................
[CV]  knn_clf__n_neighbors=8, pca__n_components=4, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=8, pca__n_components=4 .....................
[CV]  knn_clf__n_neighbors=8, pca__n_components=4, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=8, pca__n_components=4 .....................
[CV]  knn_clf__n_neighbors=8, pca__n_components=4, score=0.812, total=   0.0s
[CV] knn_clf__n_neighbors=8, pca__n_components=4 .....................
[CV]  knn_clf__n_neighbors=8, pca__n_components=4, score=0.886, total=   0.0s
[CV] knn_clf__n_neighbors=8, pca__n_components=4 .....................
[CV]  knn_clf__n_neighbors=8, pca__n_components=4, score=0.810, total=   0.0s
[CV] knn_clf__n_neighbors=8, pca__n_components=5 .....................
[CV]  knn_clf__n_neighbors=8, pca__n_components=5, score=0.827, total=   0.0s
[CV] knn_clf__n_neighbors=8,

[CV]  knn_clf__n_neighbors=8, pca__n_components=9, score=0.797, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=1, score=0.790, total=   0.1s
[CV] knn_clf__n_neighbors=9, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=1, score=0.765, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=1, score=0.725, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=1, score=0.738, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=1, score=0.775, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=1 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=1, score=0.688, total=   0.0s
[CV] knn_clf__n_neighbors=9,

[CV]  knn_clf__n_neighbors=9, pca__n_components=7, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=7, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=7, score=0.900, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=7, score=0.900, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=7, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=7, score=0.812, total=   0.0s
[CV] knn_clf__n_neighbors=9, pca__n_components=7 .....................
[CV]  knn_clf__n_neighbors=9, pca__n_components=7, score=0.835, total=   0.0s
[CV] knn_clf__n_neighbors=9,

[CV]  knn_clf__n_neighbors=10, pca__n_components=4, score=0.812, total=   0.0s
[CV] knn_clf__n_neighbors=10, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=10, pca__n_components=4, score=0.873, total=   0.0s
[CV] knn_clf__n_neighbors=10, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=10, pca__n_components=4, score=0.810, total=   0.0s
[CV] knn_clf__n_neighbors=10, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=10, pca__n_components=5, score=0.852, total=   0.0s
[CV] knn_clf__n_neighbors=10, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=10, pca__n_components=5, score=0.852, total=   0.0s
[CV] knn_clf__n_neighbors=10, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=10, pca__n_components=5, score=0.850, total=   0.0s
[CV] knn_clf__n_neighbors=10, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=10, pca__n_components=5, score=0.875, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=11, pca__n_components=1, score=0.725, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=1, score=0.800, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=1, score=0.750, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=1, score=0.787, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=1, score=0.700, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=1, score=0.835, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=1, score=0.759, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=11, pca__n_components=7, score=0.864, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=7, score=0.827, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=7, score=0.875, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=7, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=7, score=0.900, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=7, score=0.850, total=   0.0s
[CV] knn_clf__n_neighbors=11, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=11, pca__n_components=7, score=0.838, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=12, pca__n_components=3, score=0.873, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=3 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=3, score=0.848, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=4, score=0.901, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=4, score=0.877, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=4, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=4, score=0.850, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=4, score=0.887, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=12, pca__n_components=9, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=9 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=9, score=0.875, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=9 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=9, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=9 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=9, score=0.812, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=9 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=9, score=0.825, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=9 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=9, score=0.861, total=   0.0s
[CV] knn_clf__n_neighbors=12, pca__n_components=9 ....................
[CV]  knn_clf__n_neighbors=12, pca__n_components=9, score=0.797, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=13, pca__n_components=5, score=0.848, total=   0.0s
[CV] knn_clf__n_neighbors=13, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=13, pca__n_components=5, score=0.861, total=   0.0s
[CV] knn_clf__n_neighbors=13, pca__n_components=6 ....................
[CV]  knn_clf__n_neighbors=13, pca__n_components=6, score=0.877, total=   0.0s
[CV] knn_clf__n_neighbors=13, pca__n_components=6 ....................
[CV]  knn_clf__n_neighbors=13, pca__n_components=6, score=0.827, total=   0.0s
[CV] knn_clf__n_neighbors=13, pca__n_components=6 ....................
[CV]  knn_clf__n_neighbors=13, pca__n_components=6, score=0.825, total=   0.0s
[CV] knn_clf__n_neighbors=13, pca__n_components=6 ....................
[CV]  knn_clf__n_neighbors=13, pca__n_components=6, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=13, pca__n_components=6 ....................
[CV]  knn_clf__n_neighbors=13, pca__n_components=6, score=0.875, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=14, pca__n_components=2, score=0.875, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=2 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=2, score=0.861, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=2 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=2, score=0.899, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=3 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=3, score=0.877, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=3 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=3, score=0.901, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=3 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=3, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=3 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=3, score=0.875, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=14, pca__n_components=8, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=8, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=8, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=8, score=0.800, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=8, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=8, score=0.848, total=   0.0s
[CV] knn_clf__n_neighbors=14, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=14, pca__n_components=8, score=0.785, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=15, pca__n_components=5, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=15, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=15, pca__n_components=5, score=0.812, total=   0.0s
[CV] knn_clf__n_neighbors=15, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=15, pca__n_components=5, score=0.861, total=   0.0s
[CV] knn_clf__n_neighbors=15, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=15, pca__n_components=5, score=0.861, total=   0.0s
[CV] knn_clf__n_neighbors=15, pca__n_components=6 ....................
[CV]  knn_clf__n_neighbors=15, pca__n_components=6, score=0.877, total=   0.0s
[CV] knn_clf__n_neighbors=15, pca__n_components=6 ....................
[CV]  knn_clf__n_neighbors=15, pca__n_components=6, score=0.852, total=   0.0s
[CV] knn_clf__n_neighbors=15, pca__n_components=6 ....................
[CV]  knn_clf__n_neighbors=15, pca__n_components=6, score=0.838, total=   0.0s
[CV] knn_clf__n_neigh

[CV] knn_clf__n_neighbors=16, pca__n_components=2 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=2, score=0.900, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=2 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=2, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=2 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=2, score=0.861, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=2 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=2, score=0.899, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=3 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=3, score=0.889, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=3 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=3, score=0.889, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=3 ....................
[CV]  knn_clf__n_neighbors=16

[CV]  knn_clf__n_neighbors=16, pca__n_components=8, score=0.852, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=8, score=0.825, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=8, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=8, score=0.875, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=8, score=0.825, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=8, score=0.787, total=   0.0s
[CV] knn_clf__n_neighbors=16, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=16, pca__n_components=8, score=0.863, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=17, pca__n_components=4, score=0.812, total=   0.2s
[CV] knn_clf__n_neighbors=17, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=17, pca__n_components=4, score=0.873, total=   0.0s
[CV] knn_clf__n_neighbors=17, pca__n_components=4 ....................
[CV]  knn_clf__n_neighbors=17, pca__n_components=4, score=0.835, total=   0.0s
[CV] knn_clf__n_neighbors=17, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=17, pca__n_components=5, score=0.852, total=   0.0s
[CV] knn_clf__n_neighbors=17, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=17, pca__n_components=5, score=0.864, total=   0.0s
[CV] knn_clf__n_neighbors=17, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=17, pca__n_components=5, score=0.887, total=   0.0s
[CV] knn_clf__n_neighbors=17, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=17, pca__n_components=5, score=0.887, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=18, pca__n_components=1, score=0.738, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=1, score=0.762, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=1, score=0.800, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=1, score=0.812, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=1, score=0.812, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=1, score=0.738, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=1 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=1, score=0.823, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=18, pca__n_components=7, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=7, score=0.838, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=7, score=0.863, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=7, score=0.787, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=7, score=0.873, total=   0.1s
[CV] knn_clf__n_neighbors=18, pca__n_components=7 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=7, score=0.759, total=   0.0s
[CV] knn_clf__n_neighbors=18, pca__n_components=8 ....................
[CV]  knn_clf__n_neighbors=18, pca__n_components=8, score=0.864, total=   0.0s
[CV] knn_clf__n_neigh

[CV]  knn_clf__n_neighbors=19, pca__n_components=4, score=0.823, total=   0.0s
[CV] knn_clf__n_neighbors=19, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=19, pca__n_components=5, score=0.864, total=   0.0s
[CV] knn_clf__n_neighbors=19, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=19, pca__n_components=5, score=0.877, total=   0.1s
[CV] knn_clf__n_neighbors=19, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=19, pca__n_components=5, score=0.900, total=   0.1s
[CV] knn_clf__n_neighbors=19, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=19, pca__n_components=5, score=0.887, total=   0.1s
[CV] knn_clf__n_neighbors=19, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=19, pca__n_components=5, score=0.912, total=   0.1s
[CV] knn_clf__n_neighbors=19, pca__n_components=5 ....................
[CV]  knn_clf__n_neighbors=19, pca__n_components=5, score=0.875, total=   0.1s
[CV] knn_clf__n_neigh

[Parallel(n_jobs=1)]: Done 1710 out of 1710 | elapsed:   39.1s finished


GridSearchCV(cv=10, error_score='raise-deprecating',
             estimator=Pipeline(memory=None,
                                steps=[('pca',
                                        PCA(copy=True, iterated_power='auto',
                                            n_components=None, random_state=42,
                                            svd_solver='auto', tol=0.0,
                                            whiten=False)),
                                       ('scale',
                                        StandardScaler(copy=True,
                                                       with_mean=True,
                                                       with_std=True)),
                                       ('knn_clf',
                                        KNeighborsClassifier(algorithm='auto',
                                                             leaf_size=30,
                                                             metric='minkowski',
                     

In [18]:
cvr = grid_search2.cv_results_
list(zip(cvr['mean_test_score'], cvr['params']))

[(0.70625, {'knn_clf__n_neighbors': 1, 'pca__n_components': 1}),
 (0.87125, {'knn_clf__n_neighbors': 1, 'pca__n_components': 2}),
 (0.85, {'knn_clf__n_neighbors': 1, 'pca__n_components': 3}),
 (0.84125, {'knn_clf__n_neighbors': 1, 'pca__n_components': 4}),
 (0.805, {'knn_clf__n_neighbors': 1, 'pca__n_components': 5}),
 (0.82125, {'knn_clf__n_neighbors': 1, 'pca__n_components': 6}),
 (0.7825, {'knn_clf__n_neighbors': 1, 'pca__n_components': 7}),
 (0.77875, {'knn_clf__n_neighbors': 1, 'pca__n_components': 8}),
 (0.76625, {'knn_clf__n_neighbors': 1, 'pca__n_components': 9}),
 (0.735, {'knn_clf__n_neighbors': 2, 'pca__n_components': 1}),
 (0.87625, {'knn_clf__n_neighbors': 2, 'pca__n_components': 2}),
 (0.85875, {'knn_clf__n_neighbors': 2, 'pca__n_components': 3}),
 (0.8425, {'knn_clf__n_neighbors': 2, 'pca__n_components': 4}),
 (0.8175, {'knn_clf__n_neighbors': 2, 'pca__n_components': 5}),
 (0.82125, {'knn_clf__n_neighbors': 2, 'pca__n_components': 6}),
 (0.79625, {'knn_clf__n_neighbors':

Those are quite a lot, let's cut to the chase

In [19]:
grid_search2.best_score_, grid_search2.best_params_

(0.905, {'knn_clf__n_neighbors': 11, 'pca__n_components': 2})

Well, well, well. What do you know?. Hands down, that's awesome

In [20]:
knn_new2 = grid_search2.best_estimator_
cscore(knn_new2), score(knn_new2)

(0.9049154164713235, 0.865)

Although the model stinks at the **Test Set** when comparing it with the KMeans transformed model, it did great on cross validation.

As we saw, the features are applicable in many situations and can be use in various ways..

For now, Goodbye!

**Have Fun Coding :)**