Grid search is the process of performing hyperparameter tuning so as to determine the optimal values for a given model. This is important as the performance of the entire model is based on the specified hyperparameter values

## Cross validation with parameter tuning using grid search

In [2]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn import datasets, svm
import matplotlib.pyplot as plt

Create 2 datasets that contain 64 feature variables. A feature is a darknes of a pizel (8x8 image)

In [3]:
#load the digit data
digits = datasets.load_digits()

In [4]:
#view the features of the first observation
digits.data[0:1]

array([[ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.,  0.,  0., 13., 15., 10.,
        15.,  5.,  0.,  0.,  3., 15.,  2.,  0., 11.,  8.,  0.,  0.,  4.,
        12.,  0.,  0.,  8.,  8.,  0.,  0.,  5.,  8.,  0.,  0.,  9.,  8.,
         0.,  0.,  4., 11.,  0.,  1., 12.,  7.,  0.,  0.,  2., 14.,  5.,
        10., 12.,  0.,  0.,  0.,  0.,  6., 13., 10.,  0.,  0.,  0.]])

The target data is a vector containing the image;s true digit

In [5]:
#view the target of the first observation
digits.target[0:1]

array([0])

To demonstarte cross validation and parameter tuning, we are going to divide the dataset into 2. data 1 will have the first 1000 rows and the second will have 800.

Note that the split is separate to the cross validation we will conduct and is done purley to demonstrate something at the end.

In [6]:
# Create dataset 1
data1_features = digits.data[:1000]
data1_target = digits.target[:1000]

# Create dataset 2
data2_features = digits.data[1000:]
data2_target = digits.target[1000:]

Create parameter candidates 

Before looking for which combination of parameter values produces the most accurate model, we must specify different  candidate values we want to try.

Below we have multiple candidate parameter values for C, two values for gamma and two for kernel.

The grid search will try all combinations of parameter values and select the set of parametes which provides the most accurate model

In [9]:
parameter_candidates = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
]

Let us conduct a grid search with GridSearchCV (grid search cross validation)

By default it uses 3 fold Kfild and or stratefiedKfold depending on the situation

In [10]:
%%time
# Create a classifier object with the classifier and parameter candidates
clf = GridSearchCV(estimator=svm.SVC(), param_grid=parameter_candidates, n_jobs=-1)

# Train the classifier on data1's feature and target data
clf.fit(data1_features, data1_target)   

CPU times: user 313 ms, sys: 89.1 ms, total: 402 ms
Wall time: 2.15 s


GridSearchCV(estimator=SVC(), n_jobs=-1,
             param_grid=[{'C': [1, 10, 100, 1000], 'kernel': ['linear']},
                         {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001],
                          'kernel': ['rbf']}])

Success

In [11]:
#Let us look at the accuracy score when we apply the model to data1's test data
print('Best score for data1:', clf.best_score_)

Best score for data1: 0.966


In [12]:
#view the best parameters for the model found using grid search
print('Best C:', clf.best_estimator_.C)
print('Best Kernel:', clf.best_estimator_.kernel)
print('Best Gamma:', clf.best_estimator_.gamma)

Best C: 10
Best Kernel: rbf
Best Gamma: 0.001


Therefore the most accurate model uses c = 10, rbf kernel and gamma = 0.001

Use the second data set to prove those params are actually used by the model. First apply classifier we just trained to the second one. Then we train a new support vector classifier from scratch using the prams for grid search

In [13]:
#apply the classifier trained using data1 to data 2 and view accuracy score
clf.score(data2_features, data2_target)

0.9698870765370138

In [14]:
# Train a new classifier using the best parameters found by the grid search
svm.SVC(C=10, kernel='rbf', gamma=0.001).fit(data1_features, data1_target).score(data2_features, data2_target)

0.9698870765370138