# Grid Search

##### Grid search is used when we want to find the best (combination of) hyperparameters. It is as simple as building many models with all combinations of hyperparameters and then chosing the setting which gives the highest accuracy.

In [5]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
iris = load_iris()

Consider the case of a kernel SVM with an RBF (radial basis function) kernel, as
implemented in the SVC class. There two importrant hyperparameters: C for regularization and gamma for contolling the width. We wish to find the best combination for our model.

We can implement a simple grid search just as for loops over the two parameters,
training and evaluating a classifier for each combination:

In [6]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

In [7]:
# Naive grid search implementation
from sklearn.svm import SVC

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
print("Size of training set: {} size of test set: {}".format(X_train.shape[0], X_test.shape[0]))

best_score = 0

for gamma in [0.001, 0.01, 0.1, 1, 10, 100]:
    for C in [0.001, 0.01, 0.1, 1, 10, 100]:
        # For each combination of parameters, train an SVC
        svm = SVC(gamma=gamma, C=C)
        svm.fit(X_train, y_train)
        
        # Evaluate the SVC on the test set
        score = svm.score(X_test, y_test)
        
        # If we got a better score, store the score and parameters
        if score > best_score:
            best_score = score
            best_parameters = {'C': C, 'gamma': gamma}
            
print("Best score: {:.2f}".format(best_score))
print("Best parameters: {}".format(best_parameters))

Size of training set: 112 size of test set: 38
Best score: 0.97
Best parameters: {'C': 100, 'gamma': 0.001}


#### Danger of Overfitting

We made a terrible mistake by using the test set for both finding the best hyperparameters and testing the model evaluation; so we are not using any unseen data for model perfomance evaluation.
We can rectufy this by creating a third set called the validation set, and then finding the hyperparameters on the validation set, and then using the test set only for model evaluation purposes.

In [8]:
from sklearn.svm import SVC

# Split data into train+validation set and test set
X_trainval, X_test, y_trainval, y_test = train_test_split(iris.data, iris.target, random_state=0)

# Split train+validation set into training and validation sets
X_train, X_valid, y_train, y_valid = train_test_split( X_trainval, y_trainval, random_state=1)

print("Size of training set: {}   size of validation set: {}   size of test set:"
        " {}\n".format(X_train.shape[0], X_valid.shape[0], X_test.shape[0]))

best_score = 0

for gamma in [0.001, 0.01, 0.1, 1, 10, 100]:
    for C in [0.001, 0.01, 0.1, 1, 10, 100]:
        # For each combination of parameters, train an SVC
        svm = SVC(gamma=gamma, C=C)
        svm.fit(X_train, y_train)
        # Evaluate the SVC on the validation set
        score = svm.score(X_valid, y_valid)
        # If we got a better score, store the score and parameters
        if score > best_score:
            best_score = score
            best_parameters = {'C': C, 'gamma': gamma}

# Rebuild a model on the combined training and validation set and evaluate it on the test set
svm = SVC(**best_parameters)
svm.fit(X_trainval, y_trainval)

test_score = svm.score(X_test, y_test)
print("Best score on validation set: {:.2f}".format(best_score))
print("Best parameters: ", best_parameters)
print("Test set score with best parameters: {:.2f}".format(test_score))

Size of training set: 84   size of validation set: 28   size of test set: 38

Best score on validation set: 0.96
Best parameters:  {'C': 10, 'gamma': 0.001}
Test set score with best parameters: 0.92


The best score on the validation set is 96%: slightly lower than before, probably
because we used less data to train the model (X_train is smaller now because we split
our dataset twice). However, the score on the test set—the score that actually tells us
how well we generalize—is even lower, at 92%. So we can only claim to classify new
data 92% correctly, not 97% correctly as we thought before!

## Grid Search with Cross-Validation

For a better estimate of the generalization performance, instead of
using a single split into a training and a validation set, we can use cross-validation to
evaluate the performance of each parameter combination.

In [9]:
for gamma in [0.001, 0.01, 0.1, 1, 10, 100]:
    for C in [0.001, 0.01, 0.1, 1, 10, 100]:
        # For each combination of parameters,train an SVC
        svm = SVC(gamma=gamma, C=C)
        
        # Perform cross-validation
        scores = cross_val_score(svm, X_trainval, y_trainval, cv=5)
        
        # Compute mean cross-validation accuracy
        score = np.mean(scores)
        
        # If we got a better score, store the score and parameters
        if score > best_score:
            best_score = score
            best_parameters = {'C': C, 'gamma': gamma}
            
# Rebuild a model on the combined training and validation set
svm = SVC(**best_parameters)
svm.fit(X_trainval, y_trainval)

SVC(C=10, gamma=0.1)

To evaluate the accuracy of the SVM using a particular setting of C and gamma using
five-fold cross-validation, we need to train 36 * 5 = 180 models

For each parameter setting, five accuracy values are computed,
one for each split in the cross-validation. Then the mean validation accuracy is
computed for each hyperparameter setting. The hyperparameters with the highest mean validation
accuracy are chosen.

Because grid search with cross-validation is such a commonly used method to adjust
hyperparameters, scikit-learn provides the GridSearchCV class, which implements it in
the form of an estimator. To use the GridSearchCV class, you first need to specify the
hyperparameters you want to search over using a dictionary. GridSearchCV will then perform
all the necessary model fits. The keys of the dictionary are the names of hyperparameters
we want to adjust

In [10]:
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}

In [11]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
grid_search = GridSearchCV(SVC(), param_grid, cv=5)

GridSearchCV will use cross-validation in place of the split into a training and validation
set that we used before. However, we still need to split the data into a training
and a test set, to avoid overfitting the parameters:

In [12]:
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)

The grid_search object that we created behaves just like a classifier; we can call the
standard methods fit, predict, and score on it. A scikit-learn estimator that is created using another estimator is called a meta-estimator e.g. GridSearchCV

In [13]:
grid_search.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=SVC(),
             param_grid={'C': [0.001, 0.01, 0.1, 1, 10, 100],
                         'gamma': [0.001, 0.01, 0.1, 1, 10, 100]})

In [14]:
print("Test set score: {:.2f}".format(grid_search.score(X_test, y_test)))

Test set score: 0.97


Choosing the parameters using cross-validation, we actually found a model that achieves
97% accuracy on the test set. The important thing here is that we did not use the
test set to choose the hyperparameters. 

The hyperparameters that were found are scored in the best_params_ attribute, and the best cross-validation accuracy (the mean accuracy over the different splits for this hyperparameter setting) is stored in best_score_:

### Analyzing the result of cross-validation

It is often helpful to visualize the results of cross-validation, to understand how the
model generalization depends on the hyperparameters we are searching. As grid searches
are quite expensive to run, often it is a good idea to start with a small grid. We can then inspect the results of the cross-validated grid search, and possibly expand our search. The results of a grid search can be found
in the cv_results_ attribute, which is a dictionary storing all aspects of the search.

In [15]:
import pandas as pd
# Convert to DataFrame
results = pd.DataFrame(grid_search.cv_results_)
# Show the first 5 rows
display(results.head())

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_gamma,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001198,0.001165,0.000662,0.00055,0.001,0.001,"{'C': 0.001, 'gamma': 0.001}",0.347826,0.347826,0.363636,0.363636,0.409091,0.366403,0.022485,22
1,0.001829,0.002273,0.000823,0.001645,0.001,0.01,"{'C': 0.001, 'gamma': 0.01}",0.347826,0.347826,0.363636,0.363636,0.409091,0.366403,0.022485,22
2,0.001845,0.001512,0.000208,0.000415,0.001,0.1,"{'C': 0.001, 'gamma': 0.1}",0.347826,0.347826,0.363636,0.363636,0.409091,0.366403,0.022485,22
3,0.001683,0.001408,0.000406,0.000498,0.001,1.0,"{'C': 0.001, 'gamma': 1}",0.347826,0.347826,0.363636,0.363636,0.409091,0.366403,0.022485,22
4,0.001301,0.000375,0.001005,1.3e-05,0.001,10.0,"{'C': 0.001, 'gamma': 10}",0.347826,0.347826,0.363636,0.363636,0.409091,0.366403,0.022485,22


### Search over spaces that are not grids

In some case, searching over all possible combinations of hyperparameters simply deoes not make sense.

For example, SVC has a kernel hyperparameter, and
depending on which kernel is chosen, other hyperparameters will be relevant. If linear kernel is used, then only the C hyperparameter is used. If RBF kernel is used, then both the C and gamma hyperparameters are used.

To deal with these kinds of “conditional” parameters,
GridSearchCV allows the param_grid to be a list of dictionaries. Each dictionary in the
list is expanded into an independent grid.

In [16]:
param_grid = [{'kernel': ['rbf'], 'C': [0.001, 0.01, 0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]},
{'kernel': ['linear'], 'C': [0.001, 0.01, 0.1, 1, 10, 100]}]

In [17]:
grid_search = GridSearchCV(SVC(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
print("Best parameters: {}".format(grid_search.best_params_))
print("Best cross-validation score: {:.2f}".format(grid_search.best_score_))

Best parameters: {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'}
Best cross-validation score: 0.97


### Nested Cross-Validation

When using GridSearchCV, we have a single split of the data into training and test sets,
which might make our results unstable and make us depend too much on this single split of the data.

We can go a step further, and instead of splitting the original data
into training and test sets once, use multiple splits of cross-validation. This will result
in what is called nested cross-validation. In nested cross-validation, there is an outer
loop over splits of the data into training and test sets. For each of them, a grid search
is run (which might result in different best parameters for each split in the outer
loop). Then, for each outer split, the test set score using the best settings is reported.

The result of this procedure is a list of scores—not a model.
The scores tell us how well a model generalizes, given the best parameters found by the grid.
We can not use it for predictions; however, it can be useful for evaluating how well a given model works on a particular dataset.

Implementing nested cross-validation in scikit-learn is straightforward. We call
cross_val_score with an instance of GridSearchCV as the model:

In [18]:
scores = cross_val_score(GridSearchCV(SVC(), param_grid, cv=5), iris.data, iris.target, cv=5)
print("Cross-validation scores: ", scores)
print("Mean cross-validation score: ", scores.mean())

Cross-validation scores:  [0.96666667 1.         0.9        0.96666667 1.        ]
Mean cross-validation score:  0.9666666666666668


## Parallelizing cross-validation and grid search

While running a grid search over many parameters and on large datasets can be computationally
challenging, it is also embarrassingly parallel. This means that building a model using a particular parameter setting on a particular cross-validation split can
be done completely independently from the other hyperparameter settings and models.
You can make use of multiple cores in Grid
SearchCV and cross_val_score by setting the n_jobs parameter to the number of
CPU cores you want to use. You can set n_jobs=-1 to use all available cores.

Be aware that scikit-learn does not allow nesting of parallel operations.
So, if you are using the n_jobs option on your model (for example, a random forest),
you cannot use it in GridSearchCV to search over this model.