# Grid Search: Searching for estimator parameters

Parameters that are not directly learned within estimators, can be set by searching in a parameter space for the best Cross-validation calculation evaluating the estimator performance score. Typical example parameters **C**, **kernel** and **gamma** for Support Vector Classifier, **alpha** for Lasso, etc.

Such parameters are often referred to as hyperparameters (particularly in Bayesian learning), distinguishing them from the parameters optimized in a machine learning procedure.

A search consists of:
- an estimator (regressor or classifier such as sklearn.svm.SVC());
- a method for searching or sampling candidates;
- a cross-validation scheme; and
- a score function.

Two generic approaches to sampling search candidates are provided in scikit-learn: for given values, **GridSearchCV** exhaustively considers all parameter combinations, while **RandomizedSearchCV** can sample a given number of candidates from a parameter space with a specified distribution.

In this tutorial, an exaple will be presented for GridSearchCV.

You can find more imformation about GridSearchCV from here:
http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html

First, we read the digit data from scikit-learn datasets.

In [1]:
from sklearn.datasets import load_digits

digits = load_digits()

X = digits.data
y = digits.target

till, here

In [24]:
import numpy as np
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV

svc = SVC()

Cs = (np.logspace(-5, 0, num=20)).tolist()
my_gamma = [0.0001,0.0002,0.0003,0.0004,0.0005,0.0006,0.0007,0.0008,0.0007,0.0010]
my_param_grid = [{'C':Cs, 'kernel':['linear']}, {'C':Cs, 'kernel':['rbf'], 'gamma':my_gamma}]

clf = GridSearchCV(estimator = svc,
                  param_grid = my_param_grid,
                      n_jobs = -1,
                         iid = True,
                          cv = 5,
                       refit = True)
clf.fit(X, y)

GridSearchCV(cv=5, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid=[{'C': [1e-05, 1.8329807108324375e-05, 3.359818286283781e-05, 6.158482110660267e-05, 0.00011288378916846884, 0.00020691380811147902, 0.000379269019073225, 0.0006951927961775605, 0.0012742749857031334, 0.002335721469090121, 0.004281332398719391, 0.007847599703514606, 0.01438449888287663, 0...[0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007, 0.0008, 0.0007, 0.001], 'kernel': ['rbf']}],
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)

In [25]:
clf.grid_scores_

[mean: 0.75626, std: 0.01617, params: {'C': 1e-05, 'kernel': 'linear'},
 mean: 0.88147, std: 0.03122, params: {'C': 1.8329807108324375e-05, 'kernel': 'linear'},
 mean: 0.90540, std: 0.02654, params: {'C': 3.359818286283781e-05, 'kernel': 'linear'},
 mean: 0.92766, std: 0.02884, params: {'C': 6.158482110660267e-05, 'kernel': 'linear'},
 mean: 0.93267, std: 0.02914, params: {'C': 0.00011288378916846884, 'kernel': 'linear'},
 mean: 0.94491, std: 0.02566, params: {'C': 0.00020691380811147902, 'kernel': 'linear'},
 mean: 0.94936, std: 0.02466, params: {'C': 0.000379269019073225, 'kernel': 'linear'},
 mean: 0.94825, std: 0.02689, params: {'C': 0.0006951927961775605, 'kernel': 'linear'},
 mean: 0.95326, std: 0.02383, params: {'C': 0.0012742749857031334, 'kernel': 'linear'},
 mean: 0.95103, std: 0.02364, params: {'C': 0.002335721469090121, 'kernel': 'linear'},
 mean: 0.94825, std: 0.02298, params: {'C': 0.004281332398719391, 'kernel': 'linear'},
 mean: 0.95214, std: 0.02185, params: {'C': 0.00

In [26]:
clf.best_params_

{'C': 1.0, 'gamma': 0.001, 'kernel': 'rbf'}

In [27]:
clf.best_estimator_

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [28]:
clf.best_score_

0.97161936560934892