For every classifier, there are __a lot of__ parameters to be optimized if we want the best performance from a given model.
These can include: 
- The type of Kernel used in a SVM.
- Regularization terms, loss penalties.. (L1,L2..) 
- Use of different feature transformations (normalizing, scaling, feature extraction..) 
etc' . 

 This is time consuming(computation, storage and reporting) but CRUCIAL for most (but not all) models! 
 Consider these parameters an extra burden when you try to choose an optimal classifier out of multiple classifiers.

Scikit-learn provides a grid search with cross validation in `GridSearchCV`. This automatically finds the best hyperparameters, tested via cross-validation.

Note that it may take a lot of time to do - the number of parameters to be tested increases combinatorically (e.g. 2*5*3 = 30!). You may want to use a powerful multicore machine to do the grid search in the parameter space, and be "smart" about it in general (only tuning important parameters). 

In [5]:
%matplotlib inline 

import matplotlib as mlp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
from sklearn import cross_validation
from sklearn import datasets
from sklearn import ensemble
from sklearn import grid_search
from sklearn import metrics
import time

plt.style.use('fivethirtyeight')

In [3]:
california_housing = datasets.california_housing.fetch_california_housing()
california_housing_data = california_housing['data']
california_housing_labels = california_housing['target']# 'target' variables
california_housing_feature_names = california_housing['feature_names']

downloading Cal. housing from http://lib.stat.cmu.edu/modules.php?op=modload&name=Downloads&file=index&req=getit&lid=83 to C:\Users\User\scikit_learn_data


In [4]:
X_train, X_test, y_train, y_test = cross_validation.train_test_split(california_housing_data,
                                                    california_housing_labels,
                                                    test_size=0.2,
                                                    random_state=0)

In order to create a grid search object, it is necessary to pass the estimator, the parameters we want to optimize for that estimator, and optionally `n_job` in order to parallellize the parameter search. Grid search in parameter space is embarrasingly parallel by nature as different parameter combinations are independent from each other. 
You can also pass different scoring functions if you want to use another scoring function to optimize the best parameters for your classifier.

In [None]:
param_grid = {
              'learning_rate': [0.1, 0.05, 0.01],
              'max_depth': [4, 6],
              'min_samples_leaf': [3, 9, 15],
              'n_estimators': [1000, 2000, 3000],
              }

est = ensemble.GradientBoostingRegressor()


start_time = time.time()
gs_cv = grid_search.GridSearchCV(est, param_grid, n_jobs=-2).fit(X_train, y_train)
end_time = time.time()

print('It took {:.2f} seconds'.format(end_time - start_time))
# best hyperparameter setting
gs_cv.best_params_

If you want to get scores for each different parameter combination, `grid_scores_` attribute of GridSearch object provides a nice way to provide all of the scores for each parameter combination.

In [None]:
gs_cv.best_score_

In [None]:
gs_cv.grid_scores_

# Randomized Parameter Search

When you cannot do a comprehensive parameter search due to the number of parameters growing _really_ fast, then, we need to be _smarter_ about grid search. I.e. decrease the parameter search space without (hopefully) giving up too much performance. The performance of the classifier may not match to the grid search, but it may often approach it, for a fraction of the computation time. 

**RandomizedParameterSearch** in scikit-learn does this work for us and without losing too much performance, we have similar results to the GridSearch in a much faster way. Many other optimization methods exist, notably Bayesian methods and Spearmint.

In [None]:
param_grid = {
              'learning_rate': [0.1, 0.05, 0.01],
              'max_depth': [4, 6],
              'min_samples_leaf': [1, 2, 3, 4, 9, 15],
              'n_estimators': [1000, 2000, 3000],
              }

est = ensemble.GradientBoostingClassifier()

start_time = time.time()
# run randomized search
n_iter_search = 10
randomized_search = grid_search.RandomizedSearchCV(est, param_distributions=param_grid,
    n_iter=n_iter_search, n_jobs=4).fit(X_train, y_train)

gs_cv = grid_search.GridSearchCV(est, param_grid, n_jobs=4).fit(X_train, y_train)
end_time = time.time()

print('It took {} seconds'.format(end_time - start_time))