# [ハイパーパラメータ最適化](http://scikit-learn.org/stable/modules/grid_search.html)

- グリッドサーチ（[GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV)）
    - ハイパーパラメータごとに試す値を決める
    - すべての組み合わせを試す
- ランダムサーチ（[RandomizedSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.RandomizedSearchCV.html#sklearn.grid_search.RandomizedSearchCV)）
    - ハイパーパラメータごとに分布を決める
    - 分布からサンプルを引いて試す
    - ハイパーパラメータの空間が高次元な場合はけっこう上手くいく（[Bergstra and Bengio, 2012](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf)）
- ベイズ最適化（Bayesian optimization, sequential model-based optimization）
    - 今の流行り。バリデーションセットに対するパフォーマンスをGaussian Processで予測。Expected Improvementが高いハイパーパラメータの値を勾配法で見つけ、逐次的に試していく。

## ハイパーパラメータの確認方法

`estimator.get_params()` とすれば良い。

In [1]:
from sklearn import svm
from sklearn import linear_model

In [2]:
model = svm.SVC(kernel='linear', C=1)

In [3]:
model.get_params()

{'C': 1,
 'cache_size': 200,
 'class_weight': None,
 'coef0': 0.0,
 'degree': 3,
 'gamma': 0.0,
 'kernel': 'linear',
 'max_iter': -1,
 'probability': False,
 'random_state': None,
 'shrinking': True,
 'tol': 0.001,
 'verbose': False}

In [4]:
model = linear_model.Lasso()

In [5]:
model.get_params()

{'alpha': 1.0,
 'copy_X': True,
 'fit_intercept': True,
 'max_iter': 1000,
 'normalize': False,
 'positive': False,
 'precompute': False,
 'random_state': None,
 'selection': 'cyclic',
 'tol': 0.0001,
 'warm_start': False}

## [グリッドサーチ](http://scikit-learn.org/stable/modules/grid_search.html#exhaustive-grid-search)（[`GridSearchCV`](http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV)）

[`sklearn.grid_search.GridSearchCV`](http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV)を使う

In [6]:
param_grid = [
  {'kernel': ['linear'], 'C': [1, 10, 100, 1000]},
  {'kernel': ['rbf'], 'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001]},
 ]

In [7]:
# Original: 
# http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html#example-model-selection-grid-search-digits-py

from __future__ import print_function

from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC

print(__doc__)

# Loading the Digits dataset
digits = datasets.load_digits()

# To apply an classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target

# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0)

# Set the parameters by cross-validation
param_grid = [
    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]},
    {'kernel': ['rbf'], 'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001]},
]

#scores = ['precision_weighted', 'recall_weighted', 'f1_weighted']
score = 'accuracy'

#for score in scores:
print("===========================")
print("# Tuning hyper-parameters for %s" % score)
print()

# ここがキモ！
clf = GridSearchCV(SVC(), param_grid, cv=5,
                   scoring=score)
clf.fit(X_train, y_train)

print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
print("Grid scores on development set:")
print()
for params, mean_score, scores in clf.grid_scores_:
    print("%0.3f (+/-%0.03f) for %r"
          % (mean_score, scores.std() * 1.96, params))
print()

print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
print()

# Note the problem is too easy: the hyperparameter plateau is too flat and the
# output model is the same for precision and recall with ties in quality.

Automatically created module for IPython interactive environment
# Tuning hyper-parameters for accuracy

Best parameters set found on development set:

{'kernel': 'rbf', 'C': 10, 'gamma': 0.001}

Grid scores on development set:

0.973 (+/-0.014) for {'kernel': 'linear', 'C': 1}
0.973 (+/-0.014) for {'kernel': 'linear', 'C': 10}
0.973 (+/-0.014) for {'kernel': 'linear', 'C': 100}
0.973 (+/-0.014) for {'kernel': 'linear', 'C': 1000}
0.986 (+/-0.020) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.001}
0.958 (+/-0.029) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.0001}
0.987 (+/-0.020) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.001}
0.981 (+/-0.028) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.0001}
0.987 (+/-0.020) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.001}
0.981 (+/-0.026) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.0001}
0.987 (+/-0.020) for {'kernel': 'rbf', 'C': 1000, 'gamma': 0.001}
0.981 (+/-0.026) for {'kernel': 'rbf', 'C': 1000, 'gamma': 0.0001}

Detailed classification report:

The model i

## [ランダムサーチ](http://scikit-learn.org/stable/modules/grid_search.html#randomized-parameter-optimization)（[RandomizedSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.RandomizedSearchCV.html#sklearn.grid_search.RandomizedSearchCV)）

- パラメータ上の分布は scipy.stats で定義されている expon, gamma, uniform or randint などを使って指定すれば良い
- シードの設定にはnp.random.seedかnp.random.set_stateを使う

In [8]:
# Original: http://scikit-learn.org/stable/auto_examples/model_selection/randomized_search.html#example-model-selection-randomized-search-py

#print(__doc__)

import numpy as np

from time import time
from operator import itemgetter
from scipy.stats import randint as sp_randint

from sklearn.grid_search import GridSearchCV, RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier

# get some data
digits = load_digits()
X, y = digits.data, digits.target

# build a classifier
clf = RandomForestClassifier(n_estimators=20)

# Utility function to report best scores
def report(grid_scores, n_top=3):
    top_scores = sorted(grid_scores, key=itemgetter(1), reverse=True)[:n_top]
    for i, score in enumerate(top_scores):
        print("Model with rank: {0}".format(i + 1))
        print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
              score.mean_validation_score,
              np.std(score.cv_validation_scores)))
        print("Parameters: {0}".format(score.parameters))
        print("")


#################
# Random Search
#################
        
# specify parameters and distributions to sample from
param_dist = {"max_depth": [3, None],
              "max_features": sp_randint(1, 11),
              "min_samples_split": sp_randint(1, 11),
              "min_samples_leaf": sp_randint(1, 11),
              "bootstrap": [True, False],
              "criterion": ["gini", "entropy"]}

# run randomized search
n_iter_search = 20
random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
                                   n_iter=n_iter_search)

start = time()
random_search.fit(X, y)
print("===============")
print("RandomizedSearchCV took %.2f seconds for %d candidates"
      " parameter settings." % ((time() - start), n_iter_search))
report(random_search.grid_scores_)


#################
# Grid Search
#################

# use a full grid over all parameters
param_grid = {"max_depth": [3, None],
              "max_features": [1, 3, 10],
              "min_samples_split": [1, 3, 10],
              "min_samples_leaf": [1, 3, 10],
              "bootstrap": [True, False],
              "criterion": ["gini", "entropy"]}

# run grid search
grid_search = GridSearchCV(clf, param_grid=param_grid)
start = time()
grid_search.fit(X, y)

print("===============")
print("GridSearchCV took %.2f seconds for %d candidate parameter settings."
      % (time() - start, len(grid_search.grid_scores_)))
report(grid_search.grid_scores_)

RandomizedSearchCV took 1.73 seconds for 20 candidates parameter settings.
Model with rank: 1
Mean validation score: 0.932 (std: 0.012)
Parameters: {'bootstrap': False, 'min_samples_leaf': 1, 'min_samples_split': 2, 'criterion': 'entropy', 'max_features': 4, 'max_depth': None}

Model with rank: 2
Mean validation score: 0.922 (std: 0.011)
Parameters: {'bootstrap': False, 'min_samples_leaf': 3, 'min_samples_split': 6, 'criterion': 'gini', 'max_features': 7, 'max_depth': None}

Model with rank: 3
Mean validation score: 0.917 (std: 0.025)
Parameters: {'bootstrap': True, 'min_samples_leaf': 2, 'min_samples_split': 2, 'criterion': 'gini', 'max_features': 9, 'max_depth': None}

GridSearchCV took 18.41 seconds for 216 candidate parameter settings.
Model with rank: 1
Mean validation score: 0.938 (std: 0.012)
Parameters: {'bootstrap': False, 'min_samples_leaf': 1, 'min_samples_split': 1, 'criterion': 'gini', 'max_features': 10, 'max_depth': None}

Model with rank: 2
Mean validation score: 0.931 

In [9]:
from sklearn.linear_model import Lasso

In [10]:
model = Lasso()

In [11]:
model.get_params()

{'alpha': 1.0,
 'copy_X': True,
 'fit_intercept': True,
 'max_iter': 1000,
 'normalize': False,
 'positive': False,
 'precompute': False,
 'random_state': None,
 'selection': 'cyclic',
 'tol': 0.0001,
 'warm_start': False}

In [12]:
param_grid = {'alpha': [0.1, 0.5, 1.0],
             'normalize': [True, False]}

In [13]:
grid_search = GridSearchCV(Lasso(), param_grid=param_grid, cv=10)

In [14]:
grid_search.fit(X_train, y_train)

GridSearchCV(cv=10, error_score='raise',
       estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False),
       fit_params={}, iid=True, loss_func=None, n_jobs=1,
       param_grid={'normalize': [True, False], 'alpha': [0.1, 0.5, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
       verbose=0)

In [15]:
grid_search.best_params_

{'alpha': 0.1, 'normalize': False}

In [16]:
grid_search.grid_scores_

[mean: -0.00904, std: 0.01105, params: {'normalize': True, 'alpha': 0.1},
 mean: 0.57697, std: 0.05760, params: {'normalize': False, 'alpha': 0.1},
 mean: -0.00904, std: 0.01105, params: {'normalize': True, 'alpha': 0.5},
 mean: 0.52793, std: 0.05738, params: {'normalize': False, 'alpha': 0.5},
 mean: -0.00904, std: 0.01105, params: {'normalize': True, 'alpha': 1.0},
 mean: 0.46233, std: 0.05712, params: {'normalize': False, 'alpha': 1.0}]

## [ハイパーパラメータサーチのコツ](http://scikit-learn.org/stable/modules/grid_search.html#tips-for-parameter-search)

### コツ１：objective functionに気をつける

- デフォルトだとestimetor.scoreに設定されているmetricsが使われる
    - classificaiton だと sklearn.metrics.accuracy_score
    - regression だと sklearn.metrics.r2_score
- クラスのバランスが悪い場合、accuracy_scoreはよろしくない。
- GridSearchCVやRandomizedSearchCVのscoringパラメータで代わりのmetricsを設定（例：precision_weighted, recall_weighted, f1_wighted等）
- その他のmetricsについては次のリンクを参照：[The scoring parameter: defining model evaluation rules](http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter)

### コツ２：Pipelineを使う

- 前処理＋学習など複数の処理を多段に組み合わせた場合に、各処理に関連するハイパーパラメータの空間を合わせて探索する場合にはPipelineを用いると吉
- 詳しくは[Pipeline: chaining estimators](http://scikit-learn.org/stable/modules/pipeline.html#pipeline-chaining-estimators)を参照

### コツ３：テストセットを分けておく

- `sklearn.cross_validation.train_test_split`でテストデータをわけておき、CVにはテストデータ以外のデータ（[development setと呼ぶこともあるらしい...](http://scikit-learn.org/stable/modules/grid_search.html#model-selection-development-and-evaluation)）だけを用いる

### コツ４：並列化をする

- GridSearchCV と RandomizedSearchCV には n_jobs というパラメータがあるので、これを調整するとCVを並列化できる。n_jobsでジョブ数を指定。n_jobs=-1ですべてのコアを使うことができる。

### コツ５：失敗に対してロバストにする

- 特定のパラメータの組み合わせだとfitできない場合がある。この場合、CV全体がエラーで止まってしまう...
- `error_score = 0` もしくは `error_score = np.NaN` とすることで、エラーが出ても、最後までハイパーパラメータの探索を続けることができる

## [モデル依存なパラメータサーチ](http://scikit-learn.org/stable/modules/grid_search.html#alternatives-to-brute-force-parameter-search)

- モデルによってはGrid Searchなどより効率的なハイパーパラメータサーチの方法があるので、その方法を用いるという手もある。
- そういう方法が使えるモデルについては以下のリンクを参照: [Model specific cross-validation](http://scikit-learn.org/stable/modules/grid_search.html#model-specific-cross-validation)