## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [9]:
import numpy as np
from sklearn import datasets, metrics
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split, KFold, GridSearchCV, RandomizedSearchCV

In [3]:
wine  = datasets.load_wine()

In [7]:
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.25, random_state=4)
gbdt = GradientBoostingClassifier()
gbdt.fit(x_train, y_train)
y_predict = gbdt.predict(x_test)
print("accuracy: {}".format(metrics.accuracy_score(y_true=y_test, y_pred=y_predict)))

accuracy: 0.9555555555555556


In [19]:
n_estimators = [int(x) for x in range(100, 1100, 100)]
max_features = ["auto", "log2"]
max_depth = [int(x) for x in range(5, 21, 1)]
random_grid = dict(n_estimators=n_estimators, max_features=max_features, max_depth=max_depth)

In [21]:
random_search = RandomizedSearchCV(estimator=gbdt, param_distributions=random_grid, n_iter=100, cv=5, verbose=2,)

In [22]:
random_search.fit(x_train, y_train)

Fitting 5 folds for each of 100 candidates, totalling 500 fits
[CV] n_estimators=200, max_features=auto, max_depth=15 ...............


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  n_estimators=200, max_features=auto, max_depth=15, total=   0.3s
[CV] n_estimators=200, max_features=auto, max_depth=15 ...............


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s


[CV]  n_estimators=200, max_features=auto, max_depth=15, total=   0.2s
[CV] n_estimators=200, max_features=auto, max_depth=15 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=15, total=   0.2s
[CV] n_estimators=200, max_features=auto, max_depth=15 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=15, total=   0.2s
[CV] n_estimators=200, max_features=auto, max_depth=15 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=15, total=   0.2s
[CV] n_estimators=600, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=600, max_features=auto, max_depth=17, total=   0.7s
[CV] n_estimators=600, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=600, max_features=auto, max_depth=17, total=   0.5s
[CV] n_estimators=600, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=600, max_features=auto, max_depth=17, total=   0.5s
[CV] n_estimators=600, max_features=auto, max_depth=17 ...............
[CV]  

[CV]  n_estimators=100, max_features=auto, max_depth=12, total=   0.1s
[CV] n_estimators=500, max_features=auto, max_depth=13 ...............
[CV]  n_estimators=500, max_features=auto, max_depth=13, total=   0.4s
[CV] n_estimators=500, max_features=auto, max_depth=13 ...............
[CV]  n_estimators=500, max_features=auto, max_depth=13, total=   0.4s
[CV] n_estimators=500, max_features=auto, max_depth=13 ...............
[CV]  n_estimators=500, max_features=auto, max_depth=13, total=   0.4s
[CV] n_estimators=500, max_features=auto, max_depth=13 ...............
[CV]  n_estimators=500, max_features=auto, max_depth=13, total=   0.4s
[CV] n_estimators=500, max_features=auto, max_depth=13 ...............
[CV]  n_estimators=500, max_features=auto, max_depth=13, total=   0.5s
[CV] n_estimators=100, max_features=auto, max_depth=16 ...............
[CV]  n_estimators=100, max_features=auto, max_depth=16, total=   0.0s
[CV] n_estimators=100, max_features=auto, max_depth=16 ...............
[CV]  

[CV] . n_estimators=200, max_features=sqrt, max_depth=9, total=   0.1s
[CV] n_estimators=200, max_features=sqrt, max_depth=9 ................
[CV] . n_estimators=200, max_features=sqrt, max_depth=9, total=   0.2s
[CV] n_estimators=200, max_features=sqrt, max_depth=9 ................
[CV] . n_estimators=200, max_features=sqrt, max_depth=9, total=   0.1s
[CV] n_estimators=400, max_features=sqrt, max_depth=13 ...............
[CV]  n_estimators=400, max_features=sqrt, max_depth=13, total=   0.4s
[CV] n_estimators=400, max_features=sqrt, max_depth=13 ...............
[CV]  n_estimators=400, max_features=sqrt, max_depth=13, total=   0.6s
[CV] n_estimators=400, max_features=sqrt, max_depth=13 ...............
[CV]  n_estimators=400, max_features=sqrt, max_depth=13, total=   0.5s
[CV] n_estimators=400, max_features=sqrt, max_depth=13 ...............
[CV]  n_estimators=400, max_features=sqrt, max_depth=13, total=   0.4s
[CV] n_estimators=400, max_features=sqrt, max_depth=13 ...............
[CV]  

[CV]  n_estimators=100, max_features=sqrt, max_depth=16, total=   0.1s
[CV] n_estimators=100, max_features=sqrt, max_depth=16 ...............
[CV]  n_estimators=100, max_features=sqrt, max_depth=16, total=   0.0s
[CV] n_estimators=100, max_features=sqrt, max_depth=16 ...............
[CV]  n_estimators=100, max_features=sqrt, max_depth=16, total=   0.1s
[CV] n_estimators=100, max_features=sqrt, max_depth=16 ...............
[CV]  n_estimators=100, max_features=sqrt, max_depth=16, total=   0.0s
[CV] n_estimators=100, max_features=sqrt, max_depth=16 ...............
[CV]  n_estimators=100, max_features=sqrt, max_depth=16, total=   0.1s
[CV] n_estimators=1000, max_features=sqrt, max_depth=11 ..............
[CV]  n_estimators=1000, max_features=sqrt, max_depth=11, total=   0.9s
[CV] n_estimators=1000, max_features=sqrt, max_depth=11 ..............
[CV]  n_estimators=1000, max_features=sqrt, max_depth=11, total=   0.9s
[CV] n_estimators=1000, max_features=sqrt, max_depth=11 ..............
[CV]

[CV] . n_estimators=600, max_features=auto, max_depth=8, total=   0.5s
[CV] n_estimators=600, max_features=auto, max_depth=8 ................
[CV] . n_estimators=600, max_features=auto, max_depth=8, total=   0.5s
[CV] n_estimators=200, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=17, total=   0.1s
[CV] n_estimators=200, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=17, total=   0.1s
[CV] n_estimators=200, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=17, total=   0.1s
[CV] n_estimators=200, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=17, total=   0.2s
[CV] n_estimators=200, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=17, total=   0.2s
[CV] n_estimators=200, max_features=sqrt, max_depth=6 ................
[CV] .

[CV]  n_estimators=800, max_features=auto, max_depth=18, total=   0.7s
[CV] n_estimators=800, max_features=auto, max_depth=18 ...............
[CV]  n_estimators=800, max_features=auto, max_depth=18, total=   0.7s
[CV] n_estimators=800, max_features=auto, max_depth=18 ...............
[CV]  n_estimators=800, max_features=auto, max_depth=18, total=   0.7s
[CV] n_estimators=800, max_features=auto, max_depth=18 ...............
[CV]  n_estimators=800, max_features=auto, max_depth=18, total=   1.1s
[CV] n_estimators=900, max_features=sqrt, max_depth=10 ...............
[CV]  n_estimators=900, max_features=sqrt, max_depth=10, total=   1.2s
[CV] n_estimators=900, max_features=sqrt, max_depth=10 ...............
[CV]  n_estimators=900, max_features=sqrt, max_depth=10, total=   0.8s
[CV] n_estimators=900, max_features=sqrt, max_depth=10 ...............
[CV]  n_estimators=900, max_features=sqrt, max_depth=10, total=   0.8s
[CV] n_estimators=900, max_features=sqrt, max_depth=10 ...............
[CV]  

[CV]  n_estimators=500, max_features=auto, max_depth=19, total=   0.4s
[CV] n_estimators=300, max_features=auto, max_depth=9 ................
[CV] . n_estimators=300, max_features=auto, max_depth=9, total=   0.2s
[CV] n_estimators=300, max_features=auto, max_depth=9 ................
[CV] . n_estimators=300, max_features=auto, max_depth=9, total=   0.2s
[CV] n_estimators=300, max_features=auto, max_depth=9 ................
[CV] . n_estimators=300, max_features=auto, max_depth=9, total=   0.2s
[CV] n_estimators=300, max_features=auto, max_depth=9 ................
[CV] . n_estimators=300, max_features=auto, max_depth=9, total=   0.2s
[CV] n_estimators=300, max_features=auto, max_depth=9 ................
[CV] . n_estimators=300, max_features=auto, max_depth=9, total=   0.2s
[CV] n_estimators=300, max_features=sqrt, max_depth=7 ................
[CV] . n_estimators=300, max_features=sqrt, max_depth=7, total=   0.2s
[CV] n_estimators=300, max_features=sqrt, max_depth=7 ................
[CV] .

[CV]  n_estimators=400, max_features=auto, max_depth=17, total=   0.3s
[CV] n_estimators=400, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=400, max_features=auto, max_depth=17, total=   0.3s
[CV] n_estimators=400, max_features=auto, max_depth=17 ...............
[CV]  n_estimators=400, max_features=auto, max_depth=17, total=   0.6s
[CV] n_estimators=200, max_features=auto, max_depth=14 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=14, total=   0.3s
[CV] n_estimators=200, max_features=auto, max_depth=14 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=14, total=   0.1s
[CV] n_estimators=200, max_features=auto, max_depth=14 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=14, total=   0.2s
[CV] n_estimators=200, max_features=auto, max_depth=14 ...............
[CV]  n_estimators=200, max_features=auto, max_depth=14, total=   0.2s
[CV] n_estimators=200, max_features=auto, max_depth=14 ...............
[CV]  

[CV]  n_estimators=800, max_features=auto, max_depth=16, total=   0.7s
[CV] n_estimators=800, max_features=auto, max_depth=16 ...............
[CV]  n_estimators=800, max_features=auto, max_depth=16, total=   0.7s
[CV] n_estimators=800, max_features=auto, max_depth=16 ...............
[CV]  n_estimators=800, max_features=auto, max_depth=16, total=   0.7s
[CV] n_estimators=800, max_features=auto, max_depth=16 ...............
[CV]  n_estimators=800, max_features=auto, max_depth=16, total=   0.7s
[CV] n_estimators=800, max_features=auto, max_depth=16 ...............
[CV]  n_estimators=800, max_features=auto, max_depth=16, total=   0.7s
[CV] n_estimators=400, max_features=auto, max_depth=16 ...............
[CV]  n_estimators=400, max_features=auto, max_depth=16, total=   0.3s
[CV] n_estimators=400, max_features=auto, max_depth=16 ...............
[CV]  n_estimators=400, max_features=auto, max_depth=16, total=   0.3s
[CV] n_estimators=400, max_features=auto, max_depth=16 ...............
[CV]  

[Parallel(n_jobs=1)]: Done 500 out of 500 | elapsed:  4.7min finished


RandomizedSearchCV(cv=5, error_score='raise-deprecating',
          estimator=GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=3,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_sampl...      subsample=1.0, tol=0.0001, validation_fraction=0.1,
              verbose=0, warm_start=False),
          fit_params=None, iid='warn', n_iter=100, n_jobs=None,
          param_distributions={'n_estimators': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000], 'max_features': ['auto', 'sqrt'], 'max_depth': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          return_train_score='warn', scoring=None, verbose=2)

In [23]:
random_search.best_params_

{'n_estimators': 500, 'max_features': 'sqrt', 'max_depth': 19}

In [24]:
# retrain with best params
gbdt_best_params = GradientBoostingClassifier(n_estimators=random_search.best_params_["n_estimators"],
                                             max_features=random_search.best_params_["max_features"],
                                             max_depth=random_search.best_params_["max_depth"])

In [30]:
gbdt_best_params.fit(x_train, y_train)
y_pred_best_params = gbdt_best_params.predict(x_test)
print("accuracy:{}".format(metrics.accuracy_score(y_true=y_test,
                                                  y_pred=y_pred_best_params)))

accuracy:0.9777777777777777
