## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [14]:
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split, KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier

In [15]:
#Load and return the wine dataset (classification).
wine = datasets.load_wine()
#split
fea_train, fea_test, label_train, label_test = train_test_split(wine.data, wine.target, test_size = 0.1, random_state=1)
#create model
gbc = GradientBoostingClassifier()
#train model
gbc.fit(fea_train, label_train)
#pred
org_pred = gbc.predict(fea_test)
#accuracy
print(f"Accuracy : {metrics.accuracy_score(org_pred, label_test)}")

Accuracy : 0.9444444444444444


In [16]:
# 設定要訓練的超參數組合
n_estimators = [100, 200, 300]
max_depth = [1, 3, 5]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)

## 建立搜尋物件，放入模型gbc及參數組合字典param_grid (n_jobs=-1 會使用全部 cpu 平行運算)
"""
class GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, iid='warn', refit=True, cv='warn', verbose=0, pre_dispatch='2*n_jobs', error_score='raise-deprecating', return_train_score=False)
Exhaustive search over specified parameter values for an estimator.

Important members are fit, predict.

GridSearchCV implements a "fit" and a "score" method. It also implements "predict", "predict_proba", "decision_function", "transform" and "inverse_transform" if they are implemented in the estimator used.

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
"""
para_search = GridSearchCV(gbc, param_grid, n_jobs=-1, verbose=1)
# 開始搜尋最佳參數
grid_result = para_search.fit(fea_train, label_train)
# PS.預設會跑 3-fold cross-validadtion，總共 9 種參數組合，總共要 train 27 次模型

Fitting 3 folds for each of 9 candidates, totalling 27 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  12 out of  27 | elapsed:    0.6s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    1.3s finished


In [17]:
# 印出最佳結果與最佳參數
print(f"Best accuracy : {grid_result.best_score_}, by using parameters : {grid_result.best_params_}")

Best accuracy : 0.98125, by using parameters : {'max_depth': 1, 'n_estimators': 300}


In [19]:
#using best parameters to create new model 
gbc_bestpara = GradientBoostingClassifier(n_estimators=grid_result.best_params_['n_estimators'], max_depth=grid_result.best_params_['max_depth'])
#train model
gbc_bestpara.fit(fea_train, label_train)
#pred
bestpara_pred = gbc_bestpara.predict(fea_test)
#accuracy
print(f"Accuracy : {metrics.accuracy_score(bestpara_pred, label_test)}")

Accuracy : 1.0
