## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [6]:
from sklearn import datasets, metrics
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.model_selection import train_test_split, GridSearchCV

digits = datasets.load_digits()
X = digits.data
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(X, y ,test_size = 0.2 , random_state = 0)
model_1 = GradientBoostingClassifier()
model_1.fit(X_train, y_train)
predict_train = model_1.predict(X_train)
predict_test = model_1.predict(X_test)

acc_train = metrics.accuracy_score(y_train , predict_train)
print('acc_train',acc_train)
acc_test = metrics.accuracy_score(y_test , predict_test)
print('acc_test',acc_test)

acc_train 1.0
acc_test 0.9638888888888889


In [13]:
### 調整 learning_rate, n_estimators ###
learning_rate = [0.005, 0.01,0.05,0.1]
n_estimators = [30,50,70,100,130,150,180,200]
#max_depth = [2 ,3 ,5 ,7 ,10 , 12 ,15]
#min_samples_split = [2,5,8,11,14,17,20,23,26,29,32,35,38,41,44,47,50]
#min_samples_leaf = [1,3,5,7,9,11,13,15,17,19]
#max_features = []
#subsample = [0.6, 0.7, 0.8, 0.9, 1]
param_grid = dict(learning_rate=learning_rate, n_estimators=n_estimators)
#param_grid = dict(learning_rate=learning_rate, n_estimators=n_estimators, max_depth=max_depth, min_samples_split=min_samples_split, min_samples_leaf=min_samples_leaf, subsample=subsample)
model_1_grids = GridSearchCV(model_1, param_grid , scoring = 'accuracy', n_jobs = -1, verbose = 1 , cv = 3)
model_1_grids.fit(X_train, y_train)
print('best_params', model_1_grids.best_params_)
print('best_score',model_1_grids.best_score_)

Fitting 3 folds for each of 32 candidates, totalling 96 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   27.5s
[Parallel(n_jobs=-1)]: Done  96 out of  96 | elapsed:  1.2min finished


best_params {'learning_rate': 0.1, 'n_estimators': 200}
best_score 0.9582463465553236


In [14]:
### 調整 max_depth, min_samples_split ###

model_2 = GradientBoostingClassifier(learning_rate = 0.1, n_estimators = 200)
param_grid = dict(learning_rate=learning_rate, n_estimators=n_estimators)
max_depth = [2 ,3 ,5 ,7 ,10 , 12 ,15]
min_samples_split = [2,5,11,23,32,41,50]
param_grid = dict(max_depth=max_depth, min_samples_split=min_samples_split)
model_1_grids_2 = GridSearchCV(model_2, param_grid , scoring = 'accuracy', n_jobs = -1, verbose = 1 , cv = 3)
model_1_grids_2.fit(X_train, y_train)
print('best_params', model_1_grids_2.best_params_)
print('best_score',model_1_grids_2.best_score_)

Fitting 3 folds for each of 49 candidates, totalling 147 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   32.8s
[Parallel(n_jobs=-1)]: Done 147 out of 147 | elapsed:  2.6min finished


best_params {'max_depth': 3, 'min_samples_split': 2}
best_score 0.9575504523312457


In [15]:
### 調整 min_samples_leaf, subsample ###

model_3 = GradientBoostingClassifier(learning_rate = 0.1, n_estimators = 200, max_depth = 3, min_samples_split =2)
min_samples_leaf = [1,3,5,9,13,15,19]
subsample = [0.6, 0.7, 0.8, 0.9, 0.99]
param_grid = dict(min_samples_leaf=min_samples_leaf, subsample=subsample)
model_1_grids_3 = GridSearchCV(model_2, param_grid , scoring = 'accuracy', n_jobs = -1, verbose = 1 , cv = 3)
model_1_grids_3.fit(X_train, y_train)
print('best_params', model_1_grids_3.best_params_)
print('best_score',model_1_grids_3.best_score_)

Fitting 3 folds for each of 35 candidates, totalling 105 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   30.3s
[Parallel(n_jobs=-1)]: Done 105 out of 105 | elapsed:  1.4min finished


best_params {'min_samples_leaf': 15, 'subsample': 0.7}
best_score 0.9679888656924147


In [16]:
#### 準確率提升了 1.4%

model_4 = GradientBoostingClassifier(learning_rate = 0.1, n_estimators = 200, max_depth = 3, min_samples_split =2, min_samples_leaf=15, subsample =0.7)
model_4.fit(X_train, y_train)
predict_train = model_4.predict(X_train)
predict_test = model_4.predict(X_test)

acc_train = metrics.accuracy_score(y_train , predict_train)
print('acc_train',acc_train)
acc_test = metrics.accuracy_score(y_test , predict_test)
print('acc_test',acc_test)

acc_train 1.0
acc_test 0.9777777777777777
