## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
from sklearn import datasets
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

In [2]:
digits = datasets.load_digits()
x = digits.data
y = digits.target

In [3]:
# random forest
train_x,test_x,train_y,test_y = train_test_split(x,y,test_size=0.2,random_state=42)
clf = RandomForestClassifier(n_estimators=100,max_depth=3)
clf.fit(train_x,train_y)
pred_y = clf.predict(test_x)
print(f'score：{accuracy_score(test_y,pred_y)}')

score：0.8833333333333333


In [4]:
# find the best param
param_grid = dict(n_estimators=[100,200,300],max_depth=[3,5,7,9])
gridSearch = GridSearchCV(clf,param_grid,n_jobs=-1,verbose=1)
grid_result = gridSearch.fit(train_x,train_y)
print(f'best score：{grid_result.best_score_}\nbest param：{grid_result.best_params_}')

Fitting 3 folds for each of 12 candidates, totalling 36 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:    3.8s finished


best score：0.9672929714683368
best param：{'max_depth': 9, 'n_estimators': 300}


In [5]:
clf = grid_result.best_estimator_
clf.fit(train_x,train_y)
pred_y = clf.predict(test_x)
print(f'score：{accuracy_score(test_y,pred_y)}')

score：0.9777777777777777


In [6]:
# gradient boost
train_x,test_x,train_y,test_y = train_test_split(x,y,test_size=0.2,random_state=42)
clf = GradientBoostingClassifier(n_estimators=10,max_depth=3)
clf.fit(train_x,train_y)
pred_y = clf.predict(test_x)
print(f'score：{accuracy_score(test_y,pred_y)}')

score：0.9222222222222223


In [7]:
# find the best param
param_grid = dict(n_estimators=[10,20,30],max_depth=[3,5,7,9])
gridSearch = GridSearchCV(clf,param_grid,n_jobs=-1,verbose=1)
grid_result = gridSearch.fit(train_x,train_y)
print(f'best score：{grid_result.best_score_}\nbest param：{grid_result.best_params_}')

Fitting 3 folds for each of 12 candidates, totalling 36 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:    7.0s finished


best score：0.9276270006958942
best param：{'max_depth': 3, 'n_estimators': 30}


In [8]:
clf = grid_result.best_estimator_
clf.fit(train_x,train_y)
pred_y = clf.predict(test_x)
print(f'score：{accuracy_score(test_y,pred_y)}')

score：0.9472222222222222
