## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
from sklearn import datasets,metrics
from sklearn.model_selection import train_test_split, KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor

In [2]:
#load data
digits = datasets.load_digits()
#Split train/test data
x_train,x_test,y_train,y_test = train_test_split(digits.data,digits.target,test_size=0.25,random_state=42)
#Build model
clf = GradientBoostingRegressor(random_state=7)

In [3]:
#training and predict
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
print(metrics.mean_squared_error(y_pred,y_test))

1.4109841521829944


In [4]:
#設定要訓練的超參數組合
n_estimators = [100,200,300]
max_depth = [1,3,5]
param_grid = dict(n_estimators = n_estimators,max_depth = max_depth)
#建立搜尋物件、放入模型、參數組合字典(n_jobs=-1 會使用全部 cpu 平行運算)
grid_search = GridSearchCV(clf,param_grid,scoring="neg_mean_squared_error",n_jobs=-1,verbose=1)
#開始搜尋最佳參數
grid_result = grid_search.fit(x_train,y_train)


Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:   10.9s finished


In [5]:
#印出最佳結果以及最佳參數
print("Best Accuracy : %f using %s" %(grid_result.best_score_,grid_result.best_params_,))

Best Accuracy : -1.314280 using {'max_depth': 5, 'n_estimators': 300}


In [6]:
grid_result.best_params_

{'max_depth': 5, 'n_estimators': 300}

In [7]:
#Use the best parameter to train data again
clf_Best = GradientBoostingRegressor(n_estimators=grid_result.best_params_['n_estimators'],
                                         max_depth=grid_result.best_params_['max_depth'])
clf_Best.fit(x_train,y_train)
y_pred = clf_Best.predict(x_test)
print(metrics.mean_squared_error(y_pred,y_test))

0.9794535548428012
