## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
from sklearn import datasets, metrics
from sklearn.model_selection import KFold, train_test_split, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor

In [2]:
boston = datasets.load_boston()

x_train,x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size= 0.25, random_state = 42)

clf = GradientBoostingRegressor(random_state=7)
#train model
clf.fit(x_train, y_train)
pred =  clf.predict(x_test)

print ('MSE:',metrics.mean_squared_error(pred, y_test))

MSE: 8.379385957699272


In [23]:
#設定超參數組合
n_estimators = [100,200,300,250]
max_depth = [1, 3 ,5, 2]
param_grid = dict (n_estimators = n_estimators, max_depth = max_depth )
#設定搜尋物件
grid_search = GridSearchCV(clf, param_grid, scoring= 'neg_mean_squared_error', n_jobs= -1, verbose=1)
#找最佳參數
grid_result = grid_search.fit(x_train, y_train)

Fitting 3 folds for each of 20 candidates, totalling 60 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    3.3s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:    4.0s finished


In [24]:
#印出最佳參數
print("best accuracy:%f using %s" % (grid_result.best_score_, grid_result.best_params_))

best accuracy:-13.308562 using {'max_depth': 3, 'n_estimators': 300}


In [11]:
grid_result.best_params_

{'max_depth': 3, 'n_estimators': 300}

In [14]:
#用最佳參數重建模型
clf_best = GradientBoostingRegressor(n_estimators= grid_result.best_params_['n_estimators'],
                                    max_depth = grid_result.best_params_['max_depth'])
clf_best.fit(x_train,y_train)

y_pred = clf_best.predict(x_test)
print ("mse:", metrics.mean_squared_error(y_pred, y_test))

mse: 7.934048422227054
