## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [21]:
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split, KFold, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingRegressor

In [22]:
boston = datasets.load_boston()
x_train, x_test, y_train, y_test= train_test_split(boston.data, boston.target, test_size=0.25, random_state=42)
reg=GradientBoostingRegressor(random_state=7)

In [23]:
reg.fit(x_train, y_train)
y_pred = reg.predict(x_test)
print(metrics.mean_squared_error(y_test,y_pred))

8.913775994322064


In [24]:
#grid search


n_estimators = [50, 100, 150, 200, 250]
max_depth = [1, 3, 5, 7, 9]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)

grid_search = GridSearchCV(reg, param_grid, scoring="neg_mean_squared_error", n_jobs=1, verbose=1)

grid_result = grid_search.fit(x_train, y_train)


Fitting 3 folds for each of 25 candidates, totalling 75 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  75 out of  75 | elapsed:    6.4s finished


In [25]:
print("Best Accuracy: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Best Accuracy: -13.118704 using {'max_depth': 3, 'n_estimators': 200}


In [26]:
reg_bestparam = GradientBoostingRegressor(max_depth = grid_result.best_params_['max_depth'], 
                                          n_estimators=grid_result.best_params_['n_estimators'])

reg_bestparam.fit(x_train, y_train)
y_pred=reg_bestparam.predict(x_test)

In [27]:
print(metrics.mean_squared_error(y_test, y_pred))

8.264273803534515


In [45]:
#Random search
n_estimators = [50, 100, 150, 200, 250]
max_depth = [1, 3, 5, 7, 9]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)
random_search = RandomizedSearchCV(reg, param_grid, n_iter=100,scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
random_result = random_search.fit(x_train, y_train)



[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 3 folds for each of 25 candidates, totalling 75 fits


[Parallel(n_jobs=-1)]: Done  75 out of  75 | elapsed:    3.0s finished


In [46]:
print("Best Accuracy: %f using %s" % (random_result.best_score_, random_result.best_params_))

Best Accuracy: -13.118704 using {'n_estimators': 200, 'max_depth': 3}


In [47]:
reg_bestparam = GradientBoostingRegressor(max_depth=random_result.best_params_['max_depth'],
                                           n_estimators=random_result.best_params_['n_estimators'])
reg_bestparam.fit(x_train, y_train)

y_pred = reg_bestparam.predict(x_test)

In [48]:
print(metrics.mean_squared_error(y_test, y_pred))

8.880270672421146
