## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
from sklearn.metrics import mean_squared_error, accuracy_score

import warnings
warnings.filterwarnings('ignore')

In [2]:
# original
wine=datasets.load_wine()
boston=datasets.load_boston()
x_train, x_test, y_train, y_test=train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)
X_train, X_test, Y_train, Y_test=train_test_split(wine.data, wine.target, test_size=0.25, random_state=4)

gbr=GradientBoostingRegressor(random_state=7)
gbr.fit(x_train, y_train)
y_pred=gbr.predict(x_test)
print('The original mean squared error of boston:',mean_squared_error(y_test, y_pred))

gbc=GradientBoostingClassifier(random_state=7)
gbc.fit(X_train, Y_train)
Y_pred=gbc.predict(X_test)
print('The original accuracy score of wine:', accuracy_score(Y_test, Y_pred))

The original mean squared error of boston: 10.599669562491401
The original accuracy score of wine: 0.9555555555555556


In [3]:
# best hyperparameter

n_estimators=[50, 100, 150]
max_depth=[1,3,5]
param_grid=dict(n_estimators=n_estimators, max_depth=max_depth)

grid_search=GridSearchCV(gbr, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
grid_search2=GridSearchCV(gbc, param_grid, scoring="accuracy", n_jobs=-1, verbose=1)

grid_result=grid_search.fit(x_train, y_train)
grid_result2=grid_search2.fit(X_train, Y_train)

print('Best accuracy for boston:', grid_result.best_score_,'by using:', grid_result.best_params_)

print('Best accuracy for wine:', grid_result2.best_score_,'by using:', grid_result2.best_params_)

Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    4.6s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Done  20 out of  27 | elapsed:    1.8s remaining:    0.6s


Best accuracy for boston: -10.887597244141425 by using: {'max_depth': 3, 'n_estimators': 150}
Best accuracy for wine: 0.9548872180451128 by using: {'max_depth': 1, 'n_estimators': 50}


[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    2.4s finished


In [4]:
gbr_bestparam=GradientBoostingRegressor(random_state=7,max_depth=grid_result.best_params_['max_depth'],
                                       n_estimators=grid_result.best_params_['n_estimators'])
gbr_bestparam.fit(x_train, y_train)
y_pred=gbr_bestparam.predict(x_test)
print('The mean squared error of boston after adjusting:',mean_squared_error(y_test, y_pred))


gbc_bestparam=GradientBoostingClassifier(random_state=7,max_depth=grid_result.best_params_['max_depth'],
                                       n_estimators=grid_result.best_params_['n_estimators'])
gbc_bestparam.fit(X_train, Y_train)
Y_pred=gbc_bestparam.predict(X_test)
print('The accuracy score of wine after adjusting:',accuracy_score(Y_test, Y_pred))

The mean squared error of boston after adjusting: 10.44030519062419
The accuracy score of wine after adjusting: 0.9555555555555556
