## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split, KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
import numpy as np

In [2]:
boston = datasets.load_boston()
b_train_x, b_test_x, b_train_y, b_test_y = train_test_split(boston.data, boston.target, test_size = 0.3, random_state = 42)
reg = GradientBoostingRegressor(random_state=7)
reg.fit(b_train_x, b_train_y)
b_pred_reg = reg.predict(b_test_x)
error = metrics.mean_squared_error(b_test_y, b_pred_reg)
print("Error rate:{0}".format(error))

Error rate:8.525289929280651


In [3]:
n_estimators = np.arange(10, 100, 20)
max_depth = np.arange(1, 10, 2)
param_grid = dict(n_estimators = n_estimators, max_depth = max_depth)

In [4]:
grid_search_reg = GridSearchCV(reg, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
grid_result_reg = grid_search_reg.fit(b_train_x, b_train_y)
print("Smallest error: %f using %s" % (grid_result_reg.best_score_, grid_result_reg.best_params_))

Fitting 3 folds for each of 25 candidates, totalling 75 fits
Smallest error: -13.652855 using {'max_depth': 3, 'n_estimators': 90}


[Parallel(n_jobs=-1)]: Done  75 out of  75 | elapsed:    2.1s finished


In [8]:
reg_bestParam = GradientBoostingRegressor(n_estimators = grid_result_reg.best_params_['n_estimators'], 
                                          max_depth = grid_result_reg.best_params_['max_depth'])
reg_bestParam.fit(b_train_x, b_train_y)
best_pred = reg_bestParam.predict(b_test_x)
best_error = metrics.mean_squared_error(b_test_y, best_pred)
print("Best Error rate:{0}".format(best_error))

Best Error rate:8.123915339569658


In [34]:
wine = datasets.load_wine()
w_train_x, w_test_x, w_train_y, w_test_y = train_test_split(wine.data, wine.target, test_size = 0.3, random_state = 4)
cla = GradientBoostingClassifier(n_estimators = 30, max_depth = 10, random_state=7)
cla.fit(w_train_x, w_train_y.astype(int))
w_pred_cla = cla.predict(w_test_x)
acc = metrics.accuracy_score(w_test_y.astype(int), w_pred_cla.astype(int))
print('Accuracy:{0}'.format(acc))

Accuracy:0.9259259259259259


In [35]:
grid_search_cla = GridSearchCV(cla, param_grid, scoring="accuracy", n_jobs=-1, verbose=1)
grid_result_cla = grid_search_cla.fit(w_train_x, w_train_y.astype(int))
print("Largest Accuracy: %f using %s" % (grid_result_cla.best_score_, grid_result_cla.best_params_))

Fitting 3 folds for each of 25 candidates, totalling 75 fits
Largest Accuracy: 0.975806 using {'max_depth': 1, 'n_estimators': 30}


[Parallel(n_jobs=-1)]: Done  75 out of  75 | elapsed:    2.5s finished


In [38]:
cla_bestParam = GradientBoostingClassifier(n_estimators = grid_result_cla.best_params_['n_estimators'], 
                                          max_depth = grid_result_cla.best_params_['max_depth'])
cla_bestParam.fit(w_train_x, w_train_y.astype(int))
best_pred_cla = cla_bestParam.predict(w_test_x)
best_acc_cla = metrics.accuracy_score(w_test_y.astype(int), best_pred_cla.astype(int))
print("Best Accuracy:{0}".format(best_acc_cla))

Best Accuracy:0.9444444444444444
