## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.metrics import r2_score, accuracy_score

#### Boston dataset (regression problem)

In [2]:
boston = datasets.load_boston()
[x_train,x_test,y_train,y_test] = train_test_split(boston.data,boston.target,test_size=0.25,random_state=3)

Check the results with defalut hyperparameters

In [3]:
rgs = GradientBoostingRegressor()
rgs.fit(x_train,y_train)
y_pred = rgs.predict(x_test)
result = r2_score(y_test,y_pred)
print('R2 score of this regrssion problem is %.3f' % (result))

R2 score of this regrssion problem is 0.898


In [4]:
n_estimators = [100,200,300]
max_depth = range(3,11)
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)
grid_search = GridSearchCV(rgs, param_grid, 
                                 scoring='neg_mean_squared_error', n_jobs=-1, verbose=1)
grid_result = grid_search.fit(x_train,y_train)
print('Best Accuracy: %.3f using %s' % (grid_result.best_score_, grid_result.best_params_))
rgsOptimized = GradientBoostingRegressor(n_estimators=grid_result.best_params_['n_estimators'],
                                    max_depth=grid_result.best_params_['max_depth'])
rgsOptimized.fit(x_train,y_train)
y_pred = rgsOptimized.predict(x_test)
result = r2_score(y_test,y_pred)
print('------Learning results after optimizing hyperparameters------')
print('R2 score of this regrssion problem is %.3f' % (result))

Fitting 3 folds for each of 24 candidates, totalling 72 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    6.3s
[Parallel(n_jobs=-1)]: Done  72 out of  72 | elapsed:    9.5s finished


Best Accuracy: -10.984 using {'max_depth': 3, 'n_estimators': 200}
------Learning results after optimizing hyperparameters------
R2 score of this regrssion problem is 0.899


#### Breast cancer dataset (classification problem)

In [5]:
cancer = datasets.load_breast_cancer()
[x_train,x_test,y_train,y_test] = train_test_split(cancer.data,cancer.target,test_size=0.25,random_state=3)

In [6]:
clf = GradientBoostingClassifier()
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
result = accuracy_score(y_test,y_pred)
print('Accuracy score of this regrssion problem is %.3f' % (result))

Accuracy score of this regrssion problem is 0.944


In [7]:
n_estimators = [100,200,300]
max_depth = range(3,11)
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)
grid_search = GridSearchCV(rgs, param_grid, 
                                 scoring='neg_mean_squared_error', n_jobs=-1, verbose=1)
grid_result = grid_search.fit(x_train,y_train)
print('Best Accuracy: %.3f using %s' % (grid_result.best_score_, grid_result.best_params_))
clfOptimized = GradientBoostingClassifier(n_estimators=grid_result.best_params_['n_estimators'],
                                    max_depth=grid_result.best_params_['max_depth'])
clfOptimized.fit(x_train,y_train)
y_pred = clfOptimized.predict(x_test)
result = accuracy_score(y_test,y_pred)
print('------Learning results after optimizing hyperparameters------')
print('Accuracy score of this regrssion problem is %.3f' % (result))

Fitting 3 folds for each of 24 candidates, totalling 72 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    6.0s
[Parallel(n_jobs=-1)]: Done  72 out of  72 | elapsed:    8.0s finished


Best Accuracy: -0.040 using {'max_depth': 3, 'n_estimators': 200}
------Learning results after optimizing hyperparameters------
Accuracy score of this regrssion problem is 0.951
