## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [56]:
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split, KFold, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier

In [57]:
# 讀取葡萄酒資料集
wine = datasets.load_wine()
# 切分資料訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.25, random_state=42)
# 建立模型
clf = GradientBoostingClassifier(random_state=10)

clf.get_params()

{'ccp_alpha': 0.0,
 'criterion': 'friedman_mse',
 'init': None,
 'learning_rate': 0.1,
 'loss': 'deviance',
 'max_depth': 3,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_iter_no_change': None,
 'presort': 'deprecated',
 'random_state': 10,
 'subsample': 1.0,
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': 0,
 'warm_start': False}

In [58]:
# 觀察使用預設參數得到的結果
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
acc = metrics.accuracy_score(y_test, y_pred)
print("Oriring accuracy: %f " %acc)

Oriring accuracy: 0.911111 


In [59]:
# 設定要訓練的超參數組合

# loss function to be optimized
loss = ['deviance', 'exponential']
# Maximum depth of the individual regression estimators
max_depth = [3, 5, 7, 9]
# The fraction of samples to be used for fitting the individual base learners
subsample = [0.8, 0.9, 1]
# The number of boosting stages to perform
n_estimators = [50, 100, 150]

# Create the random grid
param_grid = {
    'loss': loss,
    'subsample': subsample,
    'n_estimators': n_estimators
}

# 建立搜尋物件
## Random Search
random_search = RandomizedSearchCV(clf, param_grid, n_iter = 100, verbose=2, random_state=42, n_jobs = -1)
## Grid Search
grid_search = GridSearchCV(clf, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)

## Random Search 搜尋參數
random_result = random_search.fit(x_train, y_train)
## Grid Search 搜尋最佳參數
grid_result = grid_search.fit(x_train, y_train)

Fitting 5 folds for each of 18 candidates, totalling 90 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    3.2s
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:    4.7s finished


Fitting 5 folds for each of 18 candidates, totalling 90 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    4.5s
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:    5.2s finished


In [60]:
# Random Search
print(random_result.best_params_)

# Grid Search
print(grid_result.best_params_)

{'subsample': 0.8, 'n_estimators': 50, 'loss': 'deviance'}
{'loss': 'deviance', 'n_estimators': 50, 'subsample': 0.8}


In [61]:
# 使用隨機參數重新建立模型
clf_bestparam_random = GradientBoostingClassifier(n_estimators=random_result.best_params_['n_estimators'],
                                           subsample=random_result.best_params_['subsample'],
                                           loss=random_result.best_params_['loss'])
# 使用最佳參數重新建立模型
clf_bestparam_grid = GradientBoostingClassifier(n_estimators=grid_result.best_params_['n_estimators'],
                                           subsample=grid_result.best_params_['subsample'],
                                           loss=grid_result.best_params_['loss'])
# 訓練模型
clf_bestparam_random.fit(x_train, y_train)
clf_bestparam_grid.fit(x_train, y_train)

# 預測測試集
y_pred_random = clf_bestparam_random.predict(x_test)
y_pred_grid = clf_bestparam_grid.predict(x_test)

In [62]:
# 觀察使用隨機參數得到的結果
clf_bestparam_random.fit(x_train, y_train)
y_pred_random = clf_bestparam_random.predict(x_test)
acc_random = metrics.accuracy_score(y_test, y_pred)
# 觀察使用最佳參數得到的結果
clf_bestparam_grid.fit(x_train, y_train)
y_pred_grid = clf_bestparam_grid.predict(x_test)
acc_grid = metrics.accuracy_score(y_test, y_pred)

print("Random Search accuracy: %f " %acc)
print("\nGrid Search accuracy: %f " %acc)

Random Search accuracy: 0.911111 

Grid Search accuracy: 0.911111 
