## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
import warnings

warnings.simplefilter('ignore')

# Datasets
from sklearn import datasets

# Preprocessing
from sklearn.model_selection import train_test_split, KFold, GridSearchCV

# Model
from sklearn.ensemble import GradientBoostingClassifier

# Evaluation
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [6]:
# 讀取手寫辨識資料集
digits = datasets.load_digits()

# 切分訓練集/測試集
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=0)

In [9]:
# 建立 GradientBoostingClassifier 模型
gdb_clf = GradientBoostingClassifier()

# Model training and hyper-parameters tuning
gdb_clf_param_grid = {"max_depth": [3, 5],
                      "min_samples_split": [10, 20],
                      "min_samples_leaf": [1],
                      "max_features": [None],
                      "learning_rate": [0.1],
                      "n_estimators": [100, 250]}

gsgdb_clf = GridSearchCV(gdb_clf, param_grid=gdb_clf_param_grid, cv=5, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
gsgdb_clf.fit(X_train, y_train)


# Best score
print(f"Best CV score of GradientBoostingClassifier: {(gsgdb_clf.best_score_):.5f}")

# Best parameters
gsgdb_clf_best = gsgdb_clf.best_estimator_
print("Best parameters of GradientBoostingClassifier:\n", gsgdb_clf_best)

# Predict by model
y_pred = gsgdb_clf_best.predict(X_test)

# Acuuracy
print(f"Accuracy of best GradientBoostingClassifier: {accuracy_score(y_test, y_pred):.5f}")

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:  1.5min finished


Best CV score of GradientBoostingClassifier: -0.65256
Best parameters of GradientBoostingClassifier:
 GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=3,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=10,
              min_weight_fraction_leaf=0.0, n_estimators=250,
              n_iter_no_change=None, presort='auto', random_state=None,
              subsample=1.0, tol=0.0001, validation_fraction=0.1,
              verbose=0, warm_start=False)
Accuracy of best GradientBoostingClassifier: 0.96889
