## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

## [Reference]
- [劍橋實驗室教你如何調參數](https://cambridgecoding.wordpress.com/2016/04/03/scanning-hyperspace-how-to-tune-machine-learning-models/)
- [教你使用 Python 調整隨機森林參數](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74)
- 隨機搜尋通常都能獲得更更佳的結果，[Smarter Parameter Sweeps (or Why Grid Search Is Plain Stupid)](https://medium.com/rants-on-machine-learning/smarter-parameter-sweeps-or-why-grid-search-is-plain-stupid-c17d97a0e881)
- [Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/)

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

In [2]:
boston = datasets.load_boston()
x = boston.data
y = boston.target
print('Boston Housing Price Feature Array Shape:', x.shape)
print('Boston Housing Price Target Array Shape:', y.shape)

Boston Housing Price Feature Array Shape: (506, 13)
Boston Housing Price Target Array Shape: (506,)


In [3]:
cols = ['CRIM ', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO'
        , 'B', 'LSTAT', 'MEDV']
df = pd.DataFrame(np.concatenate((x, y.reshape(len(y), 1)), axis=1), columns=cols)
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


### [scikit-learn 梯度提升树(GBDT)调参小结](https://www.cnblogs.com/pinard/p/6143927.html)
### [sklearn.ensemble.GradientBoostingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
- loss : {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, optional (default=’ls’)
loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (use alpha to specify the quantile).
- learning_rate : float, optional (default=0.1):
每個弱學習器的權重縮減係數，也稱作步長。小的learning_rate意味着需要更多的弱學習器的迭代次數。通常用步長和迭代最大次數一起来决定算法的擬合效果。所以n_estimators和learning_rate要一起調參。
- n_estimators : int (default=100):
弱學習器的最大迭代次數，或者說最大的弱學習器的個數。一般來說n_estimators太小，容易欠擬合，n_estimators太大，又容易過擬合，一般選擇一個適中的數值。默認是100。在實際調參的過程中，我們常常將n_estimators和下面介紹的參數learning_rate一起考慮。
- alpha：The alpha-quantile of the huber loss function and the quantile loss function. Only if loss='huber' or loss='quantile'.
這個參數只有GradientBoostingRegressor有，當我們使用Huber損失"huber"和分位數損失“quantile”時，需要指定分位數的值。默認是0.9，如果噪音點較多，可以適當降低這個分位數的值。

- max_depth : integer, optional (default=3)
maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

In [4]:
# 用 Gradient Boosting regressor 
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=2) 
# 建立模型
reg = GradientBoostingRegressor(random_state=1)
# 訓練模型
reg.fit(x_train, y_train)
# 預測測試集
y_pred = reg.predict(x_test)
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 8.01


### GridSearchCV  
#### Reference
- Official API:[GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)
- [sklearn學習8-GridSearchCV(自動調參)](https://www.itread01.com/content/1529133726.html)
- [機器學習_ML_GridSearchCV_網格搜尋](https://martychen920.blogspot.com/2017/09/ml-gridsearchcv.html)
#### Parameter Introduction
- estimator:使用的分類器
- param_grid: list 或是 dict 做最佳化的參數
- [scoring](https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter)=None: None的時候即使用estimator的預設score
- n_jobs=None: n_jobs=-1 會使用全部 cpu 平行運算, 如果設置-1或>1的話，在數據量過大的情況下就可能會因為記憶體不足error
- iid=’warn’: 默認為各個樣本fold概率分布一致，誤差估計為所有樣本之和，而非各個fold的平均。
- refit=True: 設置為true的話，會在最後取得最佳參數之後再以該參數做fit一次全部的資料。
- cv=’warn’: 交叉驗證，default 3-fold cross validation。將資料分n份, 預設情況下是以KFold來處理的。除非你是多分類才會以StratifiedKFold。
- verbose=0: 設置為0即訓練過程不會顯示，為1的話偶爾顯示，大於1的時候對每個子模型都輸出。
- pre_dispatch=‘2*n_jobs’:指定總共分發的並行任務數。當n_jobs大於1時，數據將在每個運行點進行復制，這可能導致OOM(out of memory)
- error_score=’raise-deprecating’: Value to assign to the score if an error occurs in estimator fitting.
- return_train_score=False:如果設置False就不會回傳結果!即屬性cv_results_ 就不會有分數的值了。
#### Attributes
- cv_results_: 這邊記錄著你的所有參數的狀況，是個dict(字典)形態。可以直接丟給pandas
- grid.fit( train_x, train_y )：運行網格搜索
- grid_score_：給出不同參數情況下的評價結果
- best_params_：取得最最佳結果的參數組合
- best_estimator_: 最佳分類器，如果refit是False就沒有效果!
- best_score_：最佳分類器的平均驗證得分
- best_index_: 在cv_results_中的最佳參數的索引值
- n_splits_: cv的保存!
#### Methods
- decision_function(X): Call decision_function on the estimator with the best found params. 只有在refit=True跟分類器可以實作的時候才有效果。以最佳參數執行!
- fit(X[, y, groups]): 適合、訓練模型(開始用設置的參數開始暴力測試)
- get_params([deep]): 取得模型參數
- inverse_transform(Xt): Call inverse_transform on the estimator with the best found params. 只有在refit=True跟分類器可以實作的時候才有效果。以最佳參數執行!各演算法的inverse_transform意義不同，可參閱SKlearn的api說明!
- predict(X)Call predict on the estimator with the best found parameters. 用最佳參數來做預測，特別貼出來說明是取最佳參數來做預測!
- predict_log_proba(X): 回傳類別概率對數(機率、或然率)…一樣是以最佳參數來執行
- predict_proba(X): 回傳類別概率(機率、或然率)…一樣是以最佳參數來執行
- score(X[, y]):看你的上面參數SCROING設置來回傳，上面是NONE就以選擇的分類器的SCORE為預設
- set_params(**params): 設置模型參數
- transform(X): Call transform on the estimator with the best found parameters.以最佳參數來做轉換，但一樣要分類器可以實作!

In [5]:
# 設定要訓練的超參數組合
random_state=[1]
loss = ['ls', 'lad', 'huber', 'quantile']
n_estimators = [50, 100, 125, 150]
max_depth = [1, 3, 5]
param_grid = dict(random_state=random_state, 
                  loss=loss, 
                  n_estimators=n_estimators,
                  max_depth=max_depth)

## 建立搜尋物件，放入模型及參數組合字典 (n_jobs=-1 會使用全部 cpu 平行運算)
grid_search = GridSearchCV(reg, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)

# 開始搜尋最佳參數
grid_result = grid_search.fit(x_train, y_train)

# 預設會跑 3-fold cross-validadtio

Fitting 3 folds for each of 48 candidates, totalling 144 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    5.2s
[Parallel(n_jobs=-1)]: Done 144 out of 144 | elapsed:   16.4s finished


In [6]:
# 印出最佳結果與最佳參數
print("Best Accuracy: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Best Accuracy: -8.732962 using {'loss': 'ls', 'max_depth': 3, 'n_estimators': 150, 'random_state': 1}


In [7]:
# 使用最佳參數重新建立模型
reg_bestparam = GradientBoostingRegressor(random_state=grid_result.best_params_['random_state'],
                                          loss=grid_result.best_params_['loss'],
                                          max_depth=grid_result.best_params_['max_depth'],
                                          n_estimators=grid_result.best_params_['n_estimators'])

# 訓練模型
reg_bestparam.fit(x_train, y_train)

# 預測測試集
y_pred = reg_bestparam.predict(x_test)
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 8.00
