## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [17]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import datasets, tree, metrics
from sklearn.linear_model import LogisticRegression,LinearRegression
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV, cross_val_score
import warnings
warnings.filterwarnings('ignore')

In [15]:
boston = datasets.load_boston()
ss_scaler = StandardScaler()
boston_data_ss = ss_scaler.fit_transform(boston.data) 
x_train_ss, x_test_ss, y_train_ss, y_test_ss = train_test_split(boston_data_ss, boston.target, random_state=4, test_size=0.25)

In [18]:
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=4, test_size=0.25)

In [48]:
#LogisticRegression
LR_reg = tree.DecisionTreeRegressor()
result_LR_reg = cross_val_score(LR_reg,x_train_ss, y_train_ss, scoring='r2', cv=5, verbose=1, n_jobs=-1)
print('Training_r2:{} +/- {} with 95% CI:\n{}'.format(np.mean(result_LR_reg), np.std(result_LR_reg)*2, result_LR_reg))
LR_reg.fit(x_train_ss, y_train_ss)
y_pred_LR = LR_reg.predict(x_test_ss)
print('Tree_param', LR_reg.
print('Test_r2:{}'.format(metrics.r2_score(y_test_ss, y_pred_LR)))

Training_r2:0.7149332412365919 +/- 0.09534062173846354 with 95% CI:
[0.77716859 0.69559834 0.73925719 0.72749463 0.63514745]
Tree_param {'criterion': 'mse', 'max_depth': None, 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'presort': False, 'random_state': None, 'splitter': 'best'}
Test_r2:0.736403568674191


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:    0.0s finished


In [46]:
#GridSearch
param_grid = {'max_depth':[i for i in range(1, 11)], 'min_samples_split':[i for i in range(2, 6)]}
tree_cv_reg = GridSearchCV(LR_reg, param_grid, scoring='r2',iid=False, cv=5, return_train_score=True, verbose=1)
tree_cv_reg.fit(x_train, y_train)
print(tree_cv_reg.best_score_, tree_cv_reg.best_params_)
tree_reg = tree.DecisionTreeRegressor(max_depth=tree_cv_reg.best_params_['max_depth'],
                                      min_samples_split=tree_cv_reg.best_params_['min_samples_split'])
tree_reg.fit(x_train, y_train)
y_pred_tree = tree_reg.predict(x_test)
print('test r2:', metrics.r2_score(y_test, y_pred_tree))

Fitting 5 folds for each of 40 candidates, totalling 200 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


0.780316712978648 {'max_depth': 4, 'min_samples_split': 4}
test r2: 0.7236428696031709


[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    0.5s finished


In [47]:
#GradientBoostingClassifier
GB_reg = GradientBoostingRegressor()
result_GB_reg = cross_val_score(GB_reg,x_train, y_train, scoring='r2', cv=5, verbose=1, n_jobs=-1)
print('Training_r2:{} +/- {} with 95% CI:\n{}'.format(np.mean(result_GB_reg), np.std(result_GB_reg)*2, result_GB_reg))
GB_reg.fit(x_train, y_train)
y_pred_GB = GB_reg.predict(x_test)
print('Test_r2:{}'.format(metrics.r2_score(y_test, y_pred_GB)))

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Training_r2:0.8511010259628163 +/- 0.08601722729424201 with 95% CI:
[0.85567748 0.86519743 0.92059682 0.81951731 0.79451609]
Test_r2:0.8826427743313583


[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:    1.9s finished


In [26]:
#RandomForestClassifier
RF_reg = RandomForestRegressor()
result_RF_reg = cross_val_score(RF_reg,x_train, y_train, scoring='r2', cv=5, verbose=1, n_jobs=-1)
print('Training_r2:{} +/- {} with 95% CI:\n{}'.format(np.mean(result_RF_reg), np.std(result_RF_reg)*2, result_RF_reg))
RF_reg.fit(x_train, y_train)
y_pred_RF = RF_reg.predict(x_test)
print('Test_r2:{}'.format(metrics.r2_score(y_test, y_pred_RF)))

Training_r2:0.831752889236343 +/- 0.0783136948645489 with 95% CI:
[0.84380527 0.84093437 0.87756376 0.83754083 0.75892022]
Test_r2:0.8380448062547814


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:    0.0s finished
