## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import train_test_split, GridSearchCV

house = datasets.fetch_california_housing()
data = pd.DataFrame(house['data'], columns=house['feature_names'])
target = pd.DataFrame(house['target'], columns=['Target'])
data = pd.concat([data, target], axis=1)
del target
data.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,Target
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


#### RandomForestClassifier

In [2]:
#預設參數
x_train, x_test, y_train, y_test = train_test_split(data.iloc[:, :-1], data['Target'], test_size=0.25, random_state=42)
RFmodel = RandomForestRegressor()
RFmodel.fit(x_train, y_train)
y_pred = RFmodel.predict(x_test)

print("The basic MSE of RandomForest :",mean_squared_error(y_test, y_pred))



The basic MSE of RandomForest : 0.2793864001967313


In [3]:
#GridSearch
param_grid = {'n_estimators':[10, 100, 300],
             'max_depth':[None, 1, 3, 5]}
grid_search = GridSearchCV(RFmodel, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
grid_result = grid_search.fit(x_train, y_train)

print('Best Accuracy : {}\nUsing {}'.format(grid_result.best_score_, grid_result.best_params_))

Fitting 3 folds for each of 12 candidates, totalling 36 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:   25.2s finished


Best Accuracy : -0.27022147836501104
Using {'max_depth': None, 'n_estimators': 300}


In [4]:
#調整參數
RFmodel_best = RandomForestRegressor(max_depth=grid_result.best_params_['max_depth'],
                                    n_estimators=grid_result.best_params_['n_estimators'])
RFmodel_best.fit(x_train, y_train)
y_pred = RFmodel_best.predict(x_test)

print('The accuracied MSE of RandomForest :',mean_squared_error(y_test, y_pred))

The accuracied MSE of RandomForest : 0.25230911381861987


#### GradientBoostingRegressior

In [5]:
#預設參數
GBmodel = GradientBoostingRegressor()
GBmodel.fit(x_train, y_train)
y_pred = GBmodel.predict(x_test)

print('The basic MSE of GradientBoostingRegressior :',mean_squared_error(y_test, y_pred))

The basic MSE of GradientBoostingRegressior : 0.28949041365583217


In [6]:
#GridSearch
param_grid = {'n_estimators':[100, 200, 300],
             'max_depth':[1, 3, 5]}
grid_search = GridSearchCV(GBmodel, param_grid, scoring='neg_mean_squared_error', n_jobs=-1, verbose=1)
grid_result = grid_search.fit(x_train, y_train)

print('Best Accuracy : {}\nUsing : {}'.format(grid_result.best_score_, grid_result.best_params_))

Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    8.8s finished


Best Accuracy : -0.2241876055366772
Using : {'max_depth': 5, 'n_estimators': 300}


In [7]:
#調整參數
GBmodel_best = GradientBoostingRegressor(max_depth=grid_result.best_params_['max_depth'],
                                        n_estimators=grid_result.best_params_['n_estimators'])
GBmodel_best.fit(x_train, y_train)
y_pred = GBmodel.predict(x_test)

print('The accuracy MSE of GradientBoostingRegressior :',mean_squared_error(y_test, y_pred))

The accuracy MSE of GradientBoostingRegressior : 0.28949041365583217
