## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

# Project 1 Data：Iiris 

In [34]:
# 載入套件
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score

In [35]:
# 載入資料
iris = pd.read_csv('C:/Users/hsu/Desktop/data/iris.csv')

In [36]:
# 資料切分
X = iris.drop('Species', axis = 1)
y = iris['Species']

In [37]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 123)

In [38]:
# 建模
svm = SVC()
svm.fit(X_train, y_train)
irispred = svm.predict(X_test)
print(accuracy_score(irispred, y_test))

0.9666666666666667




In [39]:
# gridsearch設定
pattern = ['linear', 'poly', 'rbf', 'sigmoid']
svcp = dict(kernel = pattern)
grid_svm = GridSearchCV(svm, param_grid=svcp, scoring='accuracy')
grid_result = grid_svm.fit(X_train, y_train)



In [40]:
# gridsearch相關參數
print(grid_result.best_score_)
print(grid_result.best_params_)

0.9833333333333333
{'kernel': 'linear'}


In [41]:
# 重新建模
svm = SVC(kernel=grid_result.best_params_['kernel'])
svm.fit(X_train, y_train)
svmpred = svm.predict(X_test)
print(accuracy_score(svmpred, y_test))

0.9666666666666667


# Project 2 Data：bike 

In [62]:
# 載入套件
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error

In [54]:
# 載入資料
bike = pd.read_csv('C:/Users/hsu/Desktop/data/train.csv')

In [55]:
# 資料調整
bike = bike.drop('datetime', axis = 1)
df = pd.get_dummies(data=bike, columns=['season', 'holiday', 'workingday','weather'])

In [58]:
# 特徵暨目標擷取
X = df.drop('count',axis = 1)
y = df['count']

In [60]:
# 資料切分
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 123)

In [61]:
# 建模
rf = RandomForestRegressor()
rf.fit(X_train, y_train)
bikepred = rf.predict(X_test)
print(mean_squared_error(bikepred, y_test))



10.642382920110196


In [64]:
# 模型參數設定
ntree = [50, 100, 150]
maxDepth = [1,3,5,7]
pattern = dict(n_estimators = ntree, 
               max_depth = maxDepth)

In [68]:
# gridsearch設定
clf = RandomForestRegressor()
gridset = GridSearchCV(clf, pattern, scoring='neg_mean_squared_error')
gridresult = gridset.fit(X_train, y_train)

In [71]:
# gridsearch 相關參數
print(gridresult.best_score_)
print(gridresult.best_params_)

-45.52586733563594
{'max_depth': 7, 'n_estimators': 100}


In [72]:
# 重新建模
rfg = RandomForestRegressor(n_estimators=gridresult.best_params_['n_estimators'],
                            max_depth=gridresult.best_params_['max_depth'])
rfg.fit(X_train, y_train)
rfgpred = rfg.predict(X_test)
print(mean_squared_error(rfgpred, y_test))

36.66078773076003
