## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

## Use DecisionTreeRegressor at BOSTON

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split

In [2]:
# 讀取Boston資料集
boston = datasets.load_boston()

# 將資料切分成訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size= 0.2 , random_state= 1)

# 載入隨機森林模型，使用50顆樹，每顆樹的最大深度為10
dtr = RandomForestRegressor(n_estimators=50,  max_depth=10, random_state = 2)

# 將訓練集丟進去模型運練
dtr.fit(x_train, y_train)

# 以測試集x_test丟入模型預測目標值
y_pred = dtr.predict(x_test)

In [3]:
# 因為是回歸問題，印出mean_squared_error
# 調整模型參數發現，max_depth越大 MSE越小（但超過10改變幅度就很小）
# 調整模型參數發現，n_estimators越大 MSE越小（但超過50顆樹沒有比較好）
print('MSE = ', metrics.mean_squared_error(y_test, y_pred))

MSE =  7.778141975190692


In [4]:
# 用決策樹做一次，比較結果
# 果然隨機森林的MSE比較小，結果更好
boston = datasets.load_boston()
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size= 0.2 , random_state= 1)
clf = DecisionTreeRegressor(random_state = 2)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
print('MSE = ', metrics.mean_squared_error(y_test, y_pred))

MSE =  33.320392156862745


## Use RandomForestClassifier at WINE

In [5]:
# 讀取wine資料集
wine = datasets.load_wine()

# 將資料切分成訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size= 0.2 , random_state= 1)

# 載入隨機森林模型，使用10顆樹，每顆樹的最大深度為5
rfc = RandomForestClassifier(n_estimators=10,  max_depth=5, random_state = 2)

# 將訓練集丟進去模型運練
rfc.fit(x_train, y_train)

# 以測試集x_test丟入模型預測目標值
y_pred = rfc.predict(x_test)

In [6]:
# 因為是分類問題，用accuracy_score來確認預測的精準度
# 若設為1顆樹，最大深度5，其實就等於決策樹，Accuracy一樣
# 而改為多顆樹，結果自然就比較好
print('Accuracy = ', metrics.accuracy_score(y_test,y_pred))

Accuracy =  0.9722222222222222


In [7]:
# 用決策樹做一次，比較結果
# 結果準確度低於隨機森林，果然團結力量大，多顆樹力量大於一顆樹
wine = datasets.load_wine()
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size= 0.2 , random_state= 1)
clf = DecisionTreeRegressor(random_state = 2)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
print('Accuracy = ', metrics.accuracy_score(y_test,y_pred))

Accuracy =  0.9444444444444444
