## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

In [5]:
# read iris dataset
iris = datasets.load_iris()

# cut data into train and test subset
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target,
                                                   test_size=0.25, random_state=4)

# build model
clf = RandomForestClassifier(n_estimators=30, max_depth=6)

# train model
clf.fit(x_train, y_train)

# predict with model
y_pred = clf.predict(x_test)

In [6]:
acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy: ', acc)

Accuracy:  0.9473684210526315


In [7]:
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [8]:
print('Feature importance: ', clf.feature_importances_)

Feature importance:  [0.11933546 0.04010559 0.41890634 0.42165261]


#### 改變模型參數設定影響結果, acc_score 下降了, 另外, sepal width (cm) 的重要性上升。

## 改用 boston dataset 試試

In [10]:
from sklearn.ensemble import RandomForestRegressor

In [12]:
# read boston data
boston = datasets.load_boston()

# cut data into train and test subset
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target,
                                                    test_size=0.25, random_state=4)

# build model
clf = RandomForestRegressor(n_estimators=20, max_depth=4)

# train model
clf.fit(x_train, y_train)

# predict with model
y_pred = clf.predict(x_test)

In [15]:
mae = metrics.mean_absolute_error(y_pred, y_test)
mse = metrics.mean_squared_error(y_pred, y_test)
r2 = metrics.r2_score(y_pred, y_test)
print('MAE: ', mae)
print('MSE: ', mse)
print('R-square: ', r2)

MAE:  2.7543256440624297
MSE:  15.863412504090885
R-square:  0.8027156766654855


In [17]:
print(boston.feature_names)

['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']


In [18]:
print("Feature importance: ", clf.feature_importances_)

Feature importance:  [4.97020698e-02 8.58911836e-06 2.07779296e-03 6.76497777e-05
 1.96351349e-02 4.80821553e-01 4.14814728e-03 5.13663809e-02
 5.10034621e-03 3.96634664e-03 2.45482849e-02 4.11939963e-03
 3.54438305e-01]
