## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [99]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeRegressor,DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import tree

### 1:回歸模型

In [100]:
# 讀取乳癌資料集
breast = datasets.load_breast_cancer()
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(breast.data, breast.target, test_size=0.25, random_state=4)

# 建立模型 (使用 20 顆樹，每棵樹的最大深度為 4)
# clf = RandomForestClassifier(n_estimators=20, max_depth=4)
clf1 = RandomForestClassifier(n_estimators=20, max_depth=4)
# 訓練模型
clf1.fit(x_train, y_train)

# 預測測試集
y_pred = clf1.predict(x_test)

In [101]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)
print()
print("Feature_names: ",breast.feature_names)
print()
print("Feature importance: ", clf1.feature_importances_)

Acuuracy:  0.9230769230769231

Feature_names:  ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']

Feature importance:  [0.04430746 0.00501213 0.10832328 0.146433   0.00098037 0.00621102
 0.04181459 0.01987926 0.00099688 0.00346292 0.01579157 0.00214538
 0.00614885 0.00479081 0.00414574 0.00326158 0.00321969 0.00214744
 0.00323715 0.0012525  0.20558122 0.00710826 0.12702122 0.06369325
 0.00867965 0.00482621 0.0217203  0.12232096 0.00601467 0.00947262]


In [102]:
### 2:DecisionTrees

In [103]:
# 建立模型
clf2 = DecisionTreeClassifier()
# 訓練模型
clf2.fit(x_train, y_train)
# 預測測試集
y_pred = clf2.predict(x_test)

In [104]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)
print()
print("Feature_names: ",breast.feature_names)
print()
print("Feature importance: ", clf2.feature_importances_)

Acuuracy:  0.8811188811188811

Feature_names:  ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']

Feature importance:  [0.         0.         0.         0.         0.         0.
 0.         0.         0.00660957 0.         0.00793148 0.
 0.00191154 0.01243987 0.         0.         0.         0.
 0.00970241 0.         0.         0.04163662 0.75416545 0.06715844
 0.05147346 0.         0.0092534  0.03771776 0.         0.        ]


### 3:RandomForest

In [105]:
# 建立模型 (使用 20 顆樹，每棵樹的最大深度為 4)
clf3 = RandomForestClassifier(n_estimators=20, max_depth=4)

# 訓練模型
clf3.fit(x_train, y_train)

# 預測測試集
y_pred = clf3.predict(x_test)

In [106]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.9440559440559441


In [107]:
print(breast.feature_names)
print()
print("Feature importance: ", clf3.feature_importances_)

['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']

Feature importance:  [0.03236825 0.02394826 0.03627399 0.05895644 0.0008488  0.01828513
 0.04963515 0.1190815  0.00203901 0.00843916 0.00564041 0.00463316
 0.00800855 0.01400204 0.00276574 0.0004048  0.00089513 0.00133968
 0.00542207 0.00074417 0.17326292 0.01472105 0.22540041 0.09071942
 0.0124593  0.01033257 0.03772692 0.02681151 0.01157727 0.00325719]


### Ans1
    case1:顆樹為default值不變,每棵樹的最大深度在15則精確度最大;以此樹深度為準變大或變小時,精確度都相對下降.
    case2:每棵樹的最大深度在15,顆樹較default值為小時,精確度都相對上升.
      


### Ans2
    RandomForest的精確度較回歸模型與決策樹為佳