## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
# 如果是分類問題，請使用 DecisionTreeClassifier，若為回歸問題，請使用 DecisionTreeRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

### RandomForestClassifier

In [2]:
# load breast cancer wisconsin dataset
breast_cancer = datasets.load_breast_cancer()
print(breast_cancer.data.shape)
print(breast_cancer.feature_names)

(569, 30)
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


In [3]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.1, random_state=42)
# 建立模型
clf = RandomForestClassifier(random_state = 42)
# 訓練模型
clf.fit(x_train, y_train)
# 預測測試集
y_pred = clf.predict(x_test)

# 準確率
acc = accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.9649122807017544




In [4]:
# 特徵重要性
print(breast_cancer.feature_names)
print("Feature importance: ", clf.feature_importances_)

['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Feature importance:  [0.06116707 0.01201478 0.08393627 0.01358041 0.00658298 0.00412596
 0.02394689 0.03881919 0.00426022 0.00388559 0.00819336 0.00276693
 0.00404347 0.05941063 0.00617311 0.00342939 0.01418106 0.01515916
 0.0035336  0.00262853 0.08229006 0.01496994 0.08487118 0.09118309
 0.00256669 0.05475003 0.07604696 0.20032745 0.01568315 0.00547286]


### RandomForestRegressor

In [5]:
# load boston house-prices dataset
boston = datasets.load_boston()
print(boston.data.shape)
print(boston.feature_names)

(506, 13)
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']


In [6]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.1, random_state=42)
# 建立一個線性回歸模型
regr = RandomForestRegressor(random_state = 42)
# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)
# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)

# 預測值與實際值的差距，使用 MSE
print(f'Mean squared error: {mean_squared_error(y_test, y_pred): .2f}')
# 預測值與實際值的差距，使用 r2 score
print(f'r2_score: {r2_score(y_test, y_pred): .2f}')

Mean squared error:  4.58
r2_score:  0.93




In [7]:
# 特徵重要性
print(boston.feature_names)
print("Feature importance: ", regr.feature_importances_)

['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance:  [0.04642013 0.00100506 0.00479196 0.00253557 0.01904088 0.52174224
 0.00998703 0.06047079 0.00291067 0.01685502 0.01288787 0.01000818
 0.2913446 ]


### ANS1:

In [8]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.1, random_state=42)
# 建立模型
clf_ch = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', min_samples_split = 3, min_samples_leaf = 2, random_state = 42)
# 訓練模型
clf_ch.fit(x_train, y_train)
# 預測測試集
y_pred_ch = clf_ch.predict(x_test)

# 準確率
acc_ch = accuracy_score(y_test, y_pred_ch)
print("Accuracy: ", acc_ch)

Accuracy:  0.9649122807017544


### 似乎沒有改善

### ANS2:
與[Day_038_HW.ipynb](https://github.com/Lance0218/ML100-Days/blob/master/homework/Day_038_HW.ipynb)、[Day_042_HW.ipynb](https://github.com/Lance0218/ML100-Days/blob/master/homework/Day_042_HW.ipynb)比較:  

分類:  
LR　**0.9825**  
DT　0.9298  
RF　0.9649

回歸:  
　　MSE　r2_score  
LR　15.00　0.76  
DT　10.93　0.82  
RF　**4.58　0.93**　