## Summary of D42 Decision Tree

建立模型四步驟

在 Scikit-learn 中，建立一個機器學習的模型其實非常簡單，流程大略是以下四個步驟

    1. 讀進資料，並檢查資料的 shape (有多少 samples (rows), 多少 features (columns)，label 的型態是什麼？)
        使用 pandas 讀取 .csv 檔：pd.read_csv
        使用 numpy 讀取 .txt 檔：np.loadtxt
        使用 Scikit-learn 內建的資料集：sklearn.datasets.load_xxx
        檢查資料數量：data.shape (data should be np.array or dataframe)
    2. 將資料切為訓練 (train) / 測試 (test)
        train_test_split(data)
    3. 建立模型，將資料 fit 進模型開始訓練
        clf = DecisionTreeClassifier()
        clf.fit(x_train, y_train)
    4. 將測試資料 (features) 放進訓練好的模型中，得到 prediction，與測試資料的 label (y_test) 做評估
        clf.predict(x_test)
        accuracy_score(y_test, y_pred)
        f1_score(y_test, y_pred)

### [graphviz](https://medium.com/@rnbrown/creating-and-visualizing-decision-trees-with-python-f8e8fa394176)

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

In [16]:
from sklearn import datasets, metrics, linear_model
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

In [8]:
iris = datasets.load_iris()

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

clf = DecisionTreeClassifier(criterion='entropy', max_depth=2, max_features=2)

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)

Acuuracy:  0.868421052631579


In [21]:
boston = datasets.load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=5)
dtr = DecisionTreeRegressor(criterion='mae', max_depth=3)
dtr.fit(X_train, y_train)
y_pred = dtr.predict(X_test)
r_score = r2_score(y_test, y_pred)
print(f'R2 of Decision Tree: {r_score}')
lasso = linear_model.Lasso(0.1)
lasso.fit(X_train, y_train)
y_pred = lasso.predict(X_test)
r_score = r2_score(y_test, y_pred)
print(f'R2 of Lasso: {r_score}')

R2 of Decision Tree: 0.7220880845500938
R2 of Lasso: 0.7006377089925998
