## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split

### wine

In [2]:
wine = datasets.load_wine()
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size = 0.25, random_state = 42)
clf = DecisionTreeClassifier(criterion = 'gini', max_depth = None, min_samples_split = 3, min_samples_leaf = 2)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)

In [3]:
acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy: %.2f' % acc)

Accuracy: 0.98


In [5]:
print('Feature:', wine.feature_names)

Feature: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']


In [7]:
print('Feature importance:', clf.feature_importances_)

Feature importance: [0.03080041 0.         0.         0.         0.00866261 0.
 0.4206973  0.         0.         0.40797545 0.         0.
 0.13186424]


In [8]:
clf = DecisionTreeClassifier(criterion = 'entropy', max_depth = None, min_samples_split = 3, min_samples_leaf = 2)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)

In [9]:
acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy: %.2f' % acc)

Accuracy: 0.87


In [10]:
print('Feature importance:', clf.feature_importances_)

Feature importance: [0.31841754 0.         0.00938614 0.         0.         0.
 0.         0.         0.02268516 0.02429553 0.0709232  0.4577226
 0.09656984]


### boston

In [11]:
boston = datasets.load_boston()
boston.target

array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
       18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
       15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
       13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
       21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
       35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
       19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
       20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
       23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
       33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
       21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
       20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
       23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
       15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21

In [13]:
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size = 0.25, random_state = 42)
rg = DecisionTreeRegressor(criterion = 'mse', max_depth = None, min_samples_split = 3, min_samples_leaf = 2)
rg.fit(x_train, y_train)
y_pred = rg.predict(x_test)

In [14]:
mse = metrics.mean_squared_error(y_test, y_pred)
print('Mean squared error: %2.f' % mse)

Mean squared error: 20


In [15]:
print('Feature importance:', rg.feature_importances_)

Feature importance: [3.61093563e-02 1.72740091e-03 2.49572988e-03 0.00000000e+00
 9.78203687e-03 6.02997204e-01 1.61734257e-02 7.19610959e-02
 4.27908652e-04 4.08545213e-03 2.79250264e-02 1.32981523e-02
 2.13017211e-01]


In [16]:
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size = 0.25, random_state = 42)
rg = DecisionTreeRegressor(criterion = 'mae', max_depth = None, min_samples_split = 3, min_samples_leaf = 2)
rg.fit(x_train, y_train)
y_pred = rg.predict(x_test)

In [17]:
mse = metrics.mean_squared_error(y_test, y_pred)
print('Mean squared error: %2.f' % mse)

Mean squared error: 16


In [18]:
print('Feature importance:', rg.feature_importances_)

Feature importance: [0.10835316 0.00709188 0.01050316 0.         0.02154495 0.44979577
 0.03649176 0.06319853 0.01175995 0.01714619 0.02244266 0.03375376
 0.21791822]
