## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split

In [3]:
iris = datasets.load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)
clf = DecisionTreeClassifier()
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)

In [4]:
acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy: ', acc)

Accuracy:  0.9736842105263158


In [5]:
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [6]:
print('Feature importance: ', clf.feature_importances_)

Feature importance:  [0.         0.01796599 0.05992368 0.92211033]


In [12]:
CRITERION = ['gini', 'entropy']
MIN_SAMPLES_SPLIT = [10, 50, 100]
MIN_SAMPLES_LEAF = [5, 30, 50]

for c in CRITERION:
    for split in MIN_SAMPLES_SPLIT:
        for leaf in MIN_SAMPLES_LEAF:
            print(f'{c}, {split}, {leaf}:')
            clf_temp = DecisionTreeClassifier(criterion=c, min_samples_split=split, min_samples_leaf=leaf)
            clf_temp.fit(x_train, y_train)
            print(metrics.accuracy_score(y_test, clf_temp.predict(x_test)))

gini, 10, 5:
0.9736842105263158
gini, 10, 30:
0.9736842105263158
gini, 10, 50:
0.7894736842105263
gini, 50, 5:
0.9736842105263158
gini, 50, 30:
0.9736842105263158
gini, 50, 50:
0.7894736842105263
gini, 100, 5:
0.6842105263157895
gini, 100, 30:
0.6842105263157895
gini, 100, 50:
0.7894736842105263
entropy, 10, 5:
0.9736842105263158
entropy, 10, 30:
0.9736842105263158
entropy, 10, 50:
0.7894736842105263
entropy, 50, 5:
0.9736842105263158
entropy, 50, 30:
0.9736842105263158
entropy, 50, 50:
0.7894736842105263
entropy, 100, 5:
0.6842105263157895
entropy, 100, 30:
0.6842105263157895
entropy, 100, 50:
0.7894736842105263


### Boston

In [13]:
boston = datasets.load_boston()

In [18]:
boston.target[:5]

array([24. , 21.6, 34.7, 33.4, 36.2])

In [44]:
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)
for s in [5, 10, 15]:
    for l in [5, 10, 15]:
        for d in [5, 10, 15]:
            reg = DecisionTreeRegressor(min_samples_split=s, min_samples_leaf=l, max_depth=d)
            reg.fit(x_train, y_train)
            y_pred = reg.predict(x_test)
            r2 = metrics.r2_score(y_test, y_pred)
            print(s, l, d, r2)

5 5 5 0.7614813863531334
5 5 10 0.782654772687425
5 5 15 0.7829784720689286
5 10 5 0.7464474103913383
5 10 10 0.7552993074439026
5 10 15 0.7552993074439025
5 15 5 0.7482729081445549
5 15 10 0.751358316594237
5 15 15 0.751358316594237
10 5 5 0.7614813863531336
10 5 10 0.782654772687425
10 5 15 0.7816950282616779
10 10 5 0.7464474103913383
10 10 10 0.7552993074439025
10 10 15 0.7552993074439025
10 15 5 0.7482729081445549
10 15 10 0.751358316594237
10 15 15 0.751358316594237
15 5 5 0.7680838565369525
15 5 10 0.7847672076650746
15 5 15 0.7825011674839009
15 10 5 0.7464474103913382
15 10 10 0.7552993074439026
15 10 15 0.7552993074439025
15 15 5 0.7482729081445549
15 15 10 0.751358316594237
15 15 15 0.751358316594237


In [60]:
from sklearn.linear_model import Lasso, Ridge
lasso = Lasso(alpha=0.1)
lasso.fit(x_train, y_train)
print(metrics.r2_score(y_test, lasso.predict(x_test)))

ridge = Ridge(alpha=0.1)
ridge.fit(x_train, y_train)
print(metrics.r2_score(y_test, ridge.predict(x_test)))

0.7166991429313736
0.7306867958261182


### Wine

In [14]:
wine = datasets.load_wine()

In [46]:
wine.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2])

In [62]:
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.25, random_state=4)
clf = DecisionTreeClassifier()
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
acc = metrics.accuracy_score(y_test, y_pred)
print(acc)

0.9111111111111111


In [65]:
lasso = Lasso(alpha=0.1)
lasso.fit(x_train, y_train)
print(metrics.r2_score(y_test, lasso.predict(x_test)))

ridge = Ridge(alpha=0.1)
ridge.fit(x_train, y_train)
print(metrics.r2_score(y_test, ridge.predict(x_test)))

from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(x_train, y_train)
print(metrics.r2_score(y_test, logistic.predict(x_test)))

0.8457754086493792
0.9021616990024355
0.9


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
