# Iris 분류

- test 비율은 0%, random_state는 2021
- MinMaxScaler 적용
- cv=5
- 결정트리 - depth, min_sample_split
- 최적의 파라메터와 정확도 계산

-----------------

### 전처리

In [3]:
from sklearn.datasets import load_iris
iris = load_iris()

In [7]:
from sklearn.preprocessing import MinMaxScaler
scalar = MinMaxScaler()
iris_scaled = scalar.fit_transform(iris.data)

### train/test 

In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    iris_scaled, iris.target, stratify=iris.target,
    test_size=0.2, random_state=2021
)

In [9]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((120, 4), (30, 4), (120,), (30,))

- 학습후 최적 모델 도출

In [11]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

In [19]:
params = {
    'max_depth': [2,3,4], 
    'min_samples_split': [2,3]
}

In [20]:
dtc = DecisionTreeClassifier(random_state=2021)

In [21]:
grid_dtc = GridSearchCV(dtc, param_grid=params,
                        cv=5)

In [22]:
grid_dtc.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=DecisionTreeClassifier(random_state=2021),
             param_grid={'max_depth': [2, 3, 4], 'min_samples_split': [2, 3]})

In [24]:
grid_dtc.best_score_

0.9666666666666668

In [23]:
grid_dtc.best_params_

{'max_depth': 4, 'min_samples_split': 2}

In [25]:
estimator = grid_dtc.best_estimator_

- 모델 평가

In [27]:
from sklearn.metrics import accuracy_score
pred = estimator.predict(X_test)
accuracy_score(y_test, pred)

0.9

### One-hot encoding을 하는 경우

In [29]:
iris.target.shape

(150,)

In [30]:
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
classes = encoder.fit_transform(iris.target.reshape(-1,1))
classes.shape

(150, 3)

In [31]:
X_train, X_test, Y_train, Y_test = train_test_split(
    iris_scaled, classes, stratify=classes.toarray(),
    test_size=0.2, random_state=2021
)

In [32]:
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

((120, 4), (30, 4), (120, 3), (30, 3))

In [33]:
Y_test[:5].toarray()

array([[0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.]])

In [34]:
dtc = DecisionTreeClassifier(random_state=2021)
grid_dtc = GridSearchCV(dtc, param_grid=params, cv=5, scoring='accuracy')

In [35]:
grid_dtc.fit(X_train,Y_train.toarray())

GridSearchCV(cv=5, estimator=DecisionTreeClassifier(random_state=2021),
             param_grid={'max_depth': [2, 3, 4], 'min_samples_split': [2, 3]},
             scoring='accuracy')

In [36]:
estimator = grid_dtc.best_estimator_
pred = grid_dtc.predict(X_test)

In [37]:
pred.shape

(30, 3)

In [38]:
accuracy_score(Y_test.toarray(), pred)

0.9333333333333333