# [E_02]lris classification

## 목차
**1. [라이브러리 import](#1.-라이브러리-import)**  
**2. [손글씨 분류하기](#2.-손글씨-분류하기)**  
**&#160;&#160; 2-1. [데이터 살펴보기](#2-1.-데이터-살펴보기)**  
**&#160;&#160; 2-2. [데이터 전처리](#2-2.-데이터-전처리)**  
**&#160;&#160; 2-3. [학습하기 & 테스트하기](#2-3.-학습하기-&-테스트하기)**  
**3. [와인 분류하기](#3.-와인-분류하기)**  
**&#160;&#160; 3-1. [데이터 살펴보기](#3-1.-데이터-살펴보기)**  
**&#160;&#160; 3-2. [데이터 전처리](#3-2.-데이터-전처리)**  
**&#160;&#160; 3-3. [학습하기 & 테스트하기](#3-3.-학습하기-&-테스트하기)**  
**4. [유방암 데이터 분류하기](#4.-유방암-데이터-분류하기)**  
**&#160;&#160; 4-1. [데이터 살펴보기](#4-1.-데이터-살펴보기)**  
**&#160;&#160; 4-2. [데이터 전처리](#4-2.-데이터-전처리)**  
**&#160;&#160; 4-3. [학습하기 & 테스트하기](#4-3.-학습하기-&-테스트하기)**  


---
### 루브릭 평가기준
|**평가문항**|**상세기준**|
|:---|:---|
|1. 3가지 데이터셋의 구성이 합리적으로 진행되었는가?|feature와 label 선정을 위한 데이터 분석과정이 체계적으로 전개됨|
|2. 3가지 데이터셋에 대해 각각 5가지 모델을 성공적으로 적용하였는가?|모델학습 및 테스트가 정상적으로 수행되었음|
|3. 3가지 데이터셋에 대해 모델의 평가지표가 적절히 선택되었는가?|평가지표 선택 및 이유 설명이 타당함|


## 1. 라이브러리 import

In [1]:
import sklearn
from sklearn.datasets import load_digits
from sklearn.datasets import load_wine
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

print(sklearn.__version__)

1.0


## 2. 손글씨 분류하기
### 2-1. 데이터 살펴보기

In [2]:
digits = load_digits()

print(f'digits객체의 변수와 메서드: {dir(digits)}')
print(f'digits객체의 키: {digits.keys()}\n')

digits_data = digits.data
print(f'분류시 학습하는 특징: {digits.feature_names}')
print(f'분류시 학습하는 특징의 개수: {len(digits.feature_names)}')
print(f'digits_data의 shape: {digits_data.shape}')
print(f'digits_data의 0번째 데이터 ((64,) -> (8,8)로 reshape): \n{digits_data[0].reshape((8,8))}\n')

digits_label = digits.target
print(f'분류할 정답(target): {digits.target_names}')
print(f'digits_label의 shape: {digits_label.shape}')
print(f'digits_label의 0번째 데이터: {digits_label[0:1]}\n')
print('\n**************************************************\n')

# digits에 대한 설명
# print(digits.DESCR)

digits객체의 변수와 메서드: ['DESCR', 'data', 'feature_names', 'frame', 'images', 'target', 'target_names']
digits객체의 키: dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR'])

분류시 학습하는 특징: ['pixel_0_0', 'pixel_0_1', 'pixel_0_2', 'pixel_0_3', 'pixel_0_4', 'pixel_0_5', 'pixel_0_6', 'pixel_0_7', 'pixel_1_0', 'pixel_1_1', 'pixel_1_2', 'pixel_1_3', 'pixel_1_4', 'pixel_1_5', 'pixel_1_6', 'pixel_1_7', 'pixel_2_0', 'pixel_2_1', 'pixel_2_2', 'pixel_2_3', 'pixel_2_4', 'pixel_2_5', 'pixel_2_6', 'pixel_2_7', 'pixel_3_0', 'pixel_3_1', 'pixel_3_2', 'pixel_3_3', 'pixel_3_4', 'pixel_3_5', 'pixel_3_6', 'pixel_3_7', 'pixel_4_0', 'pixel_4_1', 'pixel_4_2', 'pixel_4_3', 'pixel_4_4', 'pixel_4_5', 'pixel_4_6', 'pixel_4_7', 'pixel_5_0', 'pixel_5_1', 'pixel_5_2', 'pixel_5_3', 'pixel_5_4', 'pixel_5_5', 'pixel_5_6', 'pixel_5_7', 'pixel_6_0', 'pixel_6_1', 'pixel_6_2', 'pixel_6_3', 'pixel_6_4', 'pixel_6_5', 'pixel_6_6', 'pixel_6_7', 'pixel_7_0', 'pixel_7_1', 'pixel_7_2', 'pixel_7_3', 'p

### 2-2. 데이터 전처리

In [3]:
X_train, X_test, y_train, y_test = train_test_split(digits_data, 
                                                    digits_label, 
                                                    test_size=0.2, 
                                                    random_state=7)

print(f'X_train 개수: {len(X_train)}, X_test 개수: {len(X_test)}')

X_train 개수: 1437, X_test 개수: 360


### 2-3. 학습하기 & 테스트하기

In [4]:
print(f'학습 데이터셋 - X_train의 모양: {X_train.shape}, y_train의 모양: {y_train.shape}')
print(f'테스트 데이터셋 - X_test의 모양: {X_test.shape}, y_test의 모양: {y_test.shape}')

학습 데이터셋 - X_train의 모양: (1437, 64), y_train의 모양: (1437,)
테스트 데이터셋 - X_test의 모양: (360, 64), y_test의 모양: (360,)


### 의사결정나무

In [5]:
from sklearn.metrics import accuracy_score

from sklearn.tree import DecisionTreeClassifier

decision_tree = DecisionTreeClassifier(random_state=32)

decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)

print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')

              precision    recall  f1-score   support

           0       1.00      0.98      0.99        43
           1       0.81      0.81      0.81        42
           2       0.79      0.82      0.80        40
           3       0.79      0.91      0.85        34
           4       0.83      0.95      0.89        37
           5       0.90      0.96      0.93        28
           6       0.84      0.93      0.88        28
           7       0.96      0.82      0.89        33
           8       0.88      0.65      0.75        43
           9       0.78      0.78      0.78        32

    accuracy                           0.86       360
   macro avg       0.86      0.86      0.86       360
weighted avg       0.86      0.86      0.85       360

모델 정확도: 85.55555555555556%


### 랜덤 포레스트(Random Forest)

In [6]:
from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier(random_state=32)

random_forest.fit(X_train, y_train)
y_pred = random_forest.predict(X_test)

print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')

              precision    recall  f1-score   support

           0       1.00      0.98      0.99        43
           1       0.93      1.00      0.97        42
           2       1.00      1.00      1.00        40
           3       1.00      1.00      1.00        34
           4       0.93      1.00      0.96        37
           5       0.90      0.96      0.93        28
           6       1.00      0.96      0.98        28
           7       0.94      0.97      0.96        33
           8       1.00      0.84      0.91        43
           9       0.94      0.94      0.94        32

    accuracy                           0.96       360
   macro avg       0.96      0.96      0.96       360
weighted avg       0.97      0.96      0.96       360

모델 정확도: 96.38888888888889%


### Support Vector Machine (SVM)

In [7]:
from sklearn import svm
svm_model = svm.SVC()

svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        43
           1       0.95      1.00      0.98        42
           2       1.00      1.00      1.00        40
           3       1.00      1.00      1.00        34
           4       1.00      1.00      1.00        37
           5       0.93      1.00      0.97        28
           6       1.00      1.00      1.00        28
           7       1.00      1.00      1.00        33
           8       1.00      0.93      0.96        43
           9       1.00      0.97      0.98        32

    accuracy                           0.99       360
   macro avg       0.99      0.99      0.99       360
weighted avg       0.99      0.99      0.99       360

모델 정확도: 98.88888888888889%


### SGD (Stochastic Gradient Descent)

In [8]:
from sklearn.linear_model import SGDClassifier
sgd_model = SGDClassifier()

sgd_model.fit(X_train, y_train)
y_pred = sgd_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        43
           1       0.97      0.81      0.88        42
           2       0.98      1.00      0.99        40
           3       0.97      0.88      0.92        34
           4       1.00      0.97      0.99        37
           5       0.68      1.00      0.81        28
           6       0.96      0.93      0.95        28
           7       0.97      0.97      0.97        33
           8       0.90      0.88      0.89        43
           9       0.90      0.88      0.89        32

    accuracy                           0.93       360
   macro avg       0.93      0.93      0.93       360
weighted avg       0.94      0.93      0.93       360

모델 정확도: 93.05555555555556%


### Logistic Regression

In [9]:
from sklearn.linear_model import LogisticRegression
logistic_model = LogisticRegression()

logistic_model.fit(X_train, y_train)
y_pred = logistic_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        43
           1       0.95      0.95      0.95        42
           2       0.98      1.00      0.99        40
           3       0.94      0.97      0.96        34
           4       0.97      1.00      0.99        37
           5       0.82      0.96      0.89        28
           6       1.00      0.96      0.98        28
           7       0.97      0.97      0.97        33
           8       0.92      0.81      0.86        43
           9       0.97      0.91      0.94        32

    accuracy                           0.95       360
   macro avg       0.95      0.95      0.95       360
weighted avg       0.95      0.95      0.95       360

모델 정확도: 95.27777777777777%


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### 종합 평가
Accuracy: 종합적인 정확도 평가  
Precision: 양성예측 클래스(TP, FP)중 실제 양성클래스의 비율(정밀도)
Recall: 실제양성 클래스(TP, FN)중 양성클래스라고 예측한것의 비율(재현율)  

### <center>digits 모델에 따른 평가점수</center>

||**Accuracy**|**Precision**|**Recall**|
|:---|:---|:---|:---|
|**decision_tree(의사결정나무)**|86|86|86
|**랜덤 포레스트(Random Forest)**|96|96|96|
|**Support Vector Machine (SVM)**|99|99|99|
|**SGD (Stochastic Gradient Descent)**|94|94|94
|**Logistic Regression**|95|95|95|

<br>

80점 미만으로 나온 의사결정 나무를 제외하면 모두 적합해보인다.

## 3. 와인 분류하기

### 3-1. 데이터 살펴보기

In [10]:
wine = load_wine()

print(f'wine객체의 변수와 메서드: {dir(wine)}')
print(f'wine객체의 키: {wine.keys()}\n')

wine_data = wine.data
print(f'분류시 학습하는 특징: {wine.feature_names}')
print(f'분류시 학습하는 특징의 개수: {len(wine.feature_names)}')
print(f'wine_data의 shape: {wine_data.shape}')
print(f'wine_data의 0번째 데이터: \n{wine_data[0]}\n')

wine_label = wine.target
print(f'분류할 정답(target): {wine.target_names}')
print(f'digits_label의 shape: {wine_label.shape}')
print(f'digits_label의 0~10번째 데이터: {wine_label[0:10]}\n')
print('\n**************************************************\n')

# wine에 대한 설명
# print(wine.DESCR)

wine객체의 변수와 메서드: ['DESCR', 'data', 'feature_names', 'frame', 'target', 'target_names']
wine객체의 키: dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names'])

분류시 학습하는 특징: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
분류시 학습하는 특징의 개수: 13
wine_data의 shape: (178, 13)
wine_data의 0번째 데이터: 
[1.423e+01 1.710e+00 2.430e+00 1.560e+01 1.270e+02 2.800e+00 3.060e+00
 2.800e-01 2.290e+00 5.640e+00 1.040e+00 3.920e+00 1.065e+03]

분류할 정답(target): ['class_0' 'class_1' 'class_2']
digits_label의 shape: (178,)
digits_label의 0~10번째 데이터: [0 0 0 0 0 0 0 0 0 0]


**************************************************



 ### 3-2. 데이터 전처리

In [11]:
X_train, X_test, y_train, y_test = train_test_split(wine_data, 
                                                    wine_label, 
                                                    test_size=0.2, 
                                                    random_state=7)

print(f'X_train 개수: {len(X_train)}, X_test 개수: {len(X_test)}')
print(f'학습 데이터셋 - X_train의 모양: {X_train.shape}, y_train의 모양: {y_train.shape}')
print(f'테스트 데이터셋 - X_test의 모양: {X_test.shape}, y_test의 모양: {y_test.shape}')

X_train 개수: 142, X_test 개수: 36
학습 데이터셋 - X_train의 모양: (142, 13), y_train의 모양: (142,)
테스트 데이터셋 - X_test의 모양: (36, 13), y_test의 모양: (36,)


### 3-3. 학습하기 & 테스트하기 

In [12]:
# decision_tree(의사결정나무)
from sklearn.tree import DecisionTreeClassifier

decision_tree = DecisionTreeClassifier(random_state=32)

decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)

print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')


# 랜덤 포레스트(Random Forest)
from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier(random_state=32)

random_forest.fit(X_train, y_train)
y_pred = random_forest.predict(X_test)

print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')


# Support Vector Machine (SVM)
from sklearn import svm
svm_model = svm.SVC()

svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')


# SGD (Stochastic Gradient Descent)
from sklearn.linear_model import SGDClassifier
sgd_model = SGDClassifier()

sgd_model.fit(X_train, y_train)
y_pred = sgd_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')


# Logistic Regression
from sklearn.linear_model import LogisticRegression
logistic_model = LogisticRegression()

logistic_model.fit(X_train, y_train)
y_pred = logistic_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       0.89      1.00      0.94        17
           2       1.00      0.83      0.91        12

    accuracy                           0.94        36
   macro avg       0.96      0.94      0.95        36
weighted avg       0.95      0.94      0.94        36

모델 정확도: 94.44444444444444%
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       1.00      1.00      1.00        17
           2       1.00      1.00      1.00        12

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36

모델 정확도: 100.0%
              precision    recall  f1-score   support

           0       0.86      0.86      0.86         7
           1       0.58      0.88      0.70        17
           2       0.33      0.08

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### <center>wine 모델에 따른 평가점수</center>

||**Accuracy**|**Precision**|**Recall**|
|:---|:---|:---|:---|
|**decision_tree(의사결정나무)**|94|96|94|
|**랜덤 포레스트(Random Forest)**|100|100||100
|**Support Vector Machine (SVM)**|61|59|61|
|**SGD (Stochastic Gradient Descent)**|61|52|57|
|**Logistic Regression**|97|98|95|

<br>

점수가 50~60사이인 SVM, SGD를 제외하면 나머지 세 개 모델은 적합해보인다.

## 4. 유방암 데이터 분류하기
### 4-1. 데이터 살펴보기

In [13]:
breast_cancer = load_breast_cancer()

print(f'wine객체의 변수와 메서드: {dir(breast_cancer)}')
print(f'wine객체의 키: {breast_cancer.keys()}\n')

breast_cancer_data = breast_cancer.data
print(f'분류시 학습하는 특징: {breast_cancer.feature_names}')
print(f'분류시 학습하는 특징의 개수: {len(breast_cancer.feature_names)}')
print(f'wine_data의 shape: {breast_cancer_data.shape}')
print(f'wine_data의 0번째 데이터: \n{breast_cancer_data[0]}\n')

breast_cancer_label = breast_cancer.target
print(f'분류할 정답(target): {breast_cancer.target_names}')
print(f'digits_label의 shape: {breast_cancer_label.shape}')
print(f'digits_label의 0~10번째 데이터: {breast_cancer_label[0:10]}\n')
print('\n**************************************************\n')

# breast_cancer에 대한 설명
# print(breast_cancer.DESCR)

wine객체의 변수와 메서드: ['DESCR', 'data', 'data_module', 'feature_names', 'filename', 'frame', 'target', 'target_names']
wine객체의 키: dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

분류시 학습하는 특징: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
분류시 학습하는 특징의 개수: 30
wine_data의 shape: (569, 30)
wine_data의 0번째 데이터: 
[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+

### 4-2. 데이터 전처리

In [14]:
X_train, X_test, y_train, y_test = train_test_split(breast_cancer_data, 
                                                    breast_cancer_label, 
                                                    test_size=0.2, 
                                                    random_state=7)

print(f'X_train 개수: {len(X_train)}, X_test 개수: {len(X_test)}')
print(f'학습 데이터셋 - X_train의 모양: {X_train.shape}, y_train의 모양: {y_train.shape}')
print(f'테스트 데이터셋 - X_test의 모양: {X_test.shape}, y_test의 모양: {y_test.shape}')

X_train 개수: 455, X_test 개수: 114
학습 데이터셋 - X_train의 모양: (455, 30), y_train의 모양: (455,)
테스트 데이터셋 - X_test의 모양: (114, 30), y_test의 모양: (114,)


### 4-3. 학습하기 & 테스트하기

In [15]:
# decision_tree(의사결정나무)
from sklearn.tree import DecisionTreeClassifier

decision_tree = DecisionTreeClassifier(random_state=32)

decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)

print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')


# 랜덤 포레스트(Random Forest)
from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier(random_state=32)

random_forest.fit(X_train, y_train)
y_pred = random_forest.predict(X_test)

print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')


# Support Vector Machine (SVM)
from sklearn import svm
svm_model = svm.SVC()

svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')


# SGD (Stochastic Gradient Descent)
from sklearn.linear_model import SGDClassifier
sgd_model = SGDClassifier()

sgd_model.fit(X_train, y_train)
y_pred = sgd_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')


# Logistic Regression
from sklearn.linear_model import LogisticRegression
logistic_model = LogisticRegression()

logistic_model.fit(X_train, y_train)
y_pred = logistic_model.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy = accuracy_score(y_test, y_pred)
print(f'모델 정확도: {accuracy*100}%')

              precision    recall  f1-score   support

           0       0.92      0.82      0.87        40
           1       0.91      0.96      0.93        74

    accuracy                           0.91       114
   macro avg       0.91      0.89      0.90       114
weighted avg       0.91      0.91      0.91       114

모델 정확도: 91.22807017543859%
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        40
           1       1.00      1.00      1.00        74

    accuracy                           1.00       114
   macro avg       1.00      1.00      1.00       114
weighted avg       1.00      1.00      1.00       114

모델 정확도: 100.0%
              precision    recall  f1-score   support

           0       1.00      0.72      0.84        40
           1       0.87      1.00      0.93        74

    accuracy                           0.90       114
   macro avg       0.94      0.86      0.89       114
weighted avg       0.92      0.9

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### <center>breast_cancer 모델에 따른 평가점수</center>

||**Accuracy**|**Precision**|**Recall**|
|:---|:---|:---|:---|
|**decision_tree(의사결정나무)**|91|91|98|
|**랜덤 포레스트(Random Forest)**|100|100|100|
|**Support Vector Machine (SVM)**|90|94|86|
|**SGD (Stochastic Gradient Descent)**|90|91|88|
|**Logistic Regression**|95|96|93|

<br>

대부분 나쁘지않은 결과(80점 이상)으로 나왔지만 그중 점수가 가장 높은 랜덤 포레스트와 Logistic Regression이 가정 적합해 보인다.