## load_digits : 손글씨를 분류해 봅시다

1. 필요한 모듈 import 하기




In [1]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

2. 데이터 준비

In [2]:
digits = load_digits()

3. 데이터 이해하기

In [3]:
# Feature Data 지정하기
digits_data = digits.data

#Label Data 지정하기
digits_label = digits.target

#Target Names 출력해 보기
print(digits.target_names)

# 데이터 Describe 해 보기
print(digits.DESCR)

[0 1 2 3 4 5 6 7 8 9]
.. _digits_dataset:

Optical recognition of handwritten digits dataset
--------------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 1797
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels a

*   target_names 은 0~9까지 숫자 임을 확인. 
*  DESCR를 통해 총데이터가 1797개 임을 알 수 있음.

4. train, test 데이터 분리

In [4]:
# test 데이터 셋의 크기는 전체 데이터셋의 20%
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits_data, 
                                                    digits_label, 
                                                    test_size=0.2, 
                                                    random_state=7)
print("X_train : ",X_train.shape)
print("X_test : ",X_test.shape)

X_train :  (1437, 64)
X_test :  (360, 64)


train 데이타와 test 데이터를 (8:2) 비율로 나눠 train 데이터 1437개 test 데이터 360개로 분리됨.

5. 다양한 모델로 학습시켜보기


>1) Decision Tree 사용해 보기
* decision tree(의사결정트리)는 순차적으로 질문을 던져서 답을 고르게 하는 방식

In [5]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

decision_tree = DecisionTreeClassifier(random_state=32)
decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      0.98      0.99        43
           1       0.81      0.81      0.81        42
           2       0.79      0.82      0.80        40
           3       0.79      0.91      0.85        34
           4       0.83      0.95      0.89        37
           5       0.90      0.96      0.93        28
           6       0.84      0.93      0.88        28
           7       0.96      0.82      0.89        33
           8       0.88      0.65      0.75        43
           9       0.78      0.78      0.78        32

    accuracy                           0.86       360
   macro avg       0.86      0.86      0.86       360
weighted avg       0.86      0.86      0.85       360



In [6]:
from sklearn.metrics import accuracy_score  #정확도 확인

decision_tree_accuracy = accuracy_score(y_test, y_pred)
decision_tree_accuracy

0.8555555555555555


> 2) Random Forest 사용해 보기
*   과대적합(overfitting) 을 방지하기 위해, 최적의 기준 변수를 랜덤하게 선택하는 breiman(2001)이 제안한 머신러닝 기법
* Random Forest는 여러 개의 Decision tree (의사결정나무)를 만들고, 숲을 이룬다는 의미에서 Forest라 한다

In [7]:
from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier(random_state=32)
random_forest.fit(X_train, y_train)
y_pred = random_forest.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      0.98      0.99        43
           1       0.93      1.00      0.97        42
           2       1.00      1.00      1.00        40
           3       1.00      1.00      1.00        34
           4       0.93      1.00      0.96        37
           5       0.90      0.96      0.93        28
           6       1.00      0.96      0.98        28
           7       0.94      0.97      0.96        33
           8       1.00      0.84      0.91        43
           9       0.94      0.94      0.94        32

    accuracy                           0.96       360
   macro avg       0.96      0.96      0.96       360
weighted avg       0.97      0.96      0.96       360



In [8]:
from sklearn.metrics import accuracy_score  #정확도 확인

random_forest_accuracy = accuracy_score(y_test, y_pred)
random_forest_accuracy

0.9638888888888889


> 3) SVM 사용해 보기
*   서포트 벡터 머신(이하 SVM)은 결정 경계(Decision Boundary), 즉 분류를 위한 기준 선을 정의하는 모델
* 그래서 분류되지 않은 새로운 점이 나타나면 경계의 어느 쪽에 속하는지 확인해서 분류 과제를 수행할 수 있게 된다.

In [9]:
from sklearn import svm

svm_model = svm.SVC()
svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        43
           1       0.95      1.00      0.98        42
           2       1.00      1.00      1.00        40
           3       1.00      1.00      1.00        34
           4       1.00      1.00      1.00        37
           5       0.93      1.00      0.97        28
           6       1.00      1.00      1.00        28
           7       1.00      1.00      1.00        33
           8       1.00      0.93      0.96        43
           9       1.00      0.97      0.98        32

    accuracy                           0.99       360
   macro avg       0.99      0.99      0.99       360
weighted avg       0.99      0.99      0.99       360



In [10]:
from sklearn.metrics import accuracy_score  #정확도 확인

svm_accuracy = accuracy_score(y_test, y_pred)
svm_accuracy

0.9888888888888889


> 4) SGD Classifier 사용해 보기
*  확률적 경사 하강법 (Stochastic Gradient Descent : SGD) : 신경망 학습의 가장 기본적인 방식.
* 오차역전파법을 이용하여, 갱신시킬 파라미터들의 현 기울기 값인 그래디언트를 구해내고, 그것에 일정한 학습률을 곱하고, 기존 파라미터에 적용시키는 방식으로 갱신시킵니다.

In [11]:
from sklearn.linear_model import SGDClassifier

sgd_model = SGDClassifier()
sgd_model.fit(X_train, y_train)
y_pred = sgd_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        43
           1       0.84      0.90      0.87        42
           2       0.98      1.00      0.99        40
           3       0.82      0.97      0.89        34
           4       0.97      1.00      0.99        37
           5       0.97      1.00      0.98        28
           6       1.00      0.93      0.96        28
           7       0.94      0.97      0.96        33
           8       0.97      0.77      0.86        43
           9       0.97      0.91      0.94        32

    accuracy                           0.94       360
   macro avg       0.95      0.94      0.94       360
weighted avg       0.95      0.94      0.94       360



In [12]:
from sklearn.metrics import accuracy_score  #정확도 확인

sgd_accuracy = accuracy_score(y_test, y_pred)
sgd_accuracy

0.9416666666666667

> 5) Logistic Regression 사용해 보기
*   로지스틱 회귀(Logistic Regression)는 회귀를 사용하여 데이터가 어떤 범주에 속할 확률을 0에서 1 사이의 값으로 예측하고 그 확률에 따라 가능성이 더 높은 범주에 속하는 것으로 분류해주는 지도 학습 알고리즘이다.

In [13]:
from sklearn.linear_model import LogisticRegression

logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
y_pred = logistic_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        43
           1       0.95      0.95      0.95        42
           2       0.98      1.00      0.99        40
           3       0.94      0.97      0.96        34
           4       0.97      1.00      0.99        37
           5       0.82      0.96      0.89        28
           6       1.00      0.96      0.98        28
           7       0.97      0.97      0.97        33
           8       0.92      0.81      0.86        43
           9       0.97      0.91      0.94        32

    accuracy                           0.95       360
   macro avg       0.95      0.95      0.95       360
weighted avg       0.95      0.95      0.95       360



STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [14]:
from sklearn.metrics import accuracy_score  #정확도 확인

logistic_accuracy = accuracy_score(y_test, y_pred)
logistic_accuracy

0.9527777777777777

6. 모델을 평가해 보기

In [15]:
# 정확도(accuracy) 비교
print("decision_tree_accuracy : ",decision_tree_accuracy)
print("random_forest_accuracy : ",random_forest_accuracy)
print("svm_accuracy : ",svm_accuracy)
print("sgd_accuracy : ",sgd_accuracy)
print("logistic_accuracy : ",logistic_accuracy)

decision_tree_accuracy :  0.8555555555555555
random_forest_accuracy :  0.9638888888888889
svm_accuracy :  0.9888888888888889
sgd_accuracy :  0.9416666666666667
logistic_accuracy :  0.9527777777777777


SVM의 정확도가 가장 높은것을 알 수 있다.

## load_wine : 와인을 분류해 봅시다

(1) 필요한 모듈 import 하기




In [16]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

(2) 데이터 준비

In [17]:
data = load_wine()
import pandas as pd
pd.DataFrame(data.data, columns=data.feature_names)

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.20,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.50,16.8,113.0,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740.0
174,13.40,3.91,2.48,23.0,102.0,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750.0
175,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835.0
176,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840.0


(3) 데이터 확인하기

In [18]:
# Feature Data 지정하기
wine_data = data.data

#Label Data 지정하기
wine_label = data.target

#Target Names 출력해 보기
print(data.target_names)

['class_0' 'class_1' 'class_2']


(4) train, test 데이터 분리

In [19]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(wine_data, 
                                                    wine_label, 
                                                    test_size=0.2, 
                                                    random_state=7)

(5) 다양한 모델로 학습시켜보기


> Decision Tree 사용해 보기


In [20]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

decision_tree = DecisionTreeClassifier(random_state=32)
decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       0.89      1.00      0.94        17
           2       1.00      0.83      0.91        12

    accuracy                           0.94        36
   macro avg       0.96      0.94      0.95        36
weighted avg       0.95      0.94      0.94        36



In [21]:
from sklearn.metrics import accuracy_score  #정확도 확인

decision_tree_accuracy = accuracy_score(y_test, y_pred)
decision_tree_accuracy

0.9444444444444444


> Random Forest 사용해 보기


In [22]:
from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier(random_state=32)
random_forest.fit(X_train, y_train)
y_pred = random_forest.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       1.00      1.00      1.00        17
           2       1.00      1.00      1.00        12

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36



In [23]:
from sklearn.metrics import accuracy_score  #정확도 확인

random_forest_accuracy = accuracy_score(y_test, y_pred)
random_forest_accuracy

1.0


> SVM 사용해 보기


In [24]:
from sklearn import svm

svm_model = svm.SVC()
svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.86      0.86      0.86         7
           1       0.58      0.88      0.70        17
           2       0.33      0.08      0.13        12

    accuracy                           0.61        36
   macro avg       0.59      0.61      0.56        36
weighted avg       0.55      0.61      0.54        36



In [25]:
from sklearn.metrics import accuracy_score  #정확도 확인

svm_accuracy = accuracy_score(y_test, y_pred)
svm_accuracy

0.6111111111111112


> SGD Classifier 사용해 보기

In [26]:
from sklearn.linear_model import SGDClassifier

sgd_model = SGDClassifier()
sgd_model.fit(X_train, y_train)
y_pred = sgd_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.88      1.00      0.93         7
           1       1.00      0.18      0.30        17
           2       0.48      1.00      0.65        12

    accuracy                           0.61        36
   macro avg       0.79      0.73      0.63        36
weighted avg       0.80      0.61      0.54        36



In [27]:
from sklearn.metrics import accuracy_score  #정확도 확인

sgd_accuracy = accuracy_score(y_test, y_pred)
sgd_accuracy

0.6111111111111112

> Logistic Regression 사용해 보기

In [28]:
from sklearn.linear_model import LogisticRegression

logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
y_pred = logistic_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      0.86      0.92         7
           1       0.94      1.00      0.97        17
           2       1.00      1.00      1.00        12

    accuracy                           0.97        36
   macro avg       0.98      0.95      0.96        36
weighted avg       0.97      0.97      0.97        36



STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [29]:
from sklearn.metrics import accuracy_score  #정확도 확인

logistic_accuracy = accuracy_score(y_test, y_pred)
logistic_accuracy

0.9722222222222222

(6) 모델을 평가해 보기

In [30]:
print("decision_tree_accuracy : ",decision_tree_accuracy)
print("random_forest_accuracy : ",random_forest_accuracy)
print("svm_accuracy : ",svm_accuracy)
print("sgd_accuracy : ",sgd_accuracy)
print("logistic_accuracy : ",logistic_accuracy)

decision_tree_accuracy :  0.9444444444444444
random_forest_accuracy :  1.0
svm_accuracy :  0.6111111111111112
sgd_accuracy :  0.6111111111111112
logistic_accuracy :  0.9722222222222222


 random_forest 모델 사용시엔 정확도가 1.0(100%)가 제일 높은 것으로 나오고<br>
 svm 모델과 sgd 모델은 상당히 낮은 정확도를 보인다.

## load_breast_cancer : 유방암 여부를 진단해 봅시다

(1) 필요한 모듈 import 하기




In [31]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

(2) 데이터 준비

In [32]:
data = load_breast_cancer()

import pandas as pd
pd.DataFrame(data.data, columns=data.feature_names)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,0.2419,0.07871,1.0950,0.9053,8.589,153.40,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.380,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890
1,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.01860,0.01340,0.01389,0.003532,24.990,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902
2,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,0.2069,0.05999,0.7456,0.7869,4.585,94.03,0.006150,0.04006,0.03832,0.02058,0.02250,0.004571,23.570,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,0.2597,0.09744,0.4956,1.1560,3.445,27.23,0.009110,0.07458,0.05661,0.01867,0.05963,0.009208,14.910,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300
4,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.011490,0.02461,0.05688,0.01885,0.01756,0.005115,22.540,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,0.1726,0.05623,1.1760,1.2560,7.673,158.70,0.010300,0.02891,0.05198,0.02454,0.01114,0.004239,25.450,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115
565,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,0.1752,0.05533,0.7655,2.4630,5.203,99.04,0.005769,0.02423,0.03950,0.01678,0.01898,0.002498,23.690,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637
566,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,0.1590,0.05648,0.4564,1.0750,3.425,48.55,0.005903,0.03731,0.04730,0.01557,0.01318,0.003892,18.980,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820
567,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,0.2397,0.07016,0.7260,1.5950,5.772,86.22,0.006522,0.06158,0.07117,0.01664,0.02324,0.006185,25.740,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400


(3) 데이터 확인하기

In [33]:
# Feature Data 지정하기
breast_cancer_data = data.data

#Label Data 지정하기
breast_cancer_label = data.target

#Target Names 출력해 보기
print(data.target_names)

['malignant' 'benign']


(4) train, test 데이터 분리

In [34]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(breast_cancer_data, 
                                                    breast_cancer_label, 
                                                    test_size=0.2, 
                                                    random_state=7)

(5) 다양한 모델로 학습시켜보기


> Decision Tree 사용해 보기


In [35]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

decision_tree = DecisionTreeClassifier(random_state=32)
decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.92      0.82      0.87        40
           1       0.91      0.96      0.93        74

    accuracy                           0.91       114
   macro avg       0.91      0.89      0.90       114
weighted avg       0.91      0.91      0.91       114



In [36]:
from sklearn.metrics import accuracy_score  #정확도 확인

decision_tree_accuracy = accuracy_score(y_test, y_pred)
decision_tree_accuracy

0.9122807017543859


> Random Forest 사용해 보기


In [37]:
from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier(random_state=32)
random_forest.fit(X_train, y_train)
y_pred = random_forest.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        40
           1       1.00      1.00      1.00        74

    accuracy                           1.00       114
   macro avg       1.00      1.00      1.00       114
weighted avg       1.00      1.00      1.00       114



In [38]:
from sklearn.metrics import accuracy_score  #정확도 확인

random_forest_accuracy = accuracy_score(y_test, y_pred)
random_forest_accuracy

1.0


> SVM 사용해 보기


In [39]:
from sklearn import svm

svm_model = svm.SVC()
svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      0.72      0.84        40
           1       0.87      1.00      0.93        74

    accuracy                           0.90       114
   macro avg       0.94      0.86      0.89       114
weighted avg       0.92      0.90      0.90       114



In [40]:
from sklearn.metrics import accuracy_score  #정확도 확인

svm_accuracy = accuracy_score(y_test, y_pred)
svm_accuracy

0.9035087719298246


> SGD Classifier 사용해 보기

In [41]:
from sklearn.linear_model import SGDClassifier

sgd_model = SGDClassifier()
sgd_model.fit(X_train, y_train)
y_pred = sgd_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      0.65      0.79        40
           1       0.84      1.00      0.91        74

    accuracy                           0.88       114
   macro avg       0.92      0.82      0.85       114
weighted avg       0.90      0.88      0.87       114



In [42]:
from sklearn.metrics import accuracy_score  #정확도 확인

sgd_accuracy = accuracy_score(y_test, y_pred)
sgd_accuracy

0.8771929824561403

> Logistic Regression 사용해 보기

In [43]:
from sklearn.linear_model import LogisticRegression

logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
y_pred = logistic_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      0.82      0.90        40
           1       0.91      1.00      0.95        74

    accuracy                           0.94       114
   macro avg       0.96      0.91      0.93       114
weighted avg       0.94      0.94      0.94       114



STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [44]:
from sklearn.metrics import accuracy_score  #정확도 확인

logistic_accuracy = accuracy_score(y_test, y_pred)
logistic_accuracy

0.9385964912280702

(6) 모델을 평가해 보기

In [45]:
print("decision_tree_accuracy : ",decision_tree_accuracy)
print("random_forest_accuracy : ",random_forest_accuracy)
print("svm_accuracy : ",svm_accuracy)
print("sgd_accuracy : ",sgd_accuracy)
print("logistic_accuracy : ",logistic_accuracy)

decision_tree_accuracy :  0.9122807017543859
random_forest_accuracy :  1.0
svm_accuracy :  0.9035087719298246
sgd_accuracy :  0.8771929824561403
logistic_accuracy :  0.9385964912280702


random_forest 모델 사용시엔 정확도가 1.0(100%)가 제일 높은 것으로 나온다.<br>
그리고 breast_cancer 경우 Recall의 지표가 매우 중요한 평가지표로 작용한다. FN 즉 양성을 음성으로 인식하는 비율이 높으면 Recall의 지표가 낮아지므로 Recall의 지표가 높은것을 선택해야 하는데 이역시 random_forest 모델이 가장 높다.

## 후기


> wine 과 breast_cancer 모두 random_forest 모델 사용시엔 정확도가 1.0(100%)가 나오는 이유와 wine 데이터의 svm 모델과 sgd 모델이 상당히 낮은 정확도를 보이는 이유를 잘 모르겠음.<br>
데이터 특성별로 어떤 모델을 사용해야 하는지 명확하게 정리되지 않음