In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [3]:
dataset = load_wine()
dir(dataset)

['DESCR', 'data', 'feature_names', 'frame', 'target', 'target_names']

In [4]:
print(dataset.feature_names)
print(dataset.target_names)

['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
['class_0' 'class_1' 'class_2']


In [5]:
print(dataset.DESCR)

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0

와인의 종류가 무엇인지 식별하는 문제.
총 레코드 수는 178개이다. 
feature는 와인의 화학적 분석 결과로 알코올, 말릭산, 재, 알칼리도 등 화학물질과 색조와 강도 등 13개 특징이다.
label인 와인의 종류는 class_01, class_02, class_03으로 3가지이다.

In [6]:
features = dataset.data
label = dataset.target

In [7]:
X_train, X_test, y_train, y_test = train_test_split(features,
                                                    label,
                                                    test_size=0.2,
                                                    random_state=777)

In [8]:
# Decision Tree를 학습시키고 와인의 종류 예측
from sklearn.tree import DecisionTreeClassifier
decision_tree = DecisionTreeClassifier(random_state=2022)
decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)
print(accuracy_score(y_test, y_pred))

0.9722222222222222


In [17]:
# Random forest를 학습시키고 와인의 종류
from sklearn.ensemble import RandomForestClassifier
decision_tree = RandomForestClassifier(random_state=111)
decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)
print(accuracy_score(y_test, y_pred))

1.0


In [18]:
from sklearn import svm
svm_model = svm.SVC(random_state=33)
svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)
print(accuracy_score(y_test, y_pred))

0.6944444444444444


In [15]:
from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier(random_state=31)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
print(accuracy_score(y_test, y_pred))

0.75


In [16]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(random_state=2221, max_iter=5000)
model.fit(X_train, y_train)
pred = model.predict(X_test)
print(accuracy_score(y_test, pred))

0.9444444444444444


wine dataset에서는 3개의 클래스 사이에서 분류를 정확하게 정하는것이 중요하다. 특정 클래스를 맞게 맞추거나(precision), 특정 클래스에서 놓치는 것 없이(recall) 분류하는게 중요하지는 않으므로 전반적인 맞는 예측을 가리키는 accuracy를 사용해 평가하는 것이 적합하다. RandomForest가 1.0으로 가장 정확한 예측을 나타냈다.