# 와인 분류기  
  
  
  
- sklearn의 wine데이터를 로드한다.
- 모델을 만들어서 학습을 한다.
- 학습된 모델을 평가 한다.

## 1. Import module

In [23]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
# 사용 모델들 로드
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from sklearn.linear_model import SGDClassifier
from sklearn.linear_model import LogisticRegression

## 2. Load dataset

In [24]:
wine = load_wine()
wine_data = wine.data
wine_label = wine.target
#데이터들의 키 값 출력
print(f'와인데이터 키 : {wine.keys()}')
#label 값 (0~20)까지
print(f'와인 데이터 label 형태 : {wine_label[:20]}')
#데이터 값 모양
print(f'와인 데이터 값의 모양 : {wine_data.shape}')
#데이터 feature name
print(f'와인 데이터 feature name : {wine.feature_names}')
#데이터 요약
print(wine.DESCR)


와인데이터 키 : dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names'])
와인 데이터 label 형태 : [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
와인 데이터 값의 모양 : (178, 13)
와인 데이터 feature name : ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summar

## 3. Split dataset

In [25]:
X_train, X_test, y_train, y_test = train_test_split(wine_data, 
                                                    wine_label, 
                                                    test_size=0.2, 
                                                    random_state=7)

## 4-1 . Load model & Fit (DecisionTree)

In [26]:
decision_tree = DecisionTreeClassifier(random_state=7)
decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)

## 4-2. Evaluate

In [27]:
print(classification_report(y_test, y_pred))
confusion_matrix(y_test, y_pred)

              precision    recall  f1-score   support

           0       0.88      1.00      0.93         7
           1       0.89      0.94      0.91        17
           2       1.00      0.83      0.91        12

    accuracy                           0.92        36
   macro avg       0.92      0.92      0.92        36
weighted avg       0.92      0.92      0.92        36



array([[ 7,  0,  0],
       [ 1, 16,  0],
       [ 0,  2, 10]], dtype=int64)

## 5-1 . Load model & Fit (Random Forest)

In [28]:
random_forest = RandomForestClassifier(random_state=32)
random_forest.fit(X_train, y_train)
y_pred = random_forest.predict(X_test)

## 5-2. Evaluate

In [29]:
print(classification_report(y_test, y_pred))
confusion_matrix(y_test, y_pred)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       1.00      1.00      1.00        17
           2       1.00      1.00      1.00        12

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36



array([[ 7,  0,  0],
       [ 0, 17,  0],
       [ 0,  0, 12]], dtype=int64)

## 6-1 . Load model & Fit (SVM)

In [30]:
svm_model = svm.SVC()
svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)

## 6-2. Evaluate

In [31]:
print(classification_report(y_test, y_pred))
confusion_matrix(y_test, y_pred)

              precision    recall  f1-score   support

           0       0.86      0.86      0.86         7
           1       0.58      0.88      0.70        17
           2       0.33      0.08      0.13        12

    accuracy                           0.61        36
   macro avg       0.59      0.61      0.56        36
weighted avg       0.55      0.61      0.54        36



array([[ 6,  0,  1],
       [ 1, 15,  1],
       [ 0, 11,  1]], dtype=int64)

## 7-1 . Load model & Fit (SGD)

In [32]:
sgd_model = SGDClassifier()
sgd_model.fit(X_train, y_train)
y_pred = sgd_model.predict(X_test)

## 7-2. Evaluate

In [33]:
print(classification_report(y_test, y_pred))
confusion_matrix(y_test, y_pred)

              precision    recall  f1-score   support

           0       0.47      1.00      0.64         7
           1       0.71      0.88      0.79        17
           2       0.00      0.00      0.00        12

    accuracy                           0.61        36
   macro avg       0.39      0.63      0.48        36
weighted avg       0.43      0.61      0.50        36



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


array([[ 7,  0,  0],
       [ 2, 15,  0],
       [ 6,  6,  0]], dtype=int64)

## 8-1 . Load model & Fit (Logistic Regression)

In [34]:
logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
y_pred = logistic_model.predict(X_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


## 8-2. Evaluate

In [35]:
print(classification_report(y_test, y_pred))
confusion_matrix(y_test, y_pred)

              precision    recall  f1-score   support

           0       1.00      0.86      0.92         7
           1       0.94      1.00      0.97        17
           2       1.00      1.00      1.00        12

    accuracy                           0.97        36
   macro avg       0.98      0.95      0.96        36
weighted avg       0.97      0.97      0.97        36



array([[ 6,  1,  0],
       [ 0, 17,  0],
       [ 0,  0, 12]], dtype=int64)

# 결론
- 정확도 최고 성능 모델 : RF :1.00
- presicion 최고 성능 모델 : RF : 1.00
- recall 최고 성능 모델 : RF : 1.00
- f1 - score 최고 성능 모델 : RF : 1.00