-----
<h1><font color="#f37626">[Experiment]</font> ML-xgboost 예제</h1>

- 보다 상세한 Accuinsight 파이썬 패키지 사용법은 [Accuinsight 안내 홈페이지](https://accuinsight.cloudz.co.kr/#/intro) 또는 [Accuinsight Youtube 채널](https://www.youtube.com/channel/UChFs-FAVxgG4C00h8C1MqoA)을 참조하시기 바랍니다.
- Accuinsight 패키지를 사용한 분석 코드는 [Accuinsight-github](https://github.com/AccuInsight/accuinsight_Lifecycle_example)에서 조회 가능합니다.

###  # Iris classification
----

In [1]:
# install package if not installed
# !pip install xgboost

Collecting xgboost
  Downloading xgboost-1.5.1-py3-none-manylinux2014_x86_64.whl (173.5 MB)
     |████████████████████████████████| 173.5 MB 47.9 MB/s            MB 1.8 MB/s eta 0:01:317 MB 1.8 MB/s eta 0:01:30
Installing collected packages: xgboost
Successfully installed xgboost-1.5.1


### 1. Import packages

In [1]:
from Accuinsight.Lifecycle.ML import accuinsight

accu = accuinsight()

2021-12-02 11:43:18.925100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1


In [2]:
import pandas as pd

from xgboost import XGBClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

### 2. Data load and split

In [3]:
iris_data = pd.read_csv('../data/iris_data.csv')
iris_data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [4]:
iris_data.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [4]:
iris_data.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'], dtype='object')

In [5]:
X_train, X_test, y_train, y_test = train_test_split(iris_data.drop('class', axis=1), iris_data[['class']], random_state=0, stratify=iris_data[['class']], test_size=0.2)

In [6]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(120, 4)
(120, 1)
(30, 4)
(30, 1)


In [7]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 120 entries, 45 to 106
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  120 non-null    float64
 1   sepal_width   120 non-null    float64
 2   petal_length  120 non-null    float64
 3   petal_width   120 non-null    float64
dtypes: float64(4)
memory usage: 4.7 KB


### 3. Preprocessing

In [8]:
# scaling
scaler = MinMaxScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [9]:
# class encoding
y_train.value_counts()

class          
Iris-setosa        40
Iris-versicolor    40
Iris-virginica     40
dtype: int64

In [10]:
class_dic = {"Iris-setosa": 0, "Iris-versicolor": 1, "Iris-virginica": 2}

In [11]:
y_train_num = y_train.replace({"class": class_dic})
y_train_num.value_counts()

class
0        40
1        40
2        40
dtype: int64

In [12]:
y_test_num = y_test.replace({"class": class_dic})
y_test_num.value_counts()

class
0        10
1        10
2        10
dtype: int64

### 4. Build model

In [13]:
model = XGBClassifier(max_depth=2)
model.fit(X_train_scaled, y_train_num)



XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=2, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=8,
              num_parallel_tree=1, objective='multi:softprob', predictor='auto',
              random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=None,
              subsample=1, tree_method='exact', validate_parameters=1,
              verbosity=None)

### 6. Model evaluation

In [14]:
pred_train = model.predict(X_train_scaled)

train_report = classification_report(y_train_num, pred_train)
train_confusion_matrix = confusion_matrix(y_train_num, pred_train)

print(train_report)
print(train_confusion_matrix)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        40
           1       1.00      1.00      1.00        40
           2       1.00      1.00      1.00        40

    accuracy                           1.00       120
   macro avg       1.00      1.00      1.00       120
weighted avg       1.00      1.00      1.00       120

[[40  0  0]
 [ 0 40  0]
 [ 0  0 40]]


In [15]:
pred_test = model.predict(X_test_scaled)

test_report = classification_report(y_test_num, pred_test)
test_confusion_matrix = confusion_matrix(y_test_num, pred_test)

print(test_report)
print(test_confusion_matrix)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.83      1.00      0.91        10
           2       1.00      0.80      0.89        10

    accuracy                           0.93        30
   macro avg       0.94      0.93      0.93        30
weighted avg       0.94      0.93      0.93        30

[[10  0  0]
 [ 0 10  0]
 [ 0  2  8]]


In [16]:
accuracy_test = accuracy_score(y_test_num, pred_test)
print("Accuracy %.2f%%" % (accuracy_test * 100))

Accuracy 93.33%


### 7. Run model with experiment

__Data drift 기능을 사용할 경우, `model_monitor=True` 옵션을 사용하여 피처 중요도를 저장합니다.__  
비정형데이터의 경우 사용할 수 없습니다.

In [19]:
accu.send_message('[ML experiment] Iris classification training finished')

In [21]:
with accu.add_experiment(model, X_train_scaled, y_train_num, X_test_scaled, y_test_num, model_monitor=True) as exp:
    exp.log_params('max_depth')
    exp.log_metrics('accuracy', accuracy_test)

Using add_experiment(model_monitor=True)
