## Stacking을 위한 패키지 vecstack

@ http://j.mp/34Etidr & http://j.mp/2NJQ9NO

### <span style='color:red;'>아래 명령어들로 xgboost & vecstack를 설치하고 Jupyter notebook 재부팅(Kernel Restart)을 진행해주세요</span>
- 설치 중 에러가 발생하면 cmd(명령프롬프트) 우클릭 & 관리자권한으로 실행 후 명령어를 입력해 설치합니다.

In [4]:
!pip install xgboost==1.5.2

In [3]:
!pip install vecstack==0.4.0

<br>
<br>

## 1. Usage. Functional API

In [15]:
from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score 

from sklearn.ensemble import ExtraTreesClassifier 
from sklearn.ensemble import RandomForestClassifier 
from xgboost import XGBClassifier 

import warnings                 
warnings.filterwarnings('ignore')

In [6]:
from vecstack import stacking

In [7]:
# Load demo data 

iris = load_iris() 
X, y = iris.data, iris.target 

In [8]:
# Make train/test split 

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = 0.2, random_state = 0)

In [9]:
# Initialize 1-st level models. 
# Caution! All models and parameter values are just 
# demonstrational and shouldn't be considered as recommended. 

models = [ 
    ExtraTreesClassifier(random_state = 0, n_jobs = -1, n_estimators = 100, max_depth = 3), 
    RandomForestClassifier(random_state = 0, n_jobs = -1, n_estimators = 100, max_depth = 3), 
    XGBClassifier(seed = 0, n_jobs = -1, learning_rate = 0.1, n_estimators = 100, max_depth = 3)] 

In [10]:
# Compute stacking features

# Training data x & y 에 대해 3가지 모델들을 모두 학습시킨 후,
# 1) 해당 모델들이 X_train 에 대해 예측한 결과를 Data point 각각에 대해 묶어서 돌려준다. (S_train)
# : Shape of X_train == (120, 4) -> Shape of S_train == (120, 3) <- 모델 각각의 예측치가 하나의 열
# 2) 해당 모델들이 X_test 에 대해 예측한 결과를 Data point 각각에 대해 묶어서 돌려준다. (S_test)
# : Shape of X_test == (30, 4) -> Shape of S_test == (30, 3) <- 모델 각각의 예측치가 하나의 열

S_train, S_test = stacking(models, 
                           X_train, y_train, X_test, 
                           regression = False, 
                           metric = accuracy_score, 
                           n_folds = 4, stratified = True, shuffle = True, 
                           random_state = 0, verbose = 2) 

task:         [classification]
n_classes:    [3]
metric:       [accuracy_score]
mode:         [oof_pred_bag]
n_models:     [3]

model  0:     [ExtraTreesClassifier]
    fold  0:  [1.00000000]
    fold  1:  [0.90000000]
    fold  2:  [1.00000000]
    fold  3:  [0.90000000]
    ----
    MEAN:     [0.95000000] + [0.05000000]
    FULL:     [0.95000000]

model  1:     [RandomForestClassifier]
    fold  0:  [0.96666667]
    fold  1:  [0.90000000]
    fold  2:  [1.00000000]
    fold  3:  [0.90000000]
    ----
    MEAN:     [0.94166667] + [0.04330127]
    FULL:     [0.94166667]

model  2:     [XGBClassifier]
    fold  0:  [0.93333333]




    fold  1:  [0.90000000]
    fold  2:  [1.00000000]
    fold  3:  [0.90000000]
    ----
    MEAN:     [0.93333333] + [0.04082483]
    FULL:     [0.93333333]





In [29]:
# Initialize 2-nd level model 

model = XGBClassifier(seed = 0, n_jobs = -1, learning_rate = 0.1, n_estimators = 100, max_depth = 3, eval_metric='mlogloss') 

In [30]:
# Fit 2-nd level model 
# 3개의 모델이 예측한 결과인 S_train을 Feature(3개의 열)로 하여 y_train을 맞추도록 모델을 학습시킨다.

model = model.fit(S_train, y_train) 

In [31]:
# Predict 
# 앞서 3개의 모델이 예측한 결과인 S_test를 Feature로 하여 y_test를 예측한다.

y_pred = model.predict(S_test) 

In [32]:
# Final prediction score 

print('Final prediction score: [%.8f]' % accuracy_score(y_test, y_pred))

Final prediction score: [0.96666667]


<br>
<br>

## 2. Usage. Scikit-learn API (권장)

In [20]:
from vecstack import StackingTransformer

In [33]:
# Get your data

iris = load_iris() 
X, y = iris.data, iris.target 

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = 0.2, random_state = 0)

In [34]:
# Initialize 1st level estimators

estimators = [ 
    ('ExtraTrees', ExtraTreesClassifier(random_state = 0, n_jobs = -1, n_estimators = 100, max_depth = 3)),
    ('RandomForest', RandomForestClassifier(random_state = 0, n_jobs = -1, n_estimators = 100, max_depth = 3)),
    ('XGB', XGBClassifier(seed = 0, n_jobs = -1, learning_rate = 0.1, n_estimators = 100, max_depth = 3, eval_metric='mlogloss'))]

In [35]:
# Initialize StackingTransformer

stack = StackingTransformer(estimators, 
                            regression = False, 
                            metric = accuracy_score, 
                            n_folds = 4, stratified = True, shuffle = True, 
                            random_state = 0, verbose = 2) 

In [36]:
# Fit

stack = stack.fit(X_train, y_train)

task:         [classification]
n_classes:    [3]
metric:       [accuracy_score]
variant:      [A]
n_estimators: [3]

estimator  0: [ExtraTrees: ExtraTreesClassifier]
    fold  0:  [1.00000000]
    fold  1:  [0.90000000]
    fold  2:  [1.00000000]
    fold  3:  [0.90000000]
    ----
    MEAN:     [0.95000000] + [0.05000000]

estimator  1: [RandomForest: RandomForestClassifier]
    fold  0:  [0.96666667]
    fold  1:  [0.90000000]
    fold  2:  [1.00000000]
    fold  3:  [0.90000000]
    ----
    MEAN:     [0.94166667] + [0.04330127]

estimator  2: [XGB: XGBClassifier]
    fold  0:  [0.93333333]
    fold  1:  [0.90000000]
    fold  2:  [1.00000000]
    fold  3:  [0.90000000]
    ----
    MEAN:     [0.93333333] + [0.04082483]



In [37]:
# Get your stacked features

S_train = stack.transform(X_train)
S_test = stack.transform(X_test)

Train set was detected.
Transforming...

estimator  0: [ExtraTrees: ExtraTreesClassifier]
    model from fold  0: done
    model from fold  1: done
    model from fold  2: done
    model from fold  3: done
    ----
    DONE

estimator  1: [RandomForest: RandomForestClassifier]
    model from fold  0: done
    model from fold  1: done
    model from fold  2: done
    model from fold  3: done
    ----
    DONE

estimator  2: [XGB: XGBClassifier]
    model from fold  0: done
    model from fold  1: done
    model from fold  2: done
    model from fold  3: done
    ----
    DONE

Transforming...

estimator  0: [ExtraTrees: ExtraTreesClassifier]
    model from fold  0: done
    model from fold  1: done
    model from fold  2: done
    model from fold  3: done
    ----
    DONE

estimator  1: [RandomForest: RandomForestClassifier]
    model from fold  0: done
    model from fold  1: done
    model from fold  2: done
    model from fold  3: done
    ----
    DONE

estimator  2: [XGB: XGBClass

In [38]:
# Use 2nd level estimator with stacked features

model = XGBClassifier(seed = 0, n_jobs = -1, learning_rate = 0.1, n_estimators = 100, max_depth = 3, eval_metric='mlogloss') 
model = model.fit(S_train, y_train) 

y_pred = model.predict(S_test) 
print('Final prediction score: [%.8f]' % accuracy_score(y_test, y_pred))

Final prediction score: [0.96666667]
