Target(label) 변수의 특성에 따라 사용해야하는 모델이 다르고 (1-3)  
Parameter optimization 하는 방법에 따라 조금씩 다르다.

1. **binary classification** --> use XGBClassifier
2. **multi-class classification** -> use XGBClassifier
3. **Regression** --> use XGBRegressor
4. **Parameter optimization** --> use XGBRegressor

SEE! [sklearn - XGBoost Parameters](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn)

---
참고.  
- **XGBClassifier :** Lable이 **이항변수, 다항변수**
- **XGBRegressor  :** Lable이 **연속형 자료**

# import modules

In [53]:
'''
Created on 1 Apr 2015
@author: Jamie Hall
'''

import pickle
import xgboost as xgb

import numpy as np
from sklearn.model_selection import KFold, train_test_split, GridSearchCV, ParameterGrid
from sklearn.metrics import confusion_matrix, mean_squared_error, f1_score
from sklearn.datasets import load_iris, load_digits, load_boston

import warnings
warnings.filterwarnings('ignore')

rng = np.random.RandomState(31337)
xgb.__version__

'1.3.3'

# 1. binary classification 

In [7]:
print("Zeros and Ones from the Digits dataset: binary classification")
digits = load_digits(n_class=2)

Zeros and Ones from the Digits dataset: binary classification


In [23]:
print('Shape of X : ', digits.data.shape)
print('Shape of y : ', digits.target.shape)

Shape of X :  (360, 64)
Shape of y :  (360,)


In [34]:
y = digits['target'] # label
X = digits['data']

# make K-fold cross validation instance + you can use 
kf = KFold(n_splits=2,       # there is 2 folds
           shuffle=True,     # 데이터를 분할하기 전에 섞어줘
           random_state=rng) # seed 고정

**KFold (k-folds cross validation)** See also.  


- [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html#sklearn.model_selection.StratifiedKFold)  
Takes group information into account to avoid building folds with imbalanced class distributions (for binary or multiclass classification tasks).
- [GroupKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html#sklearn.model_selection.GroupKFold)  
K-fold iterator variant with non-overlapping groups.
- [RepeatedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RepeatedKFold.html#sklearn.model_selection.RepeatedKFold)  
Repeats K-Fold n times.

## Run XGboost

In [35]:
for train_index, test_index in kf.split(X):
    
    xgb_model = xgb.XGBClassifier(n_jobs=1, use_label_encoder=False).fit(X[train_index], y[train_index])
    predictions = xgb_model.predict(X[test_index])
    actuals = y[test_index]
    
    print(confusion_matrix(actuals, predictions))

[[93  1]
 [ 0 86]]
[[84  0]
 [ 2 94]]


# 2. multi-class classification
## load the data and transfer to frame

In [36]:
print("Iris: multiclass classification")
iris = load_iris()
y = iris['target']
X = iris['data']
kf = KFold(n_splits=2, shuffle=True, random_state=rng)

Iris: multiclass classification


In [37]:
for train_index, test_index in kf.split(X):
    
    xgb_model = xgb.XGBClassifier(n_jobs=1).fit(X[train_index], y[train_index])
    predictions = xgb_model.predict(X[test_index])
    actuals = y[test_index]
    
    print(confusion_matrix(actuals, predictions))

[[24  0  0]
 [ 0 22  4]
 [ 0  0 25]]
[[26  0  0]
 [ 0 23  1]
 [ 0  2 23]]




# 3. Regression
## load the data and transfer to frame

In [38]:
print("Boston Housing: regression")
boston = load_boston()
y = boston['target']
X = boston['data']
kf = KFold(n_splits=2, shuffle=True, random_state=rng)

Boston Housing: regression


In [41]:
for train_index, test_index in kf.split(X):
    xgb_model = xgb.XGBRegressor(n_jobs=1).fit(X[train_index], y[train_index])
    predictions = xgb_model.predict(X[test_index])
    actuals = y[test_index]
    print(mean_squared_error(actuals, predictions))

11.241413840538742
15.104799766676079


# 4. Parameter Opimization
- **n_estimators(int) :**  Number of gradient boosted trees. Equivalent to number of boosting rounds
- **max_depth(int) :** Maximum tree depth for base learners

## using GridSearchCV

In [43]:
print("Parameter optimization01 : GridSearchCV")
y = boston['target']
X = boston['data']

xgb_model = xgb.XGBRegressor(n_jobs=1)
clf = GridSearchCV(xgb_model,
                   {'max_depth': [2, 4, 6],
                    'n_estimators': [50, 100, 200]}, verbose=1, n_jobs=1)

clf.fit(X, y)
print(clf.best_score_)
print(clf.best_params_)


Parameter optimization
Fitting 5 folds for each of 9 candidates, totalling 45 fits
0.6839859272017424
{'max_depth': 2, 'n_estimators': 100}


## using ParameterGrid (***)

In [65]:
print("Parameter optimization01 : ParameterGrid")

y = boston['target']
X = boston['data']

Parameter optimization01 : ParameterGrid


### With K-fold Cross validation

+there is another way to CrossValidation → [Many kind of CV]()

In [85]:
# 5-fold, Shuffle
kf = KFold(n_splits=5, shuffle=True, random_state=rng)

#### [1st way]  CV --> Parameter

In [81]:
# 파라미터 그리드 설정
XGB_parameter_grid = ParameterGrid({"max_depth": np.arange(2, 6),
                                  "n_estimators": np.arange(50, 200)})

for train_index, test_index in kf.split(X):  

    best_score = 1000
    for parameter in XGB_parameter_grid:

        model = xgb.XGBRegressor(n_jobs =1, verbosity = 1, **parameter).fit(X[train_index], y[train_index])
        pred_Y = model.predict(X[test_index])
        score = mean_squared_error(pred_Y, y[test_index])

        if score < best_score:
            best_score = score
            best_parameter = parameter
        
    print(best_parameter, best_score)

{'max_depth': 2, 'n_estimators': 183} 7.966413949080774
{'max_depth': 3, 'n_estimators': 112} 6.451319132204588
{'max_depth': 5, 'n_estimators': 75} 7.804371671162519
{'max_depth': 5, 'n_estimators': 95} 10.340803882852418
{'max_depth': 5, 'n_estimators': 77} 16.58959935221183


#### [2nd way] Parameter 설정 --> CV

In [96]:
# 파라미터 그리드 설정
XGB_parameter_grid = ParameterGrid({"max_depth": np.arange(2, 6),
                                  "n_estimators": np.arange(50, 200)})
  
# [1st Loop]
# Set parameter which we want test
for parameter in XGB_parameter_grid:

    best_score = 1000
    avr_score = 0
    _scores = []
    
    # [2nd Loop]
    # K-fold cross validation -> Mean Score is the 'Set of parameters's score
    for train_index, test_index in kf.split(X):

        model = xgb.XGBRegressor(n_jobs =1, verbosity = 1, **parameter)
        model.fit(X[train_index], y[train_index],
                 verbose = 1)
        pred_Y = model.predict(X[test_index])
        score = mean_squared_error(pred_Y, y[test_index])
        
        _scores.append(score)

    avr_score = np.mean(_scores)

    if avr_score < best_score:
        best_score = avr_score
        best_parameter = parameter
    
print('Best Parameter : ', best_parameter)
print('Best Score(MSE):', best_score)

Best Parameter :  {'max_depth': 5, 'n_estimators': 199}
Best Score(MSE): 9.488505972585271


#### Pickle

In [94]:
# Best Parameter를 설정한 객체 저장
# By 'Pickle'

# Best Model
best_boston_XGBM = xgb.XGBRegressor(n_jobs =1, verbosity = 1, **best_parameter).fit(X, y)

# Pickling - dump
pickle.dump(best_boston_XGBM, open("best_boston_XGBM.pkl", "wb"))

# Pickling - load
Can_use_this_model = pickle.load(open("best_boston_XGBM.pkl", "rb"))
Can_use_this_model

In [98]:
Can_use_this_model

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=5,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=199, n_jobs=1, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=1)

### Using just split the data

In [68]:
# train_test_split
Train_X, Test_X, Train_Y, Test_Y = train_test_split(X, y, random_state = rng)

In [73]:
# 파라미터 그리드 설정
XGB_parameter_grid = ParameterGrid({"max_depth": np.arange(2, 6),
                                  "n_estimators": np.arange(50, 200)})

best_score = 1000

for parameter in XGB_parameter_grid:

    model = xgb.XGBRegressor(n_jobs =1, verbosity = 1 , **parameter).fit(Train_X, Train_Y)
    pred_Y = model.predict(Test_X)
    score = mean_squared_error(pred_Y, Test_Y)

    if score < best_score:
        best_score = score
        best_parameter = parameter

print(best_parameter, best_score)

{'max_depth': 3, 'n_estimators': 64} 14.224654335528523


# Appendix.

## 1. Early Stopping

In [100]:
# Early-stopping
# 내부 파라미터를 이용해서 (evaluation parameter) Early Stop 기능 활성화 가능

# data from 'digits'
X = digits['data']
y = digits['target']

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

clf = xgb.XGBClassifier(n_jobs=1)
clf.fit(X_train, y_train, 
        early_stopping_rounds=10, 
        eval_metric="auc",
        eval_set=[(X_test, y_test)])

[0]	validation_0-auc:0.99950
[1]	validation_0-auc:0.99975
[2]	validation_0-auc:0.99975
[3]	validation_0-auc:0.99975
[4]	validation_0-auc:0.99975
[5]	validation_0-auc:0.99975
[6]	validation_0-auc:1.00000
[7]	validation_0-auc:1.00000
[8]	validation_0-auc:1.00000
[9]	validation_0-auc:1.00000
[10]	validation_0-auc:1.00000
[11]	validation_0-auc:1.00000
[12]	validation_0-auc:1.00000
[13]	validation_0-auc:1.00000
[14]	validation_0-auc:1.00000
[15]	validation_0-auc:1.00000
[16]	validation_0-auc:1.00000


XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=1, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

## 2. Pickle
can use pickle

In [101]:
# The sklearn API models are picklable
print("Pickling sklearn API models")

# must open in binary format to pickle
pickle.dump(clf, open("best_boston.pkl", "wb"))
clf2 = pickle.load(open("best_boston.pkl", "rb"))
print(np.allclose(clf.predict(X), clf2.predict(X)))

Pickling sklearn API models
True
