* The Gain implies the relative contribution of the corresponding feature to the model calculated by taking each feature's contribution for each tree in the model. A higher value of this metric when compared to another feature implies it is more important for generating a prediction.

* The Cover metric means the relative number of observations related to this feature. For example, if you have 100 observations, 4 features and 3 trees, and suppose feature1 is used to decide the leaf node for 10, 5, and 2 observations in tree1, tree2 and tree3 respectively; then the metric will count cover for this feature as 10+5+2 = 17 observations. This will be calculated for all the 4 features and the cover will be 17 expressed as a percentage for all features' cover metrics.

* The Frequence (frequency) is the percentage representing the relative number of times a particular feature occurs in the trees of the model. In the above example, if feature1 occurred in 2 splits, 1 split and 3 splits in each of tree1, tree2 and tree3; then the weightage for feature1 will be 2+1+3 = 6. The frequency for feature1 is calculated as its percentage weight over weights of all features.

### Import module, function and data

In [1]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn.metrics import precision_score, recall_score
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import validation_curve
from pprint import pprint
import matplotlib.pyplot as plt
from hyperopt.pyll.stochastic import sample
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

from sklearn.ensemble import RandomForestClassifier # rf분류기

In [2]:
def F1(y_pred, dtrain):
    labels = dtrain.get_label()
    
    pre = precision_score(y_true = labels, y_pred = y_pred, average=None)
    rec = recall_score(y_true = labels, y_pred = y_pred, average=None)
    f1_score = 8/(sum(1/pre) + sum(1/rec))

    return 'f1', f1_score

In [3]:
def f1(X_val, y_val, model,mapping):
    """
    Model evaluation function for multiclass classification problem
    1) F-1 score, Precision, Recall
    2) ROC curve, PR curve는 추후에 생각
    """
 
    #### predict the value
    y_pred = model.predict(X_val)

    #print('-'*50)
    #print('2. F1-score')
    
    # inverse pre/ rec
    pre = precision_score(y_true = y_val, y_pred = y_pred, average=None)
    rec = recall_score(y_true = y_val, y_pred = y_pred, average=None)

    # f1 measure
    f1_score = 8/(sum(1/pre) + sum(1/rec))
    
    # view - precision recall
    table = pd.DataFrame([])

    for i,k in enumerate(mapping.keys()):
        table[k] = [pre[i],rec[i]]
    table.index = ['precision','recall']
    # print(table)
    
    # view - f1
    #print('F1_score %.3f'%f1_score)
    #print('='*50)
    return f1_score

In [4]:
#### load data set
X_train = pd.read_csv('temp_data/train_every_features_0905.csv').drop('new_id',axis=1)
# X_train2 = pd.read_csv('temp_data/train_activity_cnt_mean_encoded.csv').drop('new_id',axis=1)

In [5]:
cols = pd.read_csv('sub_features.csv').features

In [6]:
#### load class
train_label = pd.read_csv('temp_data/train_label_lite.csv')
# hasher = pd.read_csv('test_id.csv')
label_map = {'retained':0,'2month':1,'month':2,'week':3}
y_train = pd.Series([label_map[l] for l in train_label.label])

In [7]:
inv_map = {label_map[k]:k for k in label_map.keys()}

In [8]:
X_train = X_train.loc[:,cols]

---

In [9]:
X_train.shape

(100000, 319)

In [17]:
#### xgb
grid_result = []
param = {}
#### XGB parameters
## General Parameters
param['n_gpus'] = -1
param['tree_method'] = 'gpu_hist'
param['silent'] = 0

## Booster Parameters
param['n_estimators'] = 1000 #요기...
param['learning_rate'] = 0.1
param['min_child_weight'] = 2
param['max_depth'] = 10
param['gamma'] = 0
param['reg_alpha'] =0
param['reg_lambda'] = 0
param['subsample'] = 0.96
param['colsample_bytree'] = 0.69
param['scale_pos_weight'] = 1

## Learning task parameters
param['num_class'] = 4
param['objective'] = 'multi:softmax'
param['seed'] = 7

model = xgb.XGBClassifier(**param)

In [11]:
 #### step 1 : tuning n_estimators with cross validation
print("===============================================")
print("Find the n_estimators")
xgtrain = xgb.DMatrix(X_train.values, label= y_train.values.reshape(-1,1))
cvresult = xgb.cv(param, xgtrain, num_boost_round = param['n_estimators'], nfold = 5, feval=F1, early_stopping_rounds = 50
                  ,stratified=True, shuffle=True)
print("Optimal n_estimators : %d"%cvresult.shape[0])

Find the n_estimators
Optimal n_estimators : 281


In [13]:
cvresult['test-f1-mean'].max()

0.72223740000000003

In [18]:
model.fit(X_train,y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=0.69, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=10, min_child_weight=2, missing=None, n_estimators=1000,
       n_gpus=-1, n_jobs=1, nthread=None, num_class=4,
       objective='multi:softprob', random_state=0, reg_alpha=0,
       reg_lambda=0, scale_pos_weight=1, seed=7, silent=0, subsample=0.96,
       tree_method='gpu_hist')

In [21]:
explainer = shap.TreeExplainer(model)

In [None]:
shap_values = explainer.shap_values(X_train)

In [None]:
shap.summary_plot(shap_values, X_train,plot_type="bar")

In [19]:
cvresult.head()

Unnamed: 0,train-f1-mean,train-f1-std,train-merror-mean,train-merror-std,test-f1-mean,test-f1-std,test-merror-mean,test-merror-std
0,0.714553,0.002395,0.264482,0.001564,0.675235,0.003891,0.3012,0.003428
1,0.723411,0.002094,0.255102,0.001599,0.68247,0.002973,0.29322,0.002692
2,0.726759,0.001596,0.251348,0.001019,0.686482,0.003503,0.28868,0.003098
3,0.728296,0.001627,0.249262,0.001223,0.688568,0.003308,0.28567,0.002465
4,0.729207,0.001453,0.2483,0.001114,0.688793,0.003707,0.28573,0.002968


In [18]:
cvresult['test-f1-mean']

0       0.675235
1       0.682470
2       0.686482
3       0.688568
4       0.688793
5       0.689820
6       0.691143
7       0.691531
8       0.691888
9       0.692360
10      0.692441
11      0.693256
12      0.693979
13      0.693966
14      0.694328
15      0.694230
16      0.694265
17      0.694569
18      0.694525
19      0.694656
20      0.694807
21      0.694876
22      0.695220
23      0.695283
24      0.695231
25      0.695545
26      0.695563
27      0.695462
28      0.695483
29      0.695788
          ...   
1667    0.719694
1668    0.719740
1669    0.719699
1670    0.719682
1671    0.719655
1672    0.719618
1673    0.719636
1674    0.719674
1675    0.719718
1676    0.719694
1677    0.719710
1678    0.719692
1679    0.719744
1680    0.719721
1681    0.719771
1682    0.719761
1683    0.719726
1684    0.719715
1685    0.719703
1686    0.719747
1687    0.719737
1688    0.719781
1689    0.719822
1690    0.719816
1691    0.719847
1692    0.719851
1693    0.719855
1694    0.7198

In [9]:
model.fit(X_train,y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=0.69, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=13, min_child_weight=9.55, missing=None, n_estimators=223,
       n_gpus=-1, n_jobs=1, nthread=None, num_class=4,
       objective='multi:softprob', random_state=0, reg_alpha=0,
       reg_lambda=0, scale_pos_weight=1, seed=7, silent=0, subsample=0.96,
       tree_method='gpu_hist')

In [15]:
gains = model.get_booster().get_score(importance_type='gain')

In [13]:
X_train['play_time_Count']

1

In [19]:
gain = pd.DataFrame([(k,gains[k]) for k in gains]).sort_values(by=1,ascending = False)
gain.columns = ['feature','inforamtion_gain']
gain.reset_index(drop=True,inplace=True)

In [17]:
covers = model.get_booster().get_score(importance_type='cover')

In [20]:
cover = pd.DataFrame([(k,covers[k]) for k in covers]).sort_values(by=1,ascending = False)
cover.columns = ['feature','cover']
cover.reset_index(drop=True,inplace=True)

In [43]:
len(gain)

541

In [44]:
len(cover)

541

In [47]:
X_train.columns

Index(['payment_amount_by_play_time', 'play_time_by_cnt_dt',
       'game_combat_time_by_play_time', 'get_money_by_play_time',
       'cnt_use_buffitem_by_game_combat_time', 'chats_by_play_time',
       '('npc_exp', 'game_combat_time')', '('quest_exp', 'game_combat_time')',
       '('get_money', 'cnt_use_buffitem')', '('npc_hongmun', 'item_hongmun')',
       ...
       'exc_last', 'give_first', 'give_second', 'give_third', 'rec_first',
       'rec_second', 'rec_third', 'exc_first', 'exc_second', 'exc_third'],
      dtype='object', length=568)

In [31]:
features = X_train.columns.tolist()

In [51]:
for c in features:
    if c not in gain.feature.tolist():
        print(c)
        
    else

give_exc_weapon_ratio
first_week_cnt_clear_bam
give_acc_sum
give_gem_sum
give_weapon_sum
rec_acc_sum
rec_costume_sum
rec_gem_sum
rec_weapon_sum
give_exc_acc_sum_
give_exc_costume_sum_
give_exc_gem_sum
give_exc_weapon_sum
rec_exc_acc_sum
rec_exc_costume_sum
rec_exc_gem_sum
rec_exc_weapon_sum
give_gem_ratio
rec_weapon_ratio
give_exc_acc_ratio.1
give_exc_costume_ratio.1
give_exc_gem_ratio.1
give_exc_weapon_ratio.1
rec_exc_acc_ratio
rec_exc_costume_ratio
rec_exc_gem_ratio
rec_exc_weapon_ratio


In [87]:
shits = []
for c in features:
    if c in gain.feature.tolist():
        shits.append(ruler.loc[ruler.feature==c].index.tolist()[0]- gain.loc[gain.feature==c].index.tolist()[0])

In [95]:
for i,x in enumerate(shits):
    if x < -300:
        print(features[i],end=',')

payment_amount_by_play_time,payment_amount_median_stat,duel_cnt_min_stat,duel_cnt_max_stat,duel_cnt_range_stat,duel_cnt_median_stat,duel_cnt_sum_stat,cnt_enter_inzone_skilled_min_stat,cnt_enter_inzone_skilled_max_stat,cnt_enter_inzone_skilled_range_stat,cnt_enter_inzone_skilled_median_stat,cnt_enter_raid_range_stat,cnt_enter_raid_light_max_stat,cnt_enter_raid_light_range_stat,cnt_enter_raid_light_median_stat,district_chat_max_stat,district_chat_range_stat,district_chat_median_stat,district_chat_sum_stat,payment_amount_median_basic_time,payment_amount_var_basic_time,payment_amount_kurt_basic_time,payment_amount_MA_2_basic_time,payment_amount_MA_3_basic_time,payment_amount_MA_4_basic_time,payment_amount_cycle_basic_time,normal_chat_median_basic_time,normal_chat_MA_1_basic_time,normal_chat_MA_2_basic_time,normal_chat_MA_4_basic_time,guild_chat_var_basic_time,guild_chat_MA_1_basic_time,guild_chat_MA_4_basic_time,net_BetCen,net_EigCen,inzone_skilled_ratio,bam_ratio,party_cnt_with4_min,party

In [88]:
shits

[-425,
 496,
 178,
 383,
 366,
 369,
 96,
 51,
 137,
 99,
 209,
 199,
 44,
 448,
 248,
 280,
 302,
 161,
 120,
 275,
 6,
 236,
 -228,
 -228,
 -112,
 -343,
 -171,
 282,
 317,
 220,
 379,
 399,
 377,
 252,
 154,
 436,
 243,
 255,
 279,
 54,
 376,
 290,
 355,
 293,
 180,
 389,
 229,
 329,
 200,
 277,
 65,
 317,
 364,
 277,
 87,
 253,
 268,
 123,
 207,
 156,
 232,
 69,
 174,
 206,
 43,
 236,
 241,
 394,
 376,
 180,
 369,
 365,
 -20,
 307,
 102,
 349,
 232,
 -338,
 -354,
 -359,
 -409,
 -352,
 54,
 -3,
 -20,
 -55,
 -49,
 189,
 240,
 150,
 175,
 223,
 62,
 134,
 224,
 46,
 154,
 -493,
 -415,
 -377,
 -431,
 -299,
 129,
 37,
 -46,
 -35,
 4,
 -124,
 -259,
 -334,
 -11,
 -129,
 -168,
 -335,
 -345,
 -370,
 152,
 215,
 -98,
 127,
 111,
 196,
 45,
 293,
 176,
 -24,
 139,
 -271,
 -379,
 -390,
 -327,
 -305,
 -88,
 229,
 192,
 71,
 214,
 241,
 -58,
 -123,
 -175,
 -88,
 124,
 140,
 -250,
 119,
 105,
 -333,
 -234,
 -434,
 26,
 -359,
 -227,
 -436,
 -379,
 -334,
 -49,
 -338,
 78,
 83,
 252,
 -90,
 -64,
 150

In [67]:
ruler = pd.merge(cover,freq,on = 'feature')

In [68]:
ruler['rule'] = ruler.cover/ruler.freq

In [76]:
pd.merge(gain,ruler, on = 'feature').to_csv('feature_selection.csv',index = False)

In [83]:
ruler.rule

0      18822.009800
1       9469.567375
2       9327.874525
3       3635.276092
4       1946.954270
5       1478.092256
6       1302.818017
7        918.299646
8        770.213587
9        731.866259
10       606.292419
11       467.713766
12       461.708621
13       428.734003
14       344.084776
15       308.722186
16       300.293582
17       297.102611
18       270.340345
19       241.086329
20       194.405186
21       184.849513
22       164.693770
23       159.035126
24       146.363194
25       141.533399
26       107.148408
27       104.761029
28        97.285445
29        90.070560
           ...     
511        0.590076
512        0.580888
513        0.569278
514        0.541990
515        0.519861
516        0.519709
517        0.498595
518        0.495257
519        0.486920
520        0.474857
521        0.458724
522        0.441629
523        0.428791
524        0.421229
525        0.393177
526        0.389478
527        0.385183
528        0.382389
529        0.354705


In [82]:
ruler = ruler.sort_values(by='rule',ascending = False).reset_index(drop=True)

In [64]:
freq = model.get_booster().get_score(importance_type='weight')

In [59]:
gain

Unnamed: 0,feature,inforamtion_gain
0,quest_exp_by_cnt_dt,157.334095
1,pattern_retained,136.173674
2,play_time_Count,125.420638
3,item_hongmun_mean_basic_time,97.715113
4,cnt_enter_raid_light_Count,87.113909
5,play_pattern_by_partial_sum_retained_prob,79.396382
6,game_combat_time_MA_5_time_series,63.258112
7,making_cnt_max_stat,54.114842
8,making_cnt_sum_stat,49.865232
9,cnt_dt_max_stat,42.123701


In [60]:
cover

Unnamed: 0,feature,cover
0,rec_gem_ratio,18939.134750
1,give_weapon_ratio,18822.009800
2,give_costume_sum,18655.749050
3,rec_acc_ratio,18176.380460
4,first_week_gathering_cnt,14692.794337
5,give_costume_ratio,14330.998184
6,gathering_cnt_Count,13628.679889
7,first_week_faction_chat,11824.738045
8,rec_costume_ratio,8050.528845
9,faction_chat_Count,7881.801452


In [65]:
freq= pd.DataFrame([(k ,freq[k]) for k in freq.keys()]).sort_values(by=1,ascending = False)
freq.columns = ['feature','freq']
freq.reset_index(drop=True,inplace=True)

In [40]:
gain.loc[gain.feature=='quest_exp_by_cnt_dt'].index[0]

0

In [None]:
Xgboost objective call #9 cur_best_score=0.72345 cur_best_std=0.00281
{'colsample_bytree': 0.69, 'learning_rate': 0.1, 'max_depth': 13, 'min_child_weight': 9.55, 'n_estimators': 1000, 'n_gpus': -1, 'num_class': 4, 'objective': 'multi:softmax', 'reg_alpha': 0, 'reg_lambda': 0, 'seed': 7, 'silent': 0, 'subsample': 0.96, 'tree_method': 'gpu_hist'}
===============================================
Find the n_estimators
Optimal n_estimators : 223
5-fold of Xgboost F1: 0.72419 +/- 0.00252


---

### Feature selection

In [8]:
X_train.shape

(100000, 680)

In [9]:
#### RF 모델
model = RandomForestClassifier(criterion='gini',max_depth = 19, max_features = 290, min_samples_leaf = 1,n_estimators=300,random_state= 7, n_jobs=-1)
X_train_rf = X_train.fillna(0)
model.fit(X_train_rf,y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=19, max_features=290, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=300, n_jobs=-1,
            oob_score=False, random_state=7, verbose=0, warm_start=False)

In [10]:
#### feature selection by RF
importances = model.feature_importances_
std = np.std([tree.feature_importances_ for tree in model.estimators_],axis=0)
indices = np.argsort(importances) # ascending

#### feature ranking
feature_ranking = [(indices[f],importances[indices[f]]) for f in range(X_train.shape[1])]

In [11]:
#### state of art feature.... 300개 정도... importance ratio 조정!!!
NUM_OF_FEATURES = len([(i,f) for i, f in feature_ranking if f > 0.0001])

In [12]:
NUM_OF_FEATURES

543

In [13]:
col = pd.DataFrame({'importance': model.feature_importances_, 'feature': X_train.columns}).sort_values(by=['importance'], ascending=[False])[:NUM_OF_FEATURES]['feature'].values

In [14]:
#### FEATURE SELECTION
X_train = X_train[col]

In [15]:
X_train.shape

(100000, 543)

---

In [16]:
#### xgb
grid_result = []
param = {}
#### XGB parameters
## General Parameters
param['n_gpus'] = -1
param['tree_method'] = 'gpu_hist'
param['silent'] = 0

## Booster Parameters
param['n_estimators'] = 5#요기...
param['learning_rate'] = 0.1
param['min_child_weight'] = 4
param['max_depth'] = 14
param['gamma'] = 0
param['reg_alpha'] = 0.058
param['reg_lambda'] = 0.049
param['subsample'] = 0.96
param['colsample_bytree'] = 0.69
param['scale_pos_weight'] = 1

## Learning task parameters
param['num_class'] = 4
param['objective'] = 'multi:softmax'
param['seed'] = 7

cv_folds = 5

In [19]:
 #### step 1 : tuning n_estimators with cross validation
print("===============================================")
print("Find the n_estimators")
xgtrain = xgb.DMatrix(X_train.values, label= y_train.values.reshape(-1,1))
cvresult = xgb.cv(param, xgtrain, num_boost_round = param['n_estimators'], nfold = 5, feval=F1, early_stopping_rounds = 50
                  ,stratified=True, shuffle=True)
print("Optimal n_estimators : %d"%cvresult.shape[0])

Find the n_estimators
Optimal n_estimators : 5


### hyperopt Xgb

In [21]:
obj_call_count = 0
cur_best_score = 0
cur_best_std = 0

In [22]:
param_space = {
    'n_estimators': 1000,
    'learning_rate': 0.1,
    'min_child_weight': hp.quniform('min_child_weight',2,10,0.05),
    'max_depth':hp.choice('max_deph',[7,8,9,10,11,12,13]),

    'reg_alpha': 0,
    'reg_lambda': 0,
    'subsample': 0.96,
    'colsample_bytree': 0.69,
    
    'num_class':4,
    'objective': 'multi:softmax',
    'seed': 7,
    
    'n_gpus' : -1,
    'tree_method' : 'gpu_hist',
    'silent' : 0
    }

In [23]:
def xgb_classifier(params): # hyperopt의 objective function은 params를 input으로 받는다.
    
    global obj_call_count, cur_best_score, cur_best_std, X_train, y_train # 우리가 input할 데이터는 global변수화!
    
    obj_call_count += 1
    print('\nXgboost objective call #{} cur_best_score={:7.5f} cur_best_std={:7.5f}'.format(obj_call_count,cur_best_score,cur_best_std) )
    
    #### sampling parameters from the hyperparameter params
    xgb_params = sample(params)
    
    print(xgb_params)
    
     #### step 1 : tuning n_estimators with cross validation
    print("===============================================")
    print("Find the n_estimators")
    xgtrain = xgb.DMatrix(X_train.values, label= y_train.values.reshape(-1,1))
    cvresult = xgb.cv(xgb_params, xgtrain, num_boost_round = xgb_params['n_estimators'], nfold = 5, feval=F1, early_stopping_rounds = 50
                      ,stratified=True, shuffle=True)
    print("Optimal n_estimators : %d"%(cvresult.shape[0]-50))
    
    
    f1_mean = cvresult['test-f1-mean'].iloc[-1]
    f1_std = cvresult['test-f1-std'].iloc[-1]
    
    print('5-fold of Xgboost F1: %.5f +/- %.5f' % (f1_mean,f1_std))
    
    if f1_mean > cur_best_score:
        cur_best_score = f1_mean
        cur_best_std = f1_std
        
    #### minimize metric
    loss = 1 - f1_mean
    loss_var = f1_std
    
    return {'loss': loss , 'loss_variance': loss_var ,'status':STATUS_OK ,'attachments':{'cvresult':cvresult}}

In [24]:
trials = Trials()

In [None]:
best = fmin(xgb_classifier, param_space, algo = tpe.suggest, max_evals=30,trials=trials)
print ('best:')
print (best)


Xgboost objective call #1 cur_best_score=0.00000 cur_best_std=0.00000
{'colsample_bytree': 0.69, 'learning_rate': 0.1, 'max_depth': 10, 'min_child_weight': 7.4, 'n_estimators': 1000, 'n_gpus': -1, 'num_class': 4, 'objective': 'multi:softmax', 'reg_alpha': 0, 'reg_lambda': 0, 'seed': 7, 'silent': 0, 'subsample': 0.96, 'tree_method': 'gpu_hist'}
Find the n_estimators
Optimal n_estimators : 297
5-fold of Xgboost F1: 0.72246 +/- 0.00248

Xgboost objective call #2 cur_best_score=0.72246 cur_best_std=0.00248
{'colsample_bytree': 0.69, 'learning_rate': 0.1, 'max_depth': 12, 'min_child_weight': 8.4, 'n_estimators': 1000, 'n_gpus': -1, 'num_class': 4, 'objective': 'multi:softmax', 'reg_alpha': 0, 'reg_lambda': 0, 'seed': 7, 'silent': 0, 'subsample': 0.96, 'tree_method': 'gpu_hist'}
Find the n_estimators
Optimal n_estimators : 178
5-fold of Xgboost F1: 0.72281 +/- 0.00308

Xgboost objective call #3 cur_best_score=0.72281 cur_best_std=0.00308
{'colsample_bytree': 0.69, 'learning_rate': 0.1, 'max

Optimal n_estimators : 498
5-fold of Xgboost F1: 0.71920 +/- 0.00258

Xgboost objective call #18 cur_best_score=0.72419 cur_best_std=0.00252
{'colsample_bytree': 0.69, 'learning_rate': 0.1, 'max_depth': 8, 'min_child_weight': 8.700000000000001, 'n_estimators': 1000, 'n_gpus': -1, 'num_class': 4, 'objective': 'multi:softmax', 'reg_alpha': 0, 'reg_lambda': 0, 'seed': 7, 'silent': 0, 'subsample': 0.96, 'tree_method': 'gpu_hist'}
Find the n_estimators
Optimal n_estimators : 559
5-fold of Xgboost F1: 0.72094 +/- 0.00289

Xgboost objective call #19 cur_best_score=0.72419 cur_best_std=0.00252
{'colsample_bytree': 0.69, 'learning_rate': 0.1, 'max_depth': 12, 'min_child_weight': 6.75, 'n_estimators': 1000, 'n_gpus': -1, 'num_class': 4, 'objective': 'multi:softmax', 'reg_alpha': 0, 'reg_lambda': 0, 'seed': 7, 'silent': 0, 'subsample': 0.96, 'tree_method': 'gpu_hist'}
Find the n_estimators
Optimal n_estimators : 166
5-fold of Xgboost F1: 0.72300 +/- 0.00279

Xgboost objective call #20 cur_best_sc