# HR Analytics

<img src = 'https://datahack-prod.s3.ap-south-1.amazonaws.com/__sized__/contest_cover/hr_1920x480_s5WuoZs-thumbnail-1200x1200-90.jpg'>

Practice Problem: https://datahack.analyticsvidhya.com/contest/wns-analytics-hackathon-2018-1/

## HR Analytics

HR analytics is revolutionising the way human resources departments operate, leading to higher efficiency and better results overall. Human resources has been using analytics for years. However, the collection, processing and analysis of data has been largely manual, and given the nature of human resources dynamics and HR KPIs, the approach has been constraining HR. Therefore, it is surprising that HR departments woke up to the utility of machine learning so late in the game. Here is an opportunity to try predictive analytics in identifying the employees most likely to get promoted.

## Problem Statement

Your client is a large MNC and they have 9 broad verticals across the organisation. One of the problem your client is facing is around identifying the right people for promotion *(only for manager position and below)* and prepare them in time. Currently the process, they are following is:

* They first identify a set of employees based on recommendations/ past performance
* Selected employees go through the separate training and evaluation program for each vertical. These programs are based on the required skill of each vertical
* At the end of the program, based on various factors such as training performance, KPI completion (only employees with KPIs completed greater than 60% are considered) etc., employee gets promotion

For above mentioned process, the final promotions are only announced after the evaluation and this leads to delay in transition to their new roles. Hence, company needs your help in identifying the eligible candidates at a particular checkpoint so that they can expedite the entire promotion cycle. 

<img src = 'https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/09/wns_hack_im_1.jpg'>

They have provided multiple attributes around Employee's past and current performance along with demographics. Now, The task is to predict whether a potential promotee at checkpoint in the test set will be promoted or not after the evaluation process.

## Evaluation Metric

The evaluation metric for this competition is F1 Score.

## Public and Private Split

Test data is further randomly divided into Public (40%) and Private (60%) data.

Your initial responses will be checked and scored on the Public data.
The final rankings would be based on your private score which will be published once the competition is over.

## Entorno

In [1]:
import sys
sys.version

'3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 17:50:39) \n[GCC Clang 10.0.0 ]'

In [2]:
!conda info --envs

# conda environments:
#
micromaster              /Users/manuel/.conda/envs/micromaster
                         /Users/manuel/.julia/conda/3
base                  *  /Users/manuel/opt/anaconda3
belcorp                  /Users/manuel/opt/anaconda3/envs/belcorp
courseragcp              /Users/manuel/opt/anaconda3/envs/courseragcp
iapucp                   /Users/manuel/opt/anaconda3/envs/iapucp
mitxpro                  /Users/manuel/opt/anaconda3/envs/mitxpro
style-transfer           /Users/manuel/opt/anaconda3/envs/style-transfer
taller-dmc               /Users/manuel/opt/anaconda3/envs/taller-dmc
udacity                  /Users/manuel/opt/anaconda3/envs/udacity



## Paquetes

In [3]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import os
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm, tqdm_notebook
from pathlib import Path
import random
import warnings
import pickle

warnings.filterwarnings('ignore')


seed = 2020
random.seed(seed)

pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 400)
sns.set()

DATA = Path('../../data') 
RAW  = DATA/'raw'
PROCESSED = DATA/'processed'
SUBMISSIONS = DATA/'submissions'    

MODEL = Path('../../model') 

In [4]:
pd.__version__

'1.1.3'

In [5]:
np.__version__

'1.19.1'

In [6]:
sklearn.__version__

'0.23.2'

In [7]:
id_columns = 'employee_id'
target = 'is_promoted'

## Lectura de datos

In [8]:
os.listdir(f'{PROCESSED}')

['preprocess_v2_smoteTomek20_train.csv',
 'preprocess_v2_knnimputation.pkl',
 'preprocess_v1_smote20_train.csv',
 'preprocess_v2_smote50_train.csv',
 'preprocess_v2_over50_train.csv',
 'preprocess_v2_ohe.pkl',
 'preprocess_v2_ohe_columns.pkl',
 '.DS_Store',
 'preprocess_v1_smoteTomek20_train.csv',
 'preprocess_v2_scalerimputation.pkl',
 'preprocess_v1_ohe.pkl',
 'preprocess_v1_smoteTomek50_train.csv',
 'preprocess_v2_scaler.pkl',
 'preprocess_v1_train.csv',
 'preprocess_v2_smote20_train.csv',
 'preprocess_v1_smote50_train.csv',
 'preprocess_v2_smoteTomek50_train.csv',
 'preprocess_v1_ohe_columns.pkl',
 'preprocess_v2_under50_train.csv',
 'preprocess_v2_train.csv',
 'preprocess_v1_scaler.pkl',
 'preprocess_v2_val.csv',
 '.ipynb_checkpoints',
 'preprocess_v1_impute_values.pkl',
 'preprocess_v1_over50_train.csv',
 'preprocess_v1_capping_values.pkl',
 'preprocess_v1_val.csv',
 'preprocess_v1_under50_train.csv',
 'preprocess_v2_capping_values.pkl']

## Entrenamiento V1 sin balanceo

In [9]:
import xgboost as xgb
from sklearn.metrics import precision_recall_curve, roc_auc_score, f1_score

In [10]:
preproc_train = [file for file in os.listdir(f'{PROCESSED}') if file.endswith('train.csv')]
preproc_train

['preprocess_v2_smoteTomek20_train.csv',
 'preprocess_v1_smote20_train.csv',
 'preprocess_v2_smote50_train.csv',
 'preprocess_v2_over50_train.csv',
 'preprocess_v1_smoteTomek20_train.csv',
 'preprocess_v1_smoteTomek50_train.csv',
 'preprocess_v1_train.csv',
 'preprocess_v2_smote20_train.csv',
 'preprocess_v1_smote50_train.csv',
 'preprocess_v2_smoteTomek50_train.csv',
 'preprocess_v2_under50_train.csv',
 'preprocess_v2_train.csv',
 'preprocess_v1_over50_train.csv',
 'preprocess_v1_under50_train.csv']

In [11]:
preproc_val = [file for file in os.listdir(f'{PROCESSED}') if file.endswith('val.csv')]
preproc_val

['preprocess_v2_val.csv', 'preprocess_v1_val.csv']

In [12]:
for train_file in sorted(preproc_train):
    df_train = pd.read_csv(f'{PROCESSED}/{train_file}', compression = 'zip')
    df_val = pd.read_csv(f'{PROCESSED}/{preproc_val[0]}', compression = 'zip')
    
    print(f'label: {train_file:35} \tnrows: {len(df_train)} \t%target train: {df_train[target].mean():.4f} \t%target val: {df_val[target].mean():.4f}')

label: preprocess_v1_over50_train.csv      	nrows: 80224 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v1_smote20_train.csv     	nrows: 48134 	%target train: 0.1667 	%target val: 0.0852
label: preprocess_v1_smote50_train.csv     	nrows: 80224 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v1_smoteTomek20_train.csv 	nrows: 46412 	%target train: 0.1543 	%target val: 0.0852
label: preprocess_v1_smoteTomek50_train.csv 	nrows: 79638 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v1_train.csv             	nrows: 43846 	%target train: 0.0852 	%target val: 0.0852
label: preprocess_v1_under50_train.csv     	nrows: 7468 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v2_over50_train.csv      	nrows: 80224 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v2_smote20_train.csv     	nrows: 48134 	%target train: 0.1667 	%target val: 0.0852
label: preprocess_v2_smote50_train.csv     	nrows: 80224 	%target train: 0.5000 	%target v

In [13]:
train_file = 'preprocess_v1_train.csv'
val_file = 'preprocess_v1_val.csv'

In [14]:
from sklearn.model_selection import ParameterGrid

In [15]:
cv_grid = {'objective': ['reg:logistic'], 
           'eval_metric': ['auc'],
           'seed': [seed],
           'max_depth': [3, 6, 9],
           'min_child_weight': [5, 10, 25, 50],
           'gamma': [0, 2.5, 5]}

params_grid = list(ParameterGrid(cv_grid))

In [16]:
df_results = pd.DataFrame(columns = ['preproc_label', 'model_label', 'método', 'parámetros', 'columnas_out',
                                     'auc_train', 'auc_val', 'threshold','f1_train', 'f1_val'])


for xgb_params in tqdm(params_grid):
    
    for train_file in sorted(preproc_train):

        preproc_label = train_file.split('_train')[0]

        print('----------------------------------------------------------------------')
        print(xgb_params)
        print(train_file)
        print('----------------------------------------------------------------------')

        df_train = pd.read_csv(f'{PROCESSED}/{train_file}', compression = 'zip')
        df_val = pd.read_csv(f'{PROCESSED}/{preproc_val[0]}', compression = 'zip')

        X_train, y_train = df_train.drop(target, axis = 1), df_train[target]
        X_val, y_val = df_val.drop(target, axis = 1), df_val[target]

        dtrain =  xgb.DMatrix(data=X_train, label = y_train)
        dval  = xgb.DMatrix(data=X_val,   label = y_val)

        watch_list  = [(dtrain,'train'),(dval,'val')]

        xgb_fit = xgb.train(params = xgb_params, dtrain = dtrain, 
                            num_boost_round = 1000, early_stopping_rounds = 100, 
                            evals = watch_list, verbose_eval=False)
        
        xgb_params_export = xgb_params.copy()
        xgb_params_export.update(xgb_fit.attributes())

        probs_train = xgb_fit.predict(dtrain, ntree_limit = xgb_fit.best_iteration)
        probs_val = xgb_fit.predict(dval, ntree_limit = xgb_fit.best_iteration)

        auc_train = roc_auc_score(y_train, probs_train)
        auc_val = roc_auc_score(y_val, probs_val)

        #best threshold
        prec, recall, threshold = precision_recall_curve(y_train, probs_train)
        prec_recall = pd.DataFrame({'prec': prec[:-1], 'recall': recall[:-1], 'threshold': threshold})
        prec_recall['f1'] = 2*prec_recall['prec']*prec_recall['recall'] / (prec_recall['prec'] + prec_recall['recall'])
        prec_recall = prec_recall.sort_values(by = 'f1', ascending = False).head(1)

        #f1 scores
        best_threshold = prec_recall['threshold'].values[0]
        f1_train = prec_recall['f1'].values[0]

        labels_val = np.where(probs_val >= best_threshold, 1, 0)
        f1_val = f1_score(y_val, labels_val)

        print(f'auc_train: {auc_train:.6f} \tauc_val: {auc_val:.6f} \tf1_train: {f1_train:.6f} \tf1_val: {f1_val:.6f}')

        results = [preproc_label, 'xgboost_baseline', 'fit', xgb_params, '',
                  auc_train, auc_val, best_threshold, f1_train, f1_val]


        df_results.loc[len(df_results)] = results

  0%|          | 0/36 [00:00<?, ?it/s]

----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.925740 	auc_val: 0.912398 	f1_train: 0.857260 	f1_val: 0.387935
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.963750 	auc_val: 0.902189 	f1_train: 0.809268 	f1_val: 0.303494
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote50_train.csv
---------------------------------------------

  3%|▎         | 1/36 [02:17<1:20:16, 137.61s/it]

auc_train: 0.940279 	auc_val: 0.910838 	f1_train: 0.867863 	f1_val: 0.389914
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.924304 	auc_val: 0.911213 	f1_train: 0.857913 	f1_val: 0.381606
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.965741 	auc_val: 0.905190 	f1_train: 0.813068 	f1_val: 0.329000
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 202

  6%|▌         | 2/36 [04:27<1:16:42, 135.38s/it]

auc_train: 0.930584 	auc_val: 0.908630 	f1_train: 0.858411 	f1_val: 0.381143
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.939701 	auc_val: 0.911976 	f1_train: 0.873549 	f1_val: 0.416644
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.961808 	auc_val: 0.904120 	f1_train: 0.798500 	f1_val: 0.330255
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 202

  8%|▊         | 3/36 [07:04<1:17:56, 141.70s/it]

auc_train: 0.935747 	auc_val: 0.908802 	f1_train: 0.863264 	f1_val: 0.375027
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.926849 	auc_val: 0.911873 	f1_train: 0.857002 	f1_val: 0.377475
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.969674 	auc_val: 0.902300 	f1_train: 0.824938 	f1_val: 0.342824
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 202

 11%|█         | 4/36 [10:42<1:27:53, 164.78s/it]

auc_train: 0.920961 	auc_val: 0.902479 	f1_train: 0.851376 	f1_val: 0.368421
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.943284 	auc_val: 0.911133 	f1_train: 0.875986 	f1_val: 0.409387
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.971265 	auc_val: 0.905146 	f1_train: 0.827900 	f1_val: 0.346206
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}


 14%|█▍        | 5/36 [12:33<1:16:41, 148.44s/it]

auc_train: 0.945743 	auc_val: 0.911164 	f1_train: 0.870445 	f1_val: 0.393272
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.945900 	auc_val: 0.911192 	f1_train: 0.879317 	f1_val: 0.413593
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.958415 	auc_val: 0.900162 	f1_train: 0.782801 	f1_val: 0.345929
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 202

 17%|█▋        | 6/36 [14:31<1:09:37, 139.26s/it]

auc_train: 0.954089 	auc_val: 0.907628 	f1_train: 0.882825 	f1_val: 0.386851
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.933648 	auc_val: 0.911449 	f1_train: 0.861616 	f1_val: 0.383096
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.980915 	auc_val: 0.898024 	f1_train: 0.854684 	f1_val: 0.359514
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 202

 19%|█▉        | 7/36 [16:57<1:08:23, 141.49s/it]

auc_train: 0.943449 	auc_val: 0.908770 	f1_train: 0.871427 	f1_val: 0.388980
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.937447 	auc_val: 0.910662 	f1_train: 0.867966 	f1_val: 0.395086
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.978644 	auc_val: 0.899807 	f1_train: 0.847117 	f1_val: 0.355870
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 202

 22%|██▏       | 8/36 [19:35<1:08:17, 146.36s/it]

auc_train: 0.932572 	auc_val: 0.901672 	f1_train: 0.858420 	f1_val: 0.370747
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.950668 	auc_val: 0.909227 	f1_train: 0.884852 	f1_val: 0.403357
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.990435 	auc_val: 0.890330 	f1_train: 0.897156 	f1_val: 0.351601
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}


 25%|██▌       | 9/36 [21:53<1:04:41, 143.75s/it]

auc_train: 0.948022 	auc_val: 0.908343 	f1_train: 0.870862 	f1_val: 0.387769
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.942642 	auc_val: 0.910050 	f1_train: 0.874297 	f1_val: 0.406859
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.971652 	auc_val: 0.886409 	f1_train: 0.825036 	f1_val: 0.348517
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 202

 28%|██▊       | 10/36 [23:52<59:04, 136.33s/it] 

auc_train: 0.934769 	auc_val: 0.908082 	f1_train: 0.859467 	f1_val: 0.375383
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.953262 	auc_val: 0.912526 	f1_train: 0.884963 	f1_val: 0.426127
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.989383 	auc_val: 0.896565 	f1_train: 0.891730 	f1_val: 0.356130
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 202

 31%|███       | 11/36 [26:20<58:20, 140.00s/it]

auc_train: 0.946205 	auc_val: 0.907410 	f1_train: 0.874703 	f1_val: 0.383467
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.948029 	auc_val: 0.912410 	f1_train: 0.878975 	f1_val: 0.420796
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.985916 	auc_val: 0.894252 	f1_train: 0.875915 	f1_val: 0.361451
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 0, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 202

 33%|███▎      | 12/36 [29:41<1:03:15, 158.14s/it]

auc_train: 0.933717 	auc_val: 0.901006 	f1_train: 0.862570 	f1_val: 0.368860
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.924907 	auc_val: 0.912159 	f1_train: 0.855979 	f1_val: 0.388067
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.961095 	auc_val: 0.900591 	f1_train: 0.801961 	f1_val: 0.333210
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 

 36%|███▌      | 13/36 [31:08<52:25, 136.77s/it]  

auc_train: 0.909875 	auc_val: 0.901553 	f1_train: 0.844214 	f1_val: 0.350532
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.929314 	auc_val: 0.910556 	f1_train: 0.861518 	f1_val: 0.393983
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.960777 	auc_val: 0.907762 	f1_train: 0.797869 	f1_val: 0.335196
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed

 39%|███▉      | 14/36 [32:35<44:40, 121.83s/it]

auc_train: 0.908027 	auc_val: 0.899835 	f1_train: 0.841957 	f1_val: 0.349423
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.924630 	auc_val: 0.912806 	f1_train: 0.855822 	f1_val: 0.383580
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.958426 	auc_val: 0.903851 	f1_train: 0.788577 	f1_val: 0.337509
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed

 42%|████▏     | 15/36 [34:01<38:56, 111.26s/it]

auc_train: 0.903066 	auc_val: 0.897700 	f1_train: 0.837617 	f1_val: 0.334386
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.927376 	auc_val: 0.911360 	f1_train: 0.857124 	f1_val: 0.380344
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.958253 	auc_val: 0.898067 	f1_train: 0.789210 	f1_val: 0.340232
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed

 44%|████▍     | 16/36 [35:37<35:34, 106.70s/it]

auc_train: 0.892192 	auc_val: 0.891731 	f1_train: 0.829172 	f1_val: 0.330554
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.945760 	auc_val: 0.911259 	f1_train: 0.879516 	f1_val: 0.399699
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.974018 	auc_val: 0.900525 	f1_train: 0.833695 	f1_val: 0.348588
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 

 47%|████▋     | 17/36 [37:19<33:19, 105.24s/it]

auc_train: 0.945610 	auc_val: 0.911305 	f1_train: 0.873014 	f1_val: 0.377002
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.931837 	auc_val: 0.910440 	f1_train: 0.863519 	f1_val: 0.384417
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.969668 	auc_val: 0.904362 	f1_train: 0.824351 	f1_val: 0.349941
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed

 50%|█████     | 18/36 [39:01<31:14, 104.12s/it]

auc_train: 0.933517 	auc_val: 0.906733 	f1_train: 0.858701 	f1_val: 0.377460
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.934392 	auc_val: 0.913633 	f1_train: 0.863379 	f1_val: 0.392649
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.965296 	auc_val: 0.893406 	f1_train: 0.809205 	f1_val: 0.359385
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed

 53%|█████▎    | 19/36 [40:40<29:06, 102.73s/it]

auc_train: 0.923381 	auc_val: 0.905472 	f1_train: 0.851358 	f1_val: 0.374282
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.937907 	auc_val: 0.911913 	f1_train: 0.867908 	f1_val: 0.401104
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.966649 	auc_val: 0.894645 	f1_train: 0.815445 	f1_val: 0.361129
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed

 56%|█████▌    | 20/36 [42:39<28:42, 107.64s/it]

auc_train: 0.909713 	auc_val: 0.897396 	f1_train: 0.842156 	f1_val: 0.351444
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.951977 	auc_val: 0.912966 	f1_train: 0.886152 	f1_val: 0.425020
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.980049 	auc_val: 0.890587 	f1_train: 0.851981 	f1_val: 0.341928
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 

 58%|█████▊    | 21/36 [44:36<27:34, 110.27s/it]

auc_train: 0.952413 	auc_val: 0.910498 	f1_train: 0.879212 	f1_val: 0.392756
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.955849 	auc_val: 0.909336 	f1_train: 0.891379 	f1_val: 0.411190
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.969953 	auc_val: 0.894114 	f1_train: 0.822198 	f1_val: 0.362486
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed

 61%|██████    | 22/36 [46:31<26:06, 111.90s/it]

auc_train: 0.929102 	auc_val: 0.906534 	f1_train: 0.853790 	f1_val: 0.371601
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.940286 	auc_val: 0.911935 	f1_train: 0.869263 	f1_val: 0.408269
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.971107 	auc_val: 0.894725 	f1_train: 0.823375 	f1_val: 0.350039
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed

 64%|██████▍   | 23/36 [48:33<24:52, 114.80s/it]

auc_train: 0.930164 	auc_val: 0.905762 	f1_train: 0.855135 	f1_val: 0.373670
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.941658 	auc_val: 0.912404 	f1_train: 0.870914 	f1_val: 0.405133
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.968332 	auc_val: 0.891090 	f1_train: 0.819398 	f1_val: 0.359369
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 2.5, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed

 67%|██████▋   | 24/36 [50:43<23:53, 119.48s/it]

auc_train: 0.908350 	auc_val: 0.897352 	f1_train: 0.841029 	f1_val: 0.352247
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.929197 	auc_val: 0.911995 	f1_train: 0.860944 	f1_val: 0.389744
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.957650 	auc_val: 0.894663 	f1_train: 0.786046 	f1_val: 0.325565
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}


 69%|██████▉   | 25/36 [51:57<19:23, 105.76s/it]

auc_train: 0.904072 	auc_val: 0.899766 	f1_train: 0.840354 	f1_val: 0.350926
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.927413 	auc_val: 0.912038 	f1_train: 0.860305 	f1_val: 0.383481
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.956670 	auc_val: 0.902937 	f1_train: 0.783078 	f1_val: 0.317505
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 202

 72%|███████▏  | 26/36 [53:13<16:08, 96.86s/it] 

auc_train: 0.899024 	auc_val: 0.894211 	f1_train: 0.836656 	f1_val: 0.342135
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.930271 	auc_val: 0.911999 	f1_train: 0.860136 	f1_val: 0.398439
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.954828 	auc_val: 0.903155 	f1_train: 0.777327 	f1_val: 0.340979
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 202

 75%|███████▌  | 27/36 [54:30<13:37, 90.85s/it]

auc_train: 0.899890 	auc_val: 0.894652 	f1_train: 0.835645 	f1_val: 0.333023
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.928097 	auc_val: 0.911094 	f1_train: 0.858268 	f1_val: 0.382063
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.953499 	auc_val: 0.890413 	f1_train: 0.773590 	f1_val: 0.333272
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 3, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 202

 78%|███████▊  | 28/36 [55:48<11:36, 87.06s/it]

auc_train: 0.885124 	auc_val: 0.883581 	f1_train: 0.827586 	f1_val: 0.322232
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.940442 	auc_val: 0.911731 	f1_train: 0.873782 	f1_val: 0.398811
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.945058 	auc_val: 0.892256 	f1_train: 0.749678 	f1_val: 0.348153
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}


 81%|████████  | 29/36 [57:17<10:14, 87.76s/it]

auc_train: 0.926428 	auc_val: 0.909658 	f1_train: 0.854454 	f1_val: 0.373665
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.934991 	auc_val: 0.912196 	f1_train: 0.865403 	f1_val: 0.392194
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.961237 	auc_val: 0.904132 	f1_train: 0.796927 	f1_val: 0.352166
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 202

 83%|████████▎ | 30/36 [58:45<08:46, 87.81s/it]

auc_train: 0.916945 	auc_val: 0.903502 	f1_train: 0.843746 	f1_val: 0.357231
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.934280 	auc_val: 0.912253 	f1_train: 0.863448 	f1_val: 0.400000
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.953261 	auc_val: 0.894359 	f1_train: 0.773387 	f1_val: 0.352302
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 202

 86%|████████▌ | 31/36 [1:00:12<07:17, 87.59s/it]

auc_train: 0.912385 	auc_val: 0.901814 	f1_train: 0.842939 	f1_val: 0.359867
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.940406 	auc_val: 0.912093 	f1_train: 0.869288 	f1_val: 0.403933
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.960413 	auc_val: 0.887951 	f1_train: 0.793084 	f1_val: 0.358609
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 6, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 202

 89%|████████▉ | 32/36 [1:01:49<06:00, 90.17s/it]

auc_train: 0.893577 	auc_val: 0.891354 	f1_train: 0.833488 	f1_val: 0.336293
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.958815 	auc_val: 0.910388 	f1_train: 0.895692 	f1_val: 0.430229
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.971711 	auc_val: 0.889719 	f1_train: 0.830281 	f1_val: 0.326814
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 5, 'objective': 'reg:logistic', 'seed': 2020}


 92%|█████████▏| 33/36 [1:03:44<04:53, 97.70s/it]

auc_train: 0.922846 	auc_val: 0.907806 	f1_train: 0.848900 	f1_val: 0.362725
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.942597 	auc_val: 0.910620 	f1_train: 0.873800 	f1_val: 0.402681
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.967670 	auc_val: 0.893509 	f1_train: 0.819087 	f1_val: 0.369059
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 10, 'objective': 'reg:logistic', 'seed': 202

 94%|█████████▍| 34/36 [1:05:38<03:24, 102.49s/it]

auc_train: 0.921712 	auc_val: 0.905125 	f1_train: 0.848180 	f1_val: 0.373444
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.940970 	auc_val: 0.911430 	f1_train: 0.867336 	f1_val: 0.404523
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.965295 	auc_val: 0.900184 	f1_train: 0.807353 	f1_val: 0.354293
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 25, 'objective': 'reg:logistic', 'seed': 202

 97%|█████████▋| 35/36 [1:07:29<01:45, 105.31s/it]

auc_train: 0.915384 	auc_val: 0.903283 	f1_train: 0.844309 	f1_val: 0.365591
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.939347 	auc_val: 0.911762 	f1_train: 0.869260 	f1_val: 0.416577
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 2020}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.961805 	auc_val: 0.885820 	f1_train: 0.800281 	f1_val: 0.360307
----------------------------------------------------------------------
{'eval_metric': 'auc', 'gamma': 5, 'max_depth': 9, 'min_child_weight': 50, 'objective': 'reg:logistic', 'seed': 202

100%|██████████| 36/36 [1:09:28<00:00, 115.78s/it]

auc_train: 0.899683 	auc_val: 0.893918 	f1_train: 0.835035 	f1_val: 0.337800





In [19]:
df_results

Unnamed: 0,preproc_label,model_label,método,parámetros,columnas_out,auc_train,auc_val,threshold,f1_train,f1_val
0,preprocess_v1_over50,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 0, 'max_depth'...",,0.925740,0.912398,0.519916,0.857260,0.387935
1,preprocess_v1_smote20,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 0, 'max_depth'...",,0.963750,0.902189,0.363386,0.809268,0.303494
2,preprocess_v1_smote50,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 0, 'max_depth'...",,0.994993,0.903298,0.397081,0.970218,0.316904
3,preprocess_v1_smoteTomek20,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 0, 'max_depth'...",,0.984355,0.896496,0.335935,0.873732,0.352239
4,preprocess_v1_smoteTomek50,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 0, 'max_depth'...",,0.993239,0.902251,0.425842,0.967290,0.307300
...,...,...,...,...,...,...,...,...,...,...
499,preprocess_v2_smote50,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 5, 'max_depth'...",,0.993078,0.909270,0.449836,0.966119,0.519499
500,preprocess_v2_smoteTomek20,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 5, 'max_depth'...",,0.972369,0.909576,0.357857,0.834015,0.543329
501,preprocess_v2_smoteTomek50,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 5, 'max_depth'...",,0.993535,0.909469,0.429405,0.966711,0.522215
502,preprocess_v2,xgboost_baseline,fit,"{'eval_metric': 'auc', 'gamma': 5, 'max_depth'...",,0.917067,0.909057,0.256593,0.542299,0.531928


In [21]:
MODELS = DATA/'models'

In [22]:
df_results.to_csv(f'{MODELS}/xgboost_baseline.csv', index = False)

In [20]:
df_results.to_csv('resultados_xgboost_total.csv')