# HR Analytics

<img src = 'https://datahack-prod.s3.ap-south-1.amazonaws.com/__sized__/contest_cover/hr_1920x480_s5WuoZs-thumbnail-1200x1200-90.jpg'>

Practice Problem: https://datahack.analyticsvidhya.com/contest/wns-analytics-hackathon-2018-1/

## HR Analytics

HR analytics is revolutionising the way human resources departments operate, leading to higher efficiency and better results overall. Human resources has been using analytics for years. However, the collection, processing and analysis of data has been largely manual, and given the nature of human resources dynamics and HR KPIs, the approach has been constraining HR. Therefore, it is surprising that HR departments woke up to the utility of machine learning so late in the game. Here is an opportunity to try predictive analytics in identifying the employees most likely to get promoted.

## Problem Statement

Your client is a large MNC and they have 9 broad verticals across the organisation. One of the problem your client is facing is around identifying the right people for promotion *(only for manager position and below)* and prepare them in time. Currently the process, they are following is:

* They first identify a set of employees based on recommendations/ past performance
* Selected employees go through the separate training and evaluation program for each vertical. These programs are based on the required skill of each vertical
* At the end of the program, based on various factors such as training performance, KPI completion (only employees with KPIs completed greater than 60% are considered) etc., employee gets promotion

For above mentioned process, the final promotions are only announced after the evaluation and this leads to delay in transition to their new roles. Hence, company needs your help in identifying the eligible candidates at a particular checkpoint so that they can expedite the entire promotion cycle. 

<img src = 'https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/09/wns_hack_im_1.jpg'>

They have provided multiple attributes around Employee's past and current performance along with demographics. Now, The task is to predict whether a potential promotee at checkpoint in the test set will be promoted or not after the evaluation process.

## Evaluation Metric

The evaluation metric for this competition is F1 Score.

## Public and Private Split

Test data is further randomly divided into Public (40%) and Private (60%) data.

Your initial responses will be checked and scored on the Public data.
The final rankings would be based on your private score which will be published once the competition is over.

## Entorno

In [1]:
import sys
sys.version

'3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]'

In [2]:
!conda info --envs

# conda environments:
#
micromaster              /Users/manuel/.conda/envs/micromaster
                         /Users/manuel/.julia/conda/3
base                  *  /Users/manuel/opt/anaconda3
belcorp                  /Users/manuel/opt/anaconda3/envs/belcorp
courseragcp              /Users/manuel/opt/anaconda3/envs/courseragcp
iapucp                   /Users/manuel/opt/anaconda3/envs/iapucp
mitxpro                  /Users/manuel/opt/anaconda3/envs/mitxpro
style-transfer           /Users/manuel/opt/anaconda3/envs/style-transfer
taller-dmc               /Users/manuel/opt/anaconda3/envs/taller-dmc
udacity                  /Users/manuel/opt/anaconda3/envs/udacity



## Paquetes

In [33]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import os
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm, tqdm_notebook
from pathlib import Path
import random
import warnings
import pickle

warnings.filterwarnings('ignore')


seed = 2020
random.seed(seed)

pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 400)
sns.set()

DATA = Path('../../data') 
RAW  = DATA/'raw'
PROCESSED = DATA/'processed'
SUBMISSIONS = DATA/'submissions'    

MODEL = Path('../../model') 

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [34]:
pd.__version__

'1.1.3'

In [35]:
np.__version__

'1.19.2'

In [36]:
sklearn.__version__

'0.23.2'

In [37]:
id_columns = 'employee_id'
target = 'is_promoted'

## Lectura de datos

In [38]:
os.listdir(f'{PROCESSED}')

['.DS_Store',
 'preprocess_v1_capping_values.pkl',
 'preprocess_v1_impute_values.pkl',
 'preprocess_v1_ohe.pkl',
 'preprocess_v1_ohe_columns.pkl',
 'preprocess_v1_over50_train.csv',
 'preprocess_v1_scaler.pkl',
 'preprocess_v1_smote20_train.csv',
 'preprocess_v1_smote50_train.csv',
 'preprocess_v1_smoteTomek20_train.csv',
 'preprocess_v1_smoteTomek50_train.csv',
 'preprocess_v1_train.csv',
 'preprocess_v1_under50_train.csv',
 'preprocess_v1_val.csv',
 'preprocess_v2_capping_values.pkl',
 'preprocess_v2_knnimputation.pkl',
 'preprocess_v2_ohe.pkl',
 'preprocess_v2_ohe_columns.pkl',
 'preprocess_v2_over50_train.csv',
 'preprocess_v2_scaler.pkl',
 'preprocess_v2_scalerimputation.pkl',
 'preprocess_v2_smote20_train.csv',
 'preprocess_v2_smote50_train.csv',
 'preprocess_v2_smoteTomek20_train.csv',
 'preprocess_v2_smoteTomek50_train.csv',
 'preprocess_v2_test.csv',
 'preprocess_v2_train.csv',
 'preprocess_v2_under50_train.csv',
 'preprocess_v2_val.csv']

## Entrenamiento V1 sin balanceo

In [39]:
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import precision_recall_curve, roc_auc_score, f1_score

In [40]:
preproc_train = [file for file in os.listdir(f'{PROCESSED}') if file.endswith('train.csv')]
preproc_train

['preprocess_v1_over50_train.csv',
 'preprocess_v1_smote20_train.csv',
 'preprocess_v1_smote50_train.csv',
 'preprocess_v1_smoteTomek20_train.csv',
 'preprocess_v1_smoteTomek50_train.csv',
 'preprocess_v1_train.csv',
 'preprocess_v1_under50_train.csv',
 'preprocess_v2_over50_train.csv',
 'preprocess_v2_smote20_train.csv',
 'preprocess_v2_smote50_train.csv',
 'preprocess_v2_smoteTomek20_train.csv',
 'preprocess_v2_smoteTomek50_train.csv',
 'preprocess_v2_train.csv',
 'preprocess_v2_under50_train.csv']

In [41]:
preproc_val = [file for file in os.listdir(f'{PROCESSED}') if file.endswith('val.csv')]
preproc_val

['preprocess_v1_val.csv', 'preprocess_v2_val.csv']

In [42]:
for train_file in sorted(preproc_train):
    df_train = pd.read_csv(f'{PROCESSED}/{train_file}', compression = 'zip')
    df_val = pd.read_csv(f'{PROCESSED}/{preproc_val[0]}', compression = 'zip')
    
    print(f'label: {train_file:35} \tnrows: {len(df_train)} \t%target train: {df_train[target].mean():.4f} \t%target val: {df_val[target].mean():.4f}')

label: preprocess_v1_over50_train.csv      	nrows: 80224 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v1_smote20_train.csv     	nrows: 48134 	%target train: 0.1667 	%target val: 0.0852
label: preprocess_v1_smote50_train.csv     	nrows: 80224 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v1_smoteTomek20_train.csv 	nrows: 46412 	%target train: 0.1543 	%target val: 0.0852
label: preprocess_v1_smoteTomek50_train.csv 	nrows: 79638 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v1_train.csv             	nrows: 43846 	%target train: 0.0852 	%target val: 0.0852
label: preprocess_v1_under50_train.csv     	nrows: 7468 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v2_over50_train.csv      	nrows: 80224 	%target train: 0.5000 	%target val: 0.0852
label: preprocess_v2_smote20_train.csv     	nrows: 48134 	%target train: 0.1667 	%target val: 0.0852
label: preprocess_v2_smote50_train.csv     	nrows: 80224 	%target train: 0.5000 	%target v

In [44]:
train_file = 'preprocess_v1_train.csv'
val_file = 'preprocess_v1_val.csv'


In [45]:
from sklearn.model_selection import ParameterGrid

In [47]:
cv_grid = {'penalty': ['l1','l2'],'solver': ['liblinear','saga'],
              'C': [0.001,0.01,0.1,1,10,100,1000],
            'random_state': [seed]}

params_grid = list(ParameterGrid(cv_grid))

In [48]:
df_results = pd.DataFrame(columns = ['preproc_label', 'model_label', 'método', 'parámetros', 'columnas_out',
                                     'auc_train', 'auc_val', 'threshold','f1_train', 'f1_val'])


for xgb_params in tqdm(params_grid):
    
    for train_file in sorted(preproc_train):

        preproc_label = train_file.split('_train')[0]

        print('----------------------------------------------------------------------')
        print(xgb_params)
        print(train_file)
        print('----------------------------------------------------------------------')

        df_train = pd.read_csv(f'{PROCESSED}/{train_file}', compression = 'zip')
        df_val = pd.read_csv(f'{PROCESSED}/{preproc_val[0]}', compression = 'zip')

        X_train, y_train = df_train.drop(target, axis = 1), df_train[target]
        X_val, y_val = df_val.drop(target, axis = 1), df_val[target]

        logi = LogisticRegression(solver = xgb_params["solver"],penalty =xgb_params["penalty"], C =xgb_params["C"], random_state= xgb_params["random_state"])
        logi_fit = logi.fit( X_train,y_train )
                        
        
        #xgb_params_export = xgb_params.copy()
        #xgb_params_export.update(logi_fit.attributes())

        probs_train = logi_fit.predict(X_train)
        probs_val = logi_fit.predict(X_val)

        auc_train = roc_auc_score(y_train, probs_train)
        auc_val = roc_auc_score(y_val, probs_val)

        #best threshold
        prec, recall, threshold = precision_recall_curve(y_train, probs_train)
        prec_recall = pd.DataFrame({'prec': prec[:-1], 'recall': recall[:-1], 'threshold': threshold})
        prec_recall['f1'] = 2*prec_recall['prec']*prec_recall['recall'] / (prec_recall['prec'] + prec_recall['recall'])
        prec_recall = prec_recall.sort_values(by = 'f1', ascending = False).head(1)

        #f1 scores
        best_threshold = prec_recall['threshold'].values[0]
        f1_train = prec_recall['f1'].values[0]

        labels_val = np.where(probs_val >= best_threshold, 1, 0)
        f1_val = f1_score(y_val, labels_val)

        print(f'auc_train: {auc_train:.6f} \tauc_val: {auc_val:.6f} \tf1_train: {f1_train:.6f} \tf1_val: {f1_val:.6f}')

        results = [preproc_label, 'LogisticRegression', 'fit', xgb_params, '',
                  auc_train, auc_val, best_threshold, f1_train, f1_val]


        df_results.loc[len(df_results)] = results

  0%|          | 0/28 [00:00<?, ?it/s]

----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.772961 	auc_val: 0.779456 	f1_train: 0.784061 	f1_val: 0.346537
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.543655 	auc_val: 0.551785 	f1_train: 0.285704 	f1_val: 0.157028
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.788791 	auc_val: 0.781288 	f1_train: 0.802107 	f1_val: 0.348910
-----------------------

  4%|▎         | 1/28 [00:18<08:23, 18.66s/it]

auc_train: 0.713846 	auc_val: 0.712636 	f1_train: 0.723866 	f1_val: 0.286592
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.766117 	auc_val: 0.777858 	f1_train: 0.773844 	f1_val: 0.351086
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.540414 	auc_val: 0.548337 	f1_train: 0.285704 	f1_val: 0.157028
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.782235 	auc_val: 0.772940

  7%|▋         | 2/28 [00:40<08:33, 19.74s/it]

auc_train: 0.713444 	auc_val: 0.711941 	f1_train: 0.722726 	f1_val: 0.286892
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.782297 	auc_val: 0.785752 	f1_train: 0.792264 	f1_val: 0.356996
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.591449 	auc_val: 0.606785 	f1_train: 0.310626 	f1_val: 0.334129
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.796470 	au

 11%|█         | 3/28 [00:58<07:59, 19.19s/it]

auc_train: 0.759373 	auc_val: 0.759604 	f1_train: 0.775066 	f1_val: 0.320993
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.777747 	auc_val: 0.780852 	f1_train: 0.785382 	f1_val: 0.358013
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.584007 	auc_val: 0.600774 	f1_train: 0.289973 	f1_val: 0.321633
----------------------------------------------------------------------
{'C': 0.001, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.792182 	auc_val: 0.782511

 14%|█▍        | 4/28 [01:18<07:45, 19.40s/it]

auc_train: 0.744376 	auc_val: 0.742274 	f1_train: 0.746985 	f1_val: 0.323779
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791534 	auc_val: 0.797508 	f1_train: 0.798690 	f1_val: 0.379075
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.644304 	auc_val: 0.655805 	f1_train: 0.436617 	f1_val: 0.424201
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.807489 	auc_v

 18%|█▊        | 5/28 [02:11<11:17, 29.46s/it]

auc_train: 0.769282 	auc_val: 0.777467 	f1_train: 0.780649 	f1_val: 0.347144
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791434 	auc_val: 0.797235 	f1_train: 0.798380 	f1_val: 0.379765
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.644030 	auc_val: 0.655120 	f1_train: 0.436050 	f1_val: 0.422267
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806941 	auc_val: 0.798474 	f

 21%|██▏       | 6/28 [02:57<12:34, 34.29s/it]

auc_train: 0.760846 	auc_val: 0.770775 	f1_train: 0.768172 	f1_val: 0.346925
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.792032 	auc_val: 0.797122 	f1_train: 0.799259 	f1_val: 0.378958
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.639529 	auc_val: 0.648024 	f1_train: 0.426371 	f1_val: 0.409373
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.807128 	auc_v

 25%|██▌       | 7/28 [03:18<10:35, 30.24s/it]

auc_train: 0.783878 	auc_val: 0.783414 	f1_train: 0.793183 	f1_val: 0.357176
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791845 	auc_val: 0.796936 	f1_train: 0.798858 	f1_val: 0.379209
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.637485 	auc_val: 0.647003 	f1_train: 0.421871 	f1_val: 0.407459
----------------------------------------------------------------------
{'C': 0.01, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806417 	auc_val: 0.799482 	f

 29%|██▊       | 8/28 [03:41<09:24, 28.21s/it]

auc_train: 0.778388 	auc_val: 0.780120 	f1_train: 0.785594 	f1_val: 0.359375
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791571 	auc_val: 0.794748 	f1_train: 0.797887 	f1_val: 0.379941
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.670283 	auc_val: 0.682304 	f1_train: 0.485654 	f1_val: 0.454657
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806679 	auc_val:

 32%|███▏      | 9/28 [06:13<20:39, 65.24s/it]

auc_train: 0.791377 	auc_val: 0.798445 	f1_train: 0.798864 	f1_val: 0.382643
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791521 	auc_val: 0.794797 	f1_train: 0.797805 	f1_val: 0.380035
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.670470 	auc_val: 0.682304 	f1_train: 0.486022 	f1_val: 0.454657
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806691 	auc_val: 0.796199 	f1_t

 36%|███▌      | 10/28 [07:33<20:56, 69.79s/it]

auc_train: 0.791243 	auc_val: 0.797873 	f1_train: 0.798396 	f1_val: 0.382789
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791683 	auc_val: 0.795183 	f1_train: 0.798042 	f1_val: 0.380153
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.668575 	auc_val: 0.678084 	f1_train: 0.482837 	f1_val: 0.449410
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806803 	auc_val:

 39%|███▉      | 11/28 [07:54<15:35, 55.03s/it]

auc_train: 0.792849 	auc_val: 0.797062 	f1_train: 0.800413 	f1_val: 0.380648
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791646 	auc_val: 0.795383 	f1_train: 0.797984 	f1_val: 0.380529
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.668500 	auc_val: 0.678034 	f1_train: 0.482674 	f1_val: 0.449132
----------------------------------------------------------------------
{'C': 0.1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806666 	auc_val: 0.797120 	f1_t

 43%|████▎     | 12/28 [08:25<12:46, 47.92s/it]

auc_train: 0.791778 	auc_val: 0.797075 	f1_train: 0.799070 	f1_val: 0.381281
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791384 	auc_val: 0.794797 	f1_train: 0.797547 	f1_val: 0.380035
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674259 	auc_val: 0.684382 	f1_train: 0.492658 	f1_val: 0.456034
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_val: 0.796

 46%|████▋     | 13/28 [13:22<30:38, 122.54s/it]

auc_train: 0.789636 	auc_val: 0.799419 	f1_train: 0.795788 	f1_val: 0.385746
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791397 	auc_val: 0.794847 	f1_train: 0.797556 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674222 	auc_val: 0.683847 	f1_train: 0.492616 	f1_val: 0.455097
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806492 	auc_val: 0.796448 	f1_train: 

 50%|█████     | 14/28 [14:00<22:43, 97.39s/it] 

auc_train: 0.789904 	auc_val: 0.799419 	f1_train: 0.796048 	f1_val: 0.385746
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791384 	auc_val: 0.794797 	f1_train: 0.797547 	f1_val: 0.380035
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.673885 	auc_val: 0.683847 	f1_train: 0.492009 	f1_val: 0.455097
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806554 	auc_val: 0.796

 54%|█████▎    | 15/28 [14:22<16:09, 74.56s/it]

auc_train: 0.789904 	auc_val: 0.799469 	f1_train: 0.796101 	f1_val: 0.385842
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791397 	auc_val: 0.794797 	f1_train: 0.797556 	f1_val: 0.380035
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.673885 	auc_val: 0.683847 	f1_train: 0.492009 	f1_val: 0.455097
----------------------------------------------------------------------
{'C': 1, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806554 	auc_val: 0.796448 	f1_train: 

 57%|█████▋    | 16/28 [14:54<12:22, 61.91s/it]

auc_train: 0.789770 	auc_val: 0.799469 	f1_train: 0.795945 	f1_val: 0.385842
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791284 	auc_val: 0.794847 	f1_train: 0.797435 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674533 	auc_val: 0.684917 	f1_train: 0.493144 	f1_val: 0.456970
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806504 	auc_val: 0.

 61%|██████    | 17/28 [16:05<11:50, 64.62s/it]

auc_train: 0.790037 	auc_val: 0.799382 	f1_train: 0.796046 	f1_val: 0.386307
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791334 	auc_val: 0.794847 	f1_train: 0.797489 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674484 	auc_val: 0.684917 	f1_train: 0.493063 	f1_val: 0.456970
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806529 	auc_val: 0.796498 	f1_trai

 64%|██████▍   | 18/28 [16:44<09:28, 56.89s/it]

auc_train: 0.790037 	auc_val: 0.799332 	f1_train: 0.796099 	f1_val: 0.386210
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791347 	auc_val: 0.794847 	f1_train: 0.797498 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674484 	auc_val: 0.684917 	f1_train: 0.493063 	f1_val: 0.456970
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806529 	auc_val: 0.

 68%|██████▊   | 19/28 [17:05<06:56, 46.27s/it]

auc_train: 0.789904 	auc_val: 0.799382 	f1_train: 0.795889 	f1_val: 0.386307
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791309 	auc_val: 0.794847 	f1_train: 0.797455 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674484 	auc_val: 0.684917 	f1_train: 0.493063 	f1_val: 0.456970
----------------------------------------------------------------------
{'C': 10, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_val: 0.796498 	f1_trai

 71%|███████▏  | 20/28 [17:38<05:37, 42.14s/it]

auc_train: 0.790037 	auc_val: 0.799332 	f1_train: 0.796046 	f1_val: 0.386210
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791284 	auc_val: 0.794847 	f1_train: 0.797435 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674508 	auc_val: 0.685453 	f1_train: 0.493065 	f1_val: 0.457904
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_val:

 75%|███████▌  | 21/28 [18:26<05:07, 43.93s/it]

auc_train: 0.789904 	auc_val: 0.798697 	f1_train: 0.795889 	f1_val: 0.385614
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791334 	auc_val: 0.794847 	f1_train: 0.797489 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674508 	auc_val: 0.684917 	f1_train: 0.493065 	f1_val: 0.456970
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_val: 0.796498 	f1_t

 79%|███████▊  | 22/28 [19:09<04:21, 43.58s/it]

auc_train: 0.790037 	auc_val: 0.799382 	f1_train: 0.796046 	f1_val: 0.386307
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791222 	auc_val: 0.794847 	f1_train: 0.797353 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674508 	auc_val: 0.685453 	f1_train: 0.493065 	f1_val: 0.457904
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_val:

 82%|████████▏ | 23/28 [19:32<03:07, 37.47s/it]

auc_train: 0.789770 	auc_val: 0.799332 	f1_train: 0.795733 	f1_val: 0.386210
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791309 	auc_val: 0.794847 	f1_train: 0.797455 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674446 	auc_val: 0.684917 	f1_train: 0.492944 	f1_val: 0.456970
----------------------------------------------------------------------
{'C': 100, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806529 	auc_val: 0.796498 	f1_t

 86%|████████▌ | 24/28 [20:07<02:26, 36.75s/it]

auc_train: 0.790037 	auc_val: 0.799382 	f1_train: 0.796046 	f1_val: 0.386307
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791297 	auc_val: 0.794847 	f1_train: 0.797445 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674508 	auc_val: 0.685453 	f1_train: 0.493065 	f1_val: 0.457904
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l1', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_v

 89%|████████▉ | 25/28 [20:56<02:01, 40.53s/it]

auc_train: 0.789904 	auc_val: 0.798697 	f1_train: 0.795889 	f1_val: 0.385614
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791309 	auc_val: 0.794847 	f1_train: 0.797455 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674508 	auc_val: 0.685453 	f1_train: 0.493065 	f1_val: 0.457904
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l1', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_val: 0.796498 	f

 93%|█████████▎| 26/28 [21:39<01:22, 41.22s/it]

auc_train: 0.790037 	auc_val: 0.799382 	f1_train: 0.796046 	f1_val: 0.386307
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791235 	auc_val: 0.794847 	f1_train: 0.797362 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674508 	auc_val: 0.685453 	f1_train: 0.493065 	f1_val: 0.457904
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l2', 'random_state': 2020, 'solver': 'liblinear'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_v

 96%|█████████▋| 27/28 [22:01<00:35, 35.51s/it]

auc_train: 0.789770 	auc_val: 0.799332 	f1_train: 0.795733 	f1_val: 0.386210
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_over50_train.csv
----------------------------------------------------------------------
auc_train: 0.791309 	auc_val: 0.794847 	f1_train: 0.797455 	f1_val: 0.380129
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote20_train.csv
----------------------------------------------------------------------
auc_train: 0.674508 	auc_val: 0.685453 	f1_train: 0.493065 	f1_val: 0.457904
----------------------------------------------------------------------
{'C': 1000, 'penalty': 'l2', 'random_state': 2020, 'solver': 'saga'}
preprocess_v1_smote50_train.csv
----------------------------------------------------------------------
auc_train: 0.806517 	auc_val: 0.796498 	f

100%|██████████| 28/28 [22:35<00:00, 48.42s/it]

auc_train: 0.790037 	auc_val: 0.799382 	f1_train: 0.796046 	f1_val: 0.386307





In [49]:
df_results

Unnamed: 0,preproc_label,model_label,método,parámetros,columnas_out,auc_train,auc_val,threshold,f1_train,f1_val
0,preprocess_v1_over50,xgboost_baseline,fit,"{'C': 0.001, 'penalty': 'l1', 'random_state': ...",,0.772961,0.779456,1,0.784061,0.346537
1,preprocess_v1_smote20,xgboost_baseline,fit,"{'C': 0.001, 'penalty': 'l1', 'random_state': ...",,0.543655,0.551785,0,0.285704,0.157028
2,preprocess_v1_smote50,xgboost_baseline,fit,"{'C': 0.001, 'penalty': 'l1', 'random_state': ...",,0.788791,0.781288,1,0.802107,0.34891
3,preprocess_v1_smoteTomek20,xgboost_baseline,fit,"{'C': 0.001, 'penalty': 'l1', 'random_state': ...",,0.540606,0.548873,0,0.267336,0.157028
4,preprocess_v1_smoteTomek50,xgboost_baseline,fit,"{'C': 0.001, 'penalty': 'l1', 'random_state': ...",,0.792838,0.782408,1,0.805856,0.349722


In [51]:
MODELS = DATA/'models'

In [52]:
df_results.to_csv(f'{MODELS}/logistic_base.csv', index = False)

In [53]:
df_results.to_csv('logistic_base.csv')

In [58]:
df_results.to_excel(f'{MODELS}/logistic_base.xlsx', index = False)