# Hyperparameter Optimization All models

## Rationale:
In order to make a fair comparisson in performance, we are to tune the hyperparamters for all the models so that we can optimize the performance of each model and then compared then to see if we may be affecting because a bad initial setup.

## Methodology:
We conducted hyperparameter optimization using the Optuna library, which provides a powerful and efficient platform for automated hyperparameter tuning.

We are to Use the ```LightGBM Integration``` which optimizes the hyperparamters in a sequential way and focuses on the parameters that have the highest impact on model performance for Boosting models:



* **lambda_l1**
* **lambda_l2**
* **num_leaves**
* **feature_fraction**
* **bagging_fraction**
* **bagging_freq**
* **min_child_samples**

For the **num_estimators** and **learning_rate** we are to use the ones found on the best model optimization.

## Conclusions:
The results of our hyperparameter optimization study yielded valuable insights into improving the LightGBM model's performance. For all the models we achived an increased in performance as compared with the initial model comparisson. (see ```feature-selection``` notebook).


| Model               | out_of_fold | validation |
|---------------------|------------|------------|
| fw [32]             | 0.863464   | 0.867463   |
| boruta [46]         | 0.864788   | 0.867128   |
| all features [115] | 0.862991   | 0.867067   |
| ensemble [55]       | 0.862760   | 0.865526   |
| base_features [10]  | 0.864081   | 0.865510   |





In [16]:
%load_ext autoreload
%autoreload 2

import cloudpickle as cp

import numpy as np
import pandas as pd

import lightgbm  as lgbm
from sklearn.inspection import permutation_importance
from sklearn.model_selection import train_test_split
from copy import deepcopy

from sklearn.model_selection import KFold, cross_val_score
from sklearn.metrics import roc_auc_score

import warnings;warnings.filterwarnings("ignore")

from optuna import visualization as optunaviz

import optuna
import optuna.integration.lightgbm as lgb

from lightgbm import early_stopping
from lightgbm import log_evaluation
from optuna.logging import set_verbosity

set_verbosity(optuna.logging.ERROR)

import sys
sys.path.append("../")

# local imports
from src.learner_params import target_column, boruta_learner_params, test_params
from utils.functions__utils import find_constraint

from utils.feature_selection_lists import fw_features, boruta_features, ensemble_features
from utils.features_lists import all_features_list, base_features

from src.learner_params import MODEL_PARAMS

from utils.functions__training import model_pipeline

from src.learner_params import params_all, params_ensemble, params_fw, params_original

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
train_df = pd.read_pickle("../data/train_df.pkl")
validation_df = pd.read_pickle("../data/validation_df.pkl")

train, test = train_test_split(train_df, random_state=42, test_size=.2)

### All features

In [5]:
%%time

dtrain = lgbm.Dataset(train[all_features_list], label=train[target_column])
dval = lgb.Dataset(test[all_features_list], label=test[target_column])

params = {
    "objective": "binary",
    "metric": "auc",
    "verbosity": -1,
    "boosting_type": "gbdt",
    "n_estimators":5926,
    "learning_rate": 0.005603627873630697
}

bst = lgb.train(
    params,
    dtrain,
    valid_sets=[dtrain, dval],
    callbacks=[early_stopping(150)],
    show_progress_bar = True
)

preds = bst.predict(test[all_features_list],num_iteration=bst.best_iteration)
score = roc_auc_score(test[target_column], preds)

all_features_best_params = bst.params
print("Best params:", all_features_best_params)



feature_fraction, val_score: -inf:   0%|                  | 0/7 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871090:  14%|8     | 1/7 [00:12<01:15, 12.56s/it]

Early stopping, best iteration is:
[832]	valid_0's auc: 0.891391	valid_1's auc: 0.87109
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871358:  29%|#7    | 2/7 [00:24<01:01, 12.25s/it]

Early stopping, best iteration is:
[891]	valid_0's auc: 0.891727	valid_1's auc: 0.871358
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871358:  43%|##5   | 3/7 [00:38<00:51, 12.99s/it]

Early stopping, best iteration is:
[948]	valid_0's auc: 0.894863	valid_1's auc: 0.87108
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871358:  57%|###4  | 4/7 [00:50<00:38, 12.69s/it]

Early stopping, best iteration is:
[848]	valid_0's auc: 0.891579	valid_1's auc: 0.87112
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871358:  71%|####2 | 5/7 [01:02<00:24, 12.25s/it]

Early stopping, best iteration is:
[728]	valid_0's auc: 0.889162	valid_1's auc: 0.870549
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871358:  86%|#####1| 6/7 [01:15<00:12, 12.69s/it]

Early stopping, best iteration is:
[900]	valid_0's auc: 0.893928	valid_1's auc: 0.871048
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871358: 100%|######| 7/7 [01:28<00:00, 12.68s/it]


Early stopping, best iteration is:
[821]	valid_0's auc: 0.892251	valid_1's auc: 0.871029


num_leaves, val_score: 0.871358:   0%|                   | 0/20 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:   5%|5          | 1/20 [00:14<04:28, 14.13s/it]

Early stopping, best iteration is:
[517]	valid_0's auc: 0.914698	valid_1's auc: 0.870372
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  10%|#1         | 2/20 [00:27<04:06, 13.70s/it]

Early stopping, best iteration is:
[266]	valid_0's auc: 0.914898	valid_1's auc: 0.869346
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  15%|#6         | 3/20 [00:42<04:02, 14.27s/it]

Early stopping, best iteration is:
[448]	valid_0's auc: 0.920448	valid_1's auc: 0.870322
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  20%|##2        | 4/20 [01:22<06:31, 24.45s/it]

Did not meet early stopping. Best iteration is:
[5926]	valid_0's auc: 0.86991	valid_1's auc: 0.870993
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  25%|##7        | 5/20 [01:37<05:15, 21.02s/it]

Early stopping, best iteration is:
[265]	valid_0's auc: 0.9214	valid_1's auc: 0.869202
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  30%|###3       | 6/20 [01:52<04:25, 18.97s/it]

Early stopping, best iteration is:
[1434]	valid_0's auc: 0.882211	valid_1's auc: 0.87105
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  35%|###8       | 7/20 [02:06<03:45, 17.34s/it]

Early stopping, best iteration is:
[547]	valid_0's auc: 0.915759	valid_1's auc: 0.87046
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  40%|####4      | 8/20 [02:23<03:27, 17.26s/it]

Early stopping, best iteration is:
[470]	valid_0's auc: 0.930159	valid_1's auc: 0.869869
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  45%|####9      | 9/20 [02:37<02:57, 16.12s/it]

Early stopping, best iteration is:
[664]	valid_0's auc: 0.909055	valid_1's auc: 0.870869
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  50%|#####     | 10/20 [02:52<02:37, 15.79s/it]

Early stopping, best iteration is:
[271]	valid_0's auc: 0.922904	valid_1's auc: 0.869222
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  55%|#####5    | 11/20 [03:03<02:10, 14.51s/it]

Early stopping, best iteration is:
[651]	valid_0's auc: 0.898487	valid_1's auc: 0.871264
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  60%|######    | 12/20 [03:15<01:49, 13.71s/it]

Early stopping, best iteration is:
[659]	valid_0's auc: 0.89824	valid_1's auc: 0.871252
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  65%|######5   | 13/20 [03:27<01:30, 13.00s/it]

Early stopping, best iteration is:
[649]	valid_0's auc: 0.894769	valid_1's auc: 0.871277
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  70%|#######   | 14/20 [03:43<01:25, 14.17s/it]

Early stopping, best iteration is:
[466]	valid_0's auc: 0.929432	valid_1's auc: 0.869852
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  75%|#######5  | 15/20 [03:55<01:06, 13.28s/it]

Early stopping, best iteration is:
[649]	valid_0's auc: 0.894769	valid_1's auc: 0.871277
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  80%|########  | 16/20 [04:07<00:52, 13.12s/it]

Early stopping, best iteration is:
[891]	valid_0's auc: 0.891727	valid_1's auc: 0.871358
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  85%|########5 | 17/20 [04:22<00:40, 13.42s/it]

Early stopping, best iteration is:
[578]	valid_0's auc: 0.916784	valid_1's auc: 0.870542
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  90%|######### | 18/20 [04:33<00:25, 12.86s/it]

Early stopping, best iteration is:
[865]	valid_0's auc: 0.887706	valid_1's auc: 0.871283
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358:  95%|#########5| 19/20 [04:45<00:12, 12.49s/it]

Early stopping, best iteration is:
[969]	valid_0's auc: 0.884572	valid_1's auc: 0.871268
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871358: 100%|##########| 20/20 [04:58<00:00, 14.91s/it]


Early stopping, best iteration is:
[579]	valid_0's auc: 0.910493	valid_1's auc: 0.870575


bagging, val_score: 0.871358:   0%|                      | 0/10 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871358:  10%|#4            | 1/10 [00:11<01:46, 11.79s/it]

Early stopping, best iteration is:
[856]	valid_0's auc: 0.891561	valid_1's auc: 0.871259
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871358:  20%|##8           | 2/10 [00:23<01:34, 11.79s/it]

Early stopping, best iteration is:
[875]	valid_0's auc: 0.891354	valid_1's auc: 0.870638
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871358:  30%|####2         | 3/10 [00:36<01:27, 12.50s/it]

Early stopping, best iteration is:
[821]	valid_0's auc: 0.890162	valid_1's auc: 0.871188
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871358:  40%|#####6        | 4/10 [00:49<01:14, 12.45s/it]

Early stopping, best iteration is:
[829]	valid_0's auc: 0.890196	valid_1's auc: 0.870869
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871358:  50%|#######       | 5/10 [01:03<01:04, 12.92s/it]

Early stopping, best iteration is:
[887]	valid_0's auc: 0.891558	valid_1's auc: 0.871323
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871358:  60%|########4     | 6/10 [01:17<00:53, 13.35s/it]

Early stopping, best iteration is:
[904]	valid_0's auc: 0.891865	valid_1's auc: 0.871191
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871682:  70%|#########7    | 7/10 [01:31<00:40, 13.60s/it]

Early stopping, best iteration is:
[951]	valid_0's auc: 0.893676	valid_1's auc: 0.871682
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871682:  80%|###########2  | 8/10 [01:45<00:27, 13.79s/it]

Early stopping, best iteration is:
[926]	valid_0's auc: 0.89291	valid_1's auc: 0.87125
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871682:  90%|############6 | 9/10 [01:58<00:13, 13.44s/it]

Early stopping, best iteration is:
[818]	valid_0's auc: 0.89022	valid_1's auc: 0.871385
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871682: 100%|#############| 10/10 [02:11<00:00, 13.15s/it]


Early stopping, best iteration is:
[869]	valid_0's auc: 0.891537	valid_1's auc: 0.871368


feature_fraction_stage2, val_score: 0.871682:   0%|       | 0/3 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871682:  33%|3| 1/3 [00:14<00:29, 14.59s/i

Early stopping, best iteration is:
[951]	valid_0's auc: 0.894342	valid_1's auc: 0.871544
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871682:  67%|6| 2/3 [00:29<00:14, 14.54s/i

Early stopping, best iteration is:
[933]	valid_0's auc: 0.89413	valid_1's auc: 0.871394
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871682: 100%|#| 3/3 [00:43<00:00, 14.54s/i


Early stopping, best iteration is:
[952]	valid_0's auc: 0.893985	valid_1's auc: 0.87145


regularization_factors, val_score: 0.871682:   0%|       | 0/20 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871682:   5%| | 1/20 [00:14<04:31, 14.29s/i

Early stopping, best iteration is:
[954]	valid_0's auc: 0.893884	valid_1's auc: 0.871612
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871682:  10%|1| 2/20 [00:28<04:17, 14.33s/i

Early stopping, best iteration is:
[937]	valid_0's auc: 0.893411	valid_1's auc: 0.871619
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871682:  15%|1| 3/20 [00:43<04:04, 14.39s/i

Early stopping, best iteration is:
[951]	valid_0's auc: 0.893753	valid_1's auc: 0.871635
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871682:  20%|2| 4/20 [00:57<03:49, 14.34s/i

Early stopping, best iteration is:
[937]	valid_0's auc: 0.893375	valid_1's auc: 0.871681
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871682:  25%|2| 5/20 [01:10<03:30, 14.02s/i

Early stopping, best iteration is:
[863]	valid_0's auc: 0.891539	valid_1's auc: 0.871551
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871682:  30%|3| 6/20 [01:25<03:19, 14.22s/i

Early stopping, best iteration is:
[936]	valid_0's auc: 0.893338	valid_1's auc: 0.871642
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871682:  35%|3| 7/20 [01:39<03:04, 14.23s/i

Early stopping, best iteration is:
[937]	valid_0's auc: 0.893435	valid_1's auc: 0.871636
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  40%|4| 8/20 [01:54<02:51, 14.33s/i

Early stopping, best iteration is:
[954]	valid_0's auc: 0.893849	valid_1's auc: 0.871743
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  45%|4| 9/20 [02:08<02:35, 14.16s/i

Early stopping, best iteration is:
[865]	valid_0's auc: 0.884936	valid_1's auc: 0.871506
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  50%|5| 10/20 [02:22<02:22, 14.29s/

Early stopping, best iteration is:
[951]	valid_0's auc: 0.893676	valid_1's auc: 0.871681
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  55%|5| 11/20 [02:37<02:09, 14.36s/

Early stopping, best iteration is:
[954]	valid_0's auc: 0.893754	valid_1's auc: 0.871694
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  60%|6| 12/20 [02:51<01:54, 14.35s/

Early stopping, best iteration is:
[951]	valid_0's auc: 0.893676	valid_1's auc: 0.871681
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  65%|6| 13/20 [03:04<01:38, 14.04s/

Early stopping, best iteration is:
[867]	valid_0's auc: 0.891672	valid_1's auc: 0.871667
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  70%|7| 14/20 [03:18<01:23, 13.88s/

Early stopping, best iteration is:
[878]	valid_0's auc: 0.891829	valid_1's auc: 0.871684
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  75%|7| 15/20 [03:32<01:09, 13.99s/

Early stopping, best iteration is:
[936]	valid_0's auc: 0.893263	valid_1's auc: 0.87168
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  80%|8| 16/20 [03:46<00:56, 14.10s/

Early stopping, best iteration is:
[936]	valid_0's auc: 0.893397	valid_1's auc: 0.871687
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  85%|8| 17/20 [04:00<00:41, 13.92s/

Early stopping, best iteration is:
[867]	valid_0's auc: 0.891698	valid_1's auc: 0.871699
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  90%|9| 18/20 [04:14<00:28, 14.06s/

Early stopping, best iteration is:
[951]	valid_0's auc: 0.893803	valid_1's auc: 0.871689
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743:  95%|9| 19/20 [04:29<00:14, 14.24s/

Early stopping, best iteration is:
[936]	valid_0's auc: 0.893362	valid_1's auc: 0.871657
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871743: 100%|#| 20/20 [04:44<00:00, 14.20s/


Early stopping, best iteration is:
[936]	valid_0's auc: 0.893377	valid_1's auc: 0.871701


min_data_in_leaf, val_score: 0.871743:   0%|              | 0/5 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871743:  20%|#2    | 1/5 [00:13<00:54, 13.70s/it]

Early stopping, best iteration is:
[892]	valid_0's auc: 0.897885	valid_1's auc: 0.871382
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871743:  40%|##4   | 2/5 [00:27<00:41, 13.69s/it]

Early stopping, best iteration is:
[855]	valid_0's auc: 0.888358	valid_1's auc: 0.871512
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871743:  60%|###6  | 3/5 [00:40<00:27, 13.57s/it]

Early stopping, best iteration is:
[863]	valid_0's auc: 0.893942	valid_1's auc: 0.871543
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871743:  80%|####8 | 4/5 [00:55<00:13, 13.89s/it]

Early stopping, best iteration is:
[937]	valid_0's auc: 0.892586	valid_1's auc: 0.871542
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871743: 100%|######| 5/5 [01:08<00:00, 13.61s/it]

Early stopping, best iteration is:
[779]	valid_0's auc: 0.885093	valid_1's auc: 0.871508





Best params: {'objective': 'binary', 'metric': 'auc', 'verbosity': -1, 'boosting_type': 'gbdt', 'learning_rate': 0.005603627873630697, 'feature_pre_filter': False, 'lambda_l1': 0.022801469789147367, 'lambda_l2': 0.004898476816241659, 'num_leaves': 31, 'feature_fraction': 0.4, 'bagging_fraction': 0.8079749072343313, 'bagging_freq': 5, 'min_child_samples': 20, 'num_iterations': 5926}
CPU times: user 40min 39s, sys: 5min 53s, total: 46min 33s
Wall time: 15min 14s


In [6]:
all_features_best_params

{'objective': 'binary',
 'metric': 'auc',
 'verbosity': -1,
 'boosting_type': 'gbdt',
 'learning_rate': 0.005603627873630697,
 'feature_pre_filter': False,
 'lambda_l1': 0.022801469789147367,
 'lambda_l2': 0.004898476816241659,
 'num_leaves': 31,
 'feature_fraction': 0.4,
 'bagging_fraction': 0.8079749072343313,
 'bagging_freq': 5,
 'min_child_samples': 20,
 'num_iterations': 5926}

### MrMr Features

In [7]:
%%time
dtrain = lgbm.Dataset(train[fw_features], label=train[target_column])
dval = lgb.Dataset(test[fw_features], label=test[target_column])
params = {
    "objective": "binary",
    "metric": "auc",
    "verbosity": -1,
    "boosting_type": "gbdt",
    "n_estimators":5926,
    "learning_rate": 0.005603627873630697
}

bst = lgb.train(
    params,
    dtrain,
    valid_sets=[dtrain, dval],
    callbacks=[early_stopping(150)],
    show_progress_bar = True
)

preds = bst.predict(test[fw_features],num_iteration=bst.best_iteration)
score = roc_auc_score(test[target_column], preds)

fw_features_best_params = bst.params
print("Best params:", fw_features_best_params)



feature_fraction, val_score: -inf:   0%|                  | 0/7 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.870538:  14%|8     | 1/7 [00:10<01:00, 10.16s/it]

Early stopping, best iteration is:
[848]	valid_0's auc: 0.891962	valid_1's auc: 0.870538
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.870630:  29%|#7    | 2/7 [00:19<00:49,  9.86s/it]

Early stopping, best iteration is:
[767]	valid_0's auc: 0.890367	valid_1's auc: 0.87063
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.870922:  43%|##5   | 3/7 [00:30<00:40, 10.02s/it]

Early stopping, best iteration is:
[823]	valid_0's auc: 0.892276	valid_1's auc: 0.870922
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871700:  57%|###4  | 4/7 [00:41<00:31, 10.66s/it]

Early stopping, best iteration is:
[986]	valid_0's auc: 0.892375	valid_1's auc: 0.8717
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871700:  71%|####2 | 5/7 [00:52<00:21, 10.90s/it]

Early stopping, best iteration is:
[933]	valid_0's auc: 0.892719	valid_1's auc: 0.871696
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871700:  86%|#####1| 6/7 [01:03<00:10, 10.72s/it]

Early stopping, best iteration is:
[854]	valid_0's auc: 0.891742	valid_1's auc: 0.871656
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871700: 100%|######| 7/7 [01:13<00:00, 10.53s/it]


Early stopping, best iteration is:
[878]	valid_0's auc: 0.893095	valid_1's auc: 0.871408


num_leaves, val_score: 0.871700:   0%|                   | 0/20 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871700:   5%|5          | 1/20 [00:23<07:30, 23.71s/it]

Early stopping, best iteration is:
[3969]	valid_0's auc: 0.874572	valid_1's auc: 0.871653
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871817:  10%|#1         | 2/20 [00:34<04:52, 16.24s/it]

Early stopping, best iteration is:
[1040]	valid_0's auc: 0.887206	valid_1's auc: 0.871817
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  15%|#6         | 3/20 [00:45<03:51, 13.61s/it]

Early stopping, best iteration is:
[1129]	valid_0's auc: 0.882338	valid_1's auc: 0.871848
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  20%|##2        | 4/20 [01:03<04:10, 15.63s/it]

Early stopping, best iteration is:
[438]	valid_0's auc: 0.938945	valid_1's auc: 0.869117
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  25%|##7        | 5/20 [01:19<03:56, 15.78s/it]

Early stopping, best iteration is:
[2208]	valid_0's auc: 0.880328	valid_1's auc: 0.871551
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  30%|###3       | 6/20 [01:33<03:28, 14.91s/it]

Early stopping, best iteration is:
[683]	valid_0's auc: 0.912808	valid_1's auc: 0.871149
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  35%|###8       | 7/20 [01:47<03:12, 14.80s/it]

Early stopping, best iteration is:
[683]	valid_0's auc: 0.919902	valid_1's auc: 0.870925
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  40%|####4      | 8/20 [02:01<02:53, 14.43s/it]

Early stopping, best iteration is:
[974]	valid_0's auc: 0.906763	valid_1's auc: 0.871418
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  45%|####9      | 9/20 [02:16<02:40, 14.55s/it]

Early stopping, best iteration is:
[480]	valid_0's auc: 0.925	valid_1's auc: 0.870145
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  50%|#####     | 10/20 [02:29<02:22, 14.22s/it]

Early stopping, best iteration is:
[441]	valid_0's auc: 0.918297	valid_1's auc: 0.87009
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  55%|#####5    | 11/20 [02:41<02:01, 13.51s/it]

Early stopping, best iteration is:
[856]	valid_0's auc: 0.899943	valid_1's auc: 0.871658
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  60%|######    | 12/20 [02:57<01:54, 14.33s/it]

Early stopping, best iteration is:
[429]	valid_0's auc: 0.929077	valid_1's auc: 0.869578
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  65%|######5   | 13/20 [03:08<01:32, 13.25s/it]

Early stopping, best iteration is:
[660]	valid_0's auc: 0.898931	valid_1's auc: 0.871342
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  70%|#######   | 14/20 [03:23<01:22, 13.75s/it]

Early stopping, best iteration is:
[675]	valid_0's auc: 0.923292	valid_1's auc: 0.870814
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  75%|#######5  | 15/20 [03:34<01:04, 12.98s/it]

Early stopping, best iteration is:
[994]	valid_0's auc: 0.887266	valid_1's auc: 0.871749
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  80%|########  | 16/20 [03:45<00:49, 12.41s/it]

Early stopping, best iteration is:
[997]	valid_0's auc: 0.888463	valid_1's auc: 0.871727
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  85%|########5 | 17/20 [03:59<00:38, 12.78s/it]

Early stopping, best iteration is:
[785]	valid_0's auc: 0.910552	valid_1's auc: 0.871334
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  90%|######### | 18/20 [04:10<00:24, 12.36s/it]

Early stopping, best iteration is:
[974]	valid_0's auc: 0.893113	valid_1's auc: 0.871846
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848:  95%|#########5| 19/20 [04:35<00:15, 15.98s/it]

Early stopping, best iteration is:
[3969]	valid_0's auc: 0.874572	valid_1's auc: 0.871653
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871848: 100%|##########| 20/20 [04:47<00:00, 14.36s/it]


Early stopping, best iteration is:
[974]	valid_0's auc: 0.896507	valid_1's auc: 0.871679


bagging, val_score: 0.871848:   0%|                      | 0/10 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871848:  10%|#4            | 1/10 [00:12<01:54, 12.76s/it]

Early stopping, best iteration is:
[1273]	valid_0's auc: 0.88522	valid_1's auc: 0.871844
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871848:  20%|##8           | 2/10 [00:24<01:38, 12.27s/it]

Early stopping, best iteration is:
[1172]	valid_0's auc: 0.883805	valid_1's auc: 0.871831
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872193:  30%|####2         | 3/10 [00:41<01:38, 14.12s/it]

Early stopping, best iteration is:
[1534]	valid_0's auc: 0.887564	valid_1's auc: 0.872193
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872193:  40%|#####6        | 4/10 [00:53<01:20, 13.42s/it]

Early stopping, best iteration is:
[1048]	valid_0's auc: 0.881382	valid_1's auc: 0.871871
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872193:  50%|#######       | 5/10 [01:08<01:10, 14.01s/it]

Early stopping, best iteration is:
[1463]	valid_0's auc: 0.886309	valid_1's auc: 0.872088
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872193:  60%|########4     | 6/10 [01:20<00:53, 13.46s/it]

Early stopping, best iteration is:
[1180]	valid_0's auc: 0.883022	valid_1's auc: 0.872027
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872193:  70%|#########7    | 7/10 [01:35<00:41, 13.75s/it]

Early stopping, best iteration is:
[1423]	valid_0's auc: 0.886221	valid_1's auc: 0.87194
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872193:  80%|###########2  | 8/10 [01:47<00:26, 13.17s/it]

Early stopping, best iteration is:
[1113]	valid_0's auc: 0.882296	valid_1's auc: 0.871795
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872193:  90%|############6 | 9/10 [01:59<00:13, 13.05s/it]

Early stopping, best iteration is:
[1249]	valid_0's auc: 0.88408	valid_1's auc: 0.871904
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872193: 100%|#############| 10/10 [02:12<00:00, 13.20s/it]


Early stopping, best iteration is:
[1146]	valid_0's auc: 0.883111	valid_1's auc: 0.871966


feature_fraction_stage2, val_score: 0.872193:   0%|       | 0/3 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872193:  33%|3| 1/3 [00:15<00:31, 15.66s/i

Early stopping, best iteration is:
[1535]	valid_0's auc: 0.888048	valid_1's auc: 0.872108
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872193:  67%|6| 2/3 [00:33<00:16, 16.79s/i

Early stopping, best iteration is:
[1659]	valid_0's auc: 0.88989	valid_1's auc: 0.872181
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872193: 100%|#| 3/3 [00:48<00:00, 16.05s/i


Early stopping, best iteration is:
[1534]	valid_0's auc: 0.887564	valid_1's auc: 0.872193


regularization_factors, val_score: 0.872193:   0%|       | 0/20 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:   5%| | 1/20 [00:17<05:23, 17.03s/i

Early stopping, best iteration is:
[1740]	valid_0's auc: 0.886349	valid_1's auc: 0.872209
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  10%|1| 2/20 [00:34<05:09, 17.21s/i

Early stopping, best iteration is:
[1731]	valid_0's auc: 0.884144	valid_1's auc: 0.872103
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  15%|1| 3/20 [00:46<04:11, 14.82s/i

Early stopping, best iteration is:
[1093]	valid_0's auc: 0.881703	valid_1's auc: 0.872062
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  20%|2| 4/20 [01:01<04:01, 15.09s/i

Early stopping, best iteration is:
[1427]	valid_0's auc: 0.879598	valid_1's auc: 0.872072
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  25%|2| 5/20 [01:14<03:31, 14.12s/i

Early stopping, best iteration is:
[1157]	valid_0's auc: 0.882948	valid_1's auc: 0.872169
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  30%|3| 6/20 [01:26<03:10, 13.57s/i

Early stopping, best iteration is:
[1157]	valid_0's auc: 0.882858	valid_1's auc: 0.872206
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  35%|3| 7/20 [01:38<02:50, 13.13s/i

Early stopping, best iteration is:
[1127]	valid_0's auc: 0.8825	valid_1's auc: 0.872181
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  40%|4| 8/20 [01:51<02:33, 12.81s/i

Early stopping, best iteration is:
[1157]	valid_0's auc: 0.882867	valid_1's auc: 0.872201
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  45%|4| 9/20 [02:03<02:20, 12.75s/i

Early stopping, best iteration is:
[1178]	valid_0's auc: 0.883083	valid_1's auc: 0.872138
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  50%|5| 10/20 [02:16<02:07, 12.72s/

Early stopping, best iteration is:
[1157]	valid_0's auc: 0.882531	valid_1's auc: 0.872069
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  55%|5| 11/20 [02:32<02:04, 13.86s/

Early stopping, best iteration is:
[1534]	valid_0's auc: 0.887562	valid_1's auc: 0.872195
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  60%|6| 12/20 [02:47<01:53, 14.23s/

Early stopping, best iteration is:
[1534]	valid_0's auc: 0.887562	valid_1's auc: 0.872197
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  65%|6| 13/20 [03:03<01:41, 14.56s/

Early stopping, best iteration is:
[1534]	valid_0's auc: 0.887562	valid_1's auc: 0.872197
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  70%|7| 14/20 [03:18<01:28, 14.77s/

Early stopping, best iteration is:
[1534]	valid_0's auc: 0.887562	valid_1's auc: 0.872196
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  75%|7| 15/20 [03:33<01:14, 14.87s/

Early stopping, best iteration is:
[1534]	valid_0's auc: 0.887562	valid_1's auc: 0.872191
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  80%|8| 16/20 [03:48<00:59, 14.91s/

Early stopping, best iteration is:
[1534]	valid_0's auc: 0.887562	valid_1's auc: 0.872193
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  85%|8| 17/20 [04:03<00:44, 14.96s/

Early stopping, best iteration is:
[1513]	valid_0's auc: 0.887339	valid_1's auc: 0.872191
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  90%|9| 18/20 [04:18<00:29, 14.92s/

Early stopping, best iteration is:
[1513]	valid_0's auc: 0.887341	valid_1's auc: 0.872196
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209:  95%|9| 19/20 [04:33<00:14, 14.99s/

Early stopping, best iteration is:
[1513]	valid_0's auc: 0.887342	valid_1's auc: 0.872193
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872209: 100%|#| 20/20 [04:46<00:00, 14.30s/


Early stopping, best iteration is:
[1157]	valid_0's auc: 0.882937	valid_1's auc: 0.8722


min_data_in_leaf, val_score: 0.872209:   0%|              | 0/5 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872209:  20%|#2    | 1/5 [00:15<01:01, 15.33s/it]

Early stopping, best iteration is:
[1398]	valid_0's auc: 0.882478	valid_1's auc: 0.872133
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872209:  40%|##4   | 2/5 [00:30<00:45, 15.07s/it]

Early stopping, best iteration is:
[1386]	valid_0's auc: 0.883153	valid_1's auc: 0.872099
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872275:  60%|###6  | 3/5 [00:47<00:32, 16.29s/it]

Early stopping, best iteration is:
[1696]	valid_0's auc: 0.88563	valid_1's auc: 0.872275
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872275:  80%|####8 | 4/5 [01:00<00:14, 14.70s/it]

Early stopping, best iteration is:
[1093]	valid_0's auc: 0.879975	valid_1's auc: 0.872061
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872275: 100%|######| 5/5 [01:17<00:00, 15.48s/it]

Early stopping, best iteration is:
[1731]	valid_0's auc: 0.886282	valid_1's auc: 0.872206





Best params: {'objective': 'binary', 'metric': 'auc', 'verbosity': -1, 'boosting_type': 'gbdt', 'learning_rate': 0.005603627873630697, 'feature_pre_filter': False, 'lambda_l1': 4.1326356980799586e-05, 'lambda_l2': 0.6016241539010313, 'num_leaves': 18, 'feature_fraction': 0.4, 'bagging_fraction': 0.9403276152498784, 'bagging_freq': 4, 'min_child_samples': 50, 'num_iterations': 5926}
CPU times: user 38min 48s, sys: 5min 38s, total: 44min 26s
Wall time: 15min 5s


In [8]:
fw_features_best_params

{'objective': 'binary',
 'metric': 'auc',
 'verbosity': -1,
 'boosting_type': 'gbdt',
 'learning_rate': 0.005603627873630697,
 'feature_pre_filter': False,
 'lambda_l1': 4.1326356980799586e-05,
 'lambda_l2': 0.6016241539010313,
 'num_leaves': 18,
 'feature_fraction': 0.4,
 'bagging_fraction': 0.9403276152498784,
 'bagging_freq': 4,
 'min_child_samples': 50,
 'num_iterations': 5926}

### Ensemble features

In [9]:
%%time
dtrain = lgbm.Dataset(train[ensemble_features], label=train[target_column])
dval = lgb.Dataset(test[ensemble_features], label=test[target_column])

params = {
    "objective": "binary",
    "metric": "auc",
    "verbosity": -1,
    "boosting_type": "gbdt",
    "n_estimators":5926,
    "learning_rate": 0.005603627873630697
}

bst = lgb.train(
    params,
    dtrain,
    valid_sets=[dtrain, dval],
    callbacks=[early_stopping(150)],
    show_progress_bar = True
)

preds = bst.predict(test[ensemble_features],num_iteration=bst.best_iteration)
score = roc_auc_score(test[target_column], preds)

ensemble_features_best_params = bst.params
print("Best params:", ensemble_features_best_params)



feature_fraction, val_score: -inf:   0%|                  | 0/7 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.870887:  14%|8     | 1/7 [00:11<01:11, 11.96s/it]

Early stopping, best iteration is:
[915]	valid_0's auc: 0.894268	valid_1's auc: 0.870887
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871175:  29%|#7    | 2/7 [00:22<00:56, 11.36s/it]

Early stopping, best iteration is:
[785]	valid_0's auc: 0.890575	valid_1's auc: 0.871175
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871211:  43%|##5   | 3/7 [00:33<00:44, 11.14s/it]

Early stopping, best iteration is:
[805]	valid_0's auc: 0.891699	valid_1's auc: 0.871211
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871211:  57%|###4  | 4/7 [00:45<00:34, 11.37s/it]

Early stopping, best iteration is:
[887]	valid_0's auc: 0.893298	valid_1's auc: 0.871069
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871211:  71%|####2 | 5/7 [00:55<00:21, 10.71s/it]

Early stopping, best iteration is:
[638]	valid_0's auc: 0.884674	valid_1's auc: 0.870968
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871277:  86%|#####1| 6/7 [01:06<00:10, 10.96s/it]

Early stopping, best iteration is:
[842]	valid_0's auc: 0.891197	valid_1's auc: 0.871277
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871277: 100%|######| 7/7 [01:16<00:00, 10.93s/it]


Early stopping, best iteration is:
[741]	valid_0's auc: 0.889737	valid_1's auc: 0.870713


num_leaves, val_score: 0.871277:   0%|                   | 0/20 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871277:   5%|5          | 1/20 [00:12<03:53, 12.28s/it]

Early stopping, best iteration is:
[257]	valid_0's auc: 0.916226	valid_1's auc: 0.869012
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871277:  10%|#1         | 2/20 [00:25<03:48, 12.70s/it]

Early stopping, best iteration is:
[231]	valid_0's auc: 0.918683	valid_1's auc: 0.868785
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871410:  15%|#6         | 3/20 [00:37<03:34, 12.61s/it]

Early stopping, best iteration is:
[1309]	valid_0's auc: 0.879207	valid_1's auc: 0.87141
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871410:  20%|##2        | 4/20 [01:07<05:11, 19.46s/it]

Did not meet early stopping. Best iteration is:
[5926]	valid_0's auc: 0.864692	valid_1's auc: 0.869252
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871460:  25%|##7        | 5/20 [01:19<04:11, 16.80s/it]

Early stopping, best iteration is:
[1212]	valid_0's auc: 0.881669	valid_1's auc: 0.87146
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  30%|###3       | 6/20 [01:38<04:04, 17.45s/it]

Early stopping, best iteration is:
[2592]	valid_0's auc: 0.874209	valid_1's auc: 0.87154
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  35%|###8       | 7/20 [01:53<03:36, 16.66s/it]

Early stopping, best iteration is:
[709]	valid_0's auc: 0.9192	valid_1's auc: 0.870644
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  40%|####4      | 8/20 [02:05<03:01, 15.11s/it]

Early stopping, best iteration is:
[578]	valid_0's auc: 0.90641	valid_1's auc: 0.870509
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  45%|####9      | 9/20 [02:16<02:33, 13.99s/it]

Early stopping, best iteration is:
[580]	valid_0's auc: 0.90453	valid_1's auc: 0.870838
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  50%|#####     | 10/20 [02:28<02:11, 13.19s/it]

Early stopping, best iteration is:
[328]	valid_0's auc: 0.91061	valid_1's auc: 0.869845
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  55%|#####5    | 11/20 [02:39<01:52, 12.55s/it]

Early stopping, best iteration is:
[835]	valid_0's auc: 0.889882	valid_1's auc: 0.871351
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  60%|######    | 12/20 [02:50<01:36, 12.02s/it]

Early stopping, best iteration is:
[328]	valid_0's auc: 0.907941	valid_1's auc: 0.869961
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  65%|######5   | 13/20 [03:00<01:20, 11.56s/it]

Early stopping, best iteration is:
[604]	valid_0's auc: 0.895837	valid_1's auc: 0.870894
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  70%|#######   | 14/20 [03:12<01:10, 11.72s/it]

Early stopping, best iteration is:
[470]	valid_0's auc: 0.912101	valid_1's auc: 0.870287
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  75%|#######5  | 15/20 [03:37<01:18, 15.66s/it]

Early stopping, best iteration is:
[3823]	valid_0's auc: 0.871593	valid_1's auc: 0.871435
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  80%|########  | 16/20 [03:59<01:09, 17.45s/it]

Early stopping, best iteration is:
[3027]	valid_0's auc: 0.872707	valid_1's auc: 0.871262
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  85%|########5 | 17/20 [04:10<00:46, 15.56s/it]

Early stopping, best iteration is:
[789]	valid_0's auc: 0.890677	valid_1's auc: 0.871486
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  90%|######### | 18/20 [04:20<00:28, 14.03s/it]

Early stopping, best iteration is:
[638]	valid_0's auc: 0.894541	valid_1's auc: 0.871391
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540:  95%|#########5| 19/20 [04:32<00:13, 13.30s/it]

Early stopping, best iteration is:
[781]	valid_0's auc: 0.895507	valid_1's auc: 0.871291
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.871540: 100%|##########| 20/20 [04:44<00:00, 14.25s/it]


Early stopping, best iteration is:
[337]	valid_0's auc: 0.91588	valid_1's auc: 0.869702


bagging, val_score: 0.871540:   0%|                      | 0/10 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  10%|#4            | 1/10 [00:20<03:05, 20.58s/it]

Early stopping, best iteration is:
[2521]	valid_0's auc: 0.874034	valid_1's auc: 0.871494
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  20%|##8           | 2/10 [00:40<02:42, 20.27s/it]

Early stopping, best iteration is:
[2480]	valid_0's auc: 0.87384	valid_1's auc: 0.871313
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  30%|####2         | 3/10 [00:58<02:14, 19.15s/it]

Early stopping, best iteration is:
[2272]	valid_0's auc: 0.873832	valid_1's auc: 0.871365
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  40%|#####6        | 4/10 [01:15<01:49, 18.25s/it]

Early stopping, best iteration is:
[2042]	valid_0's auc: 0.873859	valid_1's auc: 0.871022
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  50%|#######       | 5/10 [01:37<01:38, 19.77s/it]

Early stopping, best iteration is:
[2758]	valid_0's auc: 0.874928	valid_1's auc: 0.871469
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  60%|########4     | 6/10 [01:58<01:20, 20.19s/it]

Early stopping, best iteration is:
[2537]	valid_0's auc: 0.874333	valid_1's auc: 0.871422
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  70%|#########7    | 7/10 [02:17<00:59, 19.76s/it]

Early stopping, best iteration is:
[2298]	valid_0's auc: 0.873402	valid_1's auc: 0.871331
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  80%|###########2  | 8/10 [02:38<00:40, 20.24s/it]

Early stopping, best iteration is:
[2881]	valid_0's auc: 0.876127	valid_1's auc: 0.871333
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540:  90%|############6 | 9/10 [03:01<00:20, 20.97s/it]

Early stopping, best iteration is:
[2814]	valid_0's auc: 0.875799	valid_1's auc: 0.871389
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.871540: 100%|#############| 10/10 [03:21<00:00, 20.16s/it]


Early stopping, best iteration is:
[2596]	valid_0's auc: 0.874969	valid_1's auc: 0.871267


feature_fraction_stage2, val_score: 0.871540:   0%|       | 0/6 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871540:  17%|1| 1/6 [00:19<01:38, 19.70s/i

Early stopping, best iteration is:
[2676]	valid_0's auc: 0.874459	valid_1's auc: 0.871456
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871540:  33%|3| 2/6 [00:38<01:16, 19.15s/i

Early stopping, best iteration is:
[2592]	valid_0's auc: 0.874209	valid_1's auc: 0.87154
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871540:  50%|5| 3/6 [00:56<00:55, 18.56s/i

Early stopping, best iteration is:
[2449]	valid_0's auc: 0.873489	valid_1's auc: 0.8713
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871540:  67%|6| 4/6 [01:13<00:36, 18.02s/i

Early stopping, best iteration is:
[2320]	valid_0's auc: 0.873269	valid_1's auc: 0.871424
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871540:  83%|8| 5/6 [01:33<00:18, 18.72s/i

Early stopping, best iteration is:
[2717]	valid_0's auc: 0.874765	valid_1's auc: 0.871503
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.871540: 100%|#| 6/6 [01:51<00:00, 18.50s/i


Early stopping, best iteration is:
[2506]	valid_0's auc: 0.873532	valid_1's auc: 0.871088


regularization_factors, val_score: 0.871540:   0%|       | 0/20 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871540:   5%| | 1/20 [00:19<06:01, 19.04s/i

Early stopping, best iteration is:
[2597]	valid_0's auc: 0.873987	valid_1's auc: 0.871361
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871540:  10%|1| 2/20 [00:41<06:15, 20.87s/i

Early stopping, best iteration is:
[3072]	valid_0's auc: 0.87259	valid_1's auc: 0.871463
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871540:  15%|1| 3/20 [01:01<05:49, 20.54s/i

Early stopping, best iteration is:
[2790]	valid_0's auc: 0.873067	valid_1's auc: 0.87152
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871540:  20%|2| 4/20 [01:28<06:07, 22.97s/i

Early stopping, best iteration is:
[3972]	valid_0's auc: 0.874922	valid_1's auc: 0.871509
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  25%|2| 5/20 [01:55<06:07, 24.53s/i

Early stopping, best iteration is:
[3845]	valid_0's auc: 0.874664	valid_1's auc: 0.871586
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  30%|3| 6/20 [02:16<05:28, 23.46s/i

Early stopping, best iteration is:
[3161]	valid_0's auc: 0.873561	valid_1's auc: 0.87156
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  35%|3| 7/20 [02:41<05:09, 23.82s/i

Early stopping, best iteration is:
[3386]	valid_0's auc: 0.874007	valid_1's auc: 0.871551
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  40%|4| 8/20 [03:02<04:35, 22.95s/i

Early stopping, best iteration is:
[3013]	valid_0's auc: 0.873399	valid_1's auc: 0.871501
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  45%|4| 9/20 [03:23<04:04, 22.23s/i

Early stopping, best iteration is:
[2915]	valid_0's auc: 0.873352	valid_1's auc: 0.87151
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  50%|5| 10/20 [03:48<03:52, 23.28s/

Early stopping, best iteration is:
[3586]	valid_0's auc: 0.874165	valid_1's auc: 0.87149
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  55%|5| 11/20 [04:05<03:11, 21.33s/

Early stopping, best iteration is:
[2235]	valid_0's auc: 0.872794	valid_1's auc: 0.871308
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  60%|6| 12/20 [04:24<02:45, 20.63s/

Early stopping, best iteration is:
[2692]	valid_0's auc: 0.874228	valid_1's auc: 0.871444
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  65%|6| 13/20 [04:51<02:38, 22.64s/

Early stopping, best iteration is:
[3945]	valid_0's auc: 0.874774	valid_1's auc: 0.871512
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  70%|7| 14/20 [05:10<02:09, 21.52s/

Early stopping, best iteration is:
[2668]	valid_0's auc: 0.874258	valid_1's auc: 0.871397
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  75%|7| 15/20 [05:30<01:44, 20.86s/

Early stopping, best iteration is:
[2706]	valid_0's auc: 0.873873	valid_1's auc: 0.871517
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  80%|8| 16/20 [05:48<01:20, 20.11s/

Early stopping, best iteration is:
[2553]	valid_0's auc: 0.873981	valid_1's auc: 0.871439
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  85%|8| 17/20 [06:07<00:59, 19.89s/

Early stopping, best iteration is:
[2713]	valid_0's auc: 0.873357	valid_1's auc: 0.871513
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  90%|9| 18/20 [06:25<00:38, 19.26s/

Early stopping, best iteration is:
[2515]	valid_0's auc: 0.873307	valid_1's auc: 0.871447
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586:  95%|9| 19/20 [06:45<00:19, 19.39s/

Early stopping, best iteration is:
[2690]	valid_0's auc: 0.87375	valid_1's auc: 0.871469
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.871586: 100%|#| 20/20 [07:07<00:00, 21.35s/


Early stopping, best iteration is:
[2956]	valid_0's auc: 0.874098	valid_1's auc: 0.871373


min_data_in_leaf, val_score: 0.871586:   0%|              | 0/5 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871586:  20%|#2    | 1/5 [00:22<01:29, 22.47s/it]

Early stopping, best iteration is:
[3058]	valid_0's auc: 0.873137	valid_1's auc: 0.871377
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871586:  40%|##4   | 2/5 [00:43<01:04, 21.43s/it]

Early stopping, best iteration is:
[2915]	valid_0's auc: 0.87304	valid_1's auc: 0.871467
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871586:  60%|###6  | 3/5 [01:05<00:43, 21.99s/it]

Early stopping, best iteration is:
[3115]	valid_0's auc: 0.873375	valid_1's auc: 0.871397
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871586:  80%|####8 | 4/5 [01:31<00:23, 23.35s/it]

Early stopping, best iteration is:
[3520]	valid_0's auc: 0.87406	valid_1's auc: 0.8715
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.871586: 100%|######| 5/5 [01:56<00:00, 23.34s/it]

Early stopping, best iteration is:
[3526]	valid_0's auc: 0.874169	valid_1's auc: 0.871439





Best params: {'objective': 'binary', 'metric': 'auc', 'verbosity': -1, 'boosting_type': 'gbdt', 'learning_rate': 0.005603627873630697, 'feature_pre_filter': False, 'lambda_l1': 0.0006278595123135671, 'lambda_l2': 8.715856738389705, 'num_leaves': 6, 'feature_fraction': 0.5, 'bagging_fraction': 1.0, 'bagging_freq': 0, 'min_child_samples': 20, 'num_iterations': 5926}
CPU times: user 1h 1min 16s, sys: 7min 24s, total: 1h 8min 40s
Wall time: 20min 18s


In [14]:
ensemble_features_best_params

{'objective': 'binary',
 'metric': 'auc',
 'verbosity': -1,
 'boosting_type': 'gbdt',
 'learning_rate': 0.005603627873630697,
 'feature_pre_filter': False,
 'lambda_l1': 0.0006278595123135671,
 'lambda_l2': 8.715856738389705,
 'num_leaves': 6,
 'feature_fraction': 0.5,
 'bagging_fraction': 1.0,
 'bagging_freq': 0,
 'min_child_samples': 20,
 'num_iterations': 5926}

### Base model features

In [12]:
%%time
dtrain = lgbm.Dataset(train[base_features], label=train[target_column])
dval = lgb.Dataset(test[base_features], label=test[target_column])

params = {
    "objective": "binary",
    "metric": "auc",
    "verbosity": -1,
    "boosting_type": "gbdt",
    "n_estimators":5926,
    "learning_rate": 0.005603627873630697
}

bst = lgb.train(
    params,
    dtrain,
    valid_sets=[dtrain, dval],
    callbacks=[early_stopping(150)],
    show_progress_bar = True
)

preds = bst.predict(test[base_features],num_iteration=bst.best_iteration)
score = roc_auc_score(test[target_column], preds)

base_features_best_params = bst.params
print("Best params:", base_features_best_params)



feature_fraction, val_score: -inf:   0%|                  | 0/7 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871848:  14%|8     | 1/7 [00:07<00:46,  7.69s/it]

Early stopping, best iteration is:
[734]	valid_0's auc: 0.882724	valid_1's auc: 0.871848
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.871848:  29%|#7    | 2/7 [00:18<00:48,  9.75s/it]

Early stopping, best iteration is:
[1180]	valid_0's auc: 0.8889	valid_1's auc: 0.871635
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.872378:  43%|##5   | 3/7 [00:27<00:37,  9.30s/it]

Early stopping, best iteration is:
[784]	valid_0's auc: 0.881582	valid_1's auc: 0.872378
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.872378:  57%|###4  | 4/7 [00:35<00:26,  8.77s/it]

Early stopping, best iteration is:
[742]	valid_0's auc: 0.883057	valid_1's auc: 0.872273
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.872378:  71%|####2 | 5/7 [00:43<00:17,  8.52s/it]

Early stopping, best iteration is:
[748]	valid_0's auc: 0.883036	valid_1's auc: 0.872161
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.872378:  86%|#####1| 6/7 [01:00<00:11, 11.40s/it]

Early stopping, best iteration is:
[1797]	valid_0's auc: 0.892604	valid_1's auc: 0.871717
Training until validation scores don't improve for 150 rounds


feature_fraction, val_score: 0.872378: 100%|######| 7/7 [01:05<00:00,  9.41s/it]


Early stopping, best iteration is:
[410]	valid_0's auc: 0.875246	valid_1's auc: 0.872178


num_leaves, val_score: 0.872378:   0%|                   | 0/20 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872378:   5%|5          | 1/20 [00:06<02:04,  6.57s/it]

Early stopping, best iteration is:
[428]	valid_0's auc: 0.881718	valid_1's auc: 0.872274
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872488:  10%|#1         | 2/20 [00:15<02:27,  8.18s/it]

Early stopping, best iteration is:
[784]	valid_0's auc: 0.885125	valid_1's auc: 0.872488
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872488:  15%|#6         | 3/20 [00:24<02:24,  8.53s/it]

Early stopping, best iteration is:
[273]	valid_0's auc: 0.897734	valid_1's auc: 0.870867
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872488:  20%|##2        | 4/20 [00:44<03:29, 13.07s/it]

Early stopping, best iteration is:
[3497]	valid_0's auc: 0.870933	valid_1's auc: 0.870849
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872488:  25%|##7        | 5/20 [00:56<03:10, 12.72s/it]

Early stopping, best iteration is:
[273]	valid_0's auc: 0.911002	valid_1's auc: 0.870297
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872488:  30%|###3       | 6/20 [01:05<02:39, 11.42s/it]

Early stopping, best iteration is:
[405]	valid_0's auc: 0.892328	valid_1's auc: 0.871587
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872488:  35%|###8       | 7/20 [01:14<02:17, 10.56s/it]

Early stopping, best iteration is:
[273]	valid_0's auc: 0.897088	valid_1's auc: 0.870814
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  40%|####4      | 8/20 [01:24<02:02, 10.20s/it]

Early stopping, best iteration is:
[1035]	valid_0's auc: 0.877329	valid_1's auc: 0.872569
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  45%|####9      | 9/20 [01:39<02:08, 11.68s/it]

Early stopping, best iteration is:
[2422]	valid_0's auc: 0.870721	valid_1's auc: 0.871389
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  50%|#####     | 10/20 [01:47<01:45, 10.58s/it]

Early stopping, best iteration is:
[427]	valid_0's auc: 0.888431	valid_1's auc: 0.871896
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  55%|#####5    | 11/20 [01:55<01:29,  9.96s/it]

Early stopping, best iteration is:
[648]	valid_0's auc: 0.884547	valid_1's auc: 0.872399
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  60%|######    | 12/20 [02:03<01:13,  9.16s/it]

Early stopping, best iteration is:
[273]	valid_0's auc: 0.890023	valid_1's auc: 0.871458
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  65%|######5   | 13/20 [02:11<01:02,  8.95s/it]

Early stopping, best iteration is:
[649]	valid_0's auc: 0.883736	valid_1's auc: 0.872419
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  70%|#######   | 14/20 [02:12<00:39,  6.67s/it]

Early stopping, best iteration is:
[293]	valid_0's auc: 0.848058	valid_1's auc: 0.854575
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  75%|#######5  | 15/20 [02:19<00:33,  6.68s/it]

Early stopping, best iteration is:
[428]	valid_0's auc: 0.880804	valid_1's auc: 0.872309
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  80%|########  | 16/20 [02:26<00:26,  6.62s/it]

Early stopping, best iteration is:
[427]	valid_0's auc: 0.879816	valid_1's auc: 0.872296
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  85%|########5 | 17/20 [02:34<00:21,  7.06s/it]

Early stopping, best iteration is:
[427]	valid_0's auc: 0.888431	valid_1's auc: 0.871896
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  90%|######### | 18/20 [02:42<00:14,  7.46s/it]

Early stopping, best iteration is:
[776]	valid_0's auc: 0.879234	valid_1's auc: 0.872438
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569:  95%|#########5| 19/20 [02:52<00:08,  8.32s/it]

Early stopping, best iteration is:
[273]	valid_0's auc: 0.904286	valid_1's auc: 0.870736
Training until validation scores don't improve for 150 rounds


num_leaves, val_score: 0.872569: 100%|##########| 20/20 [03:01<00:00,  9.07s/it]


Early stopping, best iteration is:
[776]	valid_0's auc: 0.880415	valid_1's auc: 0.872378


bagging, val_score: 0.872569:   0%|                      | 0/10 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872569:  10%|#4            | 1/10 [00:08<01:15,  8.36s/it]

Early stopping, best iteration is:
[883]	valid_0's auc: 0.875784	valid_1's auc: 0.87221
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872569:  20%|##8           | 2/10 [00:18<01:15,  9.49s/it]

Early stopping, best iteration is:
[1170]	valid_0's auc: 0.878803	valid_1's auc: 0.872524
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872619:  30%|####2         | 3/10 [00:29<01:11, 10.17s/it]

Early stopping, best iteration is:
[1282]	valid_0's auc: 0.879823	valid_1's auc: 0.872619
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872619:  40%|#####6        | 4/10 [00:39<00:59,  9.93s/it]

Early stopping, best iteration is:
[1061]	valid_0's auc: 0.877561	valid_1's auc: 0.872521
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872619:  50%|#######       | 5/10 [00:50<00:51, 10.38s/it]

Early stopping, best iteration is:
[1239]	valid_0's auc: 0.879387	valid_1's auc: 0.872553
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872619:  60%|########4     | 6/10 [00:59<00:40, 10.08s/it]

Early stopping, best iteration is:
[1050]	valid_0's auc: 0.877536	valid_1's auc: 0.872597
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872654:  70%|#########7    | 7/10 [01:09<00:29,  9.78s/it]

Early stopping, best iteration is:
[1011]	valid_0's auc: 0.87704	valid_1's auc: 0.872654
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872654:  80%|###########2  | 8/10 [01:19<00:19,  9.99s/it]

Early stopping, best iteration is:
[1039]	valid_0's auc: 0.877629	valid_1's auc: 0.872556
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872654:  90%|############6 | 9/10 [01:27<00:09,  9.45s/it]

Early stopping, best iteration is:
[784]	valid_0's auc: 0.874425	valid_1's auc: 0.872441
Training until validation scores don't improve for 150 rounds


bagging, val_score: 0.872654: 100%|#############| 10/10 [01:36<00:00,  9.70s/it]


Early stopping, best iteration is:
[928]	valid_0's auc: 0.875958	valid_1's auc: 0.872526


feature_fraction_stage2, val_score: 0.872654:   0%|       | 0/6 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872654:  17%|1| 1/6 [00:09<00:46,  9.25s/i

Early stopping, best iteration is:
[1011]	valid_0's auc: 0.87704	valid_1's auc: 0.872654
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872654:  33%|3| 2/6 [00:26<00:56, 14.21s/i

Early stopping, best iteration is:
[2143]	valid_0's auc: 0.883634	valid_1's auc: 0.872078
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872654:  50%|5| 3/6 [00:36<00:35, 11.96s/i

Early stopping, best iteration is:
[1011]	valid_0's auc: 0.87704	valid_1's auc: 0.872654
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872656:  67%|6| 4/6 [00:45<00:21, 10.97s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872656:  83%|8| 5/6 [00:54<00:10, 10.30s/i

Early stopping, best iteration is:
[1011]	valid_0's auc: 0.87704	valid_1's auc: 0.872654
Training until validation scores don't improve for 150 rounds


feature_fraction_stage2, val_score: 0.872656: 100%|#| 6/6 [01:04<00:00, 10.67s/i


Early stopping, best iteration is:
[1011]	valid_0's auc: 0.87704	valid_1's auc: 0.872654


regularization_factors, val_score: 0.872656:   0%|       | 0/20 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:   5%| | 1/20 [00:09<03:02,  9.61s/i

Early stopping, best iteration is:
[1014]	valid_0's auc: 0.877685	valid_1's auc: 0.872648
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  10%|1| 2/20 [00:18<02:50,  9.45s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  15%|1| 3/20 [00:28<02:38,  9.32s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  20%|2| 4/20 [00:37<02:28,  9.30s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  25%|2| 5/20 [00:46<02:19,  9.29s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  30%|3| 6/20 [00:56<02:10,  9.31s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  35%|3| 7/20 [01:05<02:01,  9.34s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  40%|4| 8/20 [01:14<01:51,  9.32s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  45%|4| 9/20 [01:24<01:43,  9.39s/i

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  50%|5| 10/20 [01:33<01:33,  9.37s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  55%|5| 11/20 [01:42<01:24,  9.35s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  60%|6| 12/20 [01:52<01:14,  9.34s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  65%|6| 13/20 [02:01<01:05,  9.41s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  70%|7| 14/20 [02:11<00:56,  9.44s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  75%|7| 15/20 [02:20<00:47,  9.46s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  80%|8| 16/20 [02:30<00:37,  9.40s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  85%|8| 17/20 [02:39<00:28,  9.46s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  90%|9| 18/20 [02:49<00:18,  9.46s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656:  95%|9| 19/20 [02:58<00:09,  9.36s/

Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656
Training until validation scores don't improve for 150 rounds


regularization_factors, val_score: 0.872656: 100%|#| 20/20 [03:07<00:00,  9.38s/


Early stopping, best iteration is:
[1037]	valid_0's auc: 0.878109	valid_1's auc: 0.872656


min_data_in_leaf, val_score: 0.872656:   0%|              | 0/5 [00:00<?, ?it/s]

Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872656:  20%|#2    | 1/5 [00:09<00:39,  9.85s/it]

Early stopping, best iteration is:
[1085]	valid_0's auc: 0.880248	valid_1's auc: 0.872429
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872656:  40%|##4   | 2/5 [00:19<00:29,  9.70s/it]

Early stopping, best iteration is:
[1040]	valid_0's auc: 0.877813	valid_1's auc: 0.872557
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872656:  60%|###6  | 3/5 [00:28<00:18,  9.48s/it]

Early stopping, best iteration is:
[1040]	valid_0's auc: 0.878923	valid_1's auc: 0.872598
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872656:  80%|####8 | 4/5 [00:38<00:09,  9.44s/it]

Early stopping, best iteration is:
[1028]	valid_0's auc: 0.877129	valid_1's auc: 0.872474
Training until validation scores don't improve for 150 rounds


min_data_in_leaf, val_score: 0.872656: 100%|######| 5/5 [00:48<00:00,  9.68s/it]

Early stopping, best iteration is:
[1117]	valid_0's auc: 0.877538	valid_1's auc: 0.8726





Best params: {'objective': 'binary', 'metric': 'auc', 'verbosity': -1, 'boosting_type': 'gbdt', 'learning_rate': 0.005603627873630697, 'feature_pre_filter': False, 'lambda_l1': 2.1179144447747353e-08, 'lambda_l2': 3.393255954585769e-08, 'num_leaves': 18, 'feature_fraction': 0.58, 'bagging_fraction': 0.9991117695058381, 'bagging_freq': 1, 'min_child_samples': 20, 'num_iterations': 5926}
CPU times: user 27min 44s, sys: 3min 47s, total: 31min 31s
Wall time: 10min 44s


In [13]:
base_features_best_params

{'objective': 'binary',
 'metric': 'auc',
 'verbosity': -1,
 'boosting_type': 'gbdt',
 'learning_rate': 0.005603627873630697,
 'feature_pre_filter': False,
 'lambda_l1': 2.1179144447747353e-08,
 'lambda_l2': 3.393255954585769e-08,
 'num_leaves': 18,
 'feature_fraction': 0.58,
 'bagging_fraction': 0.9991117695058381,
 'bagging_freq': 1,
 'min_child_samples': 20,
 'num_iterations': 5926}

### Model training

In [17]:
fw_logs = model_pipeline(train_df = train_df,
                            validation_df = validation_df,
                            params = params_fw,
                            target_column = target_column,
                            features = fw_features,
                            cv = 3,
                            random_state = 42,
                            apply_shap = False,
                            save_estimator_path=None
                          )

2023-10-08T18:24:30 | INFO | Starting pipeline: Generating 3 k-fold training...
2023-10-08T18:24:30 | INFO | Training for fold 1
2023-10-08T18:24:58 | INFO | Training for fold 2
2023-10-08T18:25:26 | INFO | Training for fold 3
2023-10-08T18:25:55 | INFO | CV training finished!
2023-10-08T18:25:55 | INFO | Training the model in the full dataset...
2023-10-08T18:26:28 | INFO | Training process finished!
2023-10-08T18:26:28 | INFO | Calculating metrics...
2023-10-08T18:26:28 | INFO | Full process finished in 1.96 minutes.
2023-10-08T18:26:28 | INFO | Saving the predict function.
2023-10-08T18:26:28 | INFO | Predict function saved.


In [18]:
ensemble_logs = model_pipeline(train_df = train_df,
                            validation_df = validation_df,
                            params = test_params,
                            target_column = target_column,
                            features = ensemble_features,
                            cv = 3,
                            random_state = 42,
                            apply_shap = False,
                            save_estimator_path=None
                          )

2023-10-08T18:26:28 | INFO | Starting pipeline: Generating 3 k-fold training...
2023-10-08T18:26:28 | INFO | Training for fold 1
2023-10-08T18:26:46 | INFO | Training for fold 2
2023-10-08T18:27:04 | INFO | Training for fold 3
2023-10-08T18:27:22 | INFO | CV training finished!
2023-10-08T18:27:22 | INFO | Training the model in the full dataset...
2023-10-08T18:27:42 | INFO | Training process finished!
2023-10-08T18:27:42 | INFO | Calculating metrics...
2023-10-08T18:27:42 | INFO | Full process finished in 1.24 minutes.
2023-10-08T18:27:42 | INFO | Saving the predict function.
2023-10-08T18:27:42 | INFO | Predict function saved.


In [19]:
all_logs = model_pipeline(train_df = train_df,
                            validation_df = validation_df,
                            params = params_all,
                            target_column = target_column,
                            features = all_features_list,
                            cv = 3,
                            random_state = 42,
                            apply_shap = False,
                            save_estimator_path=None
                          )

2023-10-08T18:27:43 | INFO | Starting pipeline: Generating 3 k-fold training...
2023-10-08T18:27:43 | INFO | Training for fold 1
2023-10-08T18:28:21 | INFO | Training for fold 2
2023-10-08T18:29:01 | INFO | Training for fold 3
2023-10-08T18:29:41 | INFO | CV training finished!
2023-10-08T18:29:41 | INFO | Training the model in the full dataset...
2023-10-08T18:30:30 | INFO | Training process finished!
2023-10-08T18:30:30 | INFO | Calculating metrics...
2023-10-08T18:30:30 | INFO | Full process finished in 2.80 minutes.
2023-10-08T18:30:30 | INFO | Saving the predict function.
2023-10-08T18:30:30 | INFO | Predict function saved.


In [20]:
boruta_logs = model_pipeline(train_df = train_df,
                            validation_df = validation_df,
                            params = MODEL_PARAMS,
                            target_column = target_column,
                            features = boruta_features,
                            cv = 3,
                            random_state = 42,
                            apply_shap = False,
                            save_estimator_path=None
                          )

2023-10-08T18:30:30 | INFO | Starting pipeline: Generating 3 k-fold training...
2023-10-08T18:30:30 | INFO | Training for fold 1
2023-10-08T18:32:08 | INFO | Training for fold 2
2023-10-08T18:33:47 | INFO | Training for fold 3
2023-10-08T18:35:26 | INFO | CV training finished!
2023-10-08T18:35:26 | INFO | Training the model in the full dataset...
2023-10-08T18:37:47 | INFO | Training process finished!
2023-10-08T18:37:47 | INFO | Calculating metrics...
2023-10-08T18:37:47 | INFO | Full process finished in 7.28 minutes.
2023-10-08T18:37:47 | INFO | Saving the predict function.
2023-10-08T18:37:47 | INFO | Predict function saved.


In [22]:
base_logs = model_pipeline(train_df = train_df,
                            validation_df = validation_df,
                            params = params_original,
                            target_column = target_column,
                            features = base_features,
                            cv = 3,
                            random_state = 42,
                            apply_shap = False,
                            save_estimator_path=None
                          )

2023-10-08T18:41:34 | INFO | Starting pipeline: Generating 3 k-fold training...
2023-10-08T18:41:34 | INFO | Training for fold 1
2023-10-08T18:41:56 | INFO | Training for fold 2
2023-10-08T18:42:17 | INFO | Training for fold 3
2023-10-08T18:42:38 | INFO | CV training finished!
2023-10-08T18:42:38 | INFO | Training the model in the full dataset...
2023-10-08T18:43:04 | INFO | Training process finished!
2023-10-08T18:43:04 | INFO | Calculating metrics...
2023-10-08T18:43:04 | INFO | Full process finished in 1.49 minutes.
2023-10-08T18:43:04 | INFO | Saving the predict function.
2023-10-08T18:43:04 | INFO | Predict function saved.


### Model evaluation

In [23]:
model_metrics  ={}
models = [base_logs, all_logs, fw_logs, boruta_logs, ensemble_logs]
names = ["base_features", "all features", "fw", "boruta", "ensemble"]
sizes = [len(base_features),len(all_features_list), len(fw_features), len(boruta_features), len(ensemble_features)]

for model, name, size in zip(models, names, sizes):
    model_metrics[f"{name} [{size}]"] = model["metrics"]["roc_auc"]
pd.DataFrame(model_metrics).T.sort_values(by = "validation", ascending = False)

Unnamed: 0,out_of_fold,validation
fw [32],0.863464,0.867463
boruta [46],0.864788,0.867128
all features [115],0.862991,0.867067
ensemble [55],0.86276,0.865526
base_features [10],0.864081,0.86551
