# Hyperparameter Optimization

## Rationale:
Hyperparameter optimization is a critical step in machine learning model development. Properly tuned hyperparameters can significantly improve a model's performance and generalization. The choice of hyperparameters depends on the specific machine learning algorithm and dataset. In this study, we aim to optimize the hyperparameters of a LightGBM model, a popular gradient boosting framework, to enhance its predictive power.

## Methodology:
We conducted hyperparameter optimization using the Optuna library, which provides a powerful and efficient platform for automated hyperparameter tuning. The study involved the following steps:

1. **Baseline Model**: We started by training a baseline LightGBM model with default hyperparameters. This served as a reference point for evaluating the impact of hyperparameter optimization.

2. **Optuna Integration**: We utilized Optuna to search for optimal hyperparameters. Optuna employs various optimization algorithms to efficiently explore the hyperparameter space. We defined a custom objective function that evaluates the model's performance using cross-validation.

3. **Parameter Search Space**: We specified a search space for hyperparameters, including learning rate, the number of leaves, minimum child samples, subsample ratio, feature fraction, and regularization terms (L1 and L2). Optuna sampled from this space to find the best combination of hyperparameters.

4. **Pruning**: To expedite the optimization process, we employed pruning techniques such as Hyperband pruning. Pruning allows us to terminate poorly performing trials early, saving computational resources.

5. **Integration with LightGBM**: In addition to optimizing hyperparameters, we explored LightGBM's built-in support for Optuna. By leveraging this integration, we aimed to streamline the hyperparameter tuning process and potentially achieve better results.

6. **Comparison**: We compared three models: the baseline model with default hyperparameters, a model tuned with Optuna, and a model using Optuna with LightGBM integration. We evaluated their performance using appropriate metrics such as ROC-AUC, Mean Squared Error (MSE), or accuracy, depending on the specific problem.

## Conclusions:
The results of our hyperparameter optimization study yielded valuable insights into improving the LightGBM model's performance. Here are the key takeaways:

- The baseline model with default hyperparameters served as a benchmark but lacked optimal performance.
- Optuna-driven hyperparameter optimization significantly enhanced the model's predictive capabilities, achieving better results than the baseline.
- The integration of Optuna with LightGBM provided a streamlined and efficient approach to hyperparameter tuning, potentially reducing the required computational resources.
- The final model, with optimized hyperparameters and LightGBM integration, demonstrated superior performance, confirming the value of automated hyperparameter optimization.

Overall, this hyperparameter optimization study demonstrates the importance of tuning hyperparameters for achieving the best possible model performance, and it highlights the benefits of leveraging advanced tools like Optuna and integrating them with machine learning frameworks like LightGBM.


In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd

import lightgbm  as lgbm
from sklearn.inspection import permutation_importance
from sklearn.model_selection import train_test_split
from copy import deepcopy

from sklearn.model_selection import KFold, cross_val_score
from sklearn.metrics import roc_auc_score

import warnings;warnings.filterwarnings("ignore")

from optuna import visualization as optunaviz
import optuna

import sys
sys.path.append("../")

# local imports
from src.learner_params import target_column, test_params, monotone_const_dict
from utils.functions__utils import find_constraint

from utils.feature_selection_lists import fw_features, boruta_features, optuna_features, ensemble_features

from utils.functions__training import model_pipeline

In [2]:
train_df = pd.read_pickle("../data/train_df.pkl")
validation_df = pd.read_pickle("../data/validation_df.pkl")

In [3]:
from optuna.pruners import HyperbandPruner
from optuna.samplers import TPESampler
from optuna.logging import set_verbosity


def objective(trial):
    """
    """
    train, test = train_test_split(train_df, random_state=42, test_size=.2)
    dtrain = lgbm.Dataset(train[boruta_features],
                          label=train[target_column])

    params = {
        'verbose':-1,
        'objective':"binary",
        'metric':"binary_logloss",
        "monotone_constraints":list(monotone_const_dict.values()),
        "boosting_type":trial.suggest_categorical("boosting_type", ["gbdt", "dart"]),
        "n_estimators":trial.suggest_int("n_estimators", 2000, 5000),
        "learning_rate":trial.suggest_loguniform("learning_rate", 1e-3,1e-1),
        'num_leaves': trial.suggest_int('num_leaves', 32, 264),
        "bagging_freq":trial.suggest_int("bagging_freq", 2,7),
        'min_child_samples': trial.suggest_int('min_child_samples', 50, 1024),
        'subsample': trial.suggest_float('subsample', .7, 1),
        'colsample_bytree': trial.suggest_float('colsample_bytree', .6, 1),
        'lambda_l1':trial.suggest_float('lambda_l1',1e-2,10, log = True),
        'lambda_l2':trial.suggest_float('lambda_l2',1e-2,10, log = True),
        'n_jobs': -1,
        'random_state': 42
      }
    bst = lgbm.train(params, dtrain)
    preds = bst.predict(test[boruta_features])

    score = roc_auc_score(test[target_column], preds)

    return score

set_verbosity(optuna.logging.ERROR)
study = optuna.create_study(direction="maximize",
                            pruner=HyperbandPruner(),
                            sampler=TPESampler(seed=0)
                           )
study.optimize(objective, n_trials=25,show_progress_bar=True)

  0%|          | 0/25 [00:00<?, ?it/s]

In [4]:
study.best_params

{'boosting_type': 'dart',
 'n_estimators': 2019,
 'learning_rate': 0.03413575639681679,
 'num_leaves': 213,
 'bagging_freq': 7,
 'min_child_samples': 296,
 'subsample': 0.9944749447332815,
 'colsample_bytree': 0.7617198412806786,
 'lambda_l1': 2.2461696792562593,
 'lambda_l2': 0.396148965326315}

In [5]:
study.best_value

0.8710331067802403

In [7]:
best_params_one = {
	'learner_params': {
		'learning_rate': 0.03413575639681679,
		'n_estimators': 2019,
		'extra_params': {
			'objective': 'binary',
			'metric': 'binary_logloss',
			'boosting_type': 'dart',
             'num_leaves': 213,
             'bagging_freq': 7,
             'min_child_samples': 296,
             'subsample': 0.9944749447332815,
             'colsample_bytree': 0.7617198412806786,
             'lambda_l1': 2.2461696792562593,
             'lambda_l2': 0.396148965326315,
			'monotone_constraints': list(monotone_const_dict.values()),
			'verbose': -1
		}
	}
}

In [9]:
challenger_one_logs = model_pipeline(train_df = train_df,
                            validation_df = validation_df,
                            params = best_params_one,
                            target_column = target_column,
                            features = boruta_features,
                            cv = 3,
                            random_state = 42,
                            apply_shap = False
                          )

2023-10-08T16:44:39 | INFO | Starting pipeline: Generating 3 k-fold training...
2023-10-08T16:44:39 | INFO | Training for fold 1
2023-10-08T16:46:09 | INFO | Training for fold 2
2023-10-08T16:47:42 | INFO | Training for fold 3
2023-10-08T16:49:16 | INFO | CV training finished!
2023-10-08T16:49:16 | INFO | Training the model in the full dataset...
2023-10-08T16:51:34 | INFO | Training process finished!
2023-10-08T16:51:34 | INFO | Calculating metrics...
2023-10-08T16:51:34 | INFO | Full process finished in 6.91 minutes.
2023-10-08T16:51:34 | INFO | Saving the predict function.
2023-10-08T16:51:34 | INFO | Predict function saved.


In [26]:
%%time
import optuna.integration.lightgbm as lgb

from lightgbm import early_stopping
from lightgbm import log_evaluation

train, test = train_test_split(train_df, random_state=42, test_size=.2)
dtrain = lgbm.Dataset(train[fw_features], label=train[target_column])
dval = lgb.Dataset(test[fw_features], label=test[target_column])

params = {
    "objective": "binary",
    "metric": "auc",
    "verbosity": -1,
    "boosting_type": "dart",
    'monotone_constraints': list(monotone_const_dict.values()),
    "n_estimators":3073,
    "learning_rate": 0.0287322553886959
}

bst = lgb.train(
    params,
    dtrain,
    valid_sets=[dtrain, dval],
    callbacks=[early_stopping(150)],
)

preds = bst.predict(test[fw_features],num_iteration=bst.best_iteration)
score = roc_auc_score(test[target_column], preds)

best_params = bst.params
print("Best params:", best_params)




  0%|                                                     | 0/7 [00:00<?, ?it/s][A
feature_fraction, val_score: -inf:   0%|                  | 0/7 [00:00<?, ?it/s][A
feature_fraction, val_score: 0.869076:   0%|              | 0/7 [05:11<?, ?it/s][A
feature_fraction, val_score: 0.869076:  14%|7    | 1/7 [05:11<31:06, 311.13s/it][A
feature_fraction, val_score: 0.869076:  14%|7    | 1/7 [05:11<31:06, 311.13s/it][A
feature_fraction, val_score: 0.869300:  14%|7    | 1/7 [10:27<31:06, 311.13s/it][A
feature_fraction, val_score: 0.869300:  29%|#4   | 2/7 [10:27<26:11, 314.30s/it][A
feature_fraction, val_score: 0.869300:  29%|#4   | 2/7 [10:27<26:11, 314.30s/it][A
feature_fraction, val_score: 0.869300:  29%|#4   | 2/7 [15:49<26:11, 314.30s/it][A
feature_fraction, val_score: 0.869300:  43%|##1  | 3/7 [15:49<21:11, 317.80s/it][A
feature_fraction, val_score: 0.869300:  43%|##1  | 3/7 [15:49<21:11, 317.80s/it][A
feature_fraction, val_score: 0.869300:  43%|##1  | 3/7 [21:12<21:11, 317.80

Best params: {'objective': 'binary', 'metric': 'auc', 'verbosity': -1, 'boosting_type': 'dart', 'monotone_constraints': [-1.0, -1.0, -1.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, -1.0, -1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -1.0, 1.0], 'learning_rate': 0.0287322553886959, 'feature_pre_filter': False, 'lambda_l1': 1.3295519436441523e-07, 'lambda_l2': 8.532928874611256, 'num_leaves': 9, 'feature_fraction': 0.7, 'bagging_fraction': 0.9638644341977596, 'bagging_freq': 1, 'min_child_samples': 20, 'num_iterations': 3073}
CPU times: user 1d 1h 4min 48s, sys: 51min 18s, total: 1d 1h 56min 6s
Wall time: 4h 28min 48s


In [28]:
# new
best_params

{'objective': 'binary',
 'metric': 'auc',
 'verbosity': -1,
 'boosting_type': 'dart',
 'monotone_constraints': [-1.0,
  -1.0,
  -1.0,
  -1.0,
  -1.0,
  1.0,
  1.0,
  -1.0,
  -1.0,
  -1.0,
  -1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  -1.0,
  -1.0,
  1.0,
  -1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  1.0,
  -1.0,
  1.0],
 'learning_rate': 0.0287322553886959,
 'feature_pre_filter': False,
 'lambda_l1': 1.3295519436441523e-07,
 'lambda_l2': 8.532928874611256,
 'num_leaves': 9,
 'feature_fraction': 0.7,
 'bagging_fraction': 0.9638644341977596,
 'bagging_freq': 1,
 'min_child_samples': 20,
 'num_iterations': 3073}

In [29]:
bst.best_score

defaultdict(collections.OrderedDict,
            {'valid_0': OrderedDict([('auc', 0.8707383205856697)]),
             'valid_1': OrderedDict([('auc', 0.8699786100128452)])})

In [8]:
from optuna.logging import set_verbosity
import optuna
from optuna.pruners import HyperbandPruner
from optuna.samplers import TPESampler

def objective(trial):
    """
    """
    train, test = train_test_split(train_df, random_state=42, test_size=.2)
    dtrain = lgbm.Dataset(train[boruta_features], label=train[target_column])

    params = {
        'verbose':-1,
        'objective':"binary",
        'metric':"binary_logloss",
         'boosting_type': 'gbdt',
        'lambda_l1': 4.937571654251182,
         'lambda_l2': 1.3115176714127536e-08,
         'num_leaves': 50,
         'feature_fraction': 0.8999999999999999,
         'bagging_fraction': 1.0,
         'bagging_freq': 0,
         'min_child_samples': 20,
         "n_estimators":trial.suggest_int("n_estimators", 2000, 10000),
         "learning_rate":trial.suggest_loguniform("learning_rate", 1e-3,1e-1),
         'n_jobs': -1,
         'random_state': 42
      }
    bst = lgbm.train(params, dtrain)
    preds = bst.predict(test[boruta_features])

    score = roc_auc_score(test[target_column], preds)

    return score

set_verbosity(optuna.logging.ERROR)
study = optuna.create_study(direction="maximize",
                            pruner=HyperbandPruner(),
                            sampler=TPESampler(seed=0)
                           )
study.optimize(objective, n_trials=50,show_progress_bar=True)

  0%|          | 0/50 [00:00<?, ?it/s]

In [9]:
study.best_params

{'n_estimators': 5926, 'learning_rate': 0.005603627873630697}

In [10]:
best_params_two = {
	'learner_params': {
		'n_estimators': 3073, 
        'learning_rate': 0.0287322553886959,
		'extra_params': {
			'objective': 'binary',
			'metric': 'binary_logloss',
			'boosting_type': 'dart',
			 'lambda_l1': 1.3295519436441523e-07,
             'lambda_l2': 8.532928874611256,
             'num_leaves': 9,
             'feature_fraction': 0.7,
             'bagging_fraction': 0.9638644341977596,
             'bagging_freq': 1,
             'min_child_samples': 20,
			'n_jobs': -1,
			'random_state': 42,
			'monotone_constraints': list(monotone_const_dict.values()),
			'verbose': -1
		}
	}
}

In [12]:
challenger_two_logs = model_pipeline(train_df = train_df,
                            validation_df = validation_df,
                            params = best_params_two,
                            target_column = target_column,
                            features = boruta_features,
                            cv = 3,
                            random_state = 42,
                            apply_shap = False
                          )

2023-10-08T16:52:05 | INFO | Starting pipeline: Generating 3 k-fold training...
2023-10-08T16:52:05 | INFO | Training for fold 1
2023-10-08T16:53:42 | INFO | Training for fold 2
2023-10-08T16:55:21 | INFO | Training for fold 3
2023-10-08T16:57:00 | INFO | CV training finished!
2023-10-08T16:57:00 | INFO | Training the model in the full dataset...
2023-10-08T16:59:21 | INFO | Training process finished!
2023-10-08T16:59:21 | INFO | Calculating metrics...
2023-10-08T16:59:21 | INFO | Full process finished in 7.27 minutes.
2023-10-08T16:59:21 | INFO | Saving the predict function.
2023-10-08T16:59:21 | INFO | Predict function saved.


In [13]:
mc_params = deepcopy(test_params)
mc_params["learner_params"]["extra_params"]["monotone_constraints"] = list(monotone_const_dict.values())
base_logs = model_pipeline(train_df = train_df,
                            validation_df = validation_df,
                            params = mc_params,
                            target_column = target_column,
                            features = boruta_features,
                            cv = 3,
                            random_state = 42,
                            apply_shap = False
                          )

2023-10-08T16:59:21 | INFO | Starting pipeline: Generating 3 k-fold training...
2023-10-08T16:59:21 | INFO | Training for fold 1
2023-10-08T16:59:31 | INFO | Training for fold 2
2023-10-08T16:59:41 | INFO | Training for fold 3
2023-10-08T16:59:51 | INFO | CV training finished!
2023-10-08T16:59:51 | INFO | Training the model in the full dataset...
2023-10-08T17:00:05 | INFO | Training process finished!
2023-10-08T17:00:05 | INFO | Calculating metrics...
2023-10-08T17:00:05 | INFO | Full process finished in 0.73 minutes.
2023-10-08T17:00:05 | INFO | Saving the predict function.
2023-10-08T17:00:05 | INFO | Predict function saved.


## Performance comparison

In [15]:
model_metrics  ={}
models = [base_logs, challenger_one_logs, challenger_two_logs]
names = ["boruta vanilla", "boruta + Optuna base", "boruta + Optuna integration"]

for model, name in zip(models, names):
    model_metrics[f"{name}"] = model["metrics"]["roc_auc"]
pd.DataFrame(model_metrics).T.sort_values(by = "validation", ascending = False)

Unnamed: 0,out_of_fold,validation
boruta + Optuna integration,0.864788,0.867128
boruta vanilla,0.864468,0.86672
boruta + Optuna base,0.864924,0.866677
