# <center>LightGBM hyperparameter tuning</center>

# 1. Choosing and importing right libraries
[Optuna](https://optuna.org/) is one of the best and easiest to use libraries for optimization. Its' API allows for great flexibility when it comes to creating, stopping and editing trials making it a very important tool for Kaggle contests. Since Optuna has a specialized API for XGBoost models, we will use it to speed up the search.

In [1]:
!pip install optuna-integration

Collecting optuna-integration
  Downloading optuna_integration-3.6.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna_integration-3.6.0-py3-none-any.whl (93 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.4/93.4 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: optuna-integration
Successfully installed optuna-integration-3.6.0


In [2]:
!pip install lightgbm --upgrade

Collecting lightgbm
  Downloading lightgbm-4.5.0-py3-none-manylinux_2_28_x86_64.whl.metadata (17 kB)
Downloading lightgbm-4.5.0-py3-none-manylinux_2_28_x86_64.whl (3.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m38.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: lightgbm
  Attempting uninstall: lightgbm
    Found existing installation: lightgbm 4.2.0
    Uninstalling lightgbm-4.2.0:
      Successfully uninstalled lightgbm-4.2.0
Successfully installed lightgbm-4.5.0


In [3]:
import warnings
import gc
import importlib
import os

warnings.filterwarnings("ignore", category=FutureWarning, module="seaborn")

import pandas as pd
pd.set_option('display.float_format', lambda x: '%.3f' % x)

from sklearn.metrics import matthews_corrcoef

import lightgbm as lgb

from multiprocessing import cpu_count

import optuna
opt_int = importlib.import_module("optuna_integration")

import pickle

import numpy as np

In [4]:
RAND = 42

In [5]:
import shutil
src_path = r"/kaggle/input/ps4e7-lgbmclassifier-tuner/lgbm.db"
dst_path = r"/kaggle/working/lgbm.db"

try:
    shutil.copy(src_path, dst_path)
    print('Copied')
except FileNotFoundError:
    print("lgbm.db not found!")

lgbm.db not found!


# 2. Preparing data for LightGBM model
We will use LightGBM native API as it allows for better flexibility with Callbacks and early stopping which are crucial for the development. In order to do that, we need to prepare our data and turn it into DMatrix with categorical features.

In [6]:
X_train = pd.read_parquet('/kaggle/input/ps4e8-data-eng/train.parquet')
y_train = X_train.pop('class')

df_train = lgb.Dataset(X_train, label=y_train)

X_test = pd.read_parquet('/kaggle/input/ps4e8-data-eng/test.parquet')
y_test = X_test.pop('class')

df_test = lgb.Dataset(X_test, label=y_test, reference=df_train)

X_train.head()

Unnamed: 0_level_0,cap-diameter,cap-shape,cap-surface,cap-color,does-bruise-or-bleed,gill-attachment,gill-spacing,gill-color,stem-height,stem-width,stem-root,stem-surface,stem-color,veil-type,veil-color,has-ring,ring-type,spore-print-color,habitat,season
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2413129,16.16,0,6,1,0,4,-1,0,16.89,17.66,3,0,0,0,1,1,5,-1,0,2
1910266,3.42,4,-1,10,0,5,2,5,3.62,15.31,-1,3,8,-1,-1,0,0,-1,0,2
1213509,8.11,0,1,8,0,-1,0,0,11.25,15.03,-1,-1,0,0,1,1,5,-1,0,2
387249,6.67,5,10,7,1,1,0,6,7.32,16.53,-1,-1,0,-1,-1,0,0,-1,0,0
2498558,12.23,0,2,5,1,4,-1,0,10.18,16.8,-1,-1,0,-1,-1,1,-1,-1,0,2


In [7]:
train_cols = X_train.columns
train_cols

Index(['cap-diameter', 'cap-shape', 'cap-surface', 'cap-color',
       'does-bruise-or-bleed', 'gill-attachment', 'gill-spacing', 'gill-color',
       'stem-height', 'stem-width', 'stem-root', 'stem-surface', 'stem-color',
       'veil-type', 'veil-color', 'has-ring', 'ring-type', 'spore-print-color',
       'habitat', 'season'],
      dtype='object')

In [8]:
X_train.shape

(2493556, 20)

In [9]:
X_train.dtypes

cap-diameter            float64
cap-shape                 int64
cap-surface               int64
cap-color                 int64
does-bruise-or-bleed      int64
gill-attachment           int64
gill-spacing              int64
gill-color                int64
stem-height             float64
stem-width              float64
stem-root                 int64
stem-surface              int64
stem-color                int64
veil-type                 int64
veil-color                int64
has-ring                  int64
ring-type                 int64
spore-print-color         int64
habitat                   int64
season                    int64
dtype: object

# 3. Defining custom XGBoost metric
Since the contest uses a custom (not available in LightGBM) metric, we need to create it ourselves! The metric used is [Matthews correlation coefficient](https://en.wikipedia.org/wiki/Phi_coefficient) known in statistics as phi coefficient.

In [10]:
def mcc(preds: np.ndarray, data: lgb.Dataset):
    # # eval_name, eval_result, is_higher_better
    return 'mcc', matthews_corrcoef(data.get_label(), np.round(preds)), True

In [11]:
def objective(trial):
    params = {
        'num_leaves': trial.suggest_int('num_leaves', 10, 128),
        'max_depth': trial.suggest_int('max_depth', 3, 10),
        'learning_rate': 0.01,
        'subsample': trial.suggest_float('subsample', 0.2, 1.0),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.2, 1.0),
        'reg_alpha': trial.suggest_int('reg_alpha', 1, 1000, log=True),
        'reg_lambda': trial.suggest_int('reg_lambda', 1, 1000, log=True),
        'objective': 'binary',
        'metric': 'logloss',
        'seed': RAND,
        'verbose': -1,
        'num_threads': cpu_count()
    }
    
    model = lgb.train(
        params,
        df_train,
        num_boost_round=2000,
        valid_sets=[df_test],
        feval=mcc,
        callbacks=[lgb.early_stopping(300, verbose=True, min_delta=1e-4), lgb.log_evaluation(period=20)]
    )
    
    return matthews_corrcoef(y_test, np.round(model.predict(X_test)))

# 4. Using existing Optuna study
We can reuse already existing study if it exists. This way we don't have to worry about Kaggle's 12 hour limit as we can just run this notebook multiple times each time reusing sampler and pruner from previous one. [Learn more](https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/001_rdb.html).

In [12]:
study_name = 'lgbm-tune'
storage = 'sqlite:///lgbm.db'

In [13]:
try:
    sampler = pickle.load(open("/kaggle/input/ps4e7-lgbmclassifier-tuner/sampler.pkl", "rb"))
    print("Choosing saved sampler!")
except FileNotFoundError:
    sampler = None
    print("Default sampler")

Default sampler


In [14]:
try:
    pruner = pickle.load(open("/kaggle/input/ps4e7-lgbmclassifier-tuner/pruner.pkl", "rb"))
    print("Choosing saved pruner!")
except FileNotFoundError:
    pruner = pruner = optuna.pruners.HyperbandPruner()
    print("Default pruner")

Default pruner


In [15]:
study = optuna.create_study(study_name=study_name, storage=storage, direction='maximize', load_if_exists=True, sampler=sampler, pruner=pruner)
study.optimize(objective, n_trials=20)

[I 2024-08-08 10:49:33,242] A new study created in RDB with name: lgbm-tune


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.525939
[40]	valid_0's mcc: 0.585949
[60]	valid_0's mcc: 0.66847
[80]	valid_0's mcc: 0.732165
[100]	valid_0's mcc: 0.767145
[120]	valid_0's mcc: 0.784729
[140]	valid_0's mcc: 0.799377
[160]	valid_0's mcc: 0.79945
[180]	valid_0's mcc: 0.807205
[200]	valid_0's mcc: 0.810993
[220]	valid_0's mcc: 0.814103
[240]	valid_0's mcc: 0.820454
[260]	valid_0's mcc: 0.833051
[280]	valid_0's mcc: 0.838163
[300]	valid_0's mcc: 0.844857
[320]	valid_0's mcc: 0.848629
[340]	valid_0's mcc: 0.851061
[360]	valid_0's mcc: 0.856363
[380]	valid_0's mcc: 0.869478
[400]	valid_0's mcc: 0.871615
[420]	valid_0's mcc: 0.873935
[440]	valid_0's mcc: 0.875103
[460]	valid_0's mcc: 0.876137
[480]	valid_0's mcc: 0.876769
[500]	valid_0's mcc: 0.878065
[520]	valid_0's mcc: 0.87847
[540]	valid_0's mcc: 0.878217
[560]	valid_0's mcc: 0.880812
[580]	valid_0's mcc: 0.884607
[600]	valid_0's mcc: 0.88671
[620]	valid_0's mcc: 0.887837
[640]	valid_0's 

[I 2024-08-08 11:07:00,671] Trial 0 finished with value: 0.9604127819411883 and parameters: {'num_leaves': 115, 'max_depth': 4, 'subsample': 0.7777110917680685, 'colsample_bytree': 0.971475963881673, 'reg_alpha': 715, 'reg_lambda': 420}. Best is trial 0 with value: 0.9604127819411883.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.454831
[40]	valid_0's mcc: 0.560945
[60]	valid_0's mcc: 0.574951
[80]	valid_0's mcc: 0.624202
[100]	valid_0's mcc: 0.645406
[120]	valid_0's mcc: 0.639195
[140]	valid_0's mcc: 0.647958
[160]	valid_0's mcc: 0.659827
[180]	valid_0's mcc: 0.684768
[200]	valid_0's mcc: 0.691742
[220]	valid_0's mcc: 0.693665
[240]	valid_0's mcc: 0.691419
[260]	valid_0's mcc: 0.68866
[280]	valid_0's mcc: 0.692034
[300]	valid_0's mcc: 0.701632
[320]	valid_0's mcc: 0.712056
[340]	valid_0's mcc: 0.731153
[360]	valid_0's mcc: 0.74003
[380]	valid_0's mcc: 0.747313
[400]	valid_0's mcc: 0.769772
[420]	valid_0's mcc: 0.773007
[440]	valid_0's mcc: 0.779016
[460]	valid_0's mcc: 0.781154
[480]	valid_0's mcc: 0.786983
[500]	valid_0's mcc: 0.793265
[520]	valid_0's mcc: 0.795029
[540]	valid_0's mcc: 0.803718
[560]	valid_0's mcc: 0.805373
[580]	valid_0's mcc: 0.807344
[600]	valid_0's mcc: 0.812373
[620]	valid_0's mcc: 0.818604
[640]	valid_0'

[I 2024-08-08 11:24:11,134] Trial 1 finished with value: 0.9128963702908467 and parameters: {'num_leaves': 12, 'max_depth': 3, 'subsample': 0.4211725054216306, 'colsample_bytree': 0.7766507288251865, 'reg_alpha': 23, 'reg_lambda': 292}. Best is trial 0 with value: 0.9604127819411883.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.713806
[40]	valid_0's mcc: 0.92392
[60]	valid_0's mcc: 0.953496
[80]	valid_0's mcc: 0.962545
[100]	valid_0's mcc: 0.968084
[120]	valid_0's mcc: 0.971047
[140]	valid_0's mcc: 0.972874
[160]	valid_0's mcc: 0.974217
[180]	valid_0's mcc: 0.974909
[200]	valid_0's mcc: 0.975604
[220]	valid_0's mcc: 0.976075
[240]	valid_0's mcc: 0.976449
[260]	valid_0's mcc: 0.976994
[280]	valid_0's mcc: 0.977699
[300]	valid_0's mcc: 0.978114
[320]	valid_0's mcc: 0.9785
[340]	valid_0's mcc: 0.978705
[360]	valid_0's mcc: 0.979258
[380]	valid_0's mcc: 0.979347
[400]	valid_0's mcc: 0.979745
[420]	valid_0's mcc: 0.979894
[440]	valid_0's mcc: 0.980153
[460]	valid_0's mcc: 0.980421
[480]	valid_0's mcc: 0.980593
[500]	valid_0's mcc: 0.980774
[520]	valid_0's mcc: 0.980926
[540]	valid_0's mcc: 0.980998
[560]	valid_0's mcc: 0.981089
[580]	valid_0's mcc: 0.981218
[600]	valid_0's mcc: 0.981344
[620]	valid_0's mcc: 0.981428
[640]	valid_0's

[I 2024-08-08 11:47:33,422] Trial 2 finished with value: 0.9831417833415508 and parameters: {'num_leaves': 82, 'max_depth': 10, 'subsample': 0.9779859952396528, 'colsample_bytree': 0.3273036581339024, 'reg_alpha': 88, 'reg_lambda': 37}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.790459
[40]	valid_0's mcc: 0.879867
[60]	valid_0's mcc: 0.897797
[80]	valid_0's mcc: 0.913134
[100]	valid_0's mcc: 0.924004
[120]	valid_0's mcc: 0.933549
[140]	valid_0's mcc: 0.938678
[160]	valid_0's mcc: 0.942723
[180]	valid_0's mcc: 0.944285
[200]	valid_0's mcc: 0.947138
[220]	valid_0's mcc: 0.950107
[240]	valid_0's mcc: 0.951168
[260]	valid_0's mcc: 0.954694
[280]	valid_0's mcc: 0.95683
[300]	valid_0's mcc: 0.957775
[320]	valid_0's mcc: 0.958813
[340]	valid_0's mcc: 0.96039
[360]	valid_0's mcc: 0.961454
[380]	valid_0's mcc: 0.963016
[400]	valid_0's mcc: 0.964592
[420]	valid_0's mcc: 0.965712
[440]	valid_0's mcc: 0.967416
[460]	valid_0's mcc: 0.968495
[480]	valid_0's mcc: 0.969032
[500]	valid_0's mcc: 0.969738
[520]	valid_0's mcc: 0.970723
[540]	valid_0's mcc: 0.971616
[560]	valid_0's mcc: 0.972114
[580]	valid_0's mcc: 0.972428
[600]	valid_0's mcc: 0.97268
[620]	valid_0's mcc: 0.973109
[640]	valid_0's

[I 2024-08-08 12:10:23,358] Trial 3 finished with value: 0.9807301071869163 and parameters: {'num_leaves': 37, 'max_depth': 9, 'subsample': 0.3551869432355755, 'colsample_bytree': 0.7628980641142333, 'reg_alpha': 104, 'reg_lambda': 369}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.504104
[40]	valid_0's mcc: 0.785046
[60]	valid_0's mcc: 0.861698
[80]	valid_0's mcc: 0.888366
[100]	valid_0's mcc: 0.89841
[120]	valid_0's mcc: 0.907373
[140]	valid_0's mcc: 0.913129
[160]	valid_0's mcc: 0.918976
[180]	valid_0's mcc: 0.922091
[200]	valid_0's mcc: 0.92562
[220]	valid_0's mcc: 0.927853
[240]	valid_0's mcc: 0.929771
[260]	valid_0's mcc: 0.932767
[280]	valid_0's mcc: 0.93445
[300]	valid_0's mcc: 0.93679
[320]	valid_0's mcc: 0.938071
[340]	valid_0's mcc: 0.940087
[360]	valid_0's mcc: 0.941732
[380]	valid_0's mcc: 0.942991
[400]	valid_0's mcc: 0.944525
[420]	valid_0's mcc: 0.945546
[440]	valid_0's mcc: 0.947793
[460]	valid_0's mcc: 0.949757
[480]	valid_0's mcc: 0.951967
[500]	valid_0's mcc: 0.953649
[520]	valid_0's mcc: 0.954231
[540]	valid_0's mcc: 0.955109
[560]	valid_0's mcc: 0.956295
[580]	valid_0's mcc: 0.957457
[600]	valid_0's mcc: 0.958773
[620]	valid_0's mcc: 0.959671
[640]	valid_0's 

[I 2024-08-08 12:31:16,357] Trial 4 finished with value: 0.9783984331325429 and parameters: {'num_leaves': 125, 'max_depth': 6, 'subsample': 0.4780059932294009, 'colsample_bytree': 0.42085483798250717, 'reg_alpha': 7, 'reg_lambda': 940}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.820979
[40]	valid_0's mcc: 0.935944
[60]	valid_0's mcc: 0.95734
[80]	valid_0's mcc: 0.962584
[100]	valid_0's mcc: 0.966324
[120]	valid_0's mcc: 0.968185
[140]	valid_0's mcc: 0.969135
[160]	valid_0's mcc: 0.970289
[180]	valid_0's mcc: 0.971723
[200]	valid_0's mcc: 0.972339
[220]	valid_0's mcc: 0.97257
[240]	valid_0's mcc: 0.972988
[260]	valid_0's mcc: 0.97348
[280]	valid_0's mcc: 0.973858
[300]	valid_0's mcc: 0.973902
[320]	valid_0's mcc: 0.974129
[340]	valid_0's mcc: 0.974595
[360]	valid_0's mcc: 0.974739
[380]	valid_0's mcc: 0.975249
[400]	valid_0's mcc: 0.975565
[420]	valid_0's mcc: 0.975836
[440]	valid_0's mcc: 0.976146
[460]	valid_0's mcc: 0.976402
[480]	valid_0's mcc: 0.976493
[500]	valid_0's mcc: 0.976619
[520]	valid_0's mcc: 0.976693
[540]	valid_0's mcc: 0.976935
[560]	valid_0's mcc: 0.977041
[580]	valid_0's mcc: 0.977354
[600]	valid_0's mcc: 0.977435
[620]	valid_0's mcc: 0.97765
[640]	valid_0's 

[I 2024-08-08 12:52:53,092] Trial 5 finished with value: 0.9794912284438911 and parameters: {'num_leaves': 102, 'max_depth': 10, 'subsample': 0.7563664713011369, 'colsample_bytree': 0.38704248032690347, 'reg_alpha': 675, 'reg_lambda': 3}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.435734
[40]	valid_0's mcc: 0.728536
[60]	valid_0's mcc: 0.818152
[80]	valid_0's mcc: 0.847996
[100]	valid_0's mcc: 0.863908
[120]	valid_0's mcc: 0.87065
[140]	valid_0's mcc: 0.871473
[160]	valid_0's mcc: 0.883387
[180]	valid_0's mcc: 0.888595
[200]	valid_0's mcc: 0.893497
[220]	valid_0's mcc: 0.895186
[240]	valid_0's mcc: 0.896861
[260]	valid_0's mcc: 0.899008
[280]	valid_0's mcc: 0.902004
[300]	valid_0's mcc: 0.906779
[320]	valid_0's mcc: 0.907755
[340]	valid_0's mcc: 0.911423
[360]	valid_0's mcc: 0.913313
[380]	valid_0's mcc: 0.914891
[400]	valid_0's mcc: 0.91719
[420]	valid_0's mcc: 0.918803
[440]	valid_0's mcc: 0.920348
[460]	valid_0's mcc: 0.922713
[480]	valid_0's mcc: 0.923673
[500]	valid_0's mcc: 0.925077
[520]	valid_0's mcc: 0.925991
[540]	valid_0's mcc: 0.926791
[560]	valid_0's mcc: 0.927541
[580]	valid_0's mcc: 0.928488
[600]	valid_0's mcc: 0.930449
[620]	valid_0's mcc: 0.933203
[640]	valid_0'

[I 2024-08-08 13:13:00,955] Trial 6 finished with value: 0.9761177924819107 and parameters: {'num_leaves': 53, 'max_depth': 5, 'subsample': 0.3941567320942593, 'colsample_bytree': 0.4650112104376626, 'reg_alpha': 2, 'reg_lambda': 1}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.711054
[40]	valid_0's mcc: 0.838429
[60]	valid_0's mcc: 0.857143
[80]	valid_0's mcc: 0.866026
[100]	valid_0's mcc: 0.888649
[120]	valid_0's mcc: 0.903339
[140]	valid_0's mcc: 0.908933
[160]	valid_0's mcc: 0.917598
[180]	valid_0's mcc: 0.920038
[200]	valid_0's mcc: 0.92095
[220]	valid_0's mcc: 0.922262
[240]	valid_0's mcc: 0.923629
[260]	valid_0's mcc: 0.926384
[280]	valid_0's mcc: 0.933596
[300]	valid_0's mcc: 0.93565
[320]	valid_0's mcc: 0.939166
[340]	valid_0's mcc: 0.941886
[360]	valid_0's mcc: 0.943685
[380]	valid_0's mcc: 0.945296
[400]	valid_0's mcc: 0.947412
[420]	valid_0's mcc: 0.950777
[440]	valid_0's mcc: 0.952111
[460]	valid_0's mcc: 0.953705
[480]	valid_0's mcc: 0.955668
[500]	valid_0's mcc: 0.957713
[520]	valid_0's mcc: 0.958884
[540]	valid_0's mcc: 0.959763
[560]	valid_0's mcc: 0.961008
[580]	valid_0's mcc: 0.961366
[600]	valid_0's mcc: 0.96145
[620]	valid_0's mcc: 0.961942
[640]	valid_0's

[I 2024-08-08 13:34:35,095] Trial 7 finished with value: 0.97994437896411 and parameters: {'num_leaves': 76, 'max_depth': 6, 'subsample': 0.42985815931908544, 'colsample_bytree': 0.7946608987130832, 'reg_alpha': 175, 'reg_lambda': 113}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.697055
[40]	valid_0's mcc: 0.832668
[60]	valid_0's mcc: 0.857358
[80]	valid_0's mcc: 0.864511
[100]	valid_0's mcc: 0.884655
[120]	valid_0's mcc: 0.903424
[140]	valid_0's mcc: 0.908315
[160]	valid_0's mcc: 0.915035
[180]	valid_0's mcc: 0.917809
[200]	valid_0's mcc: 0.920155
[220]	valid_0's mcc: 0.921219
[240]	valid_0's mcc: 0.922367
[260]	valid_0's mcc: 0.925165
[280]	valid_0's mcc: 0.927205
[300]	valid_0's mcc: 0.933811
[320]	valid_0's mcc: 0.93662
[340]	valid_0's mcc: 0.940416
[360]	valid_0's mcc: 0.942629
[380]	valid_0's mcc: 0.944201
[400]	valid_0's mcc: 0.945696
[420]	valid_0's mcc: 0.947167
[440]	valid_0's mcc: 0.948648
[460]	valid_0's mcc: 0.951307
[480]	valid_0's mcc: 0.953302
[500]	valid_0's mcc: 0.955401
[520]	valid_0's mcc: 0.957036
[540]	valid_0's mcc: 0.958302
[560]	valid_0's mcc: 0.9596
[580]	valid_0's mcc: 0.960355
[600]	valid_0's mcc: 0.961469
[620]	valid_0's mcc: 0.962004
[640]	valid_0's

[I 2024-08-08 13:55:34,921] Trial 8 finished with value: 0.9793291013831603 and parameters: {'num_leaves': 85, 'max_depth': 6, 'subsample': 0.2351756282934642, 'colsample_bytree': 0.8034415523142606, 'reg_alpha': 38, 'reg_lambda': 378}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.720223
[40]	valid_0's mcc: 0.886194
[60]	valid_0's mcc: 0.913755
[80]	valid_0's mcc: 0.92673
[100]	valid_0's mcc: 0.934752
[120]	valid_0's mcc: 0.940957
[140]	valid_0's mcc: 0.945974
[160]	valid_0's mcc: 0.949968
[180]	valid_0's mcc: 0.953691
[200]	valid_0's mcc: 0.956268
[220]	valid_0's mcc: 0.958174
[240]	valid_0's mcc: 0.959409
[260]	valid_0's mcc: 0.9605
[280]	valid_0's mcc: 0.961528
[300]	valid_0's mcc: 0.962819
[320]	valid_0's mcc: 0.963692
[340]	valid_0's mcc: 0.964412
[360]	valid_0's mcc: 0.965203
[380]	valid_0's mcc: 0.96603
[400]	valid_0's mcc: 0.96658
[420]	valid_0's mcc: 0.967261
[440]	valid_0's mcc: 0.969355
[460]	valid_0's mcc: 0.970306
[480]	valid_0's mcc: 0.970914
[500]	valid_0's mcc: 0.971258
[520]	valid_0's mcc: 0.971646
[540]	valid_0's mcc: 0.972298
[560]	valid_0's mcc: 0.973213
[580]	valid_0's mcc: 0.973713
[600]	valid_0's mcc: 0.974161
[620]	valid_0's mcc: 0.974562
[640]	valid_0's m

[I 2024-08-08 14:17:46,042] Trial 9 finished with value: 0.9815366014981884 and parameters: {'num_leaves': 38, 'max_depth': 10, 'subsample': 0.38004078751574416, 'colsample_bytree': 0.5439757005409899, 'reg_alpha': 4, 'reg_lambda': 320}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.356654
[40]	valid_0's mcc: 0.73139
[60]	valid_0's mcc: 0.844764
[80]	valid_0's mcc: 0.884714
[100]	valid_0's mcc: 0.903812
[120]	valid_0's mcc: 0.922026
[140]	valid_0's mcc: 0.928169
[160]	valid_0's mcc: 0.935185
[180]	valid_0's mcc: 0.941853
[200]	valid_0's mcc: 0.946943
[220]	valid_0's mcc: 0.949327
[240]	valid_0's mcc: 0.953372
[260]	valid_0's mcc: 0.956257
[280]	valid_0's mcc: 0.958521
[300]	valid_0's mcc: 0.960382
[320]	valid_0's mcc: 0.962845
[340]	valid_0's mcc: 0.964906
[360]	valid_0's mcc: 0.965212
[380]	valid_0's mcc: 0.965732
[400]	valid_0's mcc: 0.966997
[420]	valid_0's mcc: 0.967909
[440]	valid_0's mcc: 0.968692
[460]	valid_0's mcc: 0.96966
[480]	valid_0's mcc: 0.970251
[500]	valid_0's mcc: 0.970583
[520]	valid_0's mcc: 0.970494
[540]	valid_0's mcc: 0.971058
[560]	valid_0's mcc: 0.971817
[580]	valid_0's mcc: 0.972293
[600]	valid_0's mcc: 0.972826
[620]	valid_0's mcc: 0.973148
[640]	valid_0'

[I 2024-08-08 14:39:46,764] Trial 10 finished with value: 0.9815777583425594 and parameters: {'num_leaves': 91, 'max_depth': 8, 'subsample': 0.9970946085232921, 'colsample_bytree': 0.2081845691739894, 'reg_alpha': 1, 'reg_lambda': 21}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.357126
[40]	valid_0's mcc: 0.731612
[60]	valid_0's mcc: 0.845418
[80]	valid_0's mcc: 0.884614
[100]	valid_0's mcc: 0.904446
[120]	valid_0's mcc: 0.922395
[140]	valid_0's mcc: 0.928567
[160]	valid_0's mcc: 0.935299
[180]	valid_0's mcc: 0.94216
[200]	valid_0's mcc: 0.94687
[220]	valid_0's mcc: 0.949485
[240]	valid_0's mcc: 0.953289
[260]	valid_0's mcc: 0.95642
[280]	valid_0's mcc: 0.958666
[300]	valid_0's mcc: 0.960531
[320]	valid_0's mcc: 0.962813
[340]	valid_0's mcc: 0.964958
[360]	valid_0's mcc: 0.965459
[380]	valid_0's mcc: 0.965934
[400]	valid_0's mcc: 0.96714
[420]	valid_0's mcc: 0.968192
[440]	valid_0's mcc: 0.968903
[460]	valid_0's mcc: 0.969842
[480]	valid_0's mcc: 0.970424
[500]	valid_0's mcc: 0.970808
[520]	valid_0's mcc: 0.970687
[540]	valid_0's mcc: 0.971264
[560]	valid_0's mcc: 0.971967
[580]	valid_0's mcc: 0.972425
[600]	valid_0's mcc: 0.972901
[620]	valid_0's mcc: 0.97325
[640]	valid_0's m

[I 2024-08-08 15:01:52,047] Trial 11 finished with value: 0.9815198208734373 and parameters: {'num_leaves': 93, 'max_depth': 8, 'subsample': 0.9961617667119053, 'colsample_bytree': 0.21930033566896312, 'reg_alpha': 1, 'reg_lambda': 19}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.340752
[40]	valid_0's mcc: 0.717344
[60]	valid_0's mcc: 0.835979
[80]	valid_0's mcc: 0.876726
[100]	valid_0's mcc: 0.897537
[120]	valid_0's mcc: 0.915509
[140]	valid_0's mcc: 0.92302
[160]	valid_0's mcc: 0.928724
[180]	valid_0's mcc: 0.935665
[200]	valid_0's mcc: 0.94134
[220]	valid_0's mcc: 0.944348
[240]	valid_0's mcc: 0.948317
[260]	valid_0's mcc: 0.951744
[280]	valid_0's mcc: 0.954101
[300]	valid_0's mcc: 0.956621
[320]	valid_0's mcc: 0.959386
[340]	valid_0's mcc: 0.961589
[360]	valid_0's mcc: 0.961554
[380]	valid_0's mcc: 0.962277
[400]	valid_0's mcc: 0.964024
[420]	valid_0's mcc: 0.965068
[440]	valid_0's mcc: 0.966272
[460]	valid_0's mcc: 0.967256
[480]	valid_0's mcc: 0.967896
[500]	valid_0's mcc: 0.968352
[520]	valid_0's mcc: 0.968312
[540]	valid_0's mcc: 0.969045
[560]	valid_0's mcc: 0.96977
[580]	valid_0's mcc: 0.970198
[600]	valid_0's mcc: 0.970798
[620]	valid_0's mcc: 0.971193
[640]	valid_0's

[I 2024-08-08 15:23:21,037] Trial 12 finished with value: 0.9810451124607008 and parameters: {'num_leaves': 63, 'max_depth': 8, 'subsample': 0.9926377878514643, 'colsample_bytree': 0.22173164994974603, 'reg_alpha': 8, 'reg_lambda': 25}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.620724
[40]	valid_0's mcc: 0.890604
[60]	valid_0's mcc: 0.940683
[80]	valid_0's mcc: 0.954009
[100]	valid_0's mcc: 0.963189
[120]	valid_0's mcc: 0.966093
[140]	valid_0's mcc: 0.967881
[160]	valid_0's mcc: 0.970115
[180]	valid_0's mcc: 0.970724
[200]	valid_0's mcc: 0.971821
[220]	valid_0's mcc: 0.972144
[240]	valid_0's mcc: 0.972646
[260]	valid_0's mcc: 0.973167
[280]	valid_0's mcc: 0.974137
[300]	valid_0's mcc: 0.974731
[320]	valid_0's mcc: 0.975143
[340]	valid_0's mcc: 0.975381
[360]	valid_0's mcc: 0.975977
[380]	valid_0's mcc: 0.97616
[400]	valid_0's mcc: 0.976478
[420]	valid_0's mcc: 0.976685
[440]	valid_0's mcc: 0.976935
[460]	valid_0's mcc: 0.97705
[480]	valid_0's mcc: 0.977424
[500]	valid_0's mcc: 0.977825
[520]	valid_0's mcc: 0.978102
[540]	valid_0's mcc: 0.978445
[560]	valid_0's mcc: 0.978684
[580]	valid_0's mcc: 0.978898
[600]	valid_0's mcc: 0.979153
[620]	valid_0's mcc: 0.979288
[640]	valid_0'

[I 2024-08-08 15:46:26,470] Trial 13 finished with value: 0.9828214538247719 and parameters: {'num_leaves': 103, 'max_depth': 8, 'subsample': 0.8283125620195255, 'colsample_bytree': 0.33099651026953036, 'reg_alpha': 79, 'reg_lambda': 19}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.696352
[40]	valid_0's mcc: 0.920922
[60]	valid_0's mcc: 0.954997
[80]	valid_0's mcc: 0.965031
[100]	valid_0's mcc: 0.97019
[120]	valid_0's mcc: 0.972383
[140]	valid_0's mcc: 0.973547
[160]	valid_0's mcc: 0.974642
[180]	valid_0's mcc: 0.97524
[200]	valid_0's mcc: 0.975831
[220]	valid_0's mcc: 0.976193
[240]	valid_0's mcc: 0.976683
[260]	valid_0's mcc: 0.977131
[280]	valid_0's mcc: 0.977795
[300]	valid_0's mcc: 0.978216
[320]	valid_0's mcc: 0.978514
[340]	valid_0's mcc: 0.978716
[360]	valid_0's mcc: 0.979078
[380]	valid_0's mcc: 0.979144
[400]	valid_0's mcc: 0.979442
[420]	valid_0's mcc: 0.979685
[440]	valid_0's mcc: 0.979918
[460]	valid_0's mcc: 0.980041
[480]	valid_0's mcc: 0.980226
[500]	valid_0's mcc: 0.980355
[520]	valid_0's mcc: 0.980526
[540]	valid_0's mcc: 0.98065
[560]	valid_0's mcc: 0.980773
[580]	valid_0's mcc: 0.980931
[600]	valid_0's mcc: 0.981119
[620]	valid_0's mcc: 0.981225
[640]	valid_0's

[I 2024-08-08 16:09:58,520] Trial 14 finished with value: 0.9829801236655065 and parameters: {'num_leaves': 108, 'max_depth': 9, 'subsample': 0.7977492014849863, 'colsample_bytree': 0.3294292971599561, 'reg_alpha': 102, 'reg_lambda': 6}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.928383
[40]	valid_0's mcc: 0.966681
[60]	valid_0's mcc: 0.970193
[80]	valid_0's mcc: 0.973312
[100]	valid_0's mcc: 0.974558
[120]	valid_0's mcc: 0.974791
[140]	valid_0's mcc: 0.975255
[160]	valid_0's mcc: 0.975674
[180]	valid_0's mcc: 0.975808
[200]	valid_0's mcc: 0.975995
[220]	valid_0's mcc: 0.976229
[240]	valid_0's mcc: 0.97652
[260]	valid_0's mcc: 0.976737
[280]	valid_0's mcc: 0.977056
[300]	valid_0's mcc: 0.977173
[320]	valid_0's mcc: 0.977455
[340]	valid_0's mcc: 0.977584
[360]	valid_0's mcc: 0.97772
[380]	valid_0's mcc: 0.977908
[400]	valid_0's mcc: 0.977995
[420]	valid_0's mcc: 0.978167
[440]	valid_0's mcc: 0.978365
[460]	valid_0's mcc: 0.978491
[480]	valid_0's mcc: 0.978643
[500]	valid_0's mcc: 0.978711
[520]	valid_0's mcc: 0.978834
[540]	valid_0's mcc: 0.978944
[560]	valid_0's mcc: 0.979012
[580]	valid_0's mcc: 0.979054
[600]	valid_0's mcc: 0.979125
[620]	valid_0's mcc: 0.979215
[640]	valid_0'

[I 2024-08-08 16:31:52,771] Trial 15 finished with value: 0.9810575205495577 and parameters: {'num_leaves': 117, 'max_depth': 10, 'subsample': 0.6381929426438357, 'colsample_bytree': 0.6178402401244207, 'reg_alpha': 241, 'reg_lambda': 5}. Best is trial 2 with value: 0.9831417833415508.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.673909
[40]	valid_0's mcc: 0.907885
[60]	valid_0's mcc: 0.943256
[80]	valid_0's mcc: 0.955562
[100]	valid_0's mcc: 0.963251
[120]	valid_0's mcc: 0.967268
[140]	valid_0's mcc: 0.969533
[160]	valid_0's mcc: 0.970963
[180]	valid_0's mcc: 0.971901
[200]	valid_0's mcc: 0.97302
[220]	valid_0's mcc: 0.973624
[240]	valid_0's mcc: 0.974149
[260]	valid_0's mcc: 0.974624
[280]	valid_0's mcc: 0.975356
[300]	valid_0's mcc: 0.975857
[320]	valid_0's mcc: 0.97631
[340]	valid_0's mcc: 0.976532
[360]	valid_0's mcc: 0.977092
[380]	valid_0's mcc: 0.977236
[400]	valid_0's mcc: 0.977589
[420]	valid_0's mcc: 0.977893
[440]	valid_0's mcc: 0.978384
[460]	valid_0's mcc: 0.978649
[480]	valid_0's mcc: 0.979017
[500]	valid_0's mcc: 0.979321
[520]	valid_0's mcc: 0.97957
[540]	valid_0's mcc: 0.979732
[560]	valid_0's mcc: 0.979917
[580]	valid_0's mcc: 0.980162
[600]	valid_0's mcc: 0.980394
[620]	valid_0's mcc: 0.980573
[640]	valid_0's

[I 2024-08-08 16:55:33,343] Trial 16 finished with value: 0.9832484155231614 and parameters: {'num_leaves': 73, 'max_depth': 9, 'subsample': 0.8704060497676991, 'colsample_bytree': 0.32575454125826914, 'reg_alpha': 47, 'reg_lambda': 66}. Best is trial 16 with value: 0.9832484155231614.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.865872
[40]	valid_0's mcc: 0.948424
[60]	valid_0's mcc: 0.959055
[80]	valid_0's mcc: 0.962952
[100]	valid_0's mcc: 0.966736
[120]	valid_0's mcc: 0.967699
[140]	valid_0's mcc: 0.968396
[160]	valid_0's mcc: 0.96947
[180]	valid_0's mcc: 0.970392
[200]	valid_0's mcc: 0.971288
[220]	valid_0's mcc: 0.972016
[240]	valid_0's mcc: 0.97252
[260]	valid_0's mcc: 0.972908
[280]	valid_0's mcc: 0.97326
[300]	valid_0's mcc: 0.973494
[320]	valid_0's mcc: 0.973763
[340]	valid_0's mcc: 0.974092
[360]	valid_0's mcc: 0.974596
[380]	valid_0's mcc: 0.975295
[400]	valid_0's mcc: 0.975582
[420]	valid_0's mcc: 0.976034
[440]	valid_0's mcc: 0.976372
[460]	valid_0's mcc: 0.976643
[480]	valid_0's mcc: 0.976947
[500]	valid_0's mcc: 0.977288
[520]	valid_0's mcc: 0.977556
[540]	valid_0's mcc: 0.977833
[560]	valid_0's mcc: 0.978111
[580]	valid_0's mcc: 0.978325
[600]	valid_0's mcc: 0.97848
[620]	valid_0's mcc: 0.978681
[640]	valid_0's 

[I 2024-08-08 17:19:25,215] Trial 17 finished with value: 0.98286152081742 and parameters: {'num_leaves': 70, 'max_depth': 9, 'subsample': 0.8903600659887809, 'colsample_bytree': 0.5666651790828627, 'reg_alpha': 36, 'reg_lambda': 65}. Best is trial 16 with value: 0.9832484155231614.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.788047
[40]	valid_0's mcc: 0.894442
[60]	valid_0's mcc: 0.916428
[80]	valid_0's mcc: 0.932724
[100]	valid_0's mcc: 0.94257
[120]	valid_0's mcc: 0.941736
[140]	valid_0's mcc: 0.942966
[160]	valid_0's mcc: 0.944468
[180]	valid_0's mcc: 0.945015
[200]	valid_0's mcc: 0.948075
[220]	valid_0's mcc: 0.950208
[240]	valid_0's mcc: 0.951153
[260]	valid_0's mcc: 0.952868
[280]	valid_0's mcc: 0.954543
[300]	valid_0's mcc: 0.956361
[320]	valid_0's mcc: 0.957634
[340]	valid_0's mcc: 0.959288
[360]	valid_0's mcc: 0.960359
[380]	valid_0's mcc: 0.961565
[400]	valid_0's mcc: 0.962633
[420]	valid_0's mcc: 0.963542
[440]	valid_0's mcc: 0.964635
[460]	valid_0's mcc: 0.966972
[480]	valid_0's mcc: 0.969172
[500]	valid_0's mcc: 0.970128
[520]	valid_0's mcc: 0.970725
[540]	valid_0's mcc: 0.971239
[560]	valid_0's mcc: 0.971751
[580]	valid_0's mcc: 0.972006
[600]	valid_0's mcc: 0.972323
[620]	valid_0's mcc: 0.973075
[640]	valid_0

[I 2024-08-08 17:41:12,124] Trial 18 finished with value: 0.9813462460068485 and parameters: {'num_leaves': 50, 'max_depth': 7, 'subsample': 0.665566312220514, 'colsample_bytree': 0.6498363720364115, 'reg_alpha': 12, 'reg_lambda': 57}. Best is trial 16 with value: 0.9832484155231614.


Training until validation scores don't improve for 300 rounds
[20]	valid_0's mcc: 0.579892
[40]	valid_0's mcc: 0.875019
[60]	valid_0's mcc: 0.925874
[80]	valid_0's mcc: 0.941711
[100]	valid_0's mcc: 0.949874
[120]	valid_0's mcc: 0.956535
[140]	valid_0's mcc: 0.961255
[160]	valid_0's mcc: 0.962777
[180]	valid_0's mcc: 0.963332
[200]	valid_0's mcc: 0.965424
[220]	valid_0's mcc: 0.966826
[240]	valid_0's mcc: 0.967439
[260]	valid_0's mcc: 0.968038
[280]	valid_0's mcc: 0.968721
[300]	valid_0's mcc: 0.969241
[320]	valid_0's mcc: 0.969668
[340]	valid_0's mcc: 0.970089
[360]	valid_0's mcc: 0.970196
[380]	valid_0's mcc: 0.970505
[400]	valid_0's mcc: 0.97096
[420]	valid_0's mcc: 0.971225
[440]	valid_0's mcc: 0.971437
[460]	valid_0's mcc: 0.97182
[480]	valid_0's mcc: 0.972272
[500]	valid_0's mcc: 0.972799
[520]	valid_0's mcc: 0.97388
[540]	valid_0's mcc: 0.974178
[560]	valid_0's mcc: 0.974523
[580]	valid_0's mcc: 0.974746
[600]	valid_0's mcc: 0.974998
[620]	valid_0's mcc: 0.975177
[640]	valid_0's

[I 2024-08-08 18:03:48,601] Trial 19 finished with value: 0.9802511914362402 and parameters: {'num_leaves': 78, 'max_depth': 10, 'subsample': 0.8685413593045525, 'colsample_bytree': 0.3134558519255893, 'reg_alpha': 335, 'reg_lambda': 77}. Best is trial 16 with value: 0.9832484155231614.


# 5. Saving sampler and pruner
In order to be able to easily pick up current study next time, we have to save our sampler and pruner. [Learn more](https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/001_rdb.html#resume-study).

In [16]:
with open("sampler.pkl", "wb") as fout:
    pickle.dump(study.sampler, fout)
    print("Sampler saved!")
    
with open("pruner.pkl", "wb") as fout:
    pickle.dump(study.pruner, fout)
    print("Pruner saved!")

Sampler saved!
Pruner saved!


In [19]:
print(f"Best score: {study.best_value}")
print(study.best_params)

Best score: 0.9832484155231614
{'num_leaves': 73, 'max_depth': 9, 'subsample': 0.8704060497676991, 'colsample_bytree': 0.32575454125826914, 'reg_alpha': 47, 'reg_lambda': 66}
