* https://colab.research.google.com/github/optuna/optuna-examples/blob/main/visualization/plot_study.ipynb#scrollTo=W8MhH9ZXKraV
* https://neptune.ai/blog/hyperparameter-tuning-in-python-a-complete-guide-2020
* https://www.kaggle.com/alexandrnikitin/xgboost-hyperparameter-optimization
* https://www.kaggle.com/corochann/optuna-tutorial-for-hyperparameter-optimization


What is the difference between parameter and hyperparameter?
First, let’s understand the differences between a hyperparameter and a parameter in machine learning.

Model parameters: These are the parameters that are estimated by the model from the given data. For example the weights of a deep neural network. 
Model hyperparameters: These are the parameters that cannot be estimated by the model from the given data. These parameters are used to estimate the model parameters. For example, the learning rate in deep neural networks.

What is hyperparameter tuning and why it is important?
Hyperparameter tuning is the process of determining the right combination of hyperparameters that allows the model to maximize model performance. Setting the correct combination of hyperparameters is the only way to extract the maximum performance out of models.

How do I choose good hyperparameters?
Choosing the right combination of hyperparameters is not an easy task. There are two ways to set them.

Manual hyperparameter tuning: In this method, different combinations of hyperparameters are set (and experimented with) manually. This is a tedious process and cannot be practical in cases where there are many hyperparameters to try.
**Automated hyperparameter tuning**: In this method, optimal hyperparameters are found using an algorithm that automates and optimizes the process.
* Grid Search
* Random Search
* Bayesian Optimization


In [48]:
import os

import joblib
from optuna import create_study
from optuna import Trial
from optuna.visualization import plot_optimization_history
from optuna.visualization import plot_parallel_coordinate
import pandas as pd
from sklearn import metrics
from sklearn.pipeline import Pipeline
from sklearn.model_selection import KFold
from xgboost import XGBClassifier

from src.config import CATEGORICAL_COL, NUMERICAL_COL, LABEL_COL
from src.utils import plot_metric_curves
from src.transformer import preprocessor_numeric, preprocessor_full

In [57]:
def objective(trial):
    train = pd.read_csv("../data/users_train.csv")

    train.drop(columns=["user_first_engagement", "user_pseudo_id"], inplace=True)

    x_train, y_train = train.drop(columns=[LABEL_COL]), train[LABEL_COL]
    scale_pos_weight = 1 / y_train.mean()

    folds = 5
    shuffle = True
    seed = 42
    kf = KFold(n_splits=folds, shuffle=shuffle, random_state=seed)

    param = {
        "scale_pos_weight": scale_pos_weight,
        "use_label_encoder": False,
        "eval_metric": "logloss",
        "verbosity": 0,
        "objective": "binary:logistic",
        "tree_method": "exact",  # use exact for small dataset.
        "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
        "lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True),
        "alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True),
        "subsample": trial.suggest_float("subsample", 0.2, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.2, 1.0),
    }

    if param["booster"] in ["gbtree", "dart"]:
        param["max_depth"] = trial.suggest_int("max_depth", 3, 11, step=1)
        param["min_child_weight"] = trial.suggest_int("min_child_weight", 2, 10)
        param["eta"] = trial.suggest_float("eta", 1e-8, 1.0, log=True)
        param["gamma"] = trial.suggest_float("gamma", 1e-8, 1.0, log=True)
        param["grow_policy"] = trial.suggest_categorical(
            "grow_policy", ["depthwise", "lossguide"]
        )

    if param["booster"] == "dart":
        param["sample_type"] = trial.suggest_categorical(
            "sample_type", ["uniform", "weighted"]
        )
        param["normalize_type"] = trial.suggest_categorical(
            "normalize_type", ["tree", "forest"]
        )
        param["rate_drop"] = trial.suggest_float("rate_drop", 1e-8, 1.0, log=True)
        param["skip_drop"] = trial.suggest_float("skip_drop", 1e-8, 1.0, log=True)

    score = []
    for train_idx, valid_idx in kf.split(x_train, y_train):
        train_data = x_train.iloc[train_idx,:], y_train[train_idx]
        valid_data = x_train.iloc[valid_idx,:], y_train[valid_idx]
        xgb_model = Pipeline(
            steps=[
                ("preprocessor", preprocessor_numeric()),
                (
                    "xgboost",
                    XGBClassifier(**param),
                ),
            ]
        )
        xgb_model.fit(x_train, y_train)

        y_pred = xgb_model.predict(x_val)
        f1_score = metrics.f1_score(y_val, y_pred)
        score.append(f1_score)

    return sum(score)/folds

In [58]:
study = create_study(direction="maximize")
study.optimize(objective, n_trials=100, timeout=600)

[32m[I 2021-07-16 00:01:04,007][0m A new study created in memory with name: no-name-5e2271b7-5f01-4caf-b162-8ee004e2f4c3[0m
[32m[I 2021-07-16 00:01:13,107][0m Trial 0 finished with value: 0.4868913857677903 and parameters: {'booster': 'dart', 'lambda': 3.9726446557390856e-05, 'alpha': 1.553324907528414e-08, 'subsample': 0.8145310443406415, 'colsample_bytree': 0.7074439398372714, 'max_depth': 9, 'min_child_weight': 3, 'eta': 0.0038800963185513728, 'gamma': 0.6453866915786873, 'grow_policy': 'depthwise', 'sample_type': 'uniform', 'normalize_type': 'forest', 'rate_drop': 0.12817395507684023, 'skip_drop': 0.0005199494080116552}. Best is trial 0 with value: 0.4868913857677903.[0m
[32m[I 2021-07-16 00:01:13,553][0m Trial 1 finished with value: 0.35469107551487417 and parameters: {'booster': 'gblinear', 'lambda': 0.364289547268627, 'alpha': 0.000880782816197024, 'subsample': 0.8224830473747724, 'colsample_bytree': 0.3356987638720009}. Best is trial 0 with value: 0.4868913857677903.[0

[32m[I 2021-07-16 00:02:10,335][0m Trial 17 finished with value: 0.45090909090909087 and parameters: {'booster': 'dart', 'lambda': 0.000526503793058168, 'alpha': 0.00020279050248193738, 'subsample': 0.20156070505994578, 'colsample_bytree': 0.8698405593985821, 'max_depth': 3, 'min_child_weight': 5, 'eta': 1.621258968403371e-08, 'gamma': 0.04775477949842343, 'grow_policy': 'lossguide', 'sample_type': 'uniform', 'normalize_type': 'forest', 'rate_drop': 0.00033020526531826316, 'skip_drop': 4.644576894913496e-06}. Best is trial 14 with value: 0.5316455696202532.[0m
[32m[I 2021-07-16 00:02:16,631][0m Trial 18 finished with value: 0.4491228070175438 and parameters: {'booster': 'dart', 'lambda': 9.44478744871194e-08, 'alpha': 2.377182463807316e-06, 'subsample': 0.33823259160307817, 'colsample_bytree': 0.6159660933299752, 'max_depth': 6, 'min_child_weight': 8, 'eta': 0.000620040721794146, 'gamma': 5.470965557393542e-05, 'grow_policy': 'depthwise', 'sample_type': 'uniform', 'normalize_type'

[32m[I 2021-07-16 00:03:50,919][0m Trial 32 finished with value: 0.6649874055415617 and parameters: {'booster': 'dart', 'lambda': 3.916011788653445e-08, 'alpha': 7.754975298310968e-06, 'subsample': 0.6133485917752196, 'colsample_bytree': 0.5343690202610041, 'max_depth': 9, 'min_child_weight': 2, 'eta': 0.809800850133268, 'gamma': 1.7334966806486466e-06, 'grow_policy': 'depthwise', 'sample_type': 'weighted', 'normalize_type': 'forest', 'rate_drop': 9.2865345440658e-05, 'skip_drop': 2.124594410838549e-06}. Best is trial 21 with value: 0.6838046272493572.[0m
[32m[I 2021-07-16 00:03:57,728][0m Trial 33 finished with value: 0.6113744075829384 and parameters: {'booster': 'dart', 'lambda': 1.0956469899913262e-08, 'alpha': 0.0001516242076661113, 'subsample': 0.6776407686883408, 'colsample_bytree': 0.5732508872951623, 'max_depth': 9, 'min_child_weight': 2, 'eta': 0.16714057486395248, 'gamma': 1.4480480014094882e-06, 'grow_policy': 'depthwise', 'sample_type': 'weighted', 'normalize_type': '

[32m[I 2021-07-16 00:05:24,248][0m Trial 49 finished with value: 0.5279999999999999 and parameters: {'booster': 'dart', 'lambda': 2.0586091468247585e-06, 'alpha': 0.0012527991155667894, 'subsample': 0.7990372426708638, 'colsample_bytree': 0.5134222671576972, 'max_depth': 11, 'min_child_weight': 4, 'eta': 0.045402200770480496, 'gamma': 3.050184902541971e-06, 'grow_policy': 'depthwise', 'sample_type': 'weighted', 'normalize_type': 'tree', 'rate_drop': 1.3087879792503197e-05, 'skip_drop': 3.411224376599786e-05}. Best is trial 42 with value: 0.6991869918699186.[0m
[32m[I 2021-07-16 00:05:31,700][0m Trial 50 finished with value: 0.5009633911368017 and parameters: {'booster': 'dart', 'lambda': 4.22102274865243e-07, 'alpha': 0.004923696799932832, 'subsample': 0.8752889978114174, 'colsample_bytree': 0.687886556176618, 'max_depth': 10, 'min_child_weight': 2, 'eta': 0.00013113462150254933, 'gamma': 6.813815968970045e-07, 'grow_policy': 'lossguide', 'sample_type': 'weighted', 'normalize_type

[32m[I 2021-07-16 00:07:09,789][0m Trial 65 finished with value: 0.72 and parameters: {'booster': 'dart', 'lambda': 1.00235104037869e-08, 'alpha': 0.13242111875370283, 'subsample': 0.940925770743056, 'colsample_bytree': 0.9882835102379555, 'max_depth': 10, 'min_child_weight': 2, 'eta': 0.42667890342646825, 'gamma': 1.5382839590626556e-08, 'grow_policy': 'depthwise', 'sample_type': 'weighted', 'normalize_type': 'forest', 'rate_drop': 4.986068724948666e-05, 'skip_drop': 0.051851497811587634}. Best is trial 62 with value: 0.75.[0m
[32m[I 2021-07-16 00:07:16,875][0m Trial 66 finished with value: 0.6467661691542289 and parameters: {'booster': 'dart', 'lambda': 3.981183083901084e-08, 'alpha': 0.18147472378146318, 'subsample': 0.9314855744152389, 'colsample_bytree': 0.9658108144821784, 'max_depth': 9, 'min_child_weight': 8, 'eta': 0.5269921908973835, 'gamma': 1.2309930405793975e-08, 'grow_policy': 'depthwise', 'sample_type': 'weighted', 'normalize_type': 'forest', 'rate_drop': 0.00018716

[32m[I 2021-07-16 00:09:11,225][0m Trial 80 finished with value: 0.7317073170731707 and parameters: {'booster': 'gbtree', 'lambda': 3.2293704505109904e-07, 'alpha': 0.014581885177705603, 'subsample': 0.9791204515014515, 'colsample_bytree': 0.9769578632436203, 'max_depth': 10, 'min_child_weight': 3, 'eta': 0.9068410960238443, 'gamma': 1.9246978899664133e-08, 'grow_policy': 'lossguide'}. Best is trial 62 with value: 0.75.[0m
[32m[I 2021-07-16 00:09:13,573][0m Trial 81 finished with value: 0.6502463054187192 and parameters: {'booster': 'gbtree', 'lambda': 3.1425852165578396e-07, 'alpha': 0.01178820666822787, 'subsample': 0.9757191619634681, 'colsample_bytree': 0.9799728945495987, 'max_depth': 10, 'min_child_weight': 3, 'eta': 0.17952427258937403, 'gamma': 2.009432769753629e-08, 'grow_policy': 'lossguide'}. Best is trial 62 with value: 0.75.[0m
[32m[I 2021-07-16 00:09:15,958][0m Trial 82 finished with value: 0.7318435754189945 and parameters: {'booster': 'gbtree', 'lambda': 8.39947

In [63]:
print("Number of finished trials: ", len(study.trials))
print("Best trial:")
trial = study.best_trial

print("  Value: {}".format(trial.value))
print("  Params: ")

trial.params

Number of finished trials:  100
Best trial:
  Value: 0.75
  Params: 


{'booster': 'dart',
 'lambda': 7.476528036300595e-08,
 'alpha': 0.009459705034491677,
 'subsample': 0.8353668146424433,
 'colsample_bytree': 0.8793099088911032,
 'max_depth': 11,
 'min_child_weight': 2,
 'eta': 0.3857463796507874,
 'gamma': 9.862600734559058e-07,
 'grow_policy': 'depthwise',
 'sample_type': 'weighted',
 'normalize_type': 'forest',
 'rate_drop': 1.2892684505104446e-05,
 'skip_drop': 0.0033355721268970603}

In [60]:
plot_optimization_history(study)

In [61]:
plot_parallel_coordinate(study)