<div style=" border-bottom: 8px solid #e3f56c; overflow: hidden; border-radius: 10px; height: 95%; width: 100%; display: flex;">
  <div style="height: 100%; width: 100%; background-color: #3800BB; float: left; text-align: center; display: flex; justify-content: left; align-items: center; font-size: 40px; ">
    <b><span style="color: #FFFFFF; padding: 20px 20px;">Hyperparameters Tuning with Optuna</span></b>
  </div>
</div>



<div class="alert" style="background-color: #FFFFFF; border-left: 8px solid #B12111; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

<div class="alert alert-danger">

**Contents** 
</div>

<hr>
  <p><font size="3" face="Arial" font-size="large">
  <ul type="square">

  <li> Basic Optuna concepts;  </li>
  <li> Implementation;  </li>
  <li> Visuals;  </li>
  <li> Pruning;  </li>
  <li> Conslusion;  </li>
  <li> Useful Resources  </li>
  
  </ul>
  </font></p>

</div>

<div class="alert" style="background-color:#E8F8F5; border-left: 8px solid #1ABC9C; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

* In previous lessons, we looked at finding optimal hyperparameters using classical methods: exhaustive grid search (`GridSearchCV`) and random search from a given distribution (`Random Search`).
* Although `Random Search` significantly speeds up the search process, we might miss a set of hyperparameters where the model performs best.<br>
* And here, an idea might come to mind: "What if we do some guessing at first, like in `Random Search`, and then check more often in those areas where the model showed better accuracy?!" This method is called **Bayesian hyperparameter optimization**.
* The most popular libraries implementing this method are `HyperOpt` and `Optuna`. (in our practice, `HyperOpt` often has failures and unstable performance, so in this notebook we will focus on **`Optuna`**)
</div>

<div class="alert" style="background-color: #E8F8F5; border-left: 8px solid #1ABC9C; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

<div class="alert alert-info">

**Key Features of the `Optuna` Framework**
</div>

* Lightweight and highly versatile - suitable for optimizing arbitrary functions and evaluation metrics.
* Incorporates state-of-the-art algorithms specifically adapted for hyperparameter search.
* Supports parallel execution and advanced pruning strategies.
* Includes built-in tools for result visualization.
* Offers seamless integration with many popular libraries and frameworks (e.g., boosting algorithms, **scikit-learn**, **PyTorch**, **Weights & Biases**, among others).

To understand how to use it effectively, we will examine the framework in detail.

</div>


In [1]:
import optuna

import numpy as np
import pandas as pd

from catboost import CatBoostClassifier
from sklearn.model_selection import KFold, train_test_split

<div class="alert" style="background-color: #E8F8F5; border-left: 8px solid #1ABC9C; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

**`Optuna` has two core concepts:**


<div class="alert alert-info">

**1. `Study`: an optimization based on an `Objective` function.**
</div>


The `Objective` function should contain the logic for calculating the metric to optimize. Optuna will call this function multiple times to search for the best set of parameters.


<div class="alert" style="background-color:rgb(0, 0, 0); border-left: 8px solid #B12111; padding: 14px; border-radius: 8px; font-size: 14px; color:rgb(255, 255, 255);">

```python
def objective(trial, ...):
    # calculate score...
    return score
```
</div>

<div class="alert alert-info">

**2. `Trial` - a single execution of the `Objective` function**
</div>


Within the `trial` object, we define parameters to be tuned using appropriate methods depending on the type. For example:

<div class="alert" style="background-color:rgb(0, 0, 0); border-left: 8px solid #B12111; padding: 14px; border-radius: 8px; font-size: 14px; color:rgb(255, 255, 255);">

```python
# `suggest_float` method is used to tune float values within the range [0, 1.5]
param = trial.suggest_float('param', 0, 1.5) 

# Categorical value
loss_function = trial.suggest_categorical('loss', ['Logloss', 'CrossEntropy'])

# Integer value
depth = trial.suggest_int('depth', 5, 8)

# Uniform distribution
learning_rate = trial.suggest_uniform('learning_rate', 0.0, 1.0)
```
</div>

</div>

<div class="alert" style="background-color: #E8F8F5; border-left: 8px solid #1ABC9C; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">
    
[`Optuna`](https://optuna.readthedocs.io/en/stable/index.html) implements several parameter search methods (`samplers`), including classical ones:
* `GridSampler`
* `RandomSampler`
* `Tree-Structured Parzen Estimator` (`TPESampler` – the most popular and default one)
* `BruteForceSampler`
* And [4 more](https://optuna.readthedocs.io/en/stable/reference/samplers/index.html#module-optuna.samplers); you can also implement a custom sampler.
</div>


<div class="alert" style="background-color: #E8F8F5; border-left: 8px solid #1ABC9C; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

<div class="alert alert-info">

**Let's go through a simple example to see how `Optuna` works:**
</div>

* Suppose we have a function `y = (x+1)(x+5)(x-9)`, and we want to find the value of `x` that minimizes the function.
* We'll create an `objective` function and pass a `trial` object as an argument — using it, we'll define the search range for `x` as `[-5, 5]`.

<img src='../imgs/05.2.02_1.JPG' width='600px'>

</div>


In [2]:
# Limit logging
optuna.logging.set_verbosity(30)

In [3]:
def objective(trial):
    x = trial.suggest_float("x", -2, 10)
    return (x + 1) * (x + 5) * (x - 9)


# create a study object and run for 10 iterations; since we are searching for a minimum, the direction parameter is left as default
study = optuna.create_study()

# run the optimization
study.optimize(objective,
               n_jobs=-1,
               n_trials=250,
               show_progress_bar=True)

  0%|          | 0/250 [00:00<?, ?it/s]

In [4]:
optuna.visualization.plot_slice(study)

In [5]:
optuna.visualization.plot_optimization_history(study)

In [6]:
# display best parameters
print("Best parameters:", study.best_params)
# display best value
print("Best value:", study.best_value)

Best parameters: {'x': 5.156668354156372}
Best value: -240.32828831774358


<div class="alert" style="background-color: #FEF9E7; border-left: 8px solid #D4AC0D; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

We see the results of all runs and that the minimum of the function occurred at `x = 5.16`, which is close to the actual minimum.

</div>

In [7]:
from classes import Paths
paths = Paths()
path = paths.quickstart_train
dft = pd.read_csv(path)
dft.head(10)

Unnamed: 0,car_id,model,car_type,fuel_type,car_rating,year_to_start,riders,year_to_work,target_reg,target_class,mean_rating,distance_sum,rating_min,speed_max,user_ride_quality_median,deviation_normal_count,user_uniq
0,y13744087j,Kia Rio X-line,economy,petrol,3.78,2015,76163,2021,109.99,another_bug,4.737759,12141310.0,0.1,180.855726,0.023174,174,170
1,O41613818T,VW Polo VI,economy,petrol,3.9,2015,78218,2021,34.48,electro_bug,4.480517,18039090.0,0.0,187.862734,12.306011,174,174
2,d-2109686j,Renault Sandero,standart,petrol,6.3,2012,23340,2017,34.93,gear_stick,4.768391,15883660.0,0.1,102.382857,2.513319,174,173
3,u29695600e,Mercedes-Benz GLC,business,petrol,4.04,2011,1263,2020,32.22,engine_fuel,3.88092,16518830.0,0.1,172.793237,-5.029476,174,170
4,N-8915870N,Renault Sandero,standart,petrol,4.7,2012,26428,2017,27.51,engine_fuel,4.181149,13983170.0,0.1,203.462289,-14.260456,174,171
5,b12101843B,Skoda Rapid,economy,petrol,2.36,2013,42176,2018,48.99,engine_ignition,4.351782,10855890.0,0.1,180.886289,-18.221832,174,173
6,Q-9368117S,Nissan Qashqai,standart,petrol,5.32,2012,24611,2014,54.72,engine_overheat,4.392126,8343280.0,0.1,174.984786,12.321364,174,167
7,O-2124190y,Tesla Model 3,premium,electro,3.9,2017,116872,2019,50.4,gear_stick,4.712356,9793288.0,0.1,95.890736,-8.939366,174,139
8,h16895544p,Kia Sportage,standart,petrol,3.5,2014,56384,2017,33.59,gear_stick,4.507759,16444050.0,0.32,101.798615,-1.16469,174,170
9,K77009462l,Smart ForFour,economy,petrol,4.56,2013,41309,2018,39.04,gear_stick,4.376839,6975742.0,0.1,125.254983,3.769684,174,173


In [8]:
cat_features = ["model", "car_type", "fuel_type"]  # categorical features
targets = ["target_class", "target_reg"]
features2drop = ["car_id"]  # features to be dropped

In [9]:
# selecting the final set of features for model use
filtered_features = [i for i in dft.columns if (i not in targets and i not in features2drop)]
num_features = [i for i in filtered_features if i not in cat_features]

In [10]:
CatBoostClassifier()

<catboost.core.CatBoostClassifier at 0x173624e60>

<div class="alert" style="background-color: #E8F8F5; border-left: 8px solid #1ABC9C; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

<div class="alert alert-info">

**Tips for hyperparameter tuning:**
</div>

- Understand the importance of each parameter
- Set the number of `iterations` with a margin and fix it, while limiting with `early_stopping_rounds`
- Peek at or intuit the value ranges and step sizes
- Exclude parameters that don't need to be tuned (`random_seed`, `eval_metric`, `thread_count`, etc.)
- Use insights from previous runs

</div>


<div class="alert" style="background-color: #FEF9E7; border-left: 8px solid #D4AC0D; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

Let's define a training function for `CatBoost` that returns predictions using `KFold` validation.

</div>

In [None]:
def fit_catboost(trial, train, val):
    X_train, y_train = train
    X_val, y_val = val

    param = {
        'iterations' : 400, # No need to tune this parameter, there is Early-Stopping
        "learning_rate": trial.suggest_float("learning_rate", 0.001, 0.01),
        "l2_leaf_reg": trial.suggest_int("l2_leaf_reg", 2, 50),
        "colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.01, 0.8),
        
        "auto_class_weights": trial.suggest_categorical("auto_class_weights", ["SqrtBalanced", "Balanced", "None"]),
        "depth": trial.suggest_int("depth", 3, 9),
        
        "boosting_type": trial.suggest_categorical("boosting_type", ["Ordered", "Plain"]),
        "bootstrap_type": trial.suggest_categorical("bootstrap_type", ["Bayesian", "Bernoulli", "MVS"]),
        "used_ram_limit": "14gb",
        "eval_metric": "Accuracy", # to be defined beforehand
    }

    
    if param["bootstrap_type"] == "Bayesian":
        param["bagging_temperature"] = trial.suggest_float("bagging_temperature", 0, 20)
        
    elif param["bootstrap_type"] == "Bernoulli":
        param["subsample"] = trial.suggest_float("subsample", 0.1, 1)
        

    clf = CatBoostClassifier(
        **param,
        thread_count=-1,
        random_seed=42,
        cat_features=cat_features,
    )

    clf.fit(
        X_train,
        y_train,
        eval_set=(X_val, y_val),
        verbose=0,
        plot=False,
        early_stopping_rounds=5,
    )

    y_pred = clf.predict(X_val)
    return clf, y_pred

<div class="alert" style="background-color: #FEF9E7; border-left: 8px solid #D4AC0D; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

Let's define the `objective` function, in which we will place `KFold` validation to select the best hyperparameters on the entire dataset

</div>

In [None]:
from sklearn.metrics import accuracy_score


def objective(trial, return_models=False):
    n_splits = 3
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
    X_train = dft[filtered_features].drop(targets, axis=1, errors="ignore")
    y_train = dft["target_class"]

    scores, models = [], []
    
    for train_idx, valid_idx in kf.split(X_train):
        train_data = X_train.iloc[train_idx, :], y_train.iloc[train_idx]
        valid_data = X_train.iloc[valid_idx, :], y_train.iloc[valid_idx]

        # Pass trials for iteration
        model, y_pred = fit_catboost(trial, train_data, valid_data)  # defined above
        scores.append(accuracy_score(y_pred, valid_data[1]))
        models.append(model)
        break
         

    result = np.mean(scores)
    
    if return_models:
        return result, models
    else:
        return result

In [15]:
study = optuna.create_study(direction="maximize")
study.optimize(objective,
               n_trials=600,
               n_jobs = -1,
               show_progress_bar=True,)

  0%|          | 0/600 [00:00<?, ?it/s]

<div class="alert" style="background-color: #FEF9E7; border-left: 8px solid #D4AC0D; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

Let's display best parameters

</div>

In [16]:
print("Best trial: score {}, params {}".format(study.best_trial.value, study.best_trial.params))

Best trial: score 0.7971758664955071, params {'learning_rate': 0.007609476408946439, 'l2_leaf_reg': 7, 'colsample_bylevel': 0.2891035998013902, 'auto_class_weights': 'Balanced', 'depth': 9, 'boosting_type': 'Plain', 'bootstrap_type': 'MVS'}


<div class="alert" style="background-color: #FEF9E7; border-left: 8px solid #D4AC0D; padding: 14px; border-radius: 8px; font-size: 14px; color: #000000;">

Retrain new model on best parameters (selected by Optuna)

</div>

In [17]:
valid_scores, models = objective(
    optuna.trial.FixedTrial(study.best_params),
    return_models=True,
)

In [18]:
valid_scores, len(models)

(0.7971758664955071, 1)