<a id="top"></a>
<div class="list-group" id="list-tab" role="tablist">
<h1 class="list-group-item list-group-item-action active" data-toggle="list" style='background:#005097; border:0' role="tab" aria-controls="home"><center>Hyperparameter Optimization with Optuna</center></h1>

Optuna is an open-source hyperparameter optimization framework that automates the process of tuning machine learning models by efficiently searching for the best hyperparameters. It supports both classical machine learning models and deep learning models, and its goal is to maximize or minimize a given objective function, such as model performance, by optimizing hyperparameters. Here's a step-by-step overview of how the Optuna framework works:

Optuna selects the **sampling strategy** based on the **study configuration** and **hyperparameter types**. The choice depends on the search space, the number of trials, and whether prior knowledge is available.

## 1️⃣ Default Strategy: Tree-structured Parzen Estimator (TPE)
- If no specific sampler is specified, Optuna **automatically** uses **TPE (Tree-structured Parzen Estimator)**.
- TPE **models the probability distribution** of good and bad hyperparameter choices and chooses new trials accordingly.
- Best for **non-convex search spaces** where grid/random search fails.

```python
import optuna
study = optuna.create_study(direction="maximize")  # Uses TPE by default

## 2️⃣ Explicitly Specifying a Sampler
You can override the default and choose a specific sampler:

| Sampler                  | When to Use?                                           | Example                                      |
|--------------------------|------------------------------------------------------|----------------------------------------------|
| **TPE (default)**        | Works well in most cases, adaptive Bayesian optimization | `optuna.samplers.TPESampler()`               |
| **Random Search**        | Good for benchmarking, large search spaces          | `optuna.samplers.RandomSampler()`           |
| **Grid Search**          | If you have limited trials and want exhaustive search | `optuna.samplers.GridSampler(search_space)` |
| **CMA-ES**              | Good for continuous spaces, often used in reinforcement learning | `optuna.samplers.CmaEsSampler()` |

Example of choosing a sampler:

```python
import optuna

sampler = optuna.samplers.RandomSampler()  # Choose random search
study = optuna.create_study(direction="maximize", sampler=sampler)

## 3️⃣ How Optuna Adapts to the Problem
Optuna adjusts the search strategy based on:

- Discrete vs. Continuous Parameters. If parameters are categorical (trial.suggest_categorical), TPE handles them well. If parameters are continuous (trial.suggest_float), CMA-ES or TPE works better.
- Log-scaled vs. Linear Search Space. If log=True is used (e.g., learning_rate), Optuna adjusts the sampling distribution accordingly.
- Early Pruning and Convergence. If trials are pruned early, TPE focuses on exploiting promising areas rather than random exploration.

## 4️⃣ Customizing the Sampling Strategy
You can combine samplers or switch strategies mid-experiment:

```python
sampler = optuna.samplers.TPESampler(n_startup_trials=10)  # Use random search for first 10 trials
study = optuna.create_study(sampler=sampler)


# Demo

In [32]:
# pip install optuna

In [None]:
import pandas as pd

data = pd.read_csv('../data/aug_train.csv')

# Split features and target
X = data.drop(columns=['id', 'Response'])
y = data['Response']

# Define categorical and numerical features
categorical_features = ['Gender', 'Vehicle_Age', 'Vehicle_Damage']
numerical_features = X.columns.difference(categorical_features)

In [27]:
%%time

import numpy as np
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
import optuna

# Preprocessing pipeline
preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numerical_features),
    ('cat', OneHotEncoder(), categorical_features)
])

# Objective function for Optuna
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 50, 300)
    max_depth = trial.suggest_int('max_depth', 3, 20)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10)

    model = Pipeline([
        ('preprocessor', preprocessor),
        ('classifier', RandomForestClassifier(
            n_estimators=n_estimators,
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            random_state=42
        ))
    ])
    
    inner_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    scores = cross_val_score(model, X, y, cv=inner_cv, scoring='f1', n_jobs=-1)
    
    return np.mean(scores)

N_TRIALS = 10

# Nested cross-validation
outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=N_TRIALS)

# Best parameters
print("Best hyperparameters:", study.best_params)

[I 2025-01-31 12:30:22,756] A new study created in memory with name: no-name-2493e23e-ab8c-4e74-95de-c813b3ad4ccf
[I 2025-01-31 12:32:03,574] Trial 0 finished with value: 0.3762416412829145 and parameters: {'n_estimators': 258, 'max_depth': 17, 'min_samples_split': 4, 'min_samples_leaf': 6}. Best is trial 0 with value: 0.3762416412829145.
[I 2025-01-31 12:33:28,627] Trial 1 finished with value: 0.10200389577138821 and parameters: {'n_estimators': 259, 'max_depth': 10, 'min_samples_split': 2, 'min_samples_leaf': 7}. Best is trial 0 with value: 0.3762416412829145.
[I 2025-01-31 12:34:27,637] Trial 2 finished with value: 0.15861685616807358 and parameters: {'n_estimators': 165, 'max_depth': 12, 'min_samples_split': 8, 'min_samples_leaf': 9}. Best is trial 0 with value: 0.3762416412829145.
[I 2025-01-31 12:34:53,619] Trial 3 finished with value: 0.3981416554622904 and parameters: {'n_estimators': 64, 'max_depth': 19, 'min_samples_split': 2, 'min_samples_leaf': 7}. Best is trial 3 with valu

Best hyperparameters: {'n_estimators': 253, 'max_depth': 20, 'min_samples_split': 4, 'min_samples_leaf': 1}
CPU times: user 4.68 s, sys: 916 ms, total: 5.6 s
Wall time: 10min 44s


In [28]:
# Best parameters
best_params = study.best_params
print("Best hyperparameters:", best_params)

Best hyperparameters: {'n_estimators': 253, 'max_depth': 20, 'min_samples_split': 4, 'min_samples_leaf': 1}


In [29]:
%%time

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

# Train final model with best hyperparameters
best_model = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(
        n_estimators=best_params['n_estimators'],
        # max_depth=best_params['max_depth'],
        min_samples_split=best_params['min_samples_split'],
        min_samples_leaf=best_params['min_samples_leaf'],
        random_state=42
    ))
])

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Train and evaluate the model
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_test)

# Evaluate performance
f1 = f1_score(y_test, y_pred)
print("F1-score on test set:", f1)

F1-score on test set: 0.4454172119680491
CPU times: user 1min 38s, sys: 505 ms, total: 1min 38s
Wall time: 1min 38s


In [31]:
# Train and evaluate the model
default_model = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(random_state=42))
])

default_model.fit(X_train, y_train)
y_pred = default_model.predict(X_test)
# Evaluate performance
f1 = f1_score(y_test, y_pred)
print("F1-score on test set:", f1)

F1-score on test set: 0.43990114580993034
