train_test_ ...

# What Is Model Validation and Why Train/Test Split Is Not Enough

When we build a machine learning model, our main goal is to estimate how well it will perform on **new, unseen data**. This requires a reliable method for evaluating the model. This process is called **model validation**.

Model validation is the set of techniques used to:

* measure a model’s generalization performance,
* detect overfitting or underfitting,
* compare different models fairly,
* support decisions about model selection and hyperparameter tuning.

In other words: **validation tells us whether the model will work in the real world, not only on the data it has already seen.**

---

## Why a simple train/test split is not enough?

A train/test split divides the data into:

* **training set** – used to fit the model,
* **test set** – used to evaluate its performance.

While this is a good starting point, it has several important limitations:

### High variance of results

A single split means that the evaluation depends heavily on **how the data was divided**.
If the test set is “too easy” or “too hard”, the score will be misleading.

### The test set is too small

If the dataset is not very large, splitting it once may leave you with:

* not enough data to train the model well,
* not enough data to evaluate it reliably.

### Hyperparameter tuning contaminates the test set

When we use the test set repeatedly (e.g., to compare tuned models), it becomes part of the model selection process.
**This leaks information** from the test set into the model—meaning the test score is no longer an unbiased estimate of performance.

### No insight into model stability

A single score cannot tell us how consistent the model is across different subsets of data.
We want to know: Does the model perform similarly across all parts of the dataset?


# Cross-Validation Methods in scikit-learn

In practice, we often need more than a simple “random” K-fold split. Depending on the **type of data** and **problem structure**, different cross-validation strategies are more appropriate. Below we discuss:

* `KFold`
* `StratifiedKFold`
* `GroupKFold`
* `TimeSeriesSplit`

For each method, we first explain the idea, and then show an example in scikit-learn.

---

## KFold

**K-fold cross-validation** splits the dataset into *K* approximately equal “folds”:

1. Split data indices into K parts.
2. For each fold *k*:

   * use fold *k* as the validation set,
   * use the remaining K − 1 folds as the training set,
   * train the model and compute the validation score.
3. Average the K validation scores.

This reduces the dependence on a single train/test split and gives a more robust estimate of model performance.

### When is plain `KFold` appropriate?

* Regression tasks (continuous y).
* Classification tasks where **class distribution is balanced**.
* When the data points are i.i.d. (independent and identically distributed) and have no grouping or time structure.

If the dataset is imbalanced or has groups/time dependencies, we should prefer the more specialized strategies.

![KFold](https://scikit-learn.org/stable/_images/grid_search_cross_validation.png)

### Example: KFold

In [None]:
from sklearn.model_selection import KFold
import numpy as np

X = np.arange(8)

kfold = KFold(n_splits=4)

for fold, (train_idx, test_idx) in enumerate(kfold.split(X), start=1):
    print(f"Fold {fold}:")
    print(f"  Train: {train_idx}")
    print(f"  Test : {test_idx}")
    print()


Fold 1:
  Train: [2 3 4 5 6 7]
  Test : [0 1]

Fold 2:
  Train: [0 1 4 5 6 7]
  Test : [2 3]

Fold 3:
  Train: [0 1 2 3 6 7]
  Test : [4 5]

Fold 4:
  Train: [0 1 2 3 4 5]
  Test : [6 7]



### Example: regression on the diabetes dataset

In [None]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import KFold, cross_val_score
from sklearn.linear_model import Ridge

# Load a real regression dataset
X, y = load_diabetes(return_X_y=True)

model = Ridge(alpha=1.0)

# Define 5-fold cross validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)

scores = cross_val_score(model, X, y, cv=kf, scoring="neg_mean_squared_error")
print("MSE scores (per fold):", -scores)

MSE scores (per fold): [3077.41593883 3418.67550798 3633.48530506 3362.11149208 3674.17093153]


In [None]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import KFold, cross_validate
from sklearn.linear_model import Ridge

X, y = load_diabetes(return_X_y=True)

model = Ridge(alpha=1.0)

kf = KFold(n_splits=5, shuffle=True, random_state=42)

scoring = {
    "mse": "neg_mean_squared_error",
    "r2": "r2"
}

scores = cross_validate(model, X, y, cv=kf, scoring=scoring)

print("MSE per fold:", -scores["test_mse"])
print("R2 per fold:", scores["test_r2"])


MSE per fold: [3077.41593883 3418.67550798 3633.48530506 3362.11149208 3674.17093153]
R2 per fold: [0.41915293 0.45201323 0.33243928 0.50045558 0.34243974]


## StratifiedKFold – why it is important for classification

In classification, especially with **imbalanced classes**, a simple `KFold` split may produce folds with very different class distributions. This can lead to:

* unstable performance estimates,
* folds with almost no samples from a minority class.

**StratifiedKFold** ensures that **each fold preserves (approximately) the overall class distribution**. For example, if 10% of samples belong to class 1 in the full dataset, each fold will contain about 10% of class 1.

### When to use `StratifiedKFold`?

* For **classification tasks** almost always.
* Especially important when classes are imbalanced.

### Example with scikit-learn: classification on the breast cancer dataset

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

X, y = load_breast_cancer(return_X_y=True)

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('logreg', LogisticRegression(max_iter=1500))
])

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

print(f"Class distribution: {np.mean(y == 0) * 100:.2f}% class 0, {np.mean(y == 1) * 100:.2f}% class 1")

fold_num = 1

for train_idx, test_idx in skf.split(X, y):

    y_train = y[train_idx]
    y_test = y[test_idx]

    train_class0 = np.mean(y_train == 0) * 100
    train_class1 = np.mean(y_train == 1) * 100

    test_class0 = np.mean(y_test == 0) * 100
    test_class1 = np.mean(y_test == 1) * 100

    print(f"\nFold {fold_num}:")
    print(f" Train: class 0 = {train_class0:.2f}%, class 1 = {train_class1:.2f}%")
    print(f" Test:  class 0 = {test_class0:.2f}%, class 1 = {test_class1:.2f}%")

    fold_num += 1

from sklearn.model_selection import cross_val_score
scores = cross_val_score(pipe, X, y, cv=skf, scoring="accuracy")
print("\nAccuracy scores (per fold):", scores)
print("Average accuracy:", scores.mean())

Class distribution: 37.26% class 0, 62.74% class 1

Fold 1:
 Train: class 0 = 37.14%, class 1 = 62.86%
 Test:  class 0 = 37.72%, class 1 = 62.28%

Fold 2:
 Train: class 0 = 37.14%, class 1 = 62.86%
 Test:  class 0 = 37.72%, class 1 = 62.28%

Fold 3:
 Train: class 0 = 37.36%, class 1 = 62.64%
 Test:  class 0 = 36.84%, class 1 = 63.16%

Fold 4:
 Train: class 0 = 37.36%, class 1 = 62.64%
 Test:  class 0 = 36.84%, class 1 = 63.16%

Fold 5:
 Train: class 0 = 37.28%, class 1 = 62.72%
 Test:  class 0 = 37.17%, class 1 = 62.83%

Accuracy scores (per fold): [0.97368421 0.94736842 0.96491228 0.99122807 0.99115044]
Average accuracy: 0.9736686849868033


## GroupKFold – validation with dependent groups

Sometimes samples are not independent: they belong to **groups** that share some structure. For example:

* multiple measurements from the same patient,
* several transactions from the same customer,
* multiple images taken from the same subject.

If we randomly split such data, samples from the **same group** can end up in both train and validation sets. This leads to:

* overly optimistic performance estimates,
* information leakage from validation to training.

**GroupKFold** ensures that **all samples from a given group are placed entirely in either the training or validation set**.

### When to use `GroupKFold`?

* Whenever you have **group identifiers** and you want **no group overlap** between train and validation.
* Typical in medical or user-based data.

### Example: synthetic “patients” groups

Here we create synthetic data where each “patient” has several measurements.

Note: If we used a random `KFold` without groups, we would likely get **higher but misleading** scores, because the model would see very similar samples from the same patient in both train and validation sets.

In [None]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import GroupKFold
from sklearn.ensemble import RandomForestClassifier

# Create synthetic classification data
X, y = make_classification(n_samples=200, n_features=10, random_state=42)

# Suppose we have 50 patients, each with 4 samples
n_patients = 50
groups = np.repeat(np.arange(n_patients), 4)  # 50 * 4 = 200

model = RandomForestClassifier(random_state=42)

gkf = GroupKFold(n_splits=5)

scores = []

fold = 1

for train_idx, val_idx in gkf.split(X, y, groups=groups):
    train_groups = np.unique(groups[train_idx])
    val_groups = np.unique(groups[val_idx])

    model.fit(X[train_idx], y[train_idx])
    scores.append(model.score(X[val_idx], y[val_idx]))

    print(f"\n=== Fold {fold} ===")
    print("Train groups:", train_groups)
    print("Val groups:  ", val_groups)

    intersection = np.intersect1d(train_groups, val_groups)
    print("Intersection (powinno być puste):", intersection)

    fold += 1

print("\nGroupKFold accuracy scores:", scores)
print("Average accuracy:", np.mean(scores))



=== Fold 1 ===
Train groups: [ 0  1  2  3  5  6  7  8 10 11 12 13 15 16 17 18 20 21 22 23 25 26 27 28
 29 31 32 33 34 36 37 38 39 41 42 43 44 46 47 48]
Val groups:   [ 4  9 14 19 24 30 35 40 45 49]
Intersection (powinno być puste): []

=== Fold 2 ===
Train groups: [ 0  1  2  4  5  6  7  9 10 11 14 15 16 17 19 20 21 22 23 24 26 27 28 30
 31 32 33 35 36 37 38 40 41 42 43 45 46 47 48 49]
Val groups:   [ 3  8 12 13 18 25 29 34 39 44]
Intersection (powinno być puste): []

=== Fold 3 ===
Train groups: [ 0  1  3  4  5  6  8  9 10 12 13 14 15 16 18 19 20 21 23 24 25 27 29 30
 31 32 34 35 36 37 39 40 41 42 44 45 46 47 48 49]
Val groups:   [ 2  7 11 17 22 26 28 33 38 43]
Intersection (powinno być puste): []

=== Fold 4 ===
Train groups: [ 0  2  3  4  5  7  8  9 10 11 12 13 14 15 17 18 19 20 22 23 24 25 26 28
 29 30 31 33 34 35 36 38 39 40 41 43 44 45 46 49]
Val groups:   [ 1  6 16 21 27 32 37 42 47 48]
Intersection (powinno być puste): []

=== Fold 5 ===
Train groups: [ 1  2  3  4  6  7  8  9 1

## TimeSeriesSplit – validation for sequential / time-dependent data

### Concept

For time series or any time-ordered data, we must **never use future data to predict the past**. Randomly shuffling the samples (as in regular K-fold) would violate the temporal order.

**TimeSeriesSplit** is designed for this scenario. It:

* preserves time ordering,
* uses **earlier time points for training** and **later time points for validation**.

Example of 4 splits with TimeSeriesSplit:

* Split 1: train = [1,...,n1], test = [n1+1,...,n2]
* Split 2: train = [1,...,n2], test = [n2+1,...,n3]
* etc.

### When to use `TimeSeriesSplit`?

* For forecasting problems (e.g. stock prices, demand prediction).
* Whenever there is a **natural time order** and future data must not be used in training for earlier time points.

### Example: synthetic time series regression

Here, each split uses only **past data** to predict **future data**, which mimics a realistic forecasting scenario.

In [None]:
import numpy as np
from sklearn.model_selection import TimeSeriesSplit
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create a simple synthetic time series dataset
n_samples = 200
rng = np.random.RandomState(42)
time = np.arange(n_samples)
X = time.reshape(-1, 1)  # feature: time index
y = 0.5 * time + 5 + rng.normal(scale=5, size=n_samples)  # linear trend + noise

tscv = TimeSeriesSplit(n_splits=5)

model = LinearRegression()
mse_scores = []

split_num = 1

for train_idx, test_idx in tscv.split(X):

    print(f"\n=== Split {split_num} ===")
    print(f"Train indices: {train_idx[0]} .. {train_idx[-1]}")
    print(f"Test indices:  {test_idx[0]} .. {test_idx[-1]}")

    print("Is max(train) < min(test)?", train_idx[-1] < test_idx[0])

    print("Test idx:", test_idx)

    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse_scores.append(mean_squared_error(y_test, y_pred))

    split_num += 1

print("\nMSE scores (per split):", mse_scores)
print("Average MSE:", np.mean(mse_scores))



=== Split 1 ===
Train indices: 0 .. 34
Test indices:  35 .. 67
Is max(train) < min(test)? True
Test idx: [35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
 59 60 61 62 63 64 65 66 67]

=== Split 2 ===
Train indices: 0 .. 67
Test indices:  68 .. 100
Is max(train) < min(test)? True
Test idx: [ 68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100]

=== Split 3 ===
Train indices: 0 .. 100
Test indices:  101 .. 133
Is max(train) < min(test)? True
Test idx: [101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133]

=== Split 4 ===
Train indices: 0 .. 133
Test indices:  134 .. 166
Is max(train) < min(test)? True
Test idx: [134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166]

=== Split 5 ===
Train indices: 0 .. 166
Test indices:  167 ..

# Hyperparameter Search: GridSearchCV and RandomizedSearchCV

Model performance in machine learning often depends heavily on the choice of **hyperparameters**. Hyperparameter tuning is therefore a crucial step in building a strong predictive model.

scikit-learn offers classical tools for systematic hyperparameter search, such as:

* **GridSearchCV** – exhaustive search
* **RandomizedSearchCV** – randomized search over a parameter distribution

Both methods combine **cross-validation** with search over the hyperparameter space.
However, they differ significantly in efficiency and practicality.

## GridSearchCV – exhaustive search

### What is GridSearch?

GridSearchCV takes a predefined grid of hyperparameters and tries **every possible combination**.
For example, a grid like:

```python
C = [0.1, 1, 10]
gamma = ["scale", "auto"]
kernel = ["linear", "rbf"]
```

contains $3 \times 2 \times 2 = 12$ combinations.
With 5-fold CV, this means **60 model fits**.

GridSearchCV trains all models, evaluates them using CV, and returns the combination with the best average score.

---

### Advantages of GridSearch

* Simple and systematic.
* Guarantees evaluation of **every** combination in the grid.
* Good when hyperparameter space is:

  * small,
  * well understood,
  * low-dimensional.

---

### Disadvantages of GridSearch

* **Computationally expensive**: number of fits grows exponentially with the number of parameters.
* Does not scale to large parameter spaces.
* Often wastes time evaluating unpromising regions.
* If the chosen ranges are poor, the search may miss good values.

In practice: **GridSearch is often not the best choice for large searches**.

---

### Example: GridSearchCV on the Breast Cancer dataset

We tune a Support Vector Machine classifier (SVC).

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.svm import SVC

# Load dataset
X, y = load_breast_cancer(return_X_y=True)

# Define model
svm = SVC()

# Define hyperparameter grid
param_grid = {
    "C": [0.1, 1, 10],
    "gamma": ["scale", "auto"],
    "kernel": ["rbf"]
}

# Stratified splits for classification
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

grid = GridSearchCV(
    estimator=svm,
    param_grid=param_grid,
    cv=cv,
    scoring="accuracy"
)

# Run search
grid.fit(X, y)

print("Best parameters:", grid.best_params_)
print("Best CV accuracy:", grid.best_score_)

Best parameters: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
Best CV accuracy: 0.920912901723335


## RandomizedSearchCV – randomized sampling of hyperparameters

### What is RandomizedSearchCV?

Instead of evaluating every possible parameter combination, RandomizedSearchCV **randomly samples** combinations from distributions you specify.

You choose:

* distributions for hyperparameters,
* number of iterations (trials).

Example:

```python
{
    "n_estimators": randint(50, 300),
    "max_depth": randint(2, 20),
}
```

If you run with `n_iter=30`, RandomizedSearchCV will try **30 random combinations**, regardless of how large the space is.

---

### Advantages of RandomizedSearchCV

* Much **faster** when the parameter space is large.
* Can explore a **wider range** of values.
* Often finds good solutions **much faster** than GridSearch.
* Allows using **continuous** distributions like log-uniform, normal, etc.
* Computational budget is fully under your control (`n_iter` parameter).

Often, RandomizedSearchCV finds a good or near-optimal solution with **a small fraction** of the cost of exhaustive search.

---

### Disadvantages of RandomizedSearchCV

* Does not guarantee evaluation of all possible combinations.
* The final result depends on the randomness (though this is often beneficial).
* You must choose good parameter distributions.

---

### When RandomizedSearchCV is significantly better

* When the hyperparameter search space is **large** (e.g., Random Forests, Gradient Boosting).
* When we expect only a **few hyperparameters to be important**, and their optimal values may lie anywhere within a broad range.
* When computation time is limited.
* When using **continuous hyperparameter spaces**.

---

### Example: RandomizedSearchCV on the Wine Quality dataset

We use the real regression dataset **Wine Quality (Red Wine)** from UCI, available through scikit-learn.

We tune a Random Forest regressor.

In [None]:
from sklearn.datasets import load_wine
from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn.ensemble import RandomForestRegressor
from scipy.stats import randint

# Load dataset
X, y = load_wine(return_X_y=True)

model = RandomForestRegressor(random_state=42)

# Parameter distributions
param_dist = {
    "n_estimators": randint(50, 300),
    "max_depth": randint(3, 20),
    "min_samples_split": randint(2, 10),
    "min_samples_leaf": randint(1, 5),
}

cv = KFold(n_splits=5, shuffle=True, random_state=42)

rnd_search = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_dist,
    n_iter=30,
    cv=cv,
    scoring="neg_mean_squared_error",
    random_state=42
)

rnd_search.fit(X, y)

print("Best parameters:", rnd_search.best_params_)
print("Best CV MSE:", -rnd_search.best_score_)

Best parameters: {'max_depth': 15, 'min_samples_leaf': 1, 'min_samples_split': 6, 'n_estimators': 64}
Best CV MSE: 0.03702027864952186


### Example: Pipeline + RandomizedSearchCV

We will use *Diabetes regression dataset* and `GradientBoostingRegressor` model. We want to tune its hyperparameters using `RandomizedSearchCV` inside a Pipeline that also standardizes the input features.

In [None]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingRegressor
from scipy.stats import uniform, randint

X, y = load_diabetes(return_X_y=True)

pipe = Pipeline(steps=[
    ("scaler", StandardScaler()),
    ("model", GradientBoostingRegressor(random_state=42))
])

param_dist = {
    "model__n_estimators": randint(50, 400),
    "model__learning_rate": uniform(0.01, 0.3),
    "model__max_depth": randint(2, 8)
}

cv = KFold(n_splits=5, shuffle=True, random_state=42)

rnd = RandomizedSearchCV(
    estimator=pipe,
    param_distributions=param_dist,
    n_iter=25,
    cv=cv,
    scoring="neg_mean_squared_error",
    random_state=42
)

rnd.fit(X, y)

print("Best parameters:", rnd.best_params_)
print("Best CV MSE:", -rnd.best_score_)


Best parameters: {'model__learning_rate': 0.014789875666064259, 'model__max_depth': 3, 'model__n_estimators': 389}
Best CV MSE: 3266.6433557293653


# Introduction to Optuna: A Modern Framework for Hyperparameter Optimization

So far, we have explored two classical tools for hyperparameter tuning:

* **GridSearchCV** – exhaustive search
* **RandomizedSearchCV** – random sampling from parameter distributions

Both methods work well with scikit-learn Pipelines, but they share a limitation:
they do not *adapt* based on which regions of the hyperparameter space appear promising.

In real-world machine learning, especially with complex models (e.g., XGBoost, deep networks), this can be inefficient.

To address this, we introduce **Optuna**, a modern, efficient and highly flexible hyperparameter optimization framework.

---

## Why Optuna?

Optuna brings several advantages over traditional methods:

**Adaptive search:** It uses advanced algorithms (like TPE – Tree-structured Parzen Estimator) to explore promising regions of the search space more often.

**Pruning:** Optuna can stop unpromising trials early, saving computation.

**Flexible distributions:** You can define:

* continuous log-distributions,
* categorical choices,
* integer ranges,
* conditional parameters.

**Easy integration with scikit-learn Pipelines:** Optuna can act as a drop-in replacement for GridSearchCV through `OptunaSearchCV`.

**Lightweight and pure Python:** No special dependencies, cross-platform, easy to start.

---

## Example: Optimizing a Simple Quadratic Function

Although Optuna is usually used for hyperparameter tuning, it can optimize **any function**.
Let’s start with a minimal example: finding the value of `x` that minimizes:

$
f(x) = (x - 2)^2
$

The minimum is clearly at **x = 2**, but Optuna will discover this automatically through optimization.

---

In Optuna, the function being optimized is traditionally named **objective**.

A **Trial** represents one attempt in the optimization process.
`trial.suggest_float("x", -10, 10)` tells Optuna to pick a value of `x` uniformly between –10 and 10.
Each time Optuna calls the objective, it selects a new value of `x`, evaluates the function, and records the result.

In [None]:
import optuna

def objective(trial):
    # Suggest a value for x from the range [-10, 10]
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2

A **Study** manages the entire optimization process.
`n_trials=100` means the objective function will be executed 100 times with different values of `x`.

In [None]:
study = optuna.create_study()
study.optimize(objective, n_trials=100)

[I 2025-11-23 12:06:09,782] A new study created in memory with name: no-name-03221795-dd0f-4267-baca-d4a464f3ecbe
[I 2025-11-23 12:06:09,783] Trial 0 finished with value: 2.211115202070267 and parameters: {'x': 3.4869819104717674}. Best is trial 0 with value: 2.211115202070267.
[I 2025-11-23 12:06:09,784] Trial 1 finished with value: 40.2612025422636 and parameters: {'x': -4.345171592814776}. Best is trial 0 with value: 2.211115202070267.
[I 2025-11-23 12:06:09,784] Trial 2 finished with value: 61.35542789878644 and parameters: {'x': 9.832970566699867}. Best is trial 0 with value: 2.211115202070267.
[I 2025-11-23 12:06:09,785] Trial 3 finished with value: 137.1529745915 and parameters: {'x': -9.71123283824124}. Best is trial 0 with value: 2.211115202070267.
[I 2025-11-23 12:06:09,785] Trial 4 finished with value: 123.59968032650845 and parameters: {'x': -9.117539310769647}. Best is trial 0 with value: 2.211115202070267.
[I 2025-11-23 12:06:09,785] Trial 5 finished with value: 5.5055953

Let's inspect the Results

In [None]:
best_params = study.best_params
found_x = best_params["x"]
print(f"Found x: {found_x}, (x - 2)^2: {(found_x - 2) ** 2}")
print(f"Get best hyperparameters: {study.best_params}")
print(f"Best objective value: {study.best_value}")
print(f"Number of finished trials: {len(study.trials)}")
print(f"Best trial: {study.best_trial}")
print("First two trials:")
for trial in study.trials[:2]:
    print(trial)

Found x: 2.003926324425818, (x - 2)^2: 1.541602349677571e-05
Get best hyperparameters: {'x': 2.003926324425818}
Best objective value: 1.541602349677571e-05
Number of finished trials: 100
Best trial: FrozenTrial(number=81, state=1, values=[1.541602349677571e-05], datetime_start=datetime.datetime(2025, 11, 23, 12, 6, 9, 906669), datetime_complete=datetime.datetime(2025, 11, 23, 12, 6, 9, 908141), params={'x': 2.003926324425818}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=81, value=None)
First two trials:
FrozenTrial(number=0, state=1, values=[2.211115202070267], datetime_start=datetime.datetime(2025, 11, 23, 12, 6, 9, 783476), datetime_complete=datetime.datetime(2025, 11, 23, 12, 6, 9, 783860), params={'x': 3.4869819104717674}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=0, value=

## Example: Continuing an Optimization Session

Optuna allows you to **resume** optimization simply by calling `optimize()` again:

In [None]:
study.optimize(objective, n_trials=100)  # adds 100 more trials
print(f"Updated number of trials: {len(study.trials)}")

found_x = study.best_params["x"]
print(f"Found x: {found_x}, (x - 2)^2: {(found_x - 2) ** 2}")

[I 2025-11-23 12:06:09,973] Trial 100 finished with value: 1.1826912425215954 and parameters: {'x': 3.087516088396671}. Best is trial 81 with value: 1.541602349677571e-05.
[I 2025-11-23 12:06:09,976] Trial 101 finished with value: 0.0011354972814490857 and parameters: {'x': 2.0336971405530067}. Best is trial 81 with value: 1.541602349677571e-05.
[I 2025-11-23 12:06:09,978] Trial 102 finished with value: 1.240190882666779 and parameters: {'x': 0.8863614218846498}. Best is trial 81 with value: 1.541602349677571e-05.
[I 2025-11-23 12:06:09,980] Trial 103 finished with value: 0.0013301084007529627 and parameters: {'x': 2.0364706512246897}. Best is trial 81 with value: 1.541602349677571e-05.
[I 2025-11-23 12:06:09,981] Trial 104 finished with value: 0.01065907809275538 and parameters: {'x': 2.103242811336942}. Best is trial 81 with value: 1.541602349677571e-05.
[I 2025-11-23 12:06:09,983] Trial 105 finished with value: 50.2709745845351 and parameters: {'x': 9.09020271815518}. Best is trial 

Updated number of trials: 200
Found x: 1.9996391590060876, (x - 2)^2: 1.3020622288770726e-07


## Defining Search Spaces in Optuna

One of Optuna’s biggest advantages over traditional methods (like GridSearchCV) is its **highly flexible and expressive search space definition**.
Instead of manually specifying grids or distributions in dictionaries, Optuna lets you define hyperparameters directly inside Python code using simple, intuitive APIs.

This makes the search space:

* more **natural to define**,
* more **dynamic** (can depend on other parameters),
* and more **compact** (no need to enumerate everything manually).

---

### Basic Parameter Types

Optuna offers three fundamental parameter sampling methods:

#### Categorical parameters

```python
trial.suggest_categorical("optimizer", ["MomentumSGD", "Adam"])
```

Use this for:

* selecting algorithms (e.g., SVC vs RandomForest),
* selecting activation functions,
* choosing model variants.

---

#### Integer parameters

```python
trial.suggest_int("num_layers", 1, 3)
```

Useful for:

* number of trees,
* number of layers,
* max depth.

Optuna can also sample integers **logarithmically** (useful when values vary across orders of magnitude):

```python
trial.suggest_int("num_channels", 32, 512, log=True)
```

Or **with a step**, to discretize:

```python
trial.suggest_int("num_units", 10, 100, step=5)
```
---

#### Floating-point parameters

```python
trial.suggest_float("dropout_rate", 0.0, 1.0)
```

Floating-point parameters can also be:

* **log-uniform** (learning rates, regularization terms):

  ```python
  trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)
  ```

* **discretized**:

  ```python
  trial.suggest_float("drop_path_rate", 0.0, 1.0, step=0.1)
  ```

---

### Example: parameter types

In [None]:
import optuna

def objective(trial):
    # Categorical parameter
    optimizer = trial.suggest_categorical("optimizer", ["MomentumSGD", "Adam"])

    # Integer parameter
    num_layers = trial.suggest_int("num_layers", 1, 3)

    # Integer (log scale)
    num_channels = trial.suggest_int("num_channels", 32, 512, log=True)

    # Integer (step)
    num_units = trial.suggest_int("num_units", 10, 100, step=5)

    # Float
    dropout_rate = trial.suggest_float("dropout_rate", 0.0, 1.0)

    # Float (log scale)
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)

    # Float (step)
    drop_path_rate = trial.suggest_float("drop_path_rate", 0.0, 1.0, step=0.1)

    # Return a dummy value for example purposes
    return (num_layers - 2)**2 + dropout_rate

### Example: log scale

In [None]:
import optuna

def objective(trial):
    lr_linear = trial.suggest_float("lr_linear", 1e-5, 1e-2)
    lr_log = trial.suggest_float("lr_log", 1e-5, 1e-2, log=True)
    print(f"linear={lr_linear:.6f}, log={lr_log:.6f}")
    return 0

study = optuna.create_study(direction="minimize")

study.optimize(objective, n_trials=5)

[I 2025-11-25 09:02:25,131] A new study created in memory with name: no-name-99dbc23f-ce61-4a80-b006-c98d6118d977
[I 2025-11-25 09:02:25,144] Trial 0 finished with value: 0.0 and parameters: {'lr_linear': 0.004142552963043274, 'lr_log': 0.0001851408690684875}. Best is trial 0 with value: 0.0.
[I 2025-11-25 09:02:25,146] Trial 1 finished with value: 0.0 and parameters: {'lr_linear': 0.006006852167821104, 'lr_log': 0.0011356123747168533}. Best is trial 0 with value: 0.0.
[I 2025-11-25 09:02:25,147] Trial 2 finished with value: 0.0 and parameters: {'lr_linear': 0.009112084479331565, 'lr_log': 1.3766873775846494e-05}. Best is trial 0 with value: 0.0.
[I 2025-11-25 09:02:25,148] Trial 3 finished with value: 0.0 and parameters: {'lr_linear': 0.0055717652120061785, 'lr_log': 0.0006969882421369748}. Best is trial 0 with value: 0.0.
[I 2025-11-25 09:02:25,150] Trial 4 finished with value: 0.0 and parameters: {'lr_linear': 0.005837343842237796, 'lr_log': 4.077837065609708e-05}. Best is trial 0 w

linear=0.004143, log=0.000185
linear=0.006007, log=0.001136
linear=0.009112, log=0.000014
linear=0.005572, log=0.000697
linear=0.005837, log=0.000041


## Conditional Search Spaces

Unlike GridSearch or RandomizedSearch, Optuna can change its search space **based on the selected parameter**.

This is extremely useful when different model types require different hyperparameters.

#### Example: choose between SVC and RandomForest

In [None]:
import optuna
import sklearn.svm
import sklearn.ensemble
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


X, y = load_iris(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, random_state=42
)

def objective(trial):
    classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])

    if classifier_name == "SVC":
        svc_c = trial.suggest_float("svc_c", 1e-5, 1e5, log=True)

        classifier_obj = sklearn.svm.SVC(C=svc_c, gamma="scale")
    else:
        rf_n_estimators = trial.suggest_int("rf_n_estimators", 20, 300)
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)

        classifier_obj = sklearn.ensemble.RandomForestClassifier(
            n_estimators=rf_n_estimators,
            max_depth=rf_max_depth,
            random_state=42
        )

    classifier_obj.fit(X_train, y_train)
    preds = classifier_obj.predict(X_valid)

    accuracy = accuracy_score(y_valid, preds)

    return accuracy

### Loop-Based Search Spaces

Optuna also allows generating parameters inside **loops**, enabling structures such as:

* variable-depth neural networks
* dynamic architectures
* hierarchical models

#### Example: dynamically constructing a neural network depending on trial-suggested values.

```python
import torch
import torch.nn as nn

def create_model(trial, in_size):
    n_layers = trial.suggest_int("n_layers", 1, 3)

    layers = []
    for i in range(n_layers):
        # Each layer can have different units
        n_units = trial.suggest_int(f"n_units_l{i}", 4, 128, log=True)
        layers.append(nn.Linear(in_size, n_units))
        layers.append(nn.ReLU())
        in_size = n_units

    layers.append(nn.Linear(in_size, 10))
    return nn.Sequential(*layers)
```

## Efficient Optimization in Optuna: Samplers and Pruners

Up to now, we’ve treated Optuna as a “black box” that tries different hyperparameter values and finds promising ones.
But behind the scenes, Optuna uses **state-of-the-art optimization algorithms** to:

* **sample new hyperparameters intelligently**, and
* **stop unpromising trials early** (to save computation).

This section explains how these mechanisms work and how you can control them.

---

### Sampling Algorithms (Samplers)

A **sampler** decides *which hyperparameters to try next*.
As Optuna runs more trials, samplers use past data (parameters + objective values) to narrow down the search toward more promising regions.

Optuna provides several sampling algorithms:

#### GridSampler

Exhaustive search — same idea as `GridSearchCV`.

#### RandomSampler

Uniform random sampling — like `RandomizedSearchCV`.

#### TPESampler (Tree-structured Parzen Estimator) (default)

A Bayesian optimization method that models good and bad regions separately.
Great for:

* noisy objectives
* mixed continuous/categorical spaces
* typical machine learning hyperparameter tuning

Most Optuna users just rely on TPESampler.

#### Other alghoritms

* CmaEsSampler
* GPSampler
* PartialFixedSampler
* NSGAIISampler
* QMCSampler

---

### Switching Samplers

Every `Study` uses a sampler.
By default:

In [None]:
import optuna
study = optuna.create_study()
print(study.sampler.__class__.__name__)

[I 2025-11-23 12:06:10,197] A new study created in memory with name: no-name-ff911cc6-8e2c-4522-822c-1c0a467ba96b


TPESampler


To switch samplers:

In [None]:
import optuna

study = optuna.create_study(sampler=optuna.samplers.RandomSampler())
print(study.sampler.__class__.__name__)

study = optuna.create_study(sampler=optuna.samplers.CmaEsSampler())
print(study.sampler.__class__.__name__)

[I 2025-11-23 12:06:10,200] A new study created in memory with name: no-name-65724c5d-92c5-4cee-a4b8-c90d60deb07e
[I 2025-11-23 12:06:10,201] A new study created in memory with name: no-name-8750dbac-b6f8-4aa1-8974-ea7db9dd8def


RandomSampler
CmaEsSampler


### Pruning Algorithms (Pruners)

A **pruner** decides *when to stop unpromising trials early*.
This is especially valuable for:

* neural networks,
* iterative algorithms (SGD, boosting),
* any model where performance improves step-by-step.

Pruning saves a lot of time by not completing trials that will clearly perform poorly.

---

#### **Available Pruners**

##### MedianPruner

Stops trials whose performance is below the median of completed trials at the same step.
Simple and widely used.

##### NopPruner

Disables pruning (never prunes).

##### PatientPruner

Allows trials to be mediocre for several steps before pruning them. Similar to “patience” in deep learning.

##### Other alghoritms

* PercentilePruner: Prunes trials below a given percentile of past trials.
* SuccessiveHalvingPruner: Efficient early-stopping algorithm. Strong baseline.
* HyperbandPruner: Multi-level successive halving. Often the **best** choice for deep learning and expensive models.
* ThresholdPruner: Prunes if performance is worse than a given threshold.
* WilcoxonPruner: Uses statistical testing (paired Wilcoxon signed-rank test).

---

#### Example: Activating Pruners

When using pruning, you add two calls inside your training loop:

* `trial.report(value, step)`: Reports current performance.
* `trial.should_prune()`: Returns `True` if the trial should be stopped.

If pruning is triggered, raise:

```python
raise optuna.TrialPruned()
```

This example demonstrates pruning integrated with a scikit-learn model that trains iteratively.

In [None]:
import logging
import sys

import optuna
import sklearn.datasets
import sklearn.linear_model
import sklearn.model_selection

def objective(trial):
    iris = sklearn.datasets.load_iris()
    classes = list(set(iris.target))
    train_x, valid_x, train_y, valid_y = sklearn.model_selection.train_test_split(
        iris.data, iris.target, test_size=0.25, random_state=0
    )

    alpha = trial.suggest_float("alpha", 1e-5, 1e-1, log=True)
    clf = sklearn.linear_model.SGDClassifier(alpha=alpha)

    for step in range(100):
        clf.partial_fit(train_x, train_y, classes=classes)

        # Intermediate objective: classification error
        intermediate_value = 1.0 - clf.score(valid_x, valid_y)
        trial.report(intermediate_value, step)

        # Prune if no improvement
        if trial.should_prune():
            raise optuna.TrialPruned()

    return 1.0 - clf.score(valid_x, valid_y)

# Log pruning events
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))

study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
study.optimize(objective, n_trials=20)

[I 2025-11-23 12:06:10,206] A new study created in memory with name: no-name-be3361fb-ffb4-4474-a348-7b6927d2414b


A new study created in memory with name: no-name-be3361fb-ffb4-4474-a348-7b6927d2414b


[I 2025-11-23 12:06:10,269] Trial 0 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.0001210024799678314}. Best is trial 0 with value: 0.052631578947368474.


Trial 0 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.0001210024799678314}. Best is trial 0 with value: 0.052631578947368474.


[I 2025-11-23 12:06:10,330] Trial 1 finished with value: 0.3157894736842105 and parameters: {'alpha': 9.270737448982453e-05}. Best is trial 0 with value: 0.052631578947368474.


Trial 1 finished with value: 0.3157894736842105 and parameters: {'alpha': 9.270737448982453e-05}. Best is trial 0 with value: 0.052631578947368474.


[I 2025-11-23 12:06:10,391] Trial 2 finished with value: 0.3157894736842105 and parameters: {'alpha': 0.0007267319738624614}. Best is trial 0 with value: 0.052631578947368474.


Trial 2 finished with value: 0.3157894736842105 and parameters: {'alpha': 0.0007267319738624614}. Best is trial 0 with value: 0.052631578947368474.


[I 2025-11-23 12:06:10,448] Trial 3 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.029538503302102462}. Best is trial 0 with value: 0.052631578947368474.


Trial 3 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.029538503302102462}. Best is trial 0 with value: 0.052631578947368474.


[I 2025-11-23 12:06:10,506] Trial 4 finished with value: 0.13157894736842102 and parameters: {'alpha': 0.028223108646239642}. Best is trial 0 with value: 0.052631578947368474.


Trial 4 finished with value: 0.13157894736842102 and parameters: {'alpha': 0.028223108646239642}. Best is trial 0 with value: 0.052631578947368474.


[I 2025-11-23 12:06:10,509] Trial 5 pruned. 


Trial 5 pruned. 


[I 2025-11-23 12:06:10,512] Trial 6 pruned. 


Trial 6 pruned. 


[I 2025-11-23 12:06:10,516] Trial 7 pruned. 


Trial 7 pruned. 


[I 2025-11-23 12:06:10,518] Trial 8 pruned. 


Trial 8 pruned. 


[I 2025-11-23 12:06:10,523] Trial 9 pruned. 


Trial 9 pruned. 


[I 2025-11-23 12:06:10,587] Trial 10 finished with value: 0.26315789473684215 and parameters: {'alpha': 0.0033236143091570315}. Best is trial 0 with value: 0.052631578947368474.


Trial 10 finished with value: 0.26315789473684215 and parameters: {'alpha': 0.0033236143091570315}. Best is trial 0 with value: 0.052631578947368474.


[I 2025-11-23 12:06:10,590] Trial 11 pruned. 


Trial 11 pruned. 


[I 2025-11-23 12:06:10,594] Trial 12 pruned. 


Trial 12 pruned. 


[I 2025-11-23 12:06:10,600] Trial 13 pruned. 


Trial 13 pruned. 


[I 2025-11-23 12:06:10,604] Trial 14 pruned. 


Trial 14 pruned. 


[I 2025-11-23 12:06:10,610] Trial 15 pruned. 


Trial 15 pruned. 


[I 2025-11-23 12:06:10,631] Trial 16 pruned. 


Trial 16 pruned. 


[I 2025-11-23 12:06:10,638] Trial 17 pruned. 


Trial 17 pruned. 


[I 2025-11-23 12:06:10,645] Trial 18 pruned. 


Trial 18 pruned. 


[I 2025-11-23 12:06:10,649] Trial 19 pruned. 


Trial 19 pruned. 


## Parallelization

As models grow more complex and datasets get larger, hyperparameter optimization becomes increasingly expensive.
A key strength of Optuna is that it supports **parallel optimization** in several different ways, depending on the available hardware:

* multi-threading (threads in one process),
* multi-processing (separate processes on one machine),
* multi-node (multiple machines working on the same study),
* large-scale distributed optimization (thousands of machines) with gRPC.

Optuna’s design allows all of these approaches with **minimal code changes**.

### Multi-Thread Optimization

To run trials in parallel threads, simply set:

```python
study.optimize(objective, n_trials=20, n_jobs=4)
```

This means:
**run up to 4 trials at the same time within one process**.

#### Example: Multi-threaded optimization

In [None]:
import optuna
from optuna.trial import Trial
import threading

def objective(trial: Trial):
    print(f"Running trial {trial.number=} in {threading.current_thread().name}")
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2

study = optuna.create_study(
    storage="sqlite:///example.db",   # file-based storage
    load_if_exists=True,
)

study.optimize(objective, n_trials=20, n_jobs=4)

[I 2025-11-23 12:06:10,841] A new study created in RDB with name: no-name-584cee93-c2ca-4dd9-a52f-dc79fb66af33


A new study created in RDB with name: no-name-584cee93-c2ca-4dd9-a52f-dc79fb66af33


[I 2025-11-23 12:06:10,900] Trial 0 finished with value: 65.0137841509579 and parameters: {'x': -6.063112559735099}. Best is trial 0 with value: 65.0137841509579.


Running trial trial.number=3 in ThreadPoolExecutor-0_1
Running trial trial.number=2 in ThreadPoolExecutor-0_3
Running trial trial.number=1 in ThreadPoolExecutor-0_2
Running trial trial.number=0 in ThreadPoolExecutor-0_0
Trial 0 finished with value: 65.0137841509579 and parameters: {'x': -6.063112559735099}. Best is trial 0 with value: 65.0137841509579.


[I 2025-11-23 12:06:10,902] Trial 1 finished with value: 25.22371909546931 and parameters: {'x': 7.022322082012394}. Best is trial 3 with value: 14.403641552247803.


Trial 1 finished with value: 25.22371909546931 and parameters: {'x': 7.022322082012394}. Best is trial 3 with value: 14.403641552247803.


[I 2025-11-23 12:06:10,907] Trial 3 finished with value: 14.403641552247803 and parameters: {'x': -1.7952129785096123}. Best is trial 3 with value: 14.403641552247803.


Trial 3 finished with value: 14.403641552247803 and parameters: {'x': -1.7952129785096123}. Best is trial 3 with value: 14.403641552247803.


[I 2025-11-23 12:06:10,917] Trial 2 finished with value: 21.356158048690375 and parameters: {'x': -2.62127234089167}. Best is trial 3 with value: 14.403641552247803.


Running trial trial.number=5 in ThreadPoolExecutor-0_2Running trial trial.number=4 in ThreadPoolExecutor-0_0

Trial 2 finished with value: 21.356158048690375 and parameters: {'x': -2.62127234089167}. Best is trial 3 with value: 14.403641552247803.


[I 2025-11-23 12:06:10,952] Trial 4 finished with value: 7.890561766330676 and parameters: {'x': -0.8090143763125663}. Best is trial 4 with value: 7.890561766330676.


Running trial trial.number=6 in ThreadPoolExecutor-0_1
Running trial trial.number=7 in ThreadPoolExecutor-0_3
Trial 4 finished with value: 7.890561766330676 and parameters: {'x': -0.8090143763125663}. Best is trial 4 with value: 7.890561766330676.


[I 2025-11-23 12:06:10,953] Trial 5 finished with value: 61.699685765102515 and parameters: {'x': 9.854914752249226}. Best is trial 4 with value: 7.890561766330676.


Trial 5 finished with value: 61.699685765102515 and parameters: {'x': 9.854914752249226}. Best is trial 4 with value: 7.890561766330676.


[I 2025-11-23 12:06:10,966] Trial 6 finished with value: 51.98850792291448 and parameters: {'x': -5.2103056747210434}. Best is trial 4 with value: 7.890561766330676.


Running trial trial.number=8 in ThreadPoolExecutor-0_0
Trial 6 finished with value: 51.98850792291448 and parameters: {'x': -5.2103056747210434}. Best is trial 4 with value: 7.890561766330676.


[I 2025-11-23 12:06:10,968] Trial 7 finished with value: 0.0397240253964298 and parameters: {'x': 2.1993088693370915}. Best is trial 7 with value: 0.0397240253964298.


Trial 7 finished with value: 0.0397240253964298 and parameters: {'x': 2.1993088693370915}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:10,988] Trial 8 finished with value: 40.79017137028563 and parameters: {'x': 8.386718356893908}. Best is trial 7 with value: 0.0397240253964298.


Running trial trial.number=9 in ThreadPoolExecutor-0_2
Running trial trial.number=10 in ThreadPoolExecutor-0_1
Running trial trial.number=11 in ThreadPoolExecutor-0_3
Trial 8 finished with value: 40.79017137028563 and parameters: {'x': 8.386718356893908}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,012] Trial 9 finished with value: 12.709376809850669 and parameters: {'x': -1.56502129164058}. Best is trial 7 with value: 0.0397240253964298.


Running trial trial.number=12 in ThreadPoolExecutor-0_0
Trial 9 finished with value: 12.709376809850669 and parameters: {'x': -1.56502129164058}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,016] Trial 10 finished with value: 18.483717143351356 and parameters: {'x': -2.2992693732018417}. Best is trial 7 with value: 0.0397240253964298.


Trial 10 finished with value: 18.483717143351356 and parameters: {'x': -2.2992693732018417}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,022] Trial 11 finished with value: 1.3855712397226754 and parameters: {'x': 3.177102901076484}. Best is trial 7 with value: 0.0397240253964298.


Trial 11 finished with value: 1.3855712397226754 and parameters: {'x': 3.177102901076484}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,026] Trial 12 finished with value: 3.8148912170233817 and parameters: {'x': 3.953174650926891}. Best is trial 7 with value: 0.0397240253964298.


Running trial trial.number=13 in ThreadPoolExecutor-0_2
Trial 12 finished with value: 3.8148912170233817 and parameters: {'x': 3.953174650926891}. Best is trial 7 with value: 0.0397240253964298.
Running trial trial.number=14 in ThreadPoolExecutor-0_1
Running trial trial.number=15 in ThreadPoolExecutor-0_3


[I 2025-11-23 12:06:11,069] Trial 15 finished with value: 0.4164480038963818 and parameters: {'x': 2.6453278266868567}. Best is trial 7 with value: 0.0397240253964298.


Running trial trial.number=16 in ThreadPoolExecutor-0_0
Trial 15 finished with value: 0.4164480038963818 and parameters: {'x': 2.6453278266868567}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,072] Trial 14 finished with value: 0.8412549261294983 and parameters: {'x': 2.9171995018148986}. Best is trial 7 with value: 0.0397240253964298.


Trial 14 finished with value: 0.8412549261294983 and parameters: {'x': 2.9171995018148986}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,076] Trial 13 finished with value: 1.8452819618927574 and parameters: {'x': 3.358411558362471}. Best is trial 7 with value: 0.0397240253964298.


Trial 13 finished with value: 1.8452819618927574 and parameters: {'x': 3.358411558362471}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,093] Trial 16 finished with value: 0.5402690561498903 and parameters: {'x': 2.7350299695589904}. Best is trial 7 with value: 0.0397240253964298.


Running trial trial.number=17 in ThreadPoolExecutor-0_3
Running trial trial.number=18 in ThreadPoolExecutor-0_1
Running trial trial.number=19 in ThreadPoolExecutor-0_2
Trial 16 finished with value: 0.5402690561498903 and parameters: {'x': 2.7350299695589904}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,113] Trial 19 finished with value: 16.07500576048067 and parameters: {'x': 6.00936475772418}. Best is trial 7 with value: 0.0397240253964298.


Trial 19 finished with value: 16.07500576048067 and parameters: {'x': 6.00936475772418}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,118] Trial 17 finished with value: 140.14028771174253 and parameters: {'x': -9.8380863196609}. Best is trial 7 with value: 0.0397240253964298.


Trial 17 finished with value: 140.14028771174253 and parameters: {'x': -9.8380863196609}. Best is trial 7 with value: 0.0397240253964298.


[I 2025-11-23 12:06:11,119] Trial 18 finished with value: 13.510583092217392 and parameters: {'x': 5.675674508470165}. Best is trial 7 with value: 0.0397240253964298.


Trial 18 finished with value: 13.510583092217392 and parameters: {'x': 5.675674508470165}. Best is trial 7 with value: 0.0397240253964298.


### Multi-Process Optimization (Shared Storage)

Instead of threads, you can run **independent processes** that share the same backend:

* multiple Python scripts,
* multiple terminals,
* or `multiprocessing.Process`.

The key requirement is:
→ **All processes must point to the same storage backend.**

#### Example (conceptual)

Terminal 1:

```python
study = optuna.create_study(storage="sqlite:///example.db")
study.optimize(objective, n_trials=50)
```

Terminal 2:

```python
study = optuna.load_study(study_name="no-name", storage="sqlite:///example.db")
study.optimize(objective, n_trials=50)
```

Together these processes perform 100 trials.

## Visualizing Hyperparameter Optimization with Optuna

Hyperparameter optimization generates a **large amount of information**:

* how the objective improved over time,
* which hyperparameters mattered most,
* how hyperparameters interact,
* how individual trials behaved,
* and how long each trial took.

Optuna provides a dedicated visualization module to help analyze these results:
`optuna.visualization` (Plotly-based) and `optuna.visualization.matplotlib` (Matplotlib-based).

These visualizations are extremely valuable for understanding:

* the structure of your search space,
* the behavior of your models,
* which parameters are truly important,
* and whether you should adjust your hyperparameter ranges.

### Example: Optimization (FashionMNIST Neural Network)

*(We keep the code for completeness but the key focus is visualization, not model training.)*

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision


import optuna

# You can use Matplotlib instead of Plotly for visualization by simply replacing `optuna.visualization` with
# `optuna.visualization.matplotlib` in the following examples.
from optuna.visualization import plot_contour
from optuna.visualization import plot_edf
from optuna.visualization import plot_intermediate_values
from optuna.visualization import plot_optimization_history
from optuna.visualization import plot_parallel_coordinate
from optuna.visualization import plot_param_importances
from optuna.visualization import plot_rank
from optuna.visualization import plot_slice
from optuna.visualization import plot_timeline


SEED = 13
torch.manual_seed(SEED)

if torch.cuda.is_available():
    DEVICE = torch.device("cuda")
elif torch.backends.mps.is_available():
    DEVICE = torch.device("mps")
else:
    DEVICE = torch.device("cpu")
DIR = ".."
BATCHSIZE = 128
N_TRAIN_EXAMPLES = BATCHSIZE * 30
N_VALID_EXAMPLES = BATCHSIZE * 10


def define_model(trial):
    n_layers = trial.suggest_int("n_layers", 1, 2)
    layers = []

    in_features = 28 * 28
    for i in range(n_layers):
        out_features = trial.suggest_int("n_units_l{}".format(i), 64, 512)
        layers.append(nn.Linear(in_features, out_features))
        layers.append(nn.ReLU())

        in_features = out_features

    layers.append(nn.Linear(in_features, 10))
    layers.append(nn.LogSoftmax(dim=1))

    return nn.Sequential(*layers)


# Defines training and evaluation.
def train_model(model, optimizer, train_loader):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.view(-1, 28 * 28).to(DEVICE), target.to(DEVICE)
        optimizer.zero_grad()
        F.nll_loss(model(data), target).backward()
        optimizer.step()


def eval_model(model, valid_loader):
    model.eval()
    correct = 0
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(valid_loader):
            data, target = data.view(-1, 28 * 28).to(DEVICE), target.to(DEVICE)
            pred = model(data).argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    accuracy = correct / N_VALID_EXAMPLES

    return accuracy

In [None]:
def objective(trial):
    train_dataset = torchvision.datasets.FashionMNIST(
        DIR, train=True, download=True, transform=torchvision.transforms.ToTensor()
    )
    train_loader = torch.utils.data.DataLoader(
        torch.utils.data.Subset(train_dataset, list(range(N_TRAIN_EXAMPLES))),
        batch_size=BATCHSIZE,
        shuffle=True,
    )

    val_dataset = torchvision.datasets.FashionMNIST(
        DIR, train=False, transform=torchvision.transforms.ToTensor()
    )
    val_loader = torch.utils.data.DataLoader(
        torch.utils.data.Subset(val_dataset, list(range(N_VALID_EXAMPLES))),
        batch_size=BATCHSIZE,
        shuffle=True,
    )
    model = define_model(trial).to(DEVICE)

    optimizer = torch.optim.Adam(
        model.parameters(), trial.suggest_float("lr", 1e-5, 1e-1, log=True)
    )

    for epoch in range(10):
        train_model(model, optimizer, train_loader)

        val_accuracy = eval_model(model, val_loader)
        trial.report(val_accuracy, epoch)

        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()

    return val_accuracy

After running the study:

In [None]:
study = optuna.create_study(
    study_name="fashionmnist-study",
    storage="sqlite:///example-study.db",
    load_if_exists=True,
    direction="maximize",
    sampler=optuna.samplers.TPESampler(seed=SEED),
    pruner=optuna.pruners.MedianPruner(),
)

study.optimize(objective, n_trials=30, timeout=300)

[I 2025-11-23 13:31:41,121] A new study created in RDB with name: fashionmnist-study


A new study created in RDB with name: fashionmnist-study


[I 2025-11-23 13:31:42,539] Trial 0 finished with value: 0.10234375 and parameters: {'n_layers': 2, 'n_units_l0': 170, 'n_units_l1': 434, 'lr': 0.07294521222846949}. Best is trial 0 with value: 0.10234375.


Trial 0 finished with value: 0.10234375 and parameters: {'n_layers': 2, 'n_units_l0': 170, 'n_units_l1': 434, 'lr': 0.07294521222846949}. Best is trial 0 with value: 0.10234375.


[I 2025-11-23 13:31:43,993] Trial 1 finished with value: 0.821875 and parameters: {'n_layers': 2, 'n_units_l0': 267, 'n_units_l1': 337, 'lr': 0.01265045244598441}. Best is trial 1 with value: 0.821875.


Trial 1 finished with value: 0.821875 and parameters: {'n_layers': 2, 'n_units_l0': 267, 'n_units_l1': 337, 'lr': 0.01265045244598441}. Best is trial 1 with value: 0.821875.


[I 2025-11-23 13:31:45,400] Trial 2 finished with value: 0.78203125 and parameters: {'n_layers': 2, 'n_units_l0': 388, 'n_units_l1': 79, 'lr': 0.0001562420338129347}. Best is trial 1 with value: 0.821875.


Trial 2 finished with value: 0.78203125 and parameters: {'n_layers': 2, 'n_units_l0': 388, 'n_units_l1': 79, 'lr': 0.0001562420338129347}. Best is trial 1 with value: 0.821875.


[I 2025-11-23 13:31:46,835] Trial 3 finished with value: 0.81484375 and parameters: {'n_layers': 1, 'n_units_l0': 448, 'lr': 0.0003100388458379453}. Best is trial 1 with value: 0.821875.


Trial 3 finished with value: 0.81484375 and parameters: {'n_layers': 1, 'n_units_l0': 448, 'lr': 0.0003100388458379453}. Best is trial 1 with value: 0.821875.


[I 2025-11-23 13:31:48,162] Trial 4 finished with value: 0.5828125 and parameters: {'n_layers': 2, 'n_units_l0': 179, 'n_units_l1': 220, 'lr': 1.0905638066174219e-05}. Best is trial 1 with value: 0.821875.


Trial 4 finished with value: 0.5828125 and parameters: {'n_layers': 2, 'n_units_l0': 179, 'n_units_l1': 220, 'lr': 1.0905638066174219e-05}. Best is trial 1 with value: 0.821875.


[I 2025-11-23 13:31:48,488] Trial 5 pruned. 


Trial 5 pruned. 


[I 2025-11-23 13:31:48,662] Trial 6 pruned. 


Trial 6 pruned. 


[I 2025-11-23 13:31:50,076] Trial 7 finished with value: 0.79921875 and parameters: {'n_layers': 1, 'n_units_l0': 346, 'lr': 0.03127906576004862}. Best is trial 1 with value: 0.821875.


Trial 7 finished with value: 0.79921875 and parameters: {'n_layers': 1, 'n_units_l0': 346, 'lr': 0.03127906576004862}. Best is trial 1 with value: 0.821875.


[I 2025-11-23 13:31:51,553] Trial 8 finished with value: 0.83125 and parameters: {'n_layers': 1, 'n_units_l0': 399, 'lr': 0.017838760879309364}. Best is trial 8 with value: 0.83125.


Trial 8 finished with value: 0.83125 and parameters: {'n_layers': 1, 'n_units_l0': 399, 'lr': 0.017838760879309364}. Best is trial 8 with value: 0.83125.


[I 2025-11-23 13:31:52,886] Trial 9 finished with value: 0.83515625 and parameters: {'n_layers': 1, 'n_units_l0': 358, 'lr': 0.0010890524599308936}. Best is trial 9 with value: 0.83515625.


Trial 9 finished with value: 0.83515625 and parameters: {'n_layers': 1, 'n_units_l0': 358, 'lr': 0.0010890524599308936}. Best is trial 9 with value: 0.83515625.


[I 2025-11-23 13:31:54,177] Trial 10 finished with value: 0.846875 and parameters: {'n_layers': 1, 'n_units_l0': 268, 'lr': 0.0022476705090733336}. Best is trial 10 with value: 0.846875.


Trial 10 finished with value: 0.846875 and parameters: {'n_layers': 1, 'n_units_l0': 268, 'lr': 0.0022476705090733336}. Best is trial 10 with value: 0.846875.


[I 2025-11-23 13:31:55,474] Trial 11 finished with value: 0.8265625 and parameters: {'n_layers': 1, 'n_units_l0': 279, 'lr': 0.0028475731075628314}. Best is trial 10 with value: 0.846875.


Trial 11 finished with value: 0.8265625 and parameters: {'n_layers': 1, 'n_units_l0': 279, 'lr': 0.0028475731075628314}. Best is trial 10 with value: 0.846875.


[I 2025-11-23 13:31:55,617] Trial 12 pruned. 


Trial 12 pruned. 


[I 2025-11-23 13:31:56,123] Trial 13 pruned. 


Trial 13 pruned. 


[I 2025-11-23 13:31:56,468] Trial 14 pruned. 


Trial 14 pruned. 


[I 2025-11-23 13:31:56,651] Trial 15 pruned. 


Trial 15 pruned. 


[I 2025-11-23 13:31:57,874] Trial 16 finished with value: 0.83828125 and parameters: {'n_layers': 1, 'n_units_l0': 231, 'lr': 0.006967389538407044}. Best is trial 10 with value: 0.846875.


Trial 16 finished with value: 0.83828125 and parameters: {'n_layers': 1, 'n_units_l0': 231, 'lr': 0.006967389538407044}. Best is trial 10 with value: 0.846875.


[I 2025-11-23 13:31:58,446] Trial 17 pruned. 


Trial 17 pruned. 


[I 2025-11-23 13:31:58,594] Trial 18 pruned. 


Trial 18 pruned. 


[I 2025-11-23 13:31:59,777] Trial 19 finished with value: 0.8375 and parameters: {'n_layers': 1, 'n_units_l0': 151, 'lr': 0.005748611429593445}. Best is trial 10 with value: 0.846875.


Trial 19 finished with value: 0.8375 and parameters: {'n_layers': 1, 'n_units_l0': 151, 'lr': 0.005748611429593445}. Best is trial 10 with value: 0.846875.


[I 2025-11-23 13:32:00,089] Trial 20 pruned. 


Trial 20 pruned. 


[I 2025-11-23 13:32:00,936] Trial 21 pruned. 


Trial 21 pruned. 


[I 2025-11-23 13:32:02,211] Trial 22 finished with value: 0.828125 and parameters: {'n_layers': 1, 'n_units_l0': 133, 'lr': 0.002216776917398689}. Best is trial 10 with value: 0.846875.


Trial 22 finished with value: 0.828125 and parameters: {'n_layers': 1, 'n_units_l0': 133, 'lr': 0.002216776917398689}. Best is trial 10 with value: 0.846875.


[I 2025-11-23 13:32:03,547] Trial 23 finished with value: 0.8109375 and parameters: {'n_layers': 1, 'n_units_l0': 208, 'lr': 0.007659105490307479}. Best is trial 10 with value: 0.846875.


Trial 23 finished with value: 0.8109375 and parameters: {'n_layers': 1, 'n_units_l0': 208, 'lr': 0.007659105490307479}. Best is trial 10 with value: 0.846875.


[I 2025-11-23 13:32:04,176] Trial 24 pruned. 


Trial 24 pruned. 


[I 2025-11-23 13:32:04,460] Trial 25 pruned. 


Trial 25 pruned. 


[I 2025-11-23 13:32:05,811] Trial 26 finished with value: 0.83046875 and parameters: {'n_layers': 1, 'n_units_l0': 264, 'lr': 0.02150245982738356}. Best is trial 10 with value: 0.846875.


Trial 26 finished with value: 0.83046875 and parameters: {'n_layers': 1, 'n_units_l0': 264, 'lr': 0.02150245982738356}. Best is trial 10 with value: 0.846875.


[I 2025-11-23 13:32:05,953] Trial 27 pruned. 


Trial 27 pruned. 


[I 2025-11-23 13:32:06,102] Trial 28 pruned. 


Trial 28 pruned. 


[I 2025-11-23 13:32:06,261] Trial 29 pruned. 


Trial 29 pruned. 


…we can visualize the results.

---

### Visualization Functions

All visualizations work by simply passing the study object.
Below are the most commonly used plots, what they show, and why they matter.

---

#### Optimization History

Track how the best score evolves across trials. It helps detect:

* whether the optimization is converging,
* when improvements began to flatten,
* whether more trials are worthwhile.

In [None]:
from optuna.visualization import plot_optimization_history
plot_optimization_history(study)

#### Intermediate Values (Learning Curves)

Shows the intermediate performance reported during training (e.g., per epoch). It helps identify:

* how pruners made decisions,
* which trials improved quickly,
* noisy or unstable training runs.

##### Axes Explanation

**X-axis — Epoch / Step**

* Shows training progress (epoch or step number).
* **Each tick** = a point where `trial.report()` was called.

**Y-axis — Intermediate Metric Value**

* Shows the metric you reported during training
  (e.g., validation loss, accuracy, RMSE).
* **Each tick** = a possible value of that metric.

**Lines**

* Each line = one Optuna trial and how its performance changed over training.

In [None]:
from optuna.visualization import plot_intermediate_values
plot_intermediate_values(study)

#### Parallel Coordinate Plot

Shows relationships among parameters and the objective value in high-dimensional spaces. It allows you to see:

* which hyperparameters matter most,
* where good regions cluster,
* interactions between parameters.

##### Axes Explanation

**X-axis — Parameters and Objective**

* Each vertical axis corresponds to **one hyperparameter** or the **objective value**.
* The X-axis itself simply lays out these vertical axes side by side.

**Y-axis — Values of Each Parameter**

* Each vertical axis has its **own scale**, showing the possible values of that parameter
  (e.g., learning rate, number of layers, dropout).
* Tick marks = actual parameter values sampled in trials.

**Lines**

* Each line = one Optuna trial, connecting its parameter values across all axes.
* Patterns of lines show relationships, clusters, and which parameters impact performance.

In [None]:
from optuna.visualization import plot_parallel_coordinate
plot_parallel_coordinate(study)

Or select specific parameters:

In [None]:
plot_parallel_coordinate(study, params=["lr", "n_layers"])

#### Contour Plot

Visualizes 2D slices of the hyperparameter space. It's useful for:

* understanding smoothness or ruggedness of the search space,
* identifying interactions between two parameters.

##### Axes Explanation

**X-axis & Y-axis — Two Selected Hyperparameters**

* Each contour plot shows **two hyperparameters** on the horizontal and vertical axes.
* Tick marks = the actual sampled values of those parameters from the trials.

**Color Map (Contours)**

* The colors represent the **objective value**.
* Smooth color changes = smooth search space.
* Sharp or irregular changes = rugged space.

**Points**

* Each point = one trial’s parameter combination and resulting objective value.

In [None]:
from optuna.visualization import plot_contour
plot_contour(study)

Or with selected parameters:

In [None]:
plot_contour(study, params=["lr", "n_layers"])

#### Slice Plot

Shows how the objective value changes with respect to each hyperparameter individually. It helps to find:

* good parameter ranges,
* monotonic relationships,
* noisy parameters.

##### Axes Explanation

**X-axis — One Hyperparameter**

* Each subplot uses a single hyperparameter on the horizontal axis.
* Tick marks = the sampled values of that parameter.

**Y-axis — Objective Value**

* Shows how the objective changes for different values of that hyperparameter.
* Tick marks = objective values achieved by the trials.

**Points**

* Each point = one trial.
* Patterns show good ranges, monotonic trends, or noisy behavior.

In [None]:
from optuna.visualization import plot_slice
plot_slice(study)

#### Hyperparameter Importance

Uses model-based techniques (e.g., fANOVA) to estimate parameter importance. It's very useful to:

* shrink your search space,
* understand which parameters actually influence the outcome,
* reduce model complexity.

In [None]:
from optuna.visualization import plot_param_importances
plot_param_importances(study)

You can also compute importance on **another target**, like *duration*:


In [None]:
optuna.visualization.plot_param_importances(
    study,
    target=lambda t: t.duration.total_seconds(),
    target_name="duration"
)

#### EDF Plot (Empirical Distribution Function)

Shows the distribution of objective values across trials. It's useful for:

* comparing samplers,
* comparing search spaces,
* understanding variability in trial results.

##### Axes Explanation

**X-axis — Objective Value**

* Shows the range of objective values obtained by the trials.
* Tick marks = actual values that appeared in the study.

**Y-axis — Cumulative Probability (0–1)**

* Shows how many trials achieved a given objective value or lower.
* Tick marks represent cumulative proportions of trials (e.g., 0.2 = 20% of trials).

**Curve**

* The EDF curve rises step-by-step as objective values increase.
* Steeper rises = many trials with similar performance.
* Flatter areas = more variability.

In [None]:
from optuna.visualization import plot_edf
plot_edf(study)

#### Rank Plot

Scatter plots of parameter combinations colored by objective value. It's useful for:

* clustering trials by similarity,
* seeing where good trials are located in parameter space.

##### Axes Explanation

**X-axis & Y-axis — Two Hyperparameters**

* Each subplot shows a pair of hyperparameters.
* Tick marks = the sampled values of those parameters.

**Color — Objective Value**

* Points are colored according to the objective value (good/bad performance).
* Color helps reveal where good trials cluster.

**Points**

* Each point = one trial and its parameter combination.

In [None]:
from optuna.visualization import plot_rank
plot_rank(study)


plot_rank is experimental (supported from v3.2.0). The interface can change in the future.



#### Timeline Plot

Visualizes trial execution time and ordering. It helps to analyze:

* parallelization performance,
* slow vs. fast trials,
* pruner behavior.

In [None]:
from optuna.visualization import plot_timeline
plot_timeline(study)


plot_timeline is experimental (supported from v3.2.0). The interface can change in the future.



### Optuna Dashboard

In addition to Python visualizations, Optuna offers an interactive web UI.

Install:

```bash
pip install optuna-dashboard
```

Run:

```bash
optuna-dashboard sqlite:///example-study.db
```

You get:

* interactive plots,
* trial tables,
* hyperparameter statistics,
* search space exploration,
* pruning results,
* model comparisons.

A highly recommended tool for deeper analysis.

In [None]:
! optuna-dashboard sqlite:///example-study.db

Listening on http://127.0.0.1:8080/
Hit Ctrl-C to quit.

127.0.0.1 - - [23/Nov/2025 13:33:26] "GET / HTTP/1.1" 302 0
127.0.0.1 - - [23/Nov/2025 13:33:26] "GET /dashboard HTTP/1.1" 200 4145
127.0.0.1 - - [23/Nov/2025 13:33:26] "GET /static/bundle.js HTTP/1.1" 200 4159096
127.0.0.1 - - [23/Nov/2025 13:33:26] "GET /api/meta HTTP/1.1" 200 113
127.0.0.1 - - [23/Nov/2025 13:33:26] "GET /api/studies HTTP/1.1" 200 146
127.0.0.1 - - [23/Nov/2025 13:33:26] "GET /favicon.ico HTTP/1.1" 200 7670
  study, target=lambda t: t.values[objective_id], evaluator=PedAnovaImportanceEvaluator()
127.0.0.1 - - [23/Nov/2025 13:33:33] "GET /api/studies/1?after=0 HTTP/1.1" 200 40838
127.0.0.1 - - [23/Nov/2025 13:33:33] "GET /api/studies/1/param_importances HTTP/1.1" 200 297
127.0.0.1 - - [23/Nov/2025 13:33:33] "GET /api/studies/1/param_importances HTTP/1.1" 200 297
127.0.0.1 - - [23/Nov/2025 13:33:44] "GET /api/studies/1?after=30 HTTP/1.1" 200 2543
127.0.0.1 - - [23/Nov/2025 13:33:54] "GET /api/studies/1?after=30 