In [None]:
'''
Optuna is an automatic hyperparameter optimization framework.
It uses advanced search strategies (like Bayesian optimization, Tree-structured Parzen Estimator (TPE), and Pruning) to efficiently find the best hyperparameters.

Key Features:

Define-by-run API → You can write tuning logic dynamically in Python (not fixed beforehand like GridSearchCV).

Efficient search → Uses smarter strategies instead of brute force grid/random search.

Pruning → Stops bad trials early (saves computation).

Integration → Works with scikit-learn, PyTorch, TensorFlow, LightGBM, XGBoost, etc.

🔹 Workflow of Optuna

The typical flow looks like this:

Define an objective function

This is what Optuna will optimize (e.g., minimize validation loss, maximize accuracy).

Suggest hyperparameters

Inside the objective function, Optuna samples hyperparameters (trial.suggest_*()).

Train & evaluate the model

Use the sampled hyperparameters to train your ML/DL model and return a metric.

Run optimization

Call study.optimize() to try many hyperparameter sets.

Analyze best results

Retrieve best hyperparameters, plot search history, importance, etc.


mportant Optuna Concepts

Trial object
Provides functions to suggest hyperparameters:

trial.suggest_int("x", 1, 10) → integer range

trial.suggest_float("lr", 1e-5, 1e-1, log=True) → float range (log scale for learning rate)

trial.suggest_categorical("optimizer", ["adam", "sgd"]) → categorical choice

Study object

Holds all results of the optimization.

study.best_params → best hyperparameters

study.best_value → best score achieved

study.trials → list of all trials

Pruning

Early stopping for bad trials (saves time).

Example: optuna.integration.PyTorchLightningPruningCallback

Visualization
Optuna has built-in plots:

optuna.visualization.plot_optimization_history(study).show()
optuna.visualization.plot_param_importances(study).show()


Why Optuna over Grid/Random Search?

Grid Search → tries all combinations (very expensive).

Random Search → faster but not always efficient.

Optuna → learns from past trials → focuses on promising regions in hyperparameter space.




What is a Sampler in Optuna?

A Sampler is the algorithm Optuna uses to decide which hyperparameters to try next.

It guides the search process:

Grid Search → fixed exhaustive search

Random Search → purely random

Optuna Sampler → smarter (Bayesian, TPE, CMA-ES, etc.)

Think of it as the “brain” that chooses the next trial.

🔹 Available Samplers in Optuna
1. RandomSampler

Picks hyperparameters randomly within the defined ranges.

Similar to Random Search in scikit-learn.

Good as a baseline but not efficient for large spaces.

import optuna
sampler = optuna.samplers.RandomSampler()
study = optuna.create_study(direction="maximize", sampler=sampler)

2. TPESampler (Tree-structured Parzen Estimator) ✅ Default

Default sampler in Optuna.

A Bayesian optimization method.

Works by modeling good vs. bad hyperparameter configurations separately and sampling from the “good” regions more often.

Very effective for continuous/discrete search spaces.

Best for most use cases.

sampler = optuna.samplers.TPESampler()
study = optuna.create_study(direction="maximize", sampler=sampler)

3. CmaEsSampler (Covariance Matrix Adaptation Evolution Strategy)

An evolutionary algorithm.

Samples parameters from a multivariate normal distribution and adapts the covariance matrix over time.

Great for continuous hyperparameters with dependencies between them.

Often used in deep learning hyperparameter tuning.

sampler = optuna.samplers.CmaEsSampler()

4. GridSampler

Exhaustive grid search over given parameter values.

Not scalable for many parameters, but useful for testing small, fixed sets.

search_space = {"lr": [0.01, 0.001], "batch_size": [16, 32]}
sampler = optuna.samplers.GridSampler(search_space)

5. NSGAIISampler (Genetic Algorithm for Multi-objective)

Based on the NSGA-II evolutionary algorithm.

Used for multi-objective optimization (e.g., maximize accuracy while minimizing training time).

sampler = optuna.samplers.NSGAIISampler()
study = optuna.create_study(directions=["maximize", "minimize"], sampler=sampler)

🔹 How a Sampler Works Internally

Initialization → At the start, it may use random trials to gather data.

Model building → Builds a probabilistic model of which hyperparameter regions look promising.

Sampling → Uses that model to suggest new hyperparameters (trial.suggest_*()).

Feedback loop → Updates the model after each trial’s result.

For example:

RandomSampler → no model, just random.

TPESampler → uses past results to sample more from “good” parameter regions.

CmaEsSampler → evolves distributions over generations.

🔹 Choosing the Right Sampler
Sampler	Best For
RandomSampler	Small problems, debugging, baseline runs
TPESampler (default)	Most real-world cases, general-purpose
CmaEsSampler	Continuous search spaces with dependencies
GridSampler	Small, fixed search spaces
NSGAIISampler	Multi-objective optimization problems
🔹 Example with Custom Sampler
import optuna

def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2

# Use TPE Sampler
sampler = optuna.samplers.TPESampler(seed=42)
study = optuna.create_study(direction="minimize", sampler=sampler)
study.optimize(objective, n_trials=50)

print("Best params:", study.best_params)


✅ In summary:

A Sampler is Optuna’s strategy for exploring the hyperparameter search space.

TPESampler is the default and works well in most scenarios.

For multi-objective problems, use NSGAIISampler.

For continuous, dependent parameters, consider CmaEsSampler.



What is Optuna Visualization?

Optuna provides built-in interactive plots to:

Track optimization progress

See how hyperparameters affect performance

Compare trials

Analyze parameter importance

These are super useful for debugging and reporting results.

📦 Install (if not already):

pip install optuna plotly

🔹 Common Optuna Visualization Functions

All functions take a study object and return a Plotly figure.
You can show them with .show() in Jupyter/Notebook, or .write_html("plot.html") for saving.

1. Optimization History

Shows how the objective value (loss/accuracy/etc.) changes over trials.

import optuna.visualization as vis
vis.plot_optimization_history(study).show()


📈 X-axis → trial number

📉 Y-axis → objective value (e.g., loss, accuracy)

Helps see if tuning is converging or still improving.

2. Intermediate Values

Useful when you report intermediate metrics (e.g., validation loss at each epoch).

vis.plot_intermediate_values(study).show()


Shows the trajectory of trials over time.

Useful when combined with pruners (early stopping).

3. Parallel Coordinate Plot

Shows relationships between multiple hyperparameters and objective value.

vis.plot_parallel_coordinate(study).show()


Each vertical axis = one hyperparameter

Colored by trial performance

Helps find interactions (e.g., learning rate works only if batch size is small).

4. Slice Plot

Shows objective value vs. each hyperparameter separately.

vis.plot_slice(study).show()


Each subplot = one hyperparameter

Scatter points show trial results

Helps detect trends (e.g., accuracy improves as learning rate decreases).

5. Parameter Importance

Estimates how much each hyperparameter influenced the final score.

vis.plot_param_importances(study).show()


Bars represent importance of each parameter

Helps identify which hyperparameters matter most

Useful for simplifying the search space

6. Contour Plot

Shows 2D interactions between pairs of hyperparameters.

vis.plot_contour(study).show()


Each plot shows contour lines of performance

Helps find combinations that work well (e.g., learning rate + hidden size).

7. Rank Plot

Ranks trials based on performance.

vis.plot_rank(study).show()


Helps compare trial ordering

Useful for multi-objective studies.

🔹 Example Workflow
import optuna
import optuna.visualization as vis

# Assume study already created & optimized
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)

# Visualization
vis.plot_optimization_history(study).show()
vis.plot_slice(study).show()
vis.plot_param_importances(study).show()

🔹 Choosing the Right Visualization
Plot	Best for
Optimization History	See convergence over trials
Intermediate Values	See progress during training (with pruning)
Parallel Coordinate	Spot multi-parameter interactions
Slice Plot	See effect of single hyperparameters
Parameter Importance	Identify most critical hyperparameters
Contour Plot	Visualize 2D interactions between parameters
Rank Plot	Compare trial order & stability

✅ In summary:
Optuna visualization is your debugging + reporting toolkit. It makes hyperparameter tuning not just automatic, but also interpretable — you can explain why certain parameters worked better.




What is a Dynamic Search Space?

In most hyperparameter optimization tools (GridSearch, RandomizedSearch), you define all hyperparameters beforehand.
But in practice, the choice of one hyperparameter often depends on another.

👉 Example:

If you choose optimizer="adam", you want to tune lr.

If you choose optimizer="sgd", you also want to tune momentum.

With Optuna’s define-by-run approach, you can write conditional logic inside the objective function to dynamically create the search space.

🔹 Why is this useful?

Reduces waste (don’t search irrelevant parameters).

Allows hierarchical spaces (different models, optimizers, or architectures).

Makes the tuning process more realistic for complex ML/DL models.

🔹 Example 1: Optimizer Choice
import optuna

def objective(trial):
    optimizer_name = trial.suggest_categorical("optimizer", ["adam", "sgd"])

    if optimizer_name == "adam":
        lr = trial.suggest_float("adam_lr", 1e-5, 1e-1, log=True)
        optimizer_params = {"lr": lr}
    else:
        lr = trial.suggest_float("sgd_lr", 1e-3, 1.0, log=True)
        momentum = trial.suggest_float("momentum", 0.0, 0.99)
        optimizer_params = {"lr": lr, "momentum": momentum}

    # Dummy objective: smaller learning rate = better
    return 1.0 / lr

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=30)
print("Best Params:", study.best_params)


✅ Here the search space changes dynamically based on the chosen optimizer.

🔹 Example 2: Neural Network Architecture

Let’s say the number of hidden layers is a hyperparameter. If we pick 3 layers, then we need to sample 3 different hidden_units values.

import torch.nn as nn

def objective(trial):
    n_layers = trial.suggest_int("n_layers", 1, 3)
    layers = []

    in_features = 10
    for i in range(n_layers):
        hidden_size = trial.suggest_int(f"n_units_l{i}", 4, 128)
        layers.append(nn.Linear(in_features, hidden_size))
        layers.append(nn.ReLU())
        in_features = hidden_size

    layers.append(nn.Linear(in_features, 2))  # output layer
    model = nn.Sequential(*layers)

    # (Training loop omitted)
    # return validation accuracy
    return 0.8  # fake score


✅ The number of hyperparameters (n_units_l0, n_units_l1, etc.) depends on n_layers chosen by the trial.

🔹 Example 3: Model Selection

You can even choose between different models dynamically:

def objective(trial):
    model_name = trial.suggest_categorical("model", ["rf", "xgb"])

    if model_name == "rf":
        n_estimators = trial.suggest_int("n_estimators", 50, 300)
        max_depth = trial.suggest_int("max_depth", 2, 20)
        score = train_random_forest(n_estimators, max_depth)
    else:
        eta = trial.suggest_float("eta", 1e-3, 0.3, log=True)
        max_depth = trial.suggest_int("max_depth", 2, 10)
        score = train_xgboost(eta, max_depth)

    return score


✅ Optuna will try different models and tune their respective parameters.

🔹 Key Takeaways

Dynamic Search Space = search space depends on trial decisions.

Enabled by Optuna’s define-by-run style (trial.suggest_*() inside objective).

Allows:

Conditional hyperparameters

Model selection

Flexible NN architectures

✅ In summary:
Optuna’s dynamic search space makes it far more powerful than static tuning tools — it adapts the hyperparameter space on the fly, which is essential in real ML pipelines.




'''

In [2]:
#!pip install optuna

Collecting optuna
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.5.0-py3-none-any.whl (400 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/400.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, optuna
Successfully installed colorlog-6.9.0 optuna-4.5.0


# Machine Learning

In [5]:
import optuna
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

In [6]:
# Load dataset
X, y = load_iris(return_X_y=True)

In [11]:
# Objective function for Optuna
def objective(trial):
    # Suggest hyperparameters
    n_estimators = trial.suggest_int("n_estimators", 50, 300)
    max_depth = trial.suggest_int("max_depth", 2, 20)
    min_samples_split = trial.suggest_int("min_samples_split", 2, 10)

    # Model
    clf = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        random_state=42
    )

     # Cross-validation accuracy
    score = cross_val_score(clf, X, y, n_jobs=-1, cv=3).mean()
    return score  # maximize accuracy

In [12]:
# Create Optuna study
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=20)

print("Best Score:", study.best_value)
print("Best Params:", study.best_params)

[I 2025-09-09 19:12:23,826] A new study created in memory with name: no-name-6ff05383-e8b7-4283-8cca-256a6645c50b
[I 2025-09-09 19:12:27,103] Trial 0 finished with value: 0.9666666666666667 and parameters: {'n_estimators': 235, 'max_depth': 6, 'min_samples_split': 7}. Best is trial 0 with value: 0.9666666666666667.
[I 2025-09-09 19:12:28,909] Trial 1 finished with value: 0.9666666666666667 and parameters: {'n_estimators': 290, 'max_depth': 20, 'min_samples_split': 8}. Best is trial 0 with value: 0.9666666666666667.
[I 2025-09-09 19:12:29,714] Trial 2 finished with value: 0.9666666666666667 and parameters: {'n_estimators': 134, 'max_depth': 11, 'min_samples_split': 5}. Best is trial 0 with value: 0.9666666666666667.
[I 2025-09-09 19:12:30,139] Trial 3 finished with value: 0.9666666666666667 and parameters: {'n_estimators': 103, 'max_depth': 18, 'min_samples_split': 6}. Best is trial 0 with value: 0.9666666666666667.
[I 2025-09-09 19:12:30,934] Trial 4 finished with value: 0.966666666666

Best Score: 0.9666666666666667
Best Params: {'n_estimators': 235, 'max_depth': 6, 'min_samples_split': 7}


# Deep Learning 1

In [None]:
'''
We want to classify whether a point belongs to Class 0 or Class 1 given two numerical features.
We’ll train a simple Artificial Neural Network (ANN) and tune hyperparameters like hidden size, learning rate, epochs.
'''

In [19]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import optuna

In [20]:
# Generate synthetic dataset
X = torch.randn(200, 2)
y = (X[:, 0] * X[:, 1] > 0).long()  # Label = 1 if product > 0 else 0
dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

In [21]:
# Objective function for Optuna
def objective(trial):
    # Hyperparameters
    hidden_size = trial.suggest_int("hidden_size", 4, 64)
    lr = trial.suggest_float("lr", 1e-4, 1e-1, log=True)
    epochs = trial.suggest_int("epochs", 5, 30)

    # Model
    model = nn.Sequential(
        nn.Linear(2, hidden_size),
        nn.ReLU(),
        nn.Linear(hidden_size, 2)
    )

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    # Training loop
    for epoch in range(epochs):
        for batch_X, batch_y in dataloader:
            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

    # Evaluate accuracy
    with torch.no_grad():
        outputs = model(X)
        preds = outputs.argmax(dim=1)
        acc = (preds == y).float().mean().item()

    return acc  # maximize accuracy

In [22]:
# Run Optuna study
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=20)

print("Best Accuracy:", study.best_value)
print("Best Params:", study.best_params)

[I 2025-09-09 19:22:28,417] A new study created in memory with name: no-name-b46d00dc-2ad6-47f6-a457-9996efb40182
[I 2025-09-09 19:22:28,685] Trial 0 finished with value: 0.9950000047683716 and parameters: {'hidden_size': 38, 'lr': 0.02331349095429163, 'epochs': 17}. Best is trial 0 with value: 0.9950000047683716.
[I 2025-09-09 19:22:29,011] Trial 1 finished with value: 0.6650000214576721 and parameters: {'hidden_size': 35, 'lr': 0.0003627973025667947, 'epochs': 27}. Best is trial 0 with value: 0.9950000047683716.
[I 2025-09-09 19:22:29,354] Trial 2 finished with value: 0.8199999928474426 and parameters: {'hidden_size': 58, 'lr': 0.0002135540174041049, 'epochs': 28}. Best is trial 0 with value: 0.9950000047683716.
[I 2025-09-09 19:22:29,574] Trial 3 finished with value: 0.9950000047683716 and parameters: {'hidden_size': 32, 'lr': 0.01079910446179237, 'epochs': 24}. Best is trial 0 with value: 0.9950000047683716.
[I 2025-09-09 19:22:29,744] Trial 4 finished with value: 0.990000009536743

Best Accuracy: 1.0
Best Params: {'hidden_size': 12, 'lr': 0.06223979578076364, 'epochs': 12}


# Deep Learning 2



In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import optuna

In [14]:
# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root="./data", train=False, download=True, transform=transform)

100%|██████████| 9.91M/9.91M [00:00<00:00, 18.4MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 494kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.56MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 8.17MB/s]


In [15]:
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

In [16]:
def create_model(trial):
    # Dynamic number of hidden layers
    n_layers = trial.suggest_int("n_layers", 1, 3)
    layers = []
    in_features = 28 * 28  # MNIST input size

    for i in range(n_layers):
        hidden_size = trial.suggest_int(f"n_units_l{i}", 32, 256, step=32)
        layers.append(nn.Linear(in_features, hidden_size))
        layers.append(nn.ReLU())
        in_features = hidden_size

    layers.append(nn.Linear(in_features, 10))  # output layer
    return nn.Sequential(*layers)

def objective(trial):
    # Create model
    model = create_model(trial)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    # Optimizer choice (Dynamic)
    optimizer_name = trial.suggest_categorical("optimizer", ["Adam", "SGD"])
    lr = trial.suggest_float("lr", 1e-4, 1e-1, log=True)

    if optimizer_name == "Adam":
        optimizer = optim.Adam(model.parameters(), lr=lr)
    else:
        momentum = trial.suggest_float("momentum", 0.0, 0.99)
        optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

    criterion = nn.CrossEntropyLoss()

    # Training loop (just 1 epoch for speed)
    model.train()
    for batch_X, batch_y in train_loader:
        batch_X, batch_y = batch_X.view(batch_X.size(0), -1).to(device), batch_y.to(device)
        optimizer.zero_grad()
        output = model(batch_X)
        loss = criterion(output, batch_y)
        loss.backward()
        optimizer.step()
        break  # (remove break for full training, here just one batch for demo)

    # Evaluation
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for batch_X, batch_y in test_loader:
            batch_X, batch_y = batch_X.view(batch_X.size(0), -1).to(device), batch_y.to(device)
            output = model(batch_X)
            preds = output.argmax(dim=1)
            correct += (preds == batch_y).sum().item()
            total += batch_y.size(0)

    accuracy = correct / total
    return accuracy


In [17]:
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=20)

print("Best Accuracy:", study.best_value)
print("Best Params:", study.best_params)

[I 2025-09-09 19:16:29,142] A new study created in memory with name: no-name-c714b3c6-46a8-41bf-973e-2ec479e42e2b
[I 2025-09-09 19:16:32,684] Trial 0 finished with value: 0.1097 and parameters: {'n_layers': 3, 'n_units_l0': 256, 'n_units_l1': 160, 'n_units_l2': 160, 'optimizer': 'Adam', 'lr': 0.00348060925052619}. Best is trial 0 with value: 0.1097.
[I 2025-09-09 19:16:36,365] Trial 1 finished with value: 0.1656 and parameters: {'n_layers': 3, 'n_units_l0': 96, 'n_units_l1': 160, 'n_units_l2': 64, 'optimizer': 'Adam', 'lr': 0.004965946870862707}. Best is trial 1 with value: 0.1656.
[I 2025-09-09 19:16:38,577] Trial 2 finished with value: 0.1043 and parameters: {'n_layers': 2, 'n_units_l0': 64, 'n_units_l1': 192, 'optimizer': 'SGD', 'lr': 0.00025175309832080865, 'momentum': 0.02970495014455291}. Best is trial 1 with value: 0.1656.
[I 2025-09-09 19:16:40,410] Trial 3 finished with value: 0.1029 and parameters: {'n_layers': 3, 'n_units_l0': 224, 'n_units_l1': 128, 'n_units_l2': 96, 'optim

Best Accuracy: 0.2803
Best Params: {'n_layers': 1, 'n_units_l0': 224, 'optimizer': 'Adam', 'lr': 0.0032517606084675006}


In [18]:
import optuna.visualization as vis

# Optimization history (accuracy over trials)
vis.plot_optimization_history(study).show()

# Parameter importance
vis.plot_param_importances(study).show()

# Slice plots (effect of individual params)
vis.plot_slice(study).show()

# Parallel coordinates (interactions between params)
vis.plot_parallel_coordinate(study).show()

# Contour plot (2D param interactions)
vis.plot_contour(study).show()
