# Important functions in Optuna

* create_study: Initializes a new study for hyperparameter optimization, allowing configuration of the optimization process.
* sampler: Suggests hyperparameter values for trials based on defined strategies, balancing exploration and exploitation.
* pruner: Enables early stopping of trials that are unlikely to yield better results, improving computational efficiency. **By default its TPE**
* trials: Represents individual trial runs, capturing all details about hyperparameter configurations and their results for analysis.

**Documentation**
* [Optuna](https://www.dropbox.com/scl/fo/uthgp753cd46dun5eaorz/AAVbbKcODdsOg-Kjn2blH-I/Section-13-Optuna?dl=0&preview=02-Optuna-Main-Functions.pdf&rlkey=26ugrcmf9wgcz4n28w3ig3jua&subfolder_nav_tracking=1)

# Algorithms covered

* Grid Search
* Randomized search
* Tree-structured Parzen Estimators
* CMA-ES

## Define the objective function


In [None]:
from sklearn.ensemble import RandomForestClassifier

import optuna

# the objective function takes the hyperparameter space
# as input

def objective(trial):

    rf_n_estimators = trial.suggest_int("rf_n_estimators", 100, 1000)
    rf_criterion = trial.suggest_categorical("rf_criterion", ['gini', 'entropy'])
    rf_max_depth = trial.suggest_int("rf_max_depth", 1, 4)
    rf_min_samples_split = trial.suggest_float("rf_min_samples_split", 0.01, 1)

    model = RandomForestClassifier(
        n_estimators=rf_n_estimators,
        criterion=rf_criterion,
        max_depth=rf_max_depth,
        min_samples_split=rf_min_samples_split,
    )

    score = cross_val_score(model, X_train, y_train, cv=3)
    accuracy = score.mean()
    return accuracy

##Search algorithms within Optuna


In [None]:
# Step 1: Create a study for optimization
# Initialize an Optuna study to maximize the objective function using random sampling
study = optuna.create_study(
    direction="maximize",  # Objective is to maximize the performance metric
    sampler=optuna.samplers.RandomSampler(),  # Use random sampling for hyperparameter selection
)

# Step 2: Optimize the objective function
# Run the optimization process for a specified number of trials
study.optimize(objective, n_trials=20)  # Perform 20 optimization trials

# Step 3: Retrieve the best parameters and value
# Obtain the hyperparameters that resulted in the highest score
study.best_params
# Retrieve the best score achieved during the optimization
study.best_value
# Convert the trial results into a DataFrame for analysis
study.trials_dataframe()

# Step 4: Create a study using TPE sampling
# Initialize an Optuna study for maximizing the objective function with TPE sampling
study = optuna.create_study(
    direction="maximize",  # Objective is to maximize the performance metric
    sampler=optuna.samplers.TPESampler(
        n_startup_trials=10  # Number of initial random trials before TPE sampling
    ),
)

# Step 5: Optimize the objective function using TPE
# Run the optimization process for a specified number of trials
study.optimize(objective, n_trials=20)  # Perform 20 optimization trials
# Retrieve the best parameters and value from TPE optimization
study.best_params
study.best_value

# Step 6: Create a study using CMA-ES sampling
# Initialize an Optuna study for maximizing the objective function with CMA-ES sampling
study = optuna.create_study(
    direction="maximize",  # Objective is to maximize the performance metric
    sampler=optuna.samplers.CmaEsSampler(),  # Use CMA-ES for hyperparameter selection
)

# Step 7: Optimize the objective function using CMA-ES
# Run the optimization process for a specified number of trials
study.optimize(objective, n_trials=5)  # Perform 5 optimization trials
# Retrieve the best parameters and value from CMA-ES optimization
study.best_params
study.best_value

# Step 8: Define a search space for grid sampling
# Specify the hyperparameter values for grid search
search_space = {
    "rf_n_estimators": [100, 500, 1000],  # Possible values for number of trees
    "rf_criterion": ['gini', 'entropy'],  # Possible criteria for splitting
    "rf_max_depth": [1, 2, 3],  # Maximum depth of trees
    "rf_min_samples_split": [0.1, 1.0]  # Minimum samples required to split an internal node
}

# Step 9: Create a study using grid sampling
# Initialize an Optuna study for maximizing the objective function using grid sampling
study = optuna.create_study(
    direction="maximize",  # Objective is to maximize the performance metric
    sampler=optuna.samplers.GridSampler(search_space),  # Use grid sampling based on defined search space
)

# Step 10: Optimize the objective function using grid search
# Run the optimization process based on the grid search
study.optimize(objective)  # Perform optimization using grid sampling
# Retrieve the best parameters and value from grid search
study.best_params
study.best_value

**Not recommended to use GridSearch**

# Tuning hyperparameters for different ML models


In [None]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier  # Import ensemble classifiers
from sklearn.linear_model import LogisticRegression  # Import logistic regression classifier
from sklearn.model_selection import cross_val_score  # Import cross-validation score
import optuna  # Import Optuna for hyperparameter optimization

# Step 1: Define the objective function for optimization
# The objective function takes a trial as input to suggest hyperparameters
def objective(trial):

    # Step 2: Suggest a classifier to optimize
    # Select one of the classifiers to optimize from a categorical list
    classifier_name = trial.suggest_categorical("classifier", ["logit", "RF", 'GBM'])

    if classifier_name == "logit":
        # Step 3: Suggest hyperparameters for Logistic Regression
        # Select penalty type for logistic regression
        logit_penalty = trial.suggest_categorical('logit_penalty', ['l1', 'l2'])
        # Select the regularization strength
        logit_c = trial.suggest_float('logit_c', 0.001, 10)
        logit_solver = 'saga'  # Solver that supports both penalties

        # Instantiate Logistic Regression model with selected hyperparameters
        model = LogisticRegression(
            penalty=logit_penalty,
            C=logit_c,
            solver=logit_solver,
        )

    elif classifier_name == "RF":
        # Step 4: Suggest hyperparameters for Random Forest
        # Select number of trees in the forest
        rf_n_estimators = trial.suggest_int("rf_n_estimators", 100, 1000)
        # Select the criterion for splitting
        rf_criterion = trial.suggest_categorical("rf_criterion", ['gini', 'entropy'])
        # Select maximum depth of the trees
        rf_max_depth = trial.suggest_int("rf_max_depth", 1, 4)
        # Select minimum number of samples required to split an internal node
        rf_min_samples_split = trial.suggest_float("rf_min_samples_split", 0.01, 1)

        # Instantiate Random Forest Classifier model with selected hyperparameters
        model = RandomForestClassifier(
            n_estimators=rf_n_estimators,
            criterion=rf_criterion,
            max_depth=rf_max_depth,
            min_samples_split=rf_min_samples_split,
        )

    else:
        # Step 5: Suggest hyperparameters for Gradient Boosting Classifier
        # Select number of boosting rounds
        gbm_n_estimators = trial.suggest_int("gbm_n_estimators", 100, 1000)
        # Select the criterion for loss function
        gbm_criterion = trial.suggest_categorical("gbm_criterion", ['squared_error', 'friedman_mse'])
        # Select maximum depth of the trees
        gbm_max_depth = trial.suggest_int("gbm_max_depth", 1, 4)
        # Select minimum number of samples required to split an internal node
        gbm_min_samples_split = trial.suggest_float("gbm_min_samples_split", 0.01, 1)

        # Instantiate Gradient Boosting Classifier model with selected hyperparameters
        model = GradientBoostingClassifier(
            n_estimators=gbm_n_estimators,
            criterion=gbm_criterion,
            max_depth=gbm_max_depth,
            min_samples_split=gbm_min_samples_split,
        )

    # Step 6: Evaluate the model using cross-validation
    # Perform cross-validation and calculate the accuracy score
    score = cross_val_score(model, X_train, y_train, cv=3)  # 3-fold cross-validation
    accuracy = score.mean()  # Average accuracy score

    return accuracy  # Return the accuracy for optimization

# Step 7: Create a study for optimization
# Initialize an Optuna study to maximize the accuracy of the model
study = optuna.create_study(
    direction="maximize",  # Objective is to maximize accuracy
    sampler=optuna.samplers.TPESampler(),  # TPE sampler is the default, can be omitted
)

# Step 8: Optimize the objective function
# Run the optimization process for a specified number of trials
study.optimize(objective, n_trials=20)  # Perform 20 optimization trials

# Step 9: Analyze the results
# Convert the study results into a DataFrame
results = study.trials_dataframe()

# Step 10: Display counts of each classifier used in the trials
results['params_classifier'].value_counts()

# Step 11: Group the results by classifier and calculate mean and standard deviation of the accuracy
results.groupby(['params_classifier'])['value'].agg(['mean', 'std'])

The search quickly realised that GBM returned the best performance, so explored the hyperparameter space for that model more than for the others.

#Optuna Plotting


* **Plot Optimization History**
This plot helps you visualize how the objective (such as accuracy) has changed across different trials. You can use this plot to see how the optimization process progresses and to spot trends or improvements over time.

* **Plot Contour** for Specific Parameters
A contour plot shows the relationship between two or more parameters and their impact on the objective. It’s useful for visualizing how combinations of parameters affect performance.

* **Plot Slice** for Specific Parameters
A slice plot shows how the objective value changes as a function of a single hyperparameter, while keeping others fixed. This is useful to see the individual effect of each parameter on the model’s performance.

* **Plot Parameter Importances**
This plot shows the relative importance of different hyperparameters in the optimization process. You can use it to quickly identify which parameters have the most significant impact on performance.

* **Plot Parallel Coordinates** for Selected Parameters
A parallel coordinate plot visualizes how each trial performed by showing the relationships between multiple hyperparameters and their impact on the objective. It helps in spotting patterns and identifying successful configurations.

In [None]:
import optuna  # Import Optuna for hyperparameter optimization visualization

# Step 1: Plot the optimization history
# Create a figure showing the optimization history of the study
fig = optuna.visualization.matplotlib.plot_optimization_history(study)

# Step 2: Plot contour for specific parameters
# Create a contour plot for the relationship between selected parameters
optuna.visualization.matplotlib.plot_contour(
    study,
    params=["num_conv_layers", "num_dense_layers", "optimizer_name", 'units'],
)

# Step 3: Plot the slice of the study
# Create a slice plot to visualize how the objective value changes with the selected parameters
optuna.visualization.matplotlib.plot_slice(
    study,
    params=["num_conv_layers", "num_dense_layers", "optimizer_name", 'units'],
)

# Step 4: Plot parameter importances
# Create a plot to visualize the importance of each hyperparameter in the study
optuna.visualization.matplotlib.plot_param_importances(study)

# Step 5: Plot parallel coordinate for selected parameters
# Create a parallel coordinate plot to visualize the relationship between parameters and objective value
optuna.visualization.matplotlib.plot_parallel_coordinate(
    study,
    params=["num_conv_layers", "num_dense_layers", "optimizer_name", 'units'],
)

**For better overview of how to plot**

* [Plotting](https://github.com/solegalli/hyperparameter-optimization/blob/master/Section-12-Optuna/05-Evaluating-the-search.ipynb)

# Successive Halving with Optuna


In [None]:
import xgboost as xgb  # Import the XGBoost library for gradient boosting

def objective(trial):
    # Step 1: Define the hyperparameter space
    # Set up initial hyperparameters for the XGBoost model
    param = {
        "verbosity": 0,  # Suppress output messages
        "objective": "binary:logistic",  # Specify binary logistic regression for binary classification
        "eval_metric": "auc",  # Use AUC as the evaluation metric
        "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),  # Choose the type of booster
        "lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True),  # L2 regularization term
        "alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True),  # L1 regularization term
    }

    # Step 2: Conditional hyperparameter space
    # Additional hyperparameters that depend on the booster type
    if param["booster"] == "gbtree" or param["booster"] == "dart":
        param["max_depth"] = trial.suggest_int("max_depth", 1, 9)  # Maximum depth of the trees
        param["eta"] = trial.suggest_float("eta", 1e-8, 1.0, log=True)  # Learning rate
        param["gamma"] = trial.suggest_float("gamma", 1e-8, 1.0, log=True)  # Minimum loss reduction
        param["grow_policy"] = trial.suggest_categorical("grow_policy", ["depthwise", "lossguide"])  # Growth policy for tree construction

    if param["booster"] == "dart":
        param["sample_type"] = trial.suggest_categorical("sample_type", ["uniform", "weighted"])  # Sampling type for DART
        param["normalize_type"] = trial.suggest_categorical("normalize_type", ["tree", "forest"])  # Normalization type for DART
        param["rate_drop"] = trial.suggest_float("rate_drop", 1e-8, 1.0, log=True)  # Dropout rate for DART
        param["skip_drop"] = trial.suggest_float("skip_drop", 1e-8, 1.0, log=True)  # Skip dropout rate for DART

    # Step 3: Add pruning callback
    # Set up pruning callback to stop unpromising trials early based on validation AUC
    pruning_callback = optuna.integration.XGBoostPruningCallback(trial, "validation-auc")

    # Step 4: Train the model
    # Train the XGBoost model with the specified parameters and evaluation dataset
    bst = xgb.train(param, dtrain, evals=[(dtest, "validation")], callbacks=[pruning_callback])

    # Step 5: Evaluate the model
    # Make predictions and calculate accuracy
    preds = bst.predict(dtest)  # Get predictions for the test dataset
    pred_labels = np.rint(preds)  # Round predictions to get binary labels
    accuracy = accuracy_score(y_test, pred_labels)  # Calculate accuracy score

    return accuracy  # Return accuracy for optimization

# Step 6: Create a study for optimization
# Initialize an Optuna study to maximize accuracy of the model
study = optuna.create_study(
    sampler=optuna.samplers.RandomSampler(),  # Use random sampling for hyperparameter selection
    pruner=optuna.pruners.SuccessiveHalvingPruner(  # Set up a pruning strategy for trials
        min_resource=1,  # Minimum validation rounds before stopping
        reduction_factor=3,  # Factor by which to reduce resources
        bootstrap_count=0,  # Minimum number of trials to complete before promoting
    ),
    direction="maximize",  # Objective is to maximize accuracy
)

# Step 7: Optimize the objective function
# Run the optimization process for a specified number of trials
study.optimize(
    objective,
    n_trials=30,  # Perform 30 optimization trials
)

**Additional notes**

* Conditional Dependencies: The hyperparameters that are defined depend on the choice of the boosting method (booster). For instance, if you choose "gblinear", you won’t need to set parameters like max_depth or eta, as they are irrelevant for linear models. This flexibility allows the model to adapt to the chosen boosting method, optimizing its performance based on the specific characteristics of that method.
* By structuring the hyperparameter space this way, you ensure that only the relevant hyperparameters are suggested and optimized during the training process, which simplifies the optimization problem and improves model performance.

**Nested Hyperparameters**
* The first if block is applied to both gbtree and dart. These two boosters share certain hyperparameters, such as max_depth, eta, gamma, and grow_policy.
* The second if block specifically targets only dart because dart has additional hyperparameters (sample_type, normalize_type, rate_drop, and skip_drop) that are not relevant to gbtree.

If you remove the "or" condition from the first if block, dart would not get the shared hyperparameters (max_depth, eta, etc.), which are still necessary for its performance. Here's a breakdown:

# Hyperband with Optuna


In [None]:
import xgboost as xgb  # Import XGBoost for the model

# Step 1: Create an Optuna study
# Initialize a study for optimizing hyperparameters
study = optuna.create_study(

    # Step 2: Define the sampling method for hyperparameters
    # RandomSampler randomly samples hyperparameters
    sampler=optuna.samplers.RandomSampler(),

    # Step 3: Use a pruner to stop unpromising trials early
    # HyperbandPruner is used to prune less promising configurations during training
    pruner=optuna.pruners.HyperbandPruner(

        # Step 4: Minimum resource
        # Minimum number of validation rounds required before pruning
        min_resource=1,

        # Step 5: Maximum resource
        # The maximum budget or the total number of validation rounds
        max_resource=81,

        # Step 6: Reduction factor
        # Determines how many configurations will be promoted to the next round
        reduction_factor=3,
    ),

    # Step 7: Direction of optimization
    # We aim to maximize the objective value (e.g., accuracy)
    direction="maximize",
)

# Step 8: Optimize the objective function
# Perform optimization, limiting the number of trials (hyperparameter configurations)
study.optimize(
    objective,

    # Step 9: Number of trials
    # The number of hyperparameter configurations to try
    n_trials=50,
)

### Specific parameters description

In [None]:
# Step 4: Minimum resource
# This defines the smallest amount of resources (e.g., training steps, epochs, or validation rounds)
# that must be used before the pruner considers stopping a trial. Setting it to 1 means that
# every trial will run at least one validation round before any pruning decisions are made.
min_resource=1,

# Step 5: Maximum resource
# This sets the total available budget for training, often in terms of the number of validation rounds
# or epochs. In this case, the training process will be allowed to run up to 81 validation rounds (or units of training).
# Trials may be stopped early based on performance, but none will exceed this limit.
max_resource=81,

# Step 6: Reduction factor
# This controls how aggressively trials are pruned. A reduction factor of 3 means that
# after each stage, only about one-third of the remaining configurations will continue to the next round.
# The rest will be pruned (stopped) to focus the computational resources on the more promising trials.
reduction_factor=3,

# Step 7: Bootstrap count
# This parameter defines the minimum number of trials that must complete a stage (rung)
# before any trials are promoted to the next stage. Setting `bootstrap_count=0` allows
# trials to be promoted as soon as they finish without waiting for others.
# This speeds up the optimization process but might lead to premature promotions based on limited data.
bootstrap_count=0,