# Module 2 - Optuna for Hyperparameter Tuning
In this section we discuss how to use [Optuna](https://optuna.org/) to optimise the hyperparameters of our models. Optuna also hgas integrations with MLflow so we will see how it interacts with that to log experiments too!

## Why Optuna?

Optuna is seen to be the gold standard in hyperparameter tuning, outshining the like of sklearn, skopt and hyperopt because it plugs gaps that all of these have (such as visualisations, integrations, bayesian search etc). Optuna has a very similar API to what hyperopt has so if you want to make the switch it should be fairly straight forward. 

Optuna has built itself on 5 pillar that make it stand out as a hyperparameter optimisation module:

1. Lightweight, versatile, and platform agnostic architecture
    - Handle a wide variety of tasks with a simple installation that has few requirements.
2. Pythonic search spaces
    - Define search spaces using familiar Python syntax including conditionals and loops.
3. Efficient optimization algorithms
    - Adopt state-of-the-art algorithms for sampling hyperparameters and efficiently pruning unpromising trials.
4. Easy parallelization
    - Scale studies to tens or hundreds or workers with little or no changes to the code.
5. Quick visualization
    - Inspect optimization histories from a variety of plotting functions.

In [1]:
import sys

sys.path.append("/home/ubuntu/sh-mlops-zoomcamp/mlops_jupyter_book")
from utils.utils import ROOT_DIR, render_itable, init_jb_table_style
from itables import init_notebook_mode
import optuna
import sklearn
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.datasets import fetch_california_housing
import os

init_jb_table_style()
init_notebook_mode(all_interactive=True, connected=True)

## Ok Sounds Great, How Does it Work?
Optuna runs on basically 2 things, an `Objective` and a `Study` and we define both of these.

### Objective
Think of an objective as what we are trying to solve for. Usually this means we are trying to minimise or maximise a value (Log Loss, RMSE etc.). What the objective returns is the value of what we are trying to minimise/maximise and within the objective this is also where we define the search spaces for all of our hyperparameters:
- Objective returns the value we want to maximise / minimise
- Objective also defines the search space for all of our parameters

### Study
A study is where we put our objective into practice and run _n_ number of `Trials` on the objective function, each trial trying new hyperparameters to find the best ones for our objective.

The larger the number of trials the longer the optimisation will take but the more confident you'll be that you have found the best combination (given the search spaces defined in the objective!) 

## Different Search Spaces in Optuna

Below is a quick list of the different search spaces in Optuna and what they represent:

if your objective has a `trial` parameter, you can use the `suggest` functions
- suggest_int(name, low, high, step=1, log=False) -> Random integer betwen low and high, step and log optional
- suggest_categorical(name: str, choices: Sequence[None | bool | int | float | str]) -> random category from provided values. Can be both strings or numbers
- suggest_float(name, low, high, *, step=None, log=False) -> Random float betwen low and high, step and log optional

If you are using Optuna with one of it's integrations or there is no `trial` element you can use it's distrubtions to pass to the integration:
- optuna.distributions.IntDistribution(low, high, log=False, step=1)
- optuna.distributions.CategoricalDistribution(choices)
- optuna.distributions.FloatDistribution(low, high, log=False, step=None)


In [2]:
# Define an objective function to be minimized.
def objective(trial):
    # Invoke suggest methods of a Trial object to generate hyperparameters.
    regressor_name = trial.suggest_categorical("regressor", ["SVR", "RandomForest"])
    if regressor_name == "SVR":
        svr_c = trial.suggest_float("svr_c", 1e-2, 1e2, log=True)
        # svr_kernel = trial.suggest_categorical("kernel", ["linear", "poly", "rbf"])
        regressor_obj = SVR(C=svr_c)
    else:
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 16)
        regressor_obj = RandomForestRegressor(max_depth=rf_max_depth)

    X, y = fetch_california_housing(return_X_y=True)
    X_train, X_val, y_train, y_val = sklearn.model_selection.train_test_split(X, y, random_state=0)

    regressor_obj.fit(X_train, y_train)
    y_pred = regressor_obj.predict(X_val)

    error = sklearn.metrics.mean_squared_error(y_val, y_pred)

    return error  # An objective value linked with the Trial object.


study = optuna.create_study()  # Create a new study.
study.optimize(
    objective, n_trials=10, show_progress_bar=True, n_jobs=-1
)  # Invoke optimization of the objective function.

[I 2023-06-05 00:16:12,510] A new study created in memory with name: no-name-bbc7c957-74af-4ed8-8508-05f0261f08a6


  0%|          | 0/10 [00:00<?, ?it/s]

[I 2023-06-05 00:16:19,314] Trial 0 finished with value: 0.33047156633286323 and parameters: {'regressor': 'RandomForest', 'rf_max_depth': 9}. Best is trial 0 with value: 0.33047156633286323.
[I 2023-06-05 00:16:26,096] Trial 4 finished with value: 0.3307464950107637 and parameters: {'regressor': 'RandomForest', 'rf_max_depth': 9}. Best is trial 0 with value: 0.33047156633286323.
[I 2023-06-05 00:16:31,725] Trial 3 finished with value: 1.3652234231503118 and parameters: {'regressor': 'SVR', 'svr_c': 0.38110068482016557}. Best is trial 0 with value: 0.33047156633286323.
[I 2023-06-05 00:16:31,955] Trial 1 finished with value: 1.2776235719424631 and parameters: {'regressor': 'SVR', 'svr_c': 3.30629516406425}. Best is trial 0 with value: 0.33047156633286323.
[I 2023-06-05 00:16:32,632] Trial 2 finished with value: 0.9628589165378122 and parameters: {'regressor': 'SVR', 'svr_c': 20.724974309108617}. Best is trial 0 with value: 0.33047156633286323.
[I 2023-06-05 00:16:34,246] Trial 5 finish

In [3]:
print(f" Best params: {study.best_params} \n Best_value: {study.best_value} ")

 Best params: {'regressor': 'RandomForest', 'rf_max_depth': 13} 
 Best_value: 0.28476742790597126 


## Using Optuna with MLflow
Optuna has integrations with a [bunch of different 3rd party libraries](https://optuna.readthedocs.io/en/stable/reference/integration.html) such as sklearn, MLflow, W&B etc. 
The MLflow integration allows you to log all of the experiments run into your mlflow server

In [4]:
# Setting the tracking uri database to use and also setting the experiment name
# If the experiment exists then we log in to that experiment. If not, it creates a new experiment
import mlflow
from optuna.integration.mlflow import MLflowCallback

experiment_name = "optuna_test"
mlflow.set_tracking_uri("sqlite:///mlflow.db")
mlflow.set_experiment(experiment_name)

env: MLFLOW_TRACKING_URI=sqlite:///mlflow.db


<Experiment: artifact_location='/home/ubuntu/sh-mlops-zoomcamp/mlruns/4', creation_time=1685826086193, experiment_id='4', last_update_time=1685826086193, lifecycle_stage='active', name='optuna_test', tags={}>

In [5]:
mlflc = MLflowCallback(
    tracking_uri=os.environ.get("MLFLOW_TRACKING_URI"),
    metric_name="mean_squared_error",
)


MLflowCallback is experimental (supported from v1.4.0). The interface can change in the future.



In [6]:
# Define an objective function to be minimized and track it in ml_flow
@mlflc.track_in_mlflow()
def objective(trial):
    # Invoke suggest methods of a Trial object to generate hyperparameters.
    regressor_name = trial.suggest_categorical("regressor", ["SVR", "RandomForest"])
    if regressor_name == "SVR":
        svr_c = trial.suggest_float("svr_c", 1e-2, 1e2, log=True)
        regressor_obj = SVR(C=svr_c)
    else:
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32)
        regressor_obj = RandomForestRegressor(max_depth=rf_max_depth)

    X, y = fetch_california_housing(return_X_y=True)
    X_train, X_val, y_train, y_val = sklearn.model_selection.train_test_split(X, y, random_state=0)
    mlflow.log_param("model", regressor_name)
    mlflow.log_params(regressor_obj.get_params(deep=True))
    regressor_obj.fit(X_train, y_train)
    y_pred = regressor_obj.predict(X_val)

    error = sklearn.metrics.mean_squared_error(y_val, y_pred)
    mlflow.log_metric("mse", error)
    mlflow.sklearn.log_model(regressor_obj, "model")

    return error  # An objective value linked with the Trial object.


study = optuna.create_study(
    study_name=experiment_name, pruner=optuna.pruners.HyperbandPruner(), direction="minimize"
)  # Create a new study.
study.optimize(
    objective, n_trials=20, show_progress_bar=True, n_jobs=-1, callbacks=[mlflc]
)  # Add the mlflow callback to the study.


track_in_mlflow is experimental (supported from v2.9.0). The interface can change in the future.

[I 2023-06-05 00:16:52,272] A new study created in memory with name: optuna_test


  0%|          | 0/20 [00:00<?, ?it/s]


Setuptools is replacing distutils.



[I 2023-06-05 00:17:13,595] Trial 0 finished with value: 0.9964742184325133 and parameters: {'regressor': 'SVR', 'svr_c': 18.214176515999654}. Best is trial 0 with value: 0.9964742184325133.
[I 2023-06-05 00:17:19,558] Trial 1 finished with value: 0.47999287347127256 and parameters: {'regressor': 'RandomForest', 'rf_max_depth': 5}. Best is trial 1 with value: 0.47999287347127256.
[I 2023-06-05 00:17:24,836] Trial 2 finished with value: 0.5371519934604444 and parameters: {'regressor': 'RandomForest', 'rf_max_depth': 4}. Best is trial 1 with value: 0.47999287347127256.
[I 2023-06-05 00:17:38,615] Trial 3 finished with value: 0.27155540343253526 and parameters: {'regressor': 'RandomForest', 'rf_max_depth': 28}. Best is trial 3 with value: 0.27155540343253526.
[I 2023-06-05 00:17:58,842] Trial 4 finished with value: 1.3799042769133372 and parameters: {'regressor': 'SVR', 'svr_c': 0.040151985047117054}. Best is trial 3 with value: 0.27155540343253526.
[I 2023-06-05 00:18:12,687] Trial 5 fin

```{admonition} Accessing MLflow Models
:class: tip
After you are finished with an experiment you can extract all the results and put them into a dataframe for easy viewing!
The `mlflow.search_runs()` method will write everything to a dataframe and then you can use this to find specific run_ids
to use in later parts of your code
'''

In [7]:
df = mlflow.search_runs()
best_model_run_id = df[df["metrics.mse"] == df["metrics.mse"].min()]["run_id"].values[0]
print(f"best model run id = {best_model_run_id}")

best model run id = 50407ff6ae594a87b5f2081254fda3db


In [68]:
# Inference after loading the logged model
model_uri = "runs:/{}/model".format(best_model_run_id)
loaded_model = mlflow.sklearn.load_model(model_uri)

X, y = fetch_california_housing(return_X_y=True)
loaded_model.predict(X)[0]

4.4283009000000035

# Additional Optuna Features

## Pruning
[Pruning](https://optuna.readthedocs.io/en/v2.0.0/tutorial/pruning.html) is a fantastic feature in Optuna that allows early stopping if a particular set of hyperparameters seems to be far worse than the best configuration we have already found. It works by looking at each step of a model being trained and assess its performance, comparing it to the perfromance of our previous models at the same step. This is a great way to remove redundant trials and speed up the hyperparameter search

This works for any model that learns incrementally and that we can access these incremental steps. Examples are:
- SGD Regressor / Classifiers
- XGBoost
- Pytorch
- catboost etc.

__NOTE__: You can implement pruning for other models such as Random Forest, Linear Reg etc. in Sklearn, the issue is there is no partial observability so you only know if the trial should be pruned or not after the trial is finished which defeats the purpose

In [61]:
# Define an objective function to be minimized.
def objective(trial):
    iris = sklearn.datasets.load_iris()
    classes = list(set(iris.target))
    train_x, valid_x, train_y, valid_y = sklearn.model_selection.train_test_split(
        iris.data, iris.target, test_size=0.25, random_state=0
    )

    alpha = trial.suggest_float("alpha", 1e-5, 1e-1)
    loss = trial.suggest_categorical("loss", ["hinge", "log_loss", "perceptron", "huber"])
    clf = sklearn.linear_model.SGDClassifier(alpha=alpha, loss=loss, random_state=42)

    for step in range(100):
        clf.partial_fit(train_x, train_y, classes=classes)

        # Report intermediate objective value.
        intermediate_value = 1.0 - clf.score(valid_x, valid_y)
        trial.report(intermediate_value, step)

        # Handle pruning based on the intermediate value.
        if trial.should_prune():
            raise optuna.TrialPruned()

    return 1.0 - clf.score(valid_x, valid_y)


sampler = optuna.samplers.TPESampler(seed=10)
study = optuna.create_study(
    direction="minimize",
    sampler=sampler,
    pruner=optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=5, interval_steps=3),
)  # Create a new study.
study.optimize(
    objective,
    n_trials=20,
    show_progress_bar=True,
    n_jobs=-1,
)  # Invoke optimization of the objective function.

[I 2023-06-05 00:28:57,657] A new study created in memory with name: no-name-1b81733e-dcef-43ac-a6dd-0b26a733f811


  0%|          | 0/20 [00:00<?, ?it/s]

[I 2023-06-05 00:28:58,723] Trial 1 finished with value: 0.26315789473684215 and parameters: {'alpha': 0.009436863669364146, 'loss': 'perceptron'}. Best is trial 1 with value: 0.26315789473684215.
[I 2023-06-05 00:28:59,087] Trial 3 finished with value: 0.42105263157894735 and parameters: {'alpha': 0.06635642546476274, 'loss': 'huber'}. Best is trial 1 with value: 0.26315789473684215.
[I 2023-06-05 00:28:59,206] Trial 2 finished with value: 0.3421052631578947 and parameters: {'alpha': 0.05662170987898821, 'loss': 'log_loss'}. Best is trial 1 with value: 0.26315789473684215.
[I 2023-06-05 00:28:59,417] Trial 0 finished with value: 0.13157894736842102 and parameters: {'alpha': 0.007637790576813233, 'loss': 'log_loss'}. Best is trial 0 with value: 0.13157894736842102.
[I 2023-06-05 00:28:59,719] Trial 4 finished with value: 0.10526315789473684 and parameters: {'alpha': 0.027904362274394405, 'loss': 'perceptron'}. Best is trial 4 with value: 0.10526315789473684.
[I 2023-06-05 00:28:59,734]

In [62]:
# Calculating the pruned and completed trials
pruned_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
complete_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]

print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

  Number of finished trials:  20
  Number of pruned trials:  9
  Number of complete trials:  11


## Visualisations
When you complete a study using Optuna, you get a boat load of really useful plots that can both help inform you of your current experiment but also can help you fine tune future ones too.
Below is a (non-exhaustive) table of useful plots that you get for free just by running a study:


| Name      | Use                                       | Function Name                             |
|-----------|-------------------------------------------|-------------------------------------------|
|Contour Plot               |Creates a 2D contour map to visualise hyperparameters Vs objective|optuna.visualization.plot_contour|
|Hyperparameter Importances |Bar chart showing normalised hyperparameter importances|optuna.visualization.plot_param_importances|
|Parallel Coordinates       |Coordinate plot showing the lineage of hyperparameters to the final objective value|optuna.visualization.plot_parallel_coordinate|
|Optimisation History       |History of Objective Function value for each trial ran|optuna.visualization.plot_optimization_history|
|Intermediate Values        |Intermediate values for each trial (Only works with partial fit models)|optuna.visualization.plot_intermediate_values|

If you want to see a full list of possible plots they can be found [here.](https://optuna.readthedocs.io/en/stable/reference/visualization/index.html)

Below is an example of 4 of the plots that can be rendered right out of the box using Optuna:

In [87]:
subplots = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=(
        "Contour Plot of Alpha Vs Loss",
        "Parameter Importance",
        "Optimisation History",
        "Optimisation Pruning Plot",
    ),
)
contour = optuna.visualization.plot_contour(study, params=["alpha", "loss"]).data
param_importances = optuna.visualization.plot_param_importances(study).data
optim_hist = optuna.visualization.plot_optimization_history(study).data
intermediate_vals = optuna.visualization.plot_intermediate_values(study).data

subplots.add_traces(contour, 1, 1)
subplots.add_traces(param_importances, 1, 2)
subplots.add_traces(optim_hist, 2, 1)
subplots.add_traces(intermediate_vals, 2, 2)
subplots.update_layout(showlegend=False)
subplots.show()

## Further Reading

Some Further reading in case you are interested in Optuna and want to explore a bit more :) 
1. [What is Optuna and Examples Youtube Video](https://www.youtube.com/watch?v=P6NwZVl8ttc&t=997s&ab_channel=PyTorch)
2. [Optuna Examples Including Integrations](https://github.com/optuna/optuna-examples)
3. [Optuna Dashboard for Visualisations](https://optuna.org/#dashboard)
