# ML Experiments HPO usage

We will assume that the user is already familiar with the base usage, if not, please check the `base_usage.ipynb` notebook.

The usage of ml_experiments for hyperparameter optimization (HPO) is very similar to the base usage, with the main difference being that we will use the `HPOExperiment` class instead of the `BaseExperiment` class. This class will automatically handle many parts of the HPO process by leveraging the `optuna` library. Similar to the base usage, we will need to define a class that inherits from `HPOExperiment`, the minimal implementation is as follows:

```python
class MyHPOExperiment(HPOExperiment):
    def _get_combinations_names(self) -> list[str]:
        raise NotImplementedError

    def _get_extra_params(self):
        raise NotImplementedError

    def _load_data(self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs):
        raise NotImplementedError

    def training_fn(self, trial_dict: dict, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> dict:
        raise NotImplementedError

    def get_search_space(self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> dict:
        raise NotImplementedError

    def get_default_values(self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> list:
        raise NotImplementedError
```

Note that these are not the same methods needed for the `BaseExperiment`, that is because behind the scenes, the `HPOExperiment` inherits from the `BaseExperiment` class and already implements many of the needed methods for the base usage. The methods `_get_combinations_names`, `_get_extra_params` and `_load_data` are the same as in the base usage, so let's focus on the other methods.

The `training_fn` method is the core of the HPO process, it is this function that will be called by the `optuna` library to train the model. It should return a dictionary which includes the metrics that we want to optmize. Let's illustrate by continuing with the previous example of training a `GradientBoostingClassifier` on the iris dataset. Imagine that we want to optimize the `learning_rate` parameter of the model by maximizing the `accuracy` metric using a bagging strategy where we divide several times the dataset in train and test set and we average the results. We will implement this logic in the `training_fn` method as follows:

(Note that this is only a simplified example to showcase how to use the `HPOExperiment` class, in practice we probably do not want to use a bagging strategy like this, because of the 'data leakage' problem)

```python
def training_fn(self, trial_dict: dict, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> dict:
    n_bagging = unique_params["n_bagging"]
    trial = trial_dict["trial"]
    learning_rate = trial.params["learning_rate"]
    metrics = []
    for i in range(n_bagging):
        base_experiment = ClassificationExperiment(
            seed=i,
            n_estimators=100,
            learning_rate=learning_rate,
            model_verbose=0
        )
        result = base_experiment.run(return_results=True)
        metrics.append(result[0]["evaluate_model_return"]["accuracy"])
    return {"accuracy": np.mean(metrics)}
```
Note that we are using the `ClassificationExperiment` defined on the previous example. We are also assuming that `n_bagging` is a combination defined in the `_get_combinations_names` method.

```python
def _get_combinations_names(self) -> dict:
    return ["n_bagging"]
```

The `get_search_space` method is used to define the search space for the hyperparameters that we want to optimize. It should return a dictionary with the keys as the name of the parameter and the values as the `optuna.distributions` distribution that we want to use for the parameter. For example, if we want to optimize the `learning_rate` parameter of the `GradientBoostingClassifier`, we can define the search space as follows:

```python
def get_search_space(self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> dict:
    return {
        "learning_rate": optuna.distributions.FloatDistribution(0.01, 0.5)
    }
```

The `get_default_values` method is used to define some configurations of hyperparameters that we want to evaluate before others, it should return a list of dicts with the keys as the name of the parameter and the values as the value of the parameter. For example, if we want to evaluate a `learning_rate` of 0.1 before others, we can define the default values as follows:

```python
def get_default_values(self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> list:
    return [
        {
            "learning_rate": 0.1
        }
    ]
```

For some cases, we might want to define a custom `get_trial_fn`. By default this function is defined as follows:

```python
def get_trial_fn(
        self, 
        study: Study,
        search_space: dict, 
        combination: dict,
        unique_params: dict,
        extra_params: dict,
        mlflow_run_id: Optional[str] = None,
        child_runs_ids: Optional[list] = None,
        **kwargs,
    ) -> dict:
        flatten_search_space = flatten_dict(search_space)
        trial = study.ask(flatten_search_space)
        trial_number = trial.number
        trial_key = "_".join([str(value) for value in combination.values()])
        trial_key = trial_key + f"-{trial_number}"  # unique key (trial number)
        child_run_id = child_runs_ids[trial_number] if child_runs_ids else None
        trial.set_user_attr('child_run_id', child_run_id)
        return dict(trial=trial, trial_key=trial_key, child_run_id=child_run_id)
```
The return of this function is the `trial_dict` passed to the `training_fn` method.

Note that by default we pass a `child_rin_id` that can be used to log parameters and metrics to mlflow in a child run attached to the main mlflow run. This is useful to keep track of the different trials and their results in a more organized way, but the user must manually log the parameters and metrics that they want, for example:

```python
def training_fn(self, trial_dict: dict, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> dict:
    n_bagging = unique_params["n_bagging"]
    trial = trial_dict["trial"]
    learning_rate = trial.params["learning_rate"]
    child_run_id = trial_dict["child_run_id"]

    if child_run_id is not None:
        mlflow.log_params(trial.params, run_id=child_run_id)

    metrics = []
    for i in range(n_bagging):
        base_experiment = ClassificationExperiment(
            seed=i,
            n_estimators=100,
            learning_rate=learning_rate,
            model_verbose=0
        )
        result = base_experiment.run(return_results=True)
        metrics.append(result[0]["evaluate_model_return"]["accuracy"])

    mean_metric = np.mean(metrics)
    if child_run_id is not None:
        mlflow.log_metric("accuracy", mean_metric, run_id=child_run_id)

    return {"accuracy": mean_metric}
```

Finally, sometimes it makes sense to define a hpo experiment that inherits from a base experiment (so we can for example share the same initialization logic, and maybe couple some parameters of the HPO experiment to the base experiment). In this case, note that we should inherit first from the `HPOExperiment` class and then from the base experiment class, otherwise we will not being using the methods defined in the `HPOExperiment` class, but the ones defined in the base experiment class, which will not work as expected. For example, if we want to define a `HPOClassificationExperiment` that inherits from the `HPOExperiment` and the `ClassificationExperiment`, we can do it as follows:

```python
class HPOClassificationExperiment(HPOExperiment, ClassificationExperiment):
    def __init__(
        self,
        *args,
        n_bagging: int = 5,
        **kwargs,
    ):
        super().__init__(*args, **kwargs)
        self.n_bagging = n_bagging

    def _get_combinations_names(self) -> list[str]:
        combination_names = super()._get_combinations_names()
        combination_names.append("n_bagging")
        return combination_names

    def training_fn(self, trial_dict: dict, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> dict:
        n_bagging = unique_params["n_bagging"]
        trial = trial_dict["trial"]
        learning_rate = trial.params["learning_rate"]
        n_estimators = combination["n_estimators"]
        child_run_id = trial_dict["child_run_id"]

        if child_run_id is not None:
            mlflow.log_params(trial.params, run_id=child_run_id)

        metrics = []
        for i in range(n_bagging):
            base_experiment = ClassificationExperiment(
                seed=i,
                n_estimators=n_estimators,
                learning_rate=learning_rate,
                model_verbose=0
            )
            result = base_experiment.run(return_results=True)
            metrics.append(result[0]["evaluate_model_return"]["accuracy"])

        mean_metric = np.mean(metrics)
        if child_run_id is not None:
            mlflow.log_metric("accuracy", mean_metric, run_id=child_run_id)

        return {"accuracy": mean_metric}

```
Note that the the methods from the base_experiment `_load_model`, `_get_metrics`, `_fit_model` and `_evaluate_model` are being overriden by the `HPOExperiment` class, however the combinations are being extended, so `seed` and `n_estimators` are still being defined as combinations and so we are iterating through them and performing the HPO process several times with different `n_estimaors` and `seed` values. The values of `seed` are not used anywhere, so this will produce identical results, therefore we should not pass several `seed` values, or get it out of the combination_names. We could also change its definition to a unique_param and iterate through them in the `training_fn` method. Anyway, we should take care when inheriting from both the `HPOExperiment` and the base experiment class, as we might end up with unexpected results if we are not careful with which methods are being overriden or extended, but this is a powerful feature that allows us to create more complex experiments by combining the base experiment logic with the HPO logic.

Like `BaseExperiment` some common parameters are already defined in the `HPOExperiment` class, they are:

- `n_trials`: the number of trials to run in the HPO process.
- `timeout_hpo`: the maximum time in seconds to run the HPO process.
- `timeout_trial`: the maximum time in seconds to run each trial.
- `max_concurrent_trials`: the maximum number of concurrent trials to run if we are using a distributed setup (dask).
- `hpo_seed`: the seed to use for the HPO process, this is used to set the seed for the `optuna` library (sampler, pruner, etc).
- `sampler`: the sampler to use for the HPO process, it can be 'tpe', 'random' or 'grid'. If not specified, it will use the 'tpe' sampler.
- `pruner`: the pruner to use for the HPO process. It defaults to 'none', but it can be set to 'hyperband' or 'sha', but it is expected that the user will implement the interface between the model and optuna to perform the pruning.
- `hpo_metric`: the metric to optimize in the HPO process, it should be one of the metrics returned by the `training_fn` method.
- `direction`: the direction to optimize the metric, it can be 'minimize' or 'maximize', it defaults to 'minimize'.

Let's now illustrate the full implementation of the example:

In [1]:
from ml_experiments import BaseExperiment
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score


class ClassificationExperiment(BaseExperiment):
    def __init__(
        self,
        *args,
        seed: int | list[int] = 42,
        n_estimators: int | list[int] = 100,
        learning_rate: float = 0.1,
        model_verbose: int = 1,
        **kwargs
    ):
        super().__init__(*args, **kwargs)
        self.seed = seed
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.model_verbose = model_verbose

    def _add_arguments_to_parser(self):
        self.parser.add_argument(
            "--seed",
            type=int,
            nargs="+",
            default=self.seed,
            help="Random seed for reproducibility.",
        )
        self.parser.add_argument(
            "--n_estimators",
            type=int,
            nargs="+",
            default=self.n_estimators,
            help="Number of estimators for the model.",
        )
        self.parser.add_argument(
            "--learning_rate",
            type=float,
            default=self.learning_rate,
            help="Learning rate for the model.",
        )
        self.parser.add_argument(
            "--model_verbose",
            type=int,
            default=self.model_verbose,
            help="Verbosity level of the model training.",
            action="store_true",
        )

    def _unpack_parser(self):
        args = super()._unpack_parser()
        self.seed = args.seed
        self.n_estimators = args.n_estimators
        self.learning_rate = args.learning_rate
        self.model_verbose = args.model_verbose

    def _get_combinations_names(self) -> list[str]:
        return ["seed", "n_estimators"]

    def _get_unique_params(self):
        unique_params = super()._get_unique_params()
        unique_params.update(
            {
                "learning_rate": self.learning_rate,
            }
        )
        return unique_params

    def _get_extra_params(self):
        extra_params = super()._get_extra_params()
        extra_params.update(
            {
                "model_verbose": self.model_verbose,
            }
        )
        return extra_params

    def _load_data(
        self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs
    ):
        seed = combination["seed"]
        iris = load_iris()
        X, y = iris.data, iris.target
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=seed)
        return dict(
            X_train=X_train,
            X_test=X_test,
            y_train=y_train,
            y_test=y_test,
        )

    def _load_model(
        self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs
    ):
        n_estimators = combination["n_estimators"]
        learning_rate = unique_params["learning_rate"]
        verbose = extra_params["model_verbose"]
        model = GradientBoostingClassifier(
            n_estimators=n_estimators,
            learning_rate=learning_rate,
            verbose=verbose,
        )
        return dict(model=model)

    def _get_metrics(
        self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs
    ):
        return dict(accuracy=accuracy_score)

    def _fit_model(
        self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs
    ):
        model = kwargs["load_model_return"]["model"]
        X_train = kwargs["load_data_return"]["X_train"]
        y_train = kwargs["load_data_return"]["y_train"]
        model.fit(X_train, y_train)
        return dict()

    def _evaluate_model(
        self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs
    ):
        accuracy_fn = kwargs["get_metrics_return"]["accuracy"]
        model = kwargs["load_model_return"]["model"]
        X_test = kwargs["load_data_return"]["X_test"]
        y_test = kwargs["load_data_return"]["y_test"]
        y_pred = model.predict(X_test)
        accuracy = accuracy_fn(y_test, y_pred)
        return dict(accuracy=accuracy)

In [2]:
from ml_experiments import HPOExperiment
import mlflow
import optuna
import numpy as np


class HPOClasificationExperiment(HPOExperiment):
    def __init__(
        self,
        *args,
        n_bagging: int = 1,
        **kwargs
    ):
        super().__init__(*args, **kwargs)
        self.n_bagging = n_bagging

    def _get_combinations_names(self) -> list[str]:
        return ["n_bagging"]

    def _get_extra_params(self):
        return super()._get_extra_params() 

    def _load_data(self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs):
        return dict()

    def training_fn(self, trial_dict: dict, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> dict:
        n_bagging = combination["n_bagging"]
        trial = trial_dict["trial"]
        learning_rate = trial.params["learning_rate"]
        child_run_id = trial_dict["child_run_id"]

        if child_run_id is not None:
            mlflow.log_params(trial.params, run_id=child_run_id)

        metrics = []
        for i in range(n_bagging):
            base_experiment = ClassificationExperiment(
            seed=i,
            n_estimators=100,
            learning_rate=learning_rate,
            model_verbose=0,
            verbose=0
            )
            result = base_experiment.run(return_results=True)
            metrics.append(result[0]["evaluate_model_return"]["accuracy"])

        mean_metric = np.mean(metrics)
        if child_run_id is not None:
            mlflow.log_metric("accuracy", mean_metric, run_id=child_run_id)

        return {"accuracy": mean_metric}

    def get_search_space(self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> dict:
        return {
            "learning_rate": optuna.distributions.FloatDistribution(0.01, 0.5)
        }

    def get_default_values(self, combination: dict, unique_params: dict, extra_params: dict, mlflow_run_id: str | None = None, **kwargs) -> list:
        return [
            {
                "learning_rate": 0.1,
            }
        ]

In [3]:
experiment = HPOClasificationExperiment(n_bagging=[5,2], hpo_metric="accuracy", direction="maximize", n_trials=10, verbose=1)
result = experiment.run(return_results=True)[0]

Combinations completed:   0%|          | 0/2 [00:00<?, ?it/s]

Trials:   0%|          | 0/10 [00:00<?, ?it/s]

Trials:   0%|          | 0/10 [00:00<?, ?it/s]

In [4]:
result

{'work_dir': PosixPath('/home/belucci/code/ml_experiments/work/2f1a82f8c5'),
 'save_dir': None,
 'load_model_return': {'tuner': <ml_experiments.tuners.OptunaTuner at 0x754ff005acd0>},
 'max_memory_used_before_fit': 282.06,
 'fit_model_return': {'study': <optuna.study.study.Study at 0x754ff113ef50>,
  'elapsed_time': 9.52416181999979},
 'max_memory_used_after_fit': 284.62,
 'evaluate_model_return': {'best/accuracy': np.float64(0.9733333333333334),
  'best/value': 0.9733333333333334},
 'total_elapsed_time': 9.525035740000021,
 'combination': {'n_bagging': 5},
 'unique_params': {'timeout_fit': None,
  'timeout_combination': None,
  'hpo_framework': 'optuna',
  'n_trials': 10,
  'timeout_hpo': 0,
  'timeout_trial': 0,
  'max_concurrent_trials': 1,
  'hpo_seed': 0,
  'sampler': 'tpe',
  'pruner': 'none',
  'direction': 'maximize',
  'hpo_metric': 'accuracy'},
 'extra_params': {},
 'mlflow_run_id': None,
 'Finished': True}