# Benchmarking scikit-decide solvers

This notebook demonstrates how to run and compare scikit-decide solvers compatible with a given domain. 

This benchmark is supported by [Ray Tune](https://docs.ray.io/en/latest/tune/index.html), a scalable Python library for experiment execution and hyperparameter tuning (incl. running experiments in parallel and logging results to Tensorboard). 

Benchmarking is important since the most efficient solvers might greatly vary depending on the domain.

## Create or load a domain

As an example, we will choose the Maze domain available in scikit-decide:

In [None]:
from skdecide import utils

MyDomain = utils.load_registered_domain("Maze")

## Select solvers to benchmark

We start by automatically detecting compatible solvers:

In [None]:
compatible_solvers = utils.match_solvers(MyDomain())
print(len(compatible_solvers), "compatible solvers:", compatible_solvers)

Optionally filter out some of these solvers: here we iteratively removed the ones running for too long in the cells below (thus blocking CPUs for other trials).

In [None]:
benchmark_solvers = [
    solver
    for solver in compatible_solvers
    if solver.__name__ not in ["AOstar", "ILAOstar", "MCTS", "POMCP", "UCT"]
]
print(len(benchmark_solvers), "solvers to benchmark:", benchmark_solvers)

## Define and run benchmark

First, customize the objective function to optimize (this will serve to rank solver solutions). Here we choose *mean episode reward* to compare solvers, but we could also consider *reached goal ratio* or a mix of both...



In [None]:
# note: most of this function's content could actually be done in 1 line with scikit-decide rollout utility (but we will need to upgrade it slightly for that)
def mean_episode_reward(solution, num_episodes=10, max_episode_steps=1000):
    domain = MyDomain()
    reward_sum = 0.0
    for _ in range(num_episodes):
        solution.reset()
        observation = domain.reset()
        episode_reward = 0.0
        step = 1
        while max_episode_steps is None or step <= max_episode_steps:
            action = solution.sample_action(observation)
            outcome = domain.step(action)
            observation = outcome.observation
            episode_reward += outcome.value.reward
            if outcome.termination:
                break
            step += 1
        reward_sum += episode_reward
    return reward_sum / num_episodes

Here we define the training function for each benchmark trial (this is fairly generic and should not change much from one benchmark to another):

In [None]:
from inspect import signature

from ray import tune


def training_function(config):
    # Get trial hyperparameters
    Solver = config["solver"]
    solver_args = config.get("solver_args", {}).get(Solver.__name__, {})
    if (
        "domain_factory" in signature(Solver.__init__).parameters
    ):  # note: this shouldn't be necessary (but currently required by some solvers until we solve the issue)
        solver_args["domain_factory"] = MyDomain
    # Solve
    with Solver(**solver_args) as s:
        solution = MyDomain.solve_with(s)
        score = mean_episode_reward(solution)
    # Feed the score back to Tune
    tune.report(mean_episode_reward=score)

Now it is time to run the benchmark. 

Some remarks: 
- By default, one free CPU will be allocated for each solver trial, but you can customize allocated CPUs/GPUs using the  `resources_per_trial` argument. 
- Some solvers will fail for various reasons (e.g. missing required arguments, as logged in induvidual error.txt files under ~/ray_results arborescence), but this will not stop the benchmarck from running the other ones. So do not be afraid of the numerous red lines below!
- You could fix most of the failing solvers by specifying the missing arguments thanks to `solver_args` option as shown below for `StableBaseline`.
- To avoid a very long output, we use here a progress reporter adapted to Jupyter notebooks that will update in place the status of different jobs. As a side effect, error messages of failing solvers may be overwritten. But you can still have a look to the error files afterwards (see "error file" column in the second table below).

In [None]:
from stable_baselines3 import PPO  # this is a RL algorithm

analysis = tune.run(
    training_function,
    config={
        "solver": tune.grid_search(benchmark_solvers),
        "solver_args": {  # Optional
            # Example of how to customize specific solver arguments (if needed):
            "StableBaseline": {
                "algo_class": PPO,
                "baselines_policy": "MlpPolicy",
                "learn_config": {"total_timesteps": 1000},
            }
        },
    },
    raise_on_failed_trial=False,
    progress_reporter=tune.JupyterNotebookReporter(overwrite=True),
    # time_budget_s = 60
)

# Print (one of the) best solver, i.e. with maximum mean_episode_reward
best_config = analysis.get_best_config(metric="mean_episode_reward", mode="max")
print("==> Best solver:", best_config["solver"])

## Analyze results

Let us get a dataframe for analyzing trial results and exporting them to a csv file.

In [None]:
df = analysis.results_df
df = df[df.done.notnull()]  # remove failed runs (avoids rows filled with NaN)
df.to_csv("benchmark_results.csv")

Below we force displaying all columns, but the really interesting ones are the first two:
- `mean_episode_reward`: this is the objective function, namely the average reward on 10 episodes.
- `time_this_iter_s`: the computation time for the trial. 
   Note that this includes the whole process coded in `training_function`, namely the solving time, but also the rollout time for computing `mean_episode_reward` which could add up some overhead depending on domain and solver.


In [None]:
import pandas as pd
from IPython.core.display import HTML, display


def force_show_all(df):
    with pd.option_context(
        "display.max_rows", None, "display.max_columns", None, "display.width", None
    ):
        display(HTML(df.to_html()))


force_show_all(df)

Note that Ray tune automatically generates Tensorboard files during `tune.run`, see the [documentation](https://docs.ray.io/en/latest/tune/user-guide.html#tensorboard-logging) for more details.

## Conclusion

This concludes this benchmarking notebook, but we just scratched the surface of Ray Tune possibilties. Feel free to further experiment, for instance by fine tuning the hyperparameters of a specific solver to improve its results (the progress can sometimes be very significant)!