# Benchmarking

## Introduction

**BTB** provides a benchmarking framework that allows users and developers to evaluate the
performance of the BTB Tuners or other tuning functions for Machine Learning Hyperparameter
Tuning on hundreds of real world classification problem and classical mathematical optimization problems.

### Prerequisites

In order to use this `benchmarking` module, you will have to fork the
**BTB** repository
and install it from source. You can visit our
[Get Started](https://hdi-project.github.io/BTB/contributing.html#get-started)
tutorial and follow until step 4, which explains how to clone and install the repository
from it's source.

### The Benchmarking process

The Benchmarking BTB process has two main concepts.

#### Challenges

A Challenge of the BTB Benchmarking framework is a Python class which has a method that produces a
score that can be optimized by tuning a set of hyperparameters.

#### Tuning Functions

In the context of the BTB Benchmarking, `Tuning Functions` are python functions that, given a scoring
function and its tunable hyperparameters, try to search for the ideal hyperparameter values within
a given number of iterations.

If you want to add a tuner, you could follow the specific signature a tuning function has:

```python3
def tuning_function(
    scoring_function: callable,
    tunable_hyperparameters: dict,
    iterations: int) -> score: float
```

## Creating a tuning function

Now let's create a tuning function that takes as an input takes the following arguments:
- `scoring_function`: A function that given the keyword args of the `tunable_hp` generates a score.
- `tunable_hp`: A dictionary representation of `HyperParams`.
- `iterations`: Amount of tuning iterations to perform.

Our function will use a [BTB Tuner](https://github.com/HDI-Project/BTB/blob/master/tutorials/01_Tuning.ipynb) `GPEiTuner` that will iteratively will:

1. Propose a new set of hyperparameters to be scored.
2. Score them against the `scoring_function`.
3. Update the `best_score` so far.

And finally will return the `best_score` obtained for the given iterations.

In [1]:
import numpy as np

from btb.tuning import GPEiTuner, Tunable

def tuning_function(scoring_function, tunable_hp, iterations):
    tunable = Tunable.from_dict(tunable_hp)
    tuner = GPEiTuner(tunable)
    
    best_score = -np.inf
    
    for _ in range(iterations):
        proposal = tuner.propose()
        score = scoring_function(**proposal)
        tuner.record(proposal, score)
        
        best_score = max(score, best_score)
        
    return best_score

## Running the Benchmarking

The user API for the BTB Benchmarking is the `btb_benchmark.main.run_benchmark` function.

The `run_benchmark` function accepts the following arguments:

- `tuners`: list of tuners or tuning functions that will be benchmarked.
- `challenge_types`: list of types of challenges that will be used for benchmark (optional).
- `challenges`: list of names of challenges that will be benchmarked (optional).
- `sample`: if specified, run the benchmark on a subset of the available challenges of the given size (optional).
- `iterations`: the number of tuning iterations to perform per challenge and tuner.
- `output_path`: If given, store the benchmark results in the given path as a CSV file.
- `detailed_output`: If ``True`` a dataframe with the elapsed time, score and iterations will be returned.


*Note*: as we want to provide a simple usage example, we will be demostrating the benchmarking
functionality with a fixed amount of `samples` and low `iterations`.

The easiest way to run the benchmarking process is to import `run_benchmark` and run it with the
desired arguments:

In [2]:
from btb_benchmark import run_benchmark

scores = run_benchmark(sample=1, iterations=1)

scores

Unnamed: 0,BTB.GPEiTuner,BTB.GPTuner,BTB.UniformTuner,HyperOpt.rand,HyperOpt.tpe
XGBoostChallenge('rabe_97_1.csv'),0.509524,0.352381,0.352381,0.757832,0.790909


#### Tuners

If you want to run the benchmark on your own tuner implementation, or in a subset of the BTB
tuners, you can pass them as a list to the tuners argument. This can be done by either directly
passing the function or the name of the implemented tuners:

- `BTB.UniformTuner`
- `BTB.GPTuner`
- `BTB.GPEiTuner`
- `HyperOpt.rand`
- `HyperOpt.tpe`

For example, if we want to compare the performance of our tuning function and BTB.GPTuner, we can
call the `run_benchmark` function like this:

In [3]:
tuners = [
    tuning_function,
    'BTB.GPTuner',
]
results = run_benchmark(tuners=tuners, sample=1, iterations=5)
results

Unnamed: 0,BTB.GPTuner,tuning_function
XGBoostChallenge('wall-robot-navigation_3.csv'),0.614916,0.720911


#### Challenges

If we want to run the benchmark on a subset of the challenges, we can pass their names to the
challenges argument. If a given challenge is the name of a mathematical optimization problem
function, the corresponding Mathematical Optimization Challenge will be executed.

If the given challenge is the name of a Machine Learning Classification problem, all the
implemented classifiers will be benchmarked on that dataset.

For example, if we want to run only on the `stock_1` dataset, we can call the `run_benchmark` function like this:

In [4]:
challenges = ['stock_1']
results = run_benchmark(challenges=challenges, iterations=5)
results

Unnamed: 0,BTB.GPEiTuner,BTB.GPTuner,BTB.UniformTuner,HyperOpt.rand,HyperOpt.tpe
XGBoostChallenge('stock_1'),0.78677,0.789158,0.783963,0.612163,0.534408


Additionally, if we only want to run on a family of challenges or a specific Machine Learning
model, we can specify it passing the `types` argument.

For example, if we want to run all the dataset on the XGBoost model, we can call the run benchmark
function like this:

In [5]:
results = run_benchmark(challenge_types=['xgboost'], sample=1, iterations=5)
results

Unnamed: 0,BTB.GPEiTuner,BTB.GPTuner,BTB.UniformTuner,HyperOpt.rand,HyperOpt.tpe
XGBoostChallenge('PhishingWebsites_1.csv'),0.966134,0.967414,0.967402,0.967123,0.969341


#### Detailed output

If we want to have a detailed output about the benchmarking, wich contains the elapsed time of each benchmarking
process, we can set the argument `detailed_output` to `True`:

In [6]:
results = run_benchmark(sample=3, iterations=1, detailed_output=True)
results

Unnamed: 0,challenge,tuner,score,iterations,elapsed,hostname
0,XGBoostChallenge('steel-plates-fault_1.csv'),BTB.GPTuner,1.0,1,00:00:14.852736,lgn
1,XGBoostChallenge('wholesale-customers_1.csv'),BTB.GPTuner,1.0,1,00:00:02.922315,lgn
2,XGBoostChallenge('ar6_1.csv'),BTB.GPTuner,0.459875,1,00:00:01.967125,lgn
3,XGBoostChallenge('steel-plates-fault_1.csv'),BTB.GPEiTuner,1.0,1,00:00:13.161275,lgn
4,XGBoostChallenge('wholesale-customers_1.csv'),BTB.GPEiTuner,1.0,1,00:00:02.741434,lgn
5,XGBoostChallenge('ar6_1.csv'),BTB.GPEiTuner,0.459875,1,00:00:01.977264,lgn
6,XGBoostChallenge('steel-plates-fault_1.csv'),BTB.UniformTuner,1.0,1,00:00:12.146413,lgn
7,XGBoostChallenge('wholesale-customers_1.csv'),BTB.UniformTuner,1.0,1,00:00:03.089263,lgn
8,XGBoostChallenge('ar6_1.csv'),BTB.UniformTuner,0.459875,1,00:00:01.600893,lgn
9,XGBoostChallenge('steel-plates-fault_1.csv'),HyperOpt.tpe,1.0,1,00:00:13.243347,lgn


#### Storing the results

If we want to store the obtained results directly in to a file, we can pass the path to where we
would like to save our results, by adding the argument `output_path`.

For example, if we want to store it as `path/to/my_results.csv` we can use:

In [7]:
run_benchmark(sample=1, iterations=1, output_path='my_results.csv')