# Running the ICML 2018 Experiments

In this series of notebooks, we will replicate and analyze the ICML 2018 experiments that where used for benchmarking in <a href="http://proceedings.mlr.press/v80/falkner18a.html" target="_blank">BOHB (Falkner et al. 2018)</a>.
In addition to <a href="https://github.com/automl/HpBandSter" target="_blank">HpBandSter</a>, we will use <a href="https://github.com/automl/CAVE" target="_blank">CAVE</a> to analyze and visualize the optimization process.

## About the frameworks

### SVM Surrogate

Optimizing a Support Vector Machine (SVM) on the MNIST dataset. From <a href="http://proceedings.mlr.press/v80/falkner18a.html" target="_blank">BOHB (Falkner et al. 2018)</a>: "This surrogate imitates the hyperparameter optimization
of a support vector machine with a RBF kernel with two
hyperparameters: the regularization parameter C and the
kernel parameter γ. The budget is given by the number of
training datapoints, where the minimum budget is 1 / 512 of
the training data and the maximum budget is the full training
data. For further details, we refer to <a href="https://arxiv.org/abs/1605.07079" target="_blank">Klein et al. (2017a)</a>.".

### Installation requirements

To run the experiments, please install the <a href="https://github.com/automl/BOHBsCAVE/blob/master/examples/icml_2018_experiments/requirements.txt" target="_blank">requirements</a>, e.g. `pip install -r examples/icml_2018_experiments/requirements.txt`.

## Run the experiment


In [3]:
# We deactivate logging to ensure readability
import logging
logging.basicConfig(level=logging.ERROR)
# Also, we suppress warnings.
# If there are problems for you executing this notebook, you might want to comment this out.
import warnings
warnings.filterwarnings("ignore")

### 1.1) References
*Worker*: We need a <a href="https://automl.github.io/HpBandSter/build/html/core/worker.html" target="_blank">Worker</a> to define the kind of computation we want to optimize. The worker used for the experiments is located in `workers/svm_surrogate.py`



*ConfigSpace*: Every problem needs a description of the search space to be complete. In HpBandSter, a <a href="https://github.com/automl/ConfigSpace/tree/master/ConfigSpace" target="_blank">ConfigurationSpace</a>-object defines all hyperparameters, their ranges and dependencies between them.

In our example here, the search space consists of the hyperparameters:


|  Name     |  Type   |      Values  |
|:---------:|:-------:|:------------:|
| x0 | real | [-10.0, 10.0] |
| x1 | real | [-10.0, 10.0] |

### 1.3) Setting up the experiment and running the optimizer(s)

Please note that the execution of the experiment with all datasets might take up to a few days, depending on your hardware. You can also skip this step and process with the precomputed results saved in `opt_results/svm_surrogate`.

In [None]:
import os
from itertools import product
import numpy as np

import hpbandster.core.nameserver as hpns
import hpbandster.core.result as hpres
from hpbandster.optimizers import BOHB, RandomSearch, HyperBand

from workers.svm_surrogate import SVMSurrogateWorker

# Run the experiment
opt_methods = ["smac", "bohb", "randomsearch", "hyperband"]
num_iterations = 32
min_budget = 1/512
max_budget = 1

eta = 3

for opt_method in opt_methods:

    print(opt_method)
    
    output_dir = "opt_results/svm_surrogate/{}".format(opt_method)
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    run_id = '0'  # Every run has to have a unique (at runtime) id.
    
    # create a worker
    worker = SVMSurrogateWorker(surrogate_path=None, measure_test_loss=True, run_id=run_id)
    configspace = worker.configspace

    if opt_method in ['randomsearch', 'bohb', 'hyperband']:
        # setup a nameserver
        NS = hpns.NameServer(run_id=run_id, host='localhost', port=0, working_directory=output_dir)
        ns_host, ns_port = NS.start()

        worker.load_nameserver_credentials(output_dir)
        worker.run(background=True)

        # instantiate and run optimizer
        opt = RandomSearch if opt_method == 'randomsearch' else BOHB if opt_method == 'bohb' else HyperBand

        result_logger = hpres.json_result_logger(directory=output_dir, overwrite=True)

        opt = opt(configspace, eta=3,
                  working_directory=output_dir,
                  run_id=run_id,
                  min_budget=min_budget, max_budget=max_budget,
                  host=ns_host,
                  nameserver=ns_host,
                  nameserver_port = ns_port,
                  ping_interval=3600,
                  result_logger=result_logger)

        result = opt.run(n_iterations=num_iterations)
            
        # **NOTE:** Unfortunately, the configuration space is *not yet saved automatically* to file
        # but this step is mandatory for the analysis with CAVE.  
        # We recommend to save the configuration space every time you use BOHB.
        # We do this by using the ConfigSpace-to-json-writer.

        from ConfigSpace.read_and_write import pcs_new
        with open(os.path.join(output_dir, 'configspace.pcs'), 'w') as fh:
            fh.write(pcs_new.write(opt.config_generator.configspace))
    
    else:
        # the number of iterations for the blackbox optimizers must be increased so they have comparable total budgets
        bb_iterations = int(num_iterations * (1+(np.log(max_budget) - np.log(min_budget))/np.log(eta)))
        if opt_method == 'smac':
            result = worker.run_smac(bb_iterations, deterministic=True, working_directory=output_dir)

## Using the results in CAVE

### Instantiate CAVE

Analyzing the optimization results with CAVE is very straight-forward. If you want to use CAVE interactively in a notebook, set `show_jupyter=True`. Specify which optimization you want to analyze via the `folders` argument and specify `file_format==SMAC3` or `file_format==BOHB`, depending on which optimizer was used for the results. To analyze how BOHB optimized on the *bostenhousing* dataset, run:

In [1]:
from cave.cavefacade import CAVE

cave = CAVE(folders=["opt_results/svm_surrogate/bohb"],
            output_dir="CAVE_reports/svm_notebook",          # output for debug/images/etc
            ta_exec_dir=["."],                               # Only important for SMAC-results
            file_format='BOHB',                              # BOHB or SMAC3
            verbose_level='OFF',
            show_jupyter=True,
           )

To generate the HTML-report you can use the `analyze`-method. The report is located in `output_dir/report.html`, so in this case in `CAVE_reports/svm_notebook/report.html`.

In [None]:
cave.analyze()
! firefox CAVE_reports/svm_notebook/report.html

CAVE is fully compatible with Jupyter notebooks. You can invoke the individual analysis methods as follows.

The most interesting plot for BOHB might be a visualization of the learning curves:

In [13]:
cave.bohb_learning_curves();

In [14]:
cave.overview_table();

0,1
# aggregated parallel BOHB runs,1
# parameters,2
Deterministic target algorithm,True
Optimized run objective,quality

Unnamed: 0,budget 0.0,budget 0.1,budget 0.3,budget 1
Total time spent evaluating configurations,11.35 sec,5.60 sec,2.85 sec,1.60 sec
Average time per configuration (mean / std),0.02 sec (± 0.01),0.02 sec (± 0.01),0.02 sec (± 0.01),0.02 sec (± 0.01)
# evaluated configurations,459,243,126,72
# changed parameters (default to incumbent),2,2,2,2
Configuration origins,"Acquisition Function : 405, Random : 54","Acquisition Function : 209, Random : 34","Acquisition Function : 108, Random : 18","Acquisition Function : 62, Random : 10"


For each budget, we can list the cost over incumbents:

In [15]:
cave.bohb_incumbents_per_budget();

Unnamed: 0,budget 0.0,budget 0.0.1,budget 0.0.2,budget 0.1,budget 0.3,budget 1
x0,9.47496,9.47496,9.47496,2.59728,9.47496,7.73635
x1,-3.45319,-3.45319,-3.45319,-3.44764,-3.45319,-3.39846
Cost,0.054,0.054,0.054,0.031,0.025,0.015


For parameter-importance analysis, CAVE uses <a href="https://github.com/automl/ParameterImportance" target="_blank">PIMP</a> , a package that provides multiple approaches to parameter-importance analysis. We can easily invoke them via CAVE, of course. To estimate the importance, random forests are used to predict performances of configurations that were not executed. This is difficult for big budgets with few configurations.

We can access the individual budgets via the 'run'-keyword-argument of each analysis-method.

In [None]:
%%capture
cave.cave_fanova(run='budget_0.0');

In [None]:
cave.local_parameter_importance(run='budget_0.0');

For each budget, we can compare the different parameter-importance-methods that have already been run:

In [25]:
cave.pimp_comparison_table(run='budget_0.0');

To analyze BOHB's behaviour, we can check out the configurator footprint, cost-over-time and parallel coordinated parameters:

In [26]:
cave.configurator_footprint(use_timeslider=True, num_quantiles=5);

In [27]:
cave.cost_over_time();