# Running the ICML 2018 Experiments

In this series of notebooks, we will replicate and analyze the ICML 2018 experiments that where used for benchmarking in <a href="http://proceedings.mlr.press/v80/falkner18a.html" target="_blank">BOHB (Falkner et al. 2018)</a>.
In addition to <a href="https://github.com/automl/HpBandSter" target="_blank">HpBandSter</a>, we will use <a href="https://github.com/automl/CAVE" target="_blank">CAVE</a> to analyze and visualize the optimization process.

## About the frameworks

### Paramnet Surrogates

This experiment optimizes (TODO: add info)

### Installation requirements

Make sure you installed both, `requirements.txt` and `examples/icml_2018_experiments` (e.g. using `pip install -r PATH_FILE`).

## Run the experiment


In [1]:
# We deactivate logging to ensure readability
import logging
logging.basicConfig(level=logging.ERROR)
# Also, we suppress warnings.
# If there are problems for you executing this notebook, you might want to comment this out.
import warnings
warnings.filterwarnings("ignore")

### 1.1) Preparing BOHB
*Worker*: We need a <a href="https://automl.github.io/HpBandSter/build/html/core/worker.html" target="_blank">Worker</a> to define the kind of computation we want to optimize. The worker used for the experiments is located in `workers/paramnet_surrogates.py`

*ConfigSpace*: Every problem needs a description of the search space to be complete. In HpBandSter, a ConfigurationSpace-object defines all hyperparameters, their ranges and dependencies between them.

In our example here, the search space consists of the hyperparameters:


|  Name     |  Type   |      Values  |
|:---------:|:-------:|:------------:|
|           |         |              |

In [2]:
from workers.paramnet_surrogates import ParamNetSurrogateWorker

### 1.3) Setting up the HpBandSter Nameserver and starting the optimization run

In [3]:
import os

dataset = 'adult'  # choice: ['adult', 'higgs', 'letter', 'mnist', 'optdigits', 'poker']
opt_method="bohb"
output_dir = "results/paramnet_surrogates/"  # This is the destination for the BOHB-results
num_iterations = 1  # 16

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

#### Step 1 - initiating communication, creating a nameserver:
The <a href="https://automl.github.io/HpBandSter/build/html/core/nameserver.html" target="_blank">nameserver</a> is a small service that keeps track of all running processes and their IP addresses and ports. It could be a 'static' server with a permanent address, but here it will be started for the local machine with a random port. 

In [4]:
import hpbandster.core.nameserver as hpns
import hpbandster.core.result as hpres
from hpbandster.optimizers import BOHB, RandomSearch, HyperBand

run_id = '0'  # Every run has to have a unique (at runtime) id.

# setup a nameserver
NS = hpns.NameServer(run_id=run_id, host='localhost', port=0, working_directory=output_dir)
ns_host, ns_port = NS.start()

#### Step 2 - creating the worker: 
The worker implements the actual problem that is optimized. Its 'compute'-method will be called later by the BOHB-optimizer repeatedly with the sampled configurations and return the computed loss (and additional information).

In [21]:
%%capture
worker = ParamNetSurrogateWorker(dataset=dataset, surrogate_path=None, measure_test_loss=False, run_id=run_id)
configspace = worker.configspace
min_budget, max_budget = worker.budgets[dataset]

worker.load_nameserver_credentials(output_dir)
worker.run(background=True)

#### Step 3:  
Create an optimizer object, which samples configurations from the ConfigurationSpace, using succesive halving to assign budgets for execution. Further information on what qualifies as a budget <a href="https://automl.github.io/HpBandSter/build/html/quickstart.html#meaningful-budgets-and-number-of-iterations" target="_blank">can be found in the documentation.</a>

**NOTE:** BOHB does not build a new model at the beginning of every SuccessiveHalving run. Instead it collects all evaluations on all budgets and uses the largest budget with enough evaluations as a base for future evaluations.  

In [22]:
%%capture
eta = 3
opt = None

if opt_method == 'randomsearch':
    opt = RandomSearch      
elif opt_method == 'bohb':
    opt = BOHB              
elif opt_method == 'hyperband':
    opt = HyperBand
else:
    raise ValueError("Unknown method %s" % opt_method)

result_logger = hpres.json_result_logger(directory=output_dir, overwrite=True)

opt = opt(configspace, eta=3,
          working_directory=output_dir,
          run_id=run_id,
          min_budget=min_budget, max_budget=max_budget,
          host=ns_host,
          nameserver=ns_host,
          nameserver_port = ns_port,
          ping_interval=3600,
          result_logger=result_logger)

result = opt.run(n_iterations=num_iterations)

**NOTE:** Unfortunately, the configuration space is *not saved automatically* to file but this step is mandatory for the analysis with CAVE.  
We recommend to save the configuration space every time you use BOHB.
We do this by using the ConfigSpace-to-json-writer.

In [8]:
from ConfigSpace.read_and_write import pcs_new
with open(os.path.join(output_dir, 'configspace.pcs'), 'w') as fh:
    fh.write(pcs_new.write(opt.config_generator.configspace))

The <a href="https://automl.github.io/HpBandSter/build/html/core/result.html" target="_blank">Result-object</a> offers access to basic statistics, as well as the best configuration (incumbent) and the trajectory.

## Using the results in CAVE

### 2.1) Creating a HTML-report with CAVE

Creating the report with CAVE is very straight-forward. Simply provide the output-directory of the BOHB-analysis in CAVE's `--folders` argument and specify `--file_format` as `BOHB`. You can do this by commandline ('!' simply executes the command as if it was executed on the command line):


In [18]:
! cave --folders results/bnn --file_format BOHB --output CAVE_reports/paramnet_bash --verbose_level OFF

usage: cave --folders FOLDERS [--verbose_level {INFO, ..., OFF}] [--jupyter {on, off}] [--validation {validation, epm}]
            [--output OUTPUT] [--seed SEED] [--file_format {SMAC2, ..., BOHB}] [--validation_format {SMAC2, ..., NONE}]
            [--ta_exec_dir TA_EXEC_DIR] [--pimp_max_samples PIMP_MAX_SAMPLES] [--pimp_no_fanova_pairs]
            [--pimp_sort_table_by {average, ..., lpi}] [--cfp_time_slider {on, off}]
            [--cfp_number_quantiles CFP_NUMBER_QUANTILES] [--cfp_max_plot CFP_MAX_PLOT] [--pc_sort_by {all, ..., none}]
            [--parameter_importance {all, ..., none}] [--feature_analysis {all, ..., none}] [--no_tabular_analysis]
            [--no_ecdf] [--no_scatter_plots] [--no_cost_over_time] [--no_configurator_footprint]
            [--no_parallel_coordinates] [--no_algorithm_footprints] [-v] [-h]

CAVE: Configuration Assessment Vizualisation and Evaluation

Required Options:
  --folders FOLDERS                     path(s) to Configurator outpu

After CAVE finished the report, you can have a look at it with your favorite browser.

In [10]:
! firefox ./CAVE_reports/paramnet_bash/report.html

### 2.2) Using CAVE from within Python

Of course you can use CAVE on a module-level. Import and instantiate it (very similarily to the commandline). By default, CAVE even outputs all analysis results in a jupyter-cell-compatible way. Of course, the HTML-report is built meanwhile, so you don't have to run time-consuming analyzing-methods repeatedly.

In [16]:
from cave.cavefacade import CAVE

cave = CAVE(folders=[output_dir],
            output_dir="CAVE_reports/paramnet_notebook",
            ta_exec_dir=["."],
            file_format='BOHB',
            verbose_level='OFF',
            show_jupyter=True
           )

The most interesting plot for BOHB might be a visualization of the learning curves:

In [17]:
cave.bohb_learning_curves()

<cave.analyzer.bohb_learning_curves.BohbLearningCurves at 0x7fafb40c1cf8>

We can access the individual budgets via the 'run'-keyword-argument of each analysis-method.

In [13]:
cave.overview_table()

0,1
# aggregated parallel BOHB runs,1
# parameters,6
Deterministic target algorithm,True
Optimized run objective,quality

Unnamed: 0,budget 370.3703703703703,budget 1111.111111111111,budget 3333.333333333333,budget 10000
Total time spent evaluating configurations,0.26 sec,0.06 sec,0.02 sec,0.01 sec
Average time per configuration (mean / std),0.01 sec (± 0.00),0.01 sec (± 0.00),0.01 sec (± 0.00),0.01 sec (± 0.00)
# evaluated configurations,27,9,3,1
# changed parameters (default to incumbent),6,6,6,6
Configuration origins,"Random : 20, Acquisition Function : 7","Random : 6, Acquisition Function : 3","Random : 1, Acquisition Function : 2",Acquisition Function : 1


<cave.analyzer.overview_table.OverviewTable at 0x7fafb40d85f8>

For each budget, we can list the cost over incumbents:

In [None]:
cave.bohb_incumbents_per_budget()

For parameter-importance analysis, CAVE uses <a href="https://github.com/automl/ParameterImportance" target="_blank">PIMP</a> , a package that provides multiple approaches to parameter-importance analysis. We can easily invoke them via CAVE, of course. To estimate the importance, random forests are used to predict performances of configurations that were not executed. This is difficult for big budgets with few configurations.

In [14]:
cave.cave_fanova('budget_1000.0')

ValueError: You are using a configurator that uses budgets. Please specify one of the following runs as a 'run=' keyword-argument: ['budget_370.3703703703703', 'budget_1111.111111111111', 'budget_3333.333333333333', 'budget_10000']

In [None]:
cave.local_parameter_importance(run='budget_1000.0')

For each budget, we can compare the different parameter-importance-methods that have already been run:

In [None]:
cave.pimp_comparison_table(run='budget_1000.0')

To analyze BOHB's behaviour, we can check out the configurator footprint, cost-over-time and parallel coordinated parameters:

In [None]:
cave.configurator_footprint(use_timeslider=True, num_quantiles=5)

In [None]:
cave.cost_over_time()

In [None]:
cave.parallel_coordinates(run='budget_1000.0')