# BOAH Tool Suite - Example 1
# Visualizing hyperband-optimization, using the fmin-function

In this notebook, we present an example of how to connect our tools <a href="https://github.com/automl/ConfigSpace" target="_blank">ConfigSpace</a>, <a href="https://github.com/automl/CAVE" target="_blank">CAVE</a> and <a href="https://github.com/automl/HpBandSter" target="_blank">HpBandSter</a> to efficiently optimize a neural network and subsequently analyze and visualize the optimization process.

## Running BOHB using fmin-interface

### 1.1) Define a configuration space

Every problem needs a description of the search space to be complete. In HpBandSter, the <a href="https://github.com/automl/ConfigSpace" target="_blank">ConfigurationSpace</a>-object defines all hyperparameters, their ranges and dependencies between them.

In our example here, the search space consists of the hyperparameters:

|         Name        |     Type    |      Values      |     Condition    |
|:-------------------:|:-----------:|:----------------:|:----------------:|
| activation-function | categorical | {'relu', 'tanh'} |                  |
|    learning-rate    |    float    |   [1e-6 - 1e-2]  |                  |
|        solver       | categorical |  {'sgd', 'adam'} |                  |
|        beta_1       |    float    |      [0, 1]      | solver == 'adam' |
|        beta_2       |    float    |      [0, 1]      | solver == 'adam' |

### 1.2) Function to optimize:

To perform a BOHB run on the local machine, we use the ``fmin`` interface provided by the BOAH toolsuite.
It starts a local BOHB optimization run for the function to optimize over the given ``ConfigSpace``, which is described below.

The ``fmin`` interface helps you to run BOHB fast by initializing the core elements of BOHB automatically. If you need more control, for example, if you like to run it on your cluster, we would like to refer you to the <a href="https://automl.github.io/HpBandSter/" target="_blank">more detailed documentation of BOHB</a>. 

To use the ``fmin`` interface, your function must satisfy the following requirements:

- take as argument a parameter ``budget``
- take as arguments all hyperparameters defined in the configuration space
- return a single python scalar: the objective to minimize of your model

More information about using ``fmin`` is provided in the 
<a href="https://github.com/automl/BOAH/blob/master/boah/scripts/FMin.py" target="_blank">fmin documentation</a>.

Both, the configuration space and the function to optimize are moved to example/mlp_on_digits, to increase the readability of this notebook.

**Note**:
For later analysis, it is crucial to save the configuration space to file. For example with ConfigSpace's <a href="https://automl.github.io/ConfigSpace/master/API-Doc.html#module-ConfigSpace.read_and_write.json" target="_blank">json-serializer</a>. ``fmin`` will do this job automatically for you. But keep it in mind, if you are using the standard BOHB interface.

In [None]:
# Make relative imports possible
import os
import sys
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)

# "optimize_mlp_on_digits" is the target algorithm to optimize
# "get_configspace" is merely a function to provide a ConfigurationSpace-object
from examples.mlp_on_digits.helper_functions import optimize_mlp_on_digits, get_configspace, load_digits

### 1.3) Starting the optimization run

**NOTE:** While running the optimization process via ``fmin``, the configuration space, as well as the results, obtained by BOHB, are stored to file. This is necessary for the analysis of the run with CAVE.

We read in the configuration space and make a directory to store the results.

In [None]:
import os

# Load the configuration space
config_space = get_configspace()

# Create a output directory
out_dir = 'example_mlp_on_digits'
os.makedirs(out_dir, exist_ok=True)

# Load MNIST data
train, valid = load_digits()

We deactivate logging to ensure readability

In [None]:
import logging
logging.basicConfig(level=logging.ERROR)

**Let's start the optimization run**:

In [None]:
import warnings
from scripts.FMin import fmin

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    
    inc_value, inc_cfg, result = fmin(
        optimize_mlp_on_digits, config_space, func_args=(train, valid), eta=2,
        min_budget=5,      # minimum budget
        max_budget=200,
        num_iterations=10,
        num_workers=1, output_dir=out_dir)

Besides the best found configuration and its loss value, ``fmin`` returns a Result-object generated by HpBandSter.
The <a href="https://automl.github.io/HpBandSter/build/html/core/result.html" target="_blank">Result-object</a> offers access to basic statistics, as well as the trajectory.

In [None]:
id2config = result.get_id2config_mapping()
incumbent_trajectory = result.get_incumbent_trajectory()

print('A total of {} unique configurations where sampled.\n'
      'A total of {} runs where executed.\n'
      'Best configuration found: {}'.format(
      len(id2config.keys()), 
      len(result.get_all_runs()), 
      id2config[result.get_incumbent_id()]['config']))

The ``incumbent trajectory`` is a dictionary with all the configuration IDs,
the times the runs finished, their respective budgets,
and corresponding losses. It's used to do meaningful plots of the optimization process.

## Using the results in CAVE

### 2.1) Creating a HTML-report with CAVE

Creating the report with CAVE is very straight-forward. Simply provide the output directory of the BOHB-analysis in CAVE's `--folders` argument and specify `--file_format` as `BOHB`. You can do this by command line ('!' simply executes the command as if it was executed on the command line):


In [None]:
%%capture
! cave --folders example_mlp_on_digits --file_format BOHB --output CAVE_bash_mlp_on_digits --verbose_level OFF

After CAVE finished the report, you can have a look at it with your favorite browser.

In [None]:
! firefox ./CAVE_bash_mlp_on_digits/report.html

### 2.2) Using CAVE from within Python

Of course, you can use CAVE on a module-level. Import and instantiate it (very similar to the command line). By default, CAVE even outputs all analysis results in a jupyter-cell-compatible way. Of course, the HTML-report is built meanwhile, so you don't have to run time-consuming analyzing-methods repeatedly.

In [None]:
from cave.cavefacade import CAVE

cave = CAVE(folders=["example_mlp_on_digits"],
            output_dir="CAVE_python_mlp_on_digits",
            ta_exec_dir=["."],
            file_format='BOHB',
            verbose_level='OFF'
           )

The most interesting plot for BOHB might be a visualization of the learning curves:

In [None]:
cave.bohb_learning_curves()

We can access the individual budgets via the 'run'-keyword-argument of each analysis method.

In [None]:
cave.overview_table(run='budget_12.5')

For each budget, we can list the cost over incumbents:

In [None]:
cave.bohb_incumbents_per_budget()

In [None]:
cave.performance_table(run='budget_25')

For parameter-importance analysis, CAVE uses <a href="https://github.com/automl/ParameterImportance" target="_blank">PIMP</a>, a package that provides multiple approaches to parameter-importance analysis. We can easily invoke them via CAVE, of course. To estimate the importance, random forests are used to predict performances of configurations that were not executed. This is difficult for big budgets with few configurations.

In [None]:
cave.cave_fanova(run='budget_12.5')

In [None]:
cave.local_parameter_importance(run='budget_12.5')

For each budget, we can compare the different parameter importance methods that have already been run:

In [None]:
cave.pimp_comparison_table(run='budget_12.5')

To analyze BOHB's behavior, we can check out the configurator footprint, cost-over-time, and parallel coordinated parameters:

In [None]:
cave.configurator_footprint(run='budget_12.5', use_timeslider=True, num_quantiles=5)

In [None]:
cave.cost_over_time(run="budget_25")

In [None]:
cave.parallel_coordinates(run='budget_12.5')