Skip to content

automl/hydra-smac-sweeper

Repository files navigation

Hydra-SMAC-Sweeper

This plugin enables Hydra applications to use SMAC for hyperparameter optimization.

SMAC (Sequential Model-Based Algorithm Configuration) allows to optimize arbitrary algorithms. Its core consists of Bayesian Optimization in combination with an aggressive racing mechanism to efficiently decide which of two configurations performs better. SMAC is a minimizer.

The Hydra SMAC sweeper can parallelize the evaluations of the hyperparameters on your local machine or on a slurm cluster. The sweeper supports every SMAC facade.

Installation

For the Hydra-SMAC-Sweeper please clone the repository first:

git clone git@github.com:automl/hydra-smac-sweeper.git
cd hydra-smac-sweeper

In your virtual environment, install via pip:

pip install -e . --config-settings editable_mode=compat

⚠ The compat mode can be necessary such that the package is correctly discovered.

Please find standard approaches for configuring hydra plugins here.

How the Hydra-SMAC-Sweeper works and Setting Up the Cluster

If you want to optimize your hydra application with the Hydra-SMAC-Sweeper, hydra and SMAC starts locally on your machine. Then, depending on your dask client setup, it will either run locally, possibly using multi-processing, or on a cluster. SMAC will create jobs/processes for the specified number of workers and will keep them open for the specified time frames. Then dask can schedule smaller jobs on the created workers. This is especially useful if we have a lot of cheap function evaluations which would otherwise affect job priorities on the cluster.

Run on Cluster (Slurm Example)

In order to run SMAC's function evaluations on the cluster, we need to setup the dask client and dask cluster.

For the setup, we need to add the dask client configuration to the smac_kwargs like so:

hydra:
  sweeper:
    smac_kwargs:
      dask_client:
        _target_: dask.distributed.Client
        address: ${create_cluster:${cluster},${hydra.sweeper.scenario.n_workers}}

⚠ Sometimes the cluster does not allow communication between dask workers. You can also use a local dask cluster on your (slurm) cluster in a job with enough cores.

The cluster is automatically created from the config node cluster and the number of workers defined in the scenario. This is an example configuration for the cluster itself, found in examples/configs/hpc.yaml.

# @package _global_
cluster:
  _target_: dask_jobqueue.SLURMCluster
  queue: cpu_short
  #  account: myaccount
  cores: 1
  memory: 1 GB
  walltime: 00:30:00
  processes: 1
  log_directory: tmp/smac_dask_slurm

You can specify any kwargs available in dask_jobqueue.SLURMCluster.

Run Local

You can also run it locally by specifying the dask client to be null, e.g.

python examples/multifidelity_mlp.py hydra.sweeper.smac_kwargs.dask_client=null -m


python examples/blackbox_branin.py -m

Or in the config file:

hydra:
  sweeper:
    smac_kwargs:
      dask_client: null

Usage

In your yaml-configuration file, set hydra/sweeper to SMAC:

defaults:
  - override hydra/sweeper: SMAC

You can also add hydra/sweeper=SMAC to your command line.

Hyperparameter Search Space

SMAC offers to optimize several types of hyperparameters: uniform floats, integers, categoricals and can even manage conditions and forbiddens. The definition of the hyperparameters is based on ConfigSpace. The syntax of the hyperparameters is according to ConfigSpace's json serialization. Please see their user guide for more information on how to configure hyperparameters.

You can provide the search space either as a path to a json file stemming from ConfigSpace's serialization or you can directly specify your search space in your yaml configuration files.

Your yaml-configuration file must adhere to following syntax:

hydra:
  sweeper:
    ...
    search_space:
      hyperparameters:  # required
        hyperparameter_name_0:
          ...
        hyperparameter_name_1:
          ...
        ...
      conditions:  # optional
        - ...
        - ...
      forbiddens:  # optional
        - ...
        - ...
      

The fields conditions and forbiddens are optional. Please see this example for the exemplary definition of conditions.

Defining a uniform integer parameter is easy:

n_neurons:
  type: uniform_int  # or have a float parameter by specifying 'uniform_float'
  lower: 8
  upper: 1024
  log: true  # optimize the hyperparameter in log space
  default_value: ${n_neurons}  # you can set your default value to the one normally used in your config

Same goes for categorical parameters:

activation:
  type: categorical
  choices: [logistic, tanh, relu]
  default_value: ${activation}

See below for two exemplary search spaces.

Examples

You can find examples in this directory.

Branin (Synthetic Function)

The first example is optimizing (minimizing) a synthetic function (examples/blackbox_branin.py with the yaml-config examples/configs/branin.yaml). Branin has two (hyper-)parameters, x0 and x1 which we pass via the hydra config. For the hyperparameter optimization (or sweep) we can easily define the search space for the uniform hyperparameters in the config file:

hydra:
  sweeper:
    ...
    search_space:
      hyperparameters:
        x0:
          type: uniform_float
          lower: -5
          upper: 10
          log: false
        x1:
          type: uniform_float
          lower: 0
          upper: 15
          log: false
          default_value: 2

To optimize Branin's hyperparameters, call

python examples/blackbox_branin.py --multirun

Optimizing an MLP

Example for optimizing a Multi-Layer Perceptron (MLP) using multiple budgets (examples/multifidelity_mlp.py with the yaml-config examples/configs/mlp.yaml). The search space is hierarchical: Some options are only available if other categorical choices are selected. The budget variable is set by the intensifier for each run and can be specified in the sweeper config.

hydra:
  sweeper:
    budget_variable: max_epochs
    ...
    search_space:
      hyperparameters:
        ...
      conditions:
        - child: batch_size  # only adapt the batch size if we use sgd or adam as a solver
          parent: solver
          type: IN
          values: [sgd, adam]
        - child: learning_rate  # only adapt the learning_rate if use sgd as a solver
          parent: solver
          type: EQ
          value: sgd
        - child: learning_rate_init
          parent: solver
          type: IN
          values: [sgd, adam]

Necessary Configuration Keys

In order to let SMAC successfully interact with your hydra main function, you need to use following configuration keys in your main DictConfig:

  • seed: SMAC will set DictConfig.seed and pass it to your main function.

Multi-Fidelity Optimization

In order to use multi-fidelity, you need to set cfg.budget_variable with the name of your config variable controlling the budget. You can find an example in examples/multifidelity_mlp.py and examples/configs/mlp.yaml to see how we set the budget variable. Here we have budget_variable=epochs indicating that epochs is the fidelity.

Using Instances

In order to use instances, you need to use cfg.instance to set your instance in your main function.

About

Sweeper plugin based on SMAC for Hydra.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published