This plugin enables Hydra applications to use SMAC for hyperparameter optimization.
SMAC (Sequential Model-Based Algorithm Configuration) allows to optimize arbitrary algorithms. Its core consists of Bayesian Optimization in combination with an aggressive racing mechanism to efficiently decide which of two configurations performs better. SMAC is a minimizer.
The Hydra SMAC sweeper can parallelize the evaluations of the hyperparameters on your local machine or on a slurm cluster. The sweeper supports every SMAC facade.
For the Hydra-SMAC-Sweeper please clone the repository first:
git clone git@github.com:automl/hydra-smac-sweeper.git
cd hydra-smac-sweeper
In your virtual environment, install via pip:
pip install -e . --config-settings editable_mode=compat
⚠ The compat mode can be necessary such that the package is correctly discovered.
Please find standard approaches for configuring hydra plugins here.
If you want to optimize your hydra application with the Hydra-SMAC-Sweeper, hydra and SMAC starts locally on your machine. Then, depending on your dask client setup, it will either run locally, possibly using multi-processing, or on a cluster. SMAC will create jobs/processes for the specified number of workers and will keep them open for the specified time frames. Then dask can schedule smaller jobs on the created workers. This is especially useful if we have a lot of cheap function evaluations which would otherwise affect job priorities on the cluster.
In order to run SMAC's function evaluations on the cluster, we need to setup the dask client and dask cluster.
For the setup, we need to add the dask client configuration to the smac_kwargs
like so:
hydra:
sweeper:
smac_kwargs:
dask_client:
_target_: dask.distributed.Client
address: ${create_cluster:${cluster},${hydra.sweeper.scenario.n_workers}}
⚠ Sometimes the cluster does not allow communication between dask workers. You can also use a local dask cluster on your (slurm) cluster in a job with enough cores.
The cluster is automatically created from the config node cluster
and the number of workers defined in the scenario.
This is an example configuration for the cluster itself, found in examples/configs/hpc.yaml.
# @package _global_
cluster:
_target_: dask_jobqueue.SLURMCluster
queue: cpu_short
# account: myaccount
cores: 1
memory: 1 GB
walltime: 00:30:00
processes: 1
log_directory: tmp/smac_dask_slurm
You can specify any kwargs available in dask_jobqueue.SLURMCluster
.
You can also run it locally by specifying the dask client to be null
, e.g.
python examples/multifidelity_mlp.py hydra.sweeper.smac_kwargs.dask_client=null -m
python examples/blackbox_branin.py -m
Or in the config file:
hydra:
sweeper:
smac_kwargs:
dask_client: null
In your yaml-configuration file, set hydra/sweeper
to SMAC
:
defaults:
- override hydra/sweeper: SMAC
You can also add hydra/sweeper=SMAC
to your command line.
SMAC offers to optimize several types of hyperparameters: uniform floats, integers, categoricals and can even manage conditions and forbiddens. The definition of the hyperparameters is based on ConfigSpace. The syntax of the hyperparameters is according to ConfigSpace's json serialization. Please see their user guide for more information on how to configure hyperparameters.
You can provide the search space either as a path to a json file stemming from ConfigSpace's serialization or you can directly specify your search space in your yaml configuration files.
Your yaml-configuration file must adhere to following syntax:
hydra:
sweeper:
...
search_space:
hyperparameters: # required
hyperparameter_name_0:
...
hyperparameter_name_1:
...
...
conditions: # optional
- ...
- ...
forbiddens: # optional
- ...
- ...
The fields conditions
and forbiddens
are optional. Please see this
example
for the exemplary definition of conditions.
Defining a uniform integer parameter is easy:
n_neurons:
type: uniform_int # or have a float parameter by specifying 'uniform_float'
lower: 8
upper: 1024
log: true # optimize the hyperparameter in log space
default_value: ${n_neurons} # you can set your default value to the one normally used in your config
Same goes for categorical parameters:
activation:
type: categorical
choices: [logistic, tanh, relu]
default_value: ${activation}
See below for two exemplary search spaces.
You can find examples in this directory.
The first example is optimizing (minimizing) a synthetic function (examples/blackbox_branin.py
with
the yaml-config examples/configs/branin.yaml
).
Branin has two (hyper-)parameters, x0
and x1
which we pass via the hydra config.
For the hyperparameter optimization (or sweep) we can easily define the search
space for the uniform hyperparameters in the config file:
hydra:
sweeper:
...
search_space:
hyperparameters:
x0:
type: uniform_float
lower: -5
upper: 10
log: false
x1:
type: uniform_float
lower: 0
upper: 15
log: false
default_value: 2
To optimize Branin's hyperparameters, call
python examples/blackbox_branin.py --multirun
Example for optimizing a Multi-Layer Perceptron (MLP) using multiple budgets
(examples/multifidelity_mlp.py
with the yaml-config examples/configs/mlp.yaml
).
The search space is hierarchical: Some options are only available if other categorical
choices are selected.
The budget variable is set by the intensifier for each run and can be specified in the sweeper config.
hydra:
sweeper:
budget_variable: max_epochs
...
search_space:
hyperparameters:
...
conditions:
- child: batch_size # only adapt the batch size if we use sgd or adam as a solver
parent: solver
type: IN
values: [sgd, adam]
- child: learning_rate # only adapt the learning_rate if use sgd as a solver
parent: solver
type: EQ
value: sgd
- child: learning_rate_init
parent: solver
type: IN
values: [sgd, adam]
In order to let SMAC successfully interact with your hydra main function, you need to use following configuration keys in your main DictConfig:
- seed: SMAC will set DictConfig.seed and pass it to your main function.
In order to use multi-fidelity, you need to set cfg.budget_variable
with the name of your config variable controlling the budget.
You can find an example in examples/multifidelity_mlp.py
and examples/configs/mlp.yaml
to see how we set the budget variable.
Here we have budget_variable=epochs
indicating that epochs
is the fidelity.
In order to use instances, you need to use cfg.instance
to set your instance in your main function.