# Bayesian Optimization Frameworks
> An comparison of packages for Bayesian optimization in Python (and R).

- toc: true
- categories: [Bayesian optimization]

Bayesian Optimization (BO) is a method for optimizing expensive black-box functions by means of a probabilistic surrogate model and an acquisition function that specifies a trade-off between exploration (improving the surrogate) and exploitation (finding the minimizer of the surrogate and thus of the black-box function).

As **surrogate model** typically Gaussians processes (GP) or tree-based models (RF, GBT ...) are used.
These have their own parameters that need to be estimated via maximum likelihood (ML) or integrated out.
The ML point estimate is typically searched for using a gradient-based local optimizer together with a global optimization heuristic such as multiple restarts to address the non-convexiy of the problem.
For integrating out the model parameters either Markov-chain Monte Carlo (MCMC) or approximate variational inference (VI) methods are used, together with an assumption on the prior distribution.

**Search spaces** define the possible inputs.
When applying BO for tuning hyperparameters of machine learning models the inputs include continuous, discrete and categorical variables and may include conditionals (think of having to choose beta1 and beta2 when selecting Adam).
When applying BO to optimization of real-world processes there are typically some (in-)equality constraints that need to be satisfied.

**Acquistion functions** either have a analytic form, or need to be approximated via Monte Carlo methods.
We want to find the minimizer of the acquistion over the search space in order to evaluate it next.
This is typically done using a gradient-free or gradient-based local optimizer or via MC-sampling, depending on the nature of the acquistion function.
The search space including its constraints needs to be handled by the optimizer.
In multi-objective BO the black-box function has multiple outputs. 
Here the acquisition function needs to guide the search towards exploring the Pareto front. 
An comparison of acquisition functions is given here: [single-objective](https://davidwalz.github.io/blog/bayesian%20optimization/2020/06/20/bayesopt-acquisitions-single.html), [multi-objective](https://davidwalz.github.io/blog/bayesian%20optimization/2020/06/20/bayesopt-acquisitions-multi.html).

There are a number of general-purpose (not focussing only on hyperparameter tuning) BO frameworks available in Python and R. 
In this post the main frameworks are compared in terms of supported features and development & support activity.

## Feature comparison

All packages in this comparison support GP surrogates with analytic acquisitions EI/PI/CB over continuous search spaces.
Beyond that it's interesting to note that no package currently provides all options in terms of surrogates, acquistions and search space, so it really depends on the type of problem you want to apply BO on.  

| Name | Surrogates | Hyperparameter handling | Single-objective acquisitions | Multi-objective acquisitions | Search space |
| ------------------------------------------------------------------------- | ------------------------------------------------------------------------ | ----------------------------------------------------- | ---------------------------- | ----------------------------------------------- | --------------- |
| [mlrMBO](https://mlr-org.github.io/mlrMBO) | GP, RF ([mlr](https://mlr.mlr-org.com/)) | ML | EI, CB, AEI, EQI, AdaCB | DIB | continuous, integer, categorical + constraints |
| [pyGPGO](http://pygpgo.readthedocs.io) | GP (native), GBT, RF, ET ([scikit-learn](https://scikit-learn.org)) | ML, MCMC ([pymc3](https://docs.pymc.io/)) | EI, PI, CB | | continuous, integer |
| [acikit-optimize](https://scikit-optimize.github.io) | GP, RF, GBT ([scikit-learn](https://scikit-learn.org)) | ML | EI, PI, CB | - | continuous, discrete, categorical + constraints | 
| [GPyOpt](https://github.com/SheffieldML/GPyOpt) | GP ([GPy](http://sheffieldml.github.io/GPy)) | ML, MCMC | EI, PI, CB | - | continuous, discrete, categorical + constraints | 
| [GPflowOpt](https://github.com/GPflow/GPflowOpt) | GP ([GPflow](https://github.com/GPflow/GPflow)) | ML | EI, PI, CB, MES, PF | HVPI | continuous |
| [BoTorch](https://botorch.org/) | GP ([GPyTorch](https://github.com/cornellius-gp/gpytorch)), extensible | ML, MCMC (Pyro)| EI, PI, CB, qMES, qKG, extensible | custom scalarizations, HVEI | continous + linear constraints |
| [Emukit](https://github.com/amzn/emukit) | GP ([GPy](http://sheffieldml.github.io/GPy)), extensible | ML | EI, PI, CB, ES, MES, PF | - | continuous, integer, categorical + constraints |
| [DragonFly](https://github.com/dragonfly/dragonfly) | GP (native) | ML, posterior sampling | EI, CB, PI, TTEI, TS | scalarization with CB/TS | continuous, categorical |

## Activity comparison

The following table gives an impression on the popularity and development activity of the frameworks, based on github statistics. 
The top-3 in terms of stars, contributors, commits and issues are scikit-optimize, botorch (together with Ax which builds on botorch) and mlrMBO.

In [2]:
#hide_input
import util

df = util.compare_repos([
    'mlr-org/mlrMBO',
    'SheffieldML/GPyOpt',
    'scikit-optimize/scikit-optimize',
    'GPflow/GPflowOpt',
    'pytorch/botorch',
    'amzn/emukit',
    'dragonfly/dragonfly',
    'josejimenezluna/pyGPGO',
    'facebook/Ax'
])
df.sort_values(by='closed_issues', ascending=False)

Unnamed: 0_level_0,stars,forks,contributors,commits,open_issues,closed_issues,created,last_commit,license
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
scikit-optimize,1836,349,55,1500,150,769,2016-03-20,2020-05-18,BSD-3-Clause
botorch,1630,137,28,656,24,437,2018-07-30,2020-06-24,MIT
mlrMBO,162,40,14,1600,84,410,2013-10-23,2020-06-15,NOASSERTION
Ax,1179,113,46,608,21,318,2019-02-09,2020-06-29,MIT
emukit,218,64,20,282,31,278,2018-09-04,2020-06-03,Apache-2.0
GPyOpt,621,190,35,504,89,235,2014-08-13,2020-03-19,BSD-3-Clause
GPflowOpt,213,51,5,426,22,97,2017-04-28,2018-09-12,Apache-2.0
dragonfly,498,57,8,393,17,49,2018-04-20,2020-03-13,MIT
pyGPGO,182,47,2,292,7,18,2016-11-23,2019-06-15,MIT


## GPyOpt
* GPy [[code]](https://github.com/SheffieldML/GPy) [[doc]](https://gpy.readthedocs.io/en/latest/)
* GPyOpt [[code]](https://github.com/SheffieldML/GPyOpt) [[doc]](https://gpyopt.readthedocs.io/en/latest/)

GPyOpt is a BO package built on top of the hugely popular GPy for flexible GP modeling. Both are being developed by the university of Sheffield.
Together with mlrMBO, GPyOpt has been around the longest. However, development of GPyOpt has somewhat stalled.

## Scikit-Optimize
[[code]](https://github.com/scikit-optimize/scikit-optimize)
[[doc]](https://scikit-optimize.github.io/)

Scikit-Optimize is an actively developed and well polished BO package based on the GP and tree-based models in Scikit-Learn. Not supported are model parameter integration and multi-objective optimization. Compared to GPy, the GP modeling in Scikit-Learn is rather rudimentary. Mixed search spaces and constraints are supported, as well as external, delayed and batched evaluations. For Hyperparameter tuning in Scikit-Learn there is a drop-in replacement for Grid/RandomSearchCV.

## GPFlowOpt (with TF & GPFlow)
* GPFlow [[code]](https://github.com/GPflow/GPflow) [[doc]](https://gpflow.readthedocs.io/en/latest/) [[paper]](https://arxiv.org/abs/1711.03845)
* GPFlowOpt [[code]](https://github.com/GPflow/GPflowOpt) [[doc]](https://gpflowopt.readthedocs.io/en/latest/) [[paper]](http://jmlr.org/papers/v18/16-537.html)

GPFlowOpt is package built on top of GPFlow, which in turn uses TensorFlow for fast linear algebra computations with GPU-support and auto-differentiation. This makes it more extensible as different models and acquisition functions can be implemented without having to define gradients for the optimizer.
The top-level API is inspired GPy.

*Development on GPFlowOpt seems to have stopped since the end of 2018.*

## BoTorch (with PyTorch, GPyTorch & Pyro)
* BoTorch [[code]](https://github.com/pytorch/botorch) [[doc]](https://botorch.org/) [[paper]](https://arxiv.org/abs/1910.06403)
* GPyTorch [[code]](https://github.com/cornellius-gp/gpytorch) [[doc]](https://gpytorch.ai/) [[paper]](https://arxiv.org/abs/1809.11165)
* Pyro [[code]](https://github.com/pyro-ppl/pyro) [[doc]](http://docs.pyro.ai) [[paper]](https://arxiv.org/abs/1810.09538)

BoTorch is a package built on top of GPyTorch (GP modeling), Pyro (MCMC and variational inference) and PyTorch as its compution framework. Hence, it profits from GPU support and auto-differentiation. BoTorch is extremely flexible, e.g. in it's first class support for custom acquisition functions, which is made possible by MC integration (quasi-MC together with the reparametrisation trick) and auto-differentiation for gradient-based optimization. BoTorch supports:
* GP models (multi-fidelity, multi-task, ...), variational neural networks 
* MC handling of model parameters is in principle supported via GPyTorch and Pyro, but is not yet implemented
* analytic acquisitions (EI, PI, CB) and MC acquisitions: (knowledge gradient, max-value entropy search, posterior variance), cost awareness and custom acquisitions
* multi-objective optimization via passing a custom pytorch function that scalarizes the objectives to the corresponding MC acqusisition
* only continous search spaces; categorical / ordinal variables need to be encoded beforehand
* parameter constraints (linear inequality constraints) and outcome constraints
* batched proposals

## Ax
[[code]](https://github.com/facebook/Ax)
[[doc]](https://ax.dev)
[[test]](ax.ipynb)

Ax is a high-level framework for Bayesian and Bandit optimization. 
For BO, Ax relies mostly on BoTorch from the same developers, hence the entire feature set of BoTorch is available in principle, but needs to implemented first.
For Bandits optimization, Thompson sampling is used.
Ax provides:
* a service-like API for managing and storing experiments as JSON files (local) or in SQL databases. The service is only running locally.
* transform pipelines to handle encoding of categorical and ordinal variables, scaling and log-transforms
* limited support for parameter constraints of type $x1 \leq x2$ and $\sum x_i \leq c$ as well as output constraints
* limited support for multi-objective optimization: only weighted sum scalarization and no tooling around it
* examples for human-in-the-loop optimization

Overall, I think Ax needs more maturing.

## Emukit
[[code]](https://github.com/amzn/emukit)
[[doc]](https://amzn.github.io/emukit/)
[[paper]]()

Emukit is a high-level framework for Bayesian optimization and and Bayesian quadrature. It is intended to be independent of the modeling framework, but supports first class support for GPy. Similar built-in support for other frameworks is apparently not planned. Mixed search spaces and constraints are supported, multi-objective optimization is currently not.

Emukit provides abstraction layers for the individual components of Bayesian optimization in order to implement algorithms independent of the concrete modeling framework. While the idea is intriguing, concepts like auto-differentiation that make GPFlowOpt and GPyTorchOpt powerfull fall short here. Instead, by focussing on GPy, Emukit seems to end up as replacement for GPyOpt.

## Dragonfly
[[code]](https://github.com/dragonfly/dragonfly)
[[doc]](https://dragonfly-opt.readthedocs.io)
[[paper]](https://arxiv.org/abs/1903.06694)

Dragonfly is package developed at Carneggie Melon University. It has native implementations of GPs with the typical kernels, an optimizer (DOO), MCMC samplers copied from copied from pymc3 and pgmpy (Metropolis, Slice, NUTS, HMC). Multi-objective optimization is supported via random scalarizations, constraints are not.
Note that Thompson sampling from GPs seems to be incorrectly implemented, as points are sampled without keeping track of the previously sampled points.

## Other projects
* [fmfn/BayesianOptimization](https://github.com/fmfn/BayesianOptimization) - Small BO package using GPs from Scikit-Learn 
* [Cornell-MOE](https://github.com/wujian16/Cornell-MOE) - Relatively old Python / C++ package for BO
* [ProcessOptimizer](https://github.com/bytesandbrains/ProcessOptimizer) - Fork of Scikit-Optimizer for optimizing real world processes. Provides classes for composable constraints. However, only rejection sampling is implemented.
* [Phoenics](https://github.com/aspuru-guzik-group/phoenics) - BO package using kernel density estimates and a specific multi-objective acquistion called CHIMERA
* [TS-EMO](https://github.com/Eric-Bradford/TS-EMO) - Matlab implementation of the TS-EMO algorithm
