# Bayesian Optimization Frameworks

Bayesian Optimization (BO) is a method for optimizing expensive black-box functions by means of a probabilistic surrogate model and an acquisition function that specifies a trade-off between exploration (improving the surrogate) and exploitation (finding the minimizer of the surrogate and thus of the black-box function).

As **surrogate model** typically Gaussians processes (GP) or tree-based models (RF, GBT ...) are used. These have their own parameters that need to be estimated via maximum likelihood or integrated out.
The maximum likelihood (ML) point estimate is typically searched using a gradient-based local optimizer is used together with a heuristic such as multiple restarts to deal with the non-convexiy of the problem.
For integrating out the model parameters either Markov-chain Monte Carlo (MCMC) or approximate variational inference (VI) methods have to be used, together with an assumption on the prior distribution. As with the ML estimation, the problem is non-convex.

**Search spaces** define the possible inputs. When applying BO for tuning hyperparameters of machine learning models the inputs include continuous, integer / ordinal and categorical variables and may include conditionals. When applying BO to optimization of real-world processes additional (in-)equality constraints may need to be satisfied.

**Acquistion functions** either have a analytic form, or need to be approximated via Monte Carlo methods.
We want to find the minimizer of the acquistion over the search space in order to evaluate it next.
This is typically done using a gradient-free or gradient-based local optimizer or via MC-sampling, depending on the nature of the acquistion function.
The search space including its constraints needs to be handled by the optimizer.
In multi-objective BO the black-box function has multiple target outputs. Here the acquisition function needs to guide the search towards exploring the Pareto front.  

There are a number of general-purpose (not focussing only on hyperparameter tuning) BO frameworks available in Python and R. In the main frameworks are compared in terms of supported features and development & support activity.

## Feature comparison

| Name | Surrogate | Hyperparameter handling | Acquisitions | Multi-objective | Search space | Optimizer |
| ------------------------------------------------------------------------- | ------------------------------------------------------------------------ | ----------------------------------------------------- | ---------------------------- | ----------------------------------------------- | --------------- | ---- |
| [mlrMBO](https://mlr-org.github.io/mlrMBO) | GP, RF ([mlr](https://mlr.mlr-org.com/)) | ML | EI, CB, AEI, EQI, AdaCB | DIB | continuous, integer, categorical + constraints |
| [pyGPGO](http://pygpgo.readthedocs.io) | GP (native), GBT, RF, ET ([scikit-learn](https://scikit-learn.org)) | ML, MCMC ([pymc3](https://docs.pymc.io/)) | EI, PI, CB | | continuous, integer |
| [Scikit-Optimize](https://scikit-optimize.github.io) | GP, RF, GBT ([scikit-learn](https://scikit-learn.org)) | ML | EI, PI, CB | - | continuous, discrete, categorical + constraints | BFGS
| [GPyOpt](https://github.com/SheffieldML/GPyOpt) | GP ([GPy](http://sheffieldml.github.io/GPy)) | ML, MCMC | EI, PI, CB | - | continuous, discrete, categorical + constraints | BFGS, DIRECT
| [GPflowOpt](https://github.com/GPflow/GPflowOpt) | GP ([GPflow](https://github.com/GPflow/GPflow)) | ML | EI, PI, CB, MES, PF | HVPI | continuous |
| [BoTorch](https://botorch.org/) | GP ([GPyTorch](https://github.com/cornellius-gp/gpytorch)) | MC, MCMC (Pyro)| | | |
| [Emukit](https://github.com/amzn/emukit) | GP ([GPy](http://sheffieldml.github.io/GPy)) | ML | EI, PI, CB, ES, MES, PF | - | continuous, integer, categorical + constraints |
| [DragonFly](https://github.com/dragonfly/dragonfly) | GP (native) | ML, posterior sampling | EI, CB, PI, TTEI, TS | scalarization with CB/TS | continuous, categorical | DOO, DIRECT

## Activity comparison
The ranking by open/closed issues and commits gives similar results.

In [1]:
import datetime
print(datetime.datetime.now())

2020-03-05 11:54:28.129475


In [2]:
import util

df = util.compare_repos([
    'mlr-org/mlrMBO',
    'SheffieldML/GPyOpt',
    'scikit-optimize/scikit-optimize',
    'GPflow/GPflowOpt',
    'pytorch/botorch',
    'amzn/emukit',
    'dragonfly/dragonfly',
    'josejimenezluna/pyGPGO',
])
df.sort_values(by='closed_issues', ascending=False)

Unnamed: 0_level_0,stars,forks,contributors,commits,open_issues,closed_issues,created,last_commit,license
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
scikit-optimize,1699,331,51,1474,137,752,2016-03-20,2020-02-28,BSD-3-Clause
mlrMBO,153,37,14,1589,82,403,2013-10-23,2020-02-28,NOASSERTION
botorch,1495,110,25,600,22,363,2018-07-30,2020-03-04,MIT
emukit,179,52,17,256,26,257,2018-09-04,2020-02-20,Apache-2.0
GPyOpt,580,171,34,489,77,228,2014-08-13,2020-03-03,BSD-3-Clause
GPflowOpt,200,43,5,426,22,97,2017-04-28,2018-09-12,Apache-2.0
dragonfly,462,48,8,391,13,46,2018-04-20,2020-03-02,MIT
pyGPGO,170,43,2,292,7,17,2016-11-23,2019-06-15,MIT


## GPyOpt
* GPy [[code]](https://github.com/SheffieldML/GPy) [[doc]](https://gpy.readthedocs.io/en/latest/)
* GPyOpt [[code]](https://github.com/SheffieldML/GPyOpt) [[doc]](https://gpyopt.readthedocs.io/en/latest/)

GPyOpt is a BO package built on top of the hugely popular GPy for flexible GP modeling. Both are being developed by the university of Sheffield.
Together with mlrMBO, GPyOpt has been around the longest. However, development of GPyOpt has somewhat stalled.

## Scikit-Optimize
[[code]](https://github.com/scikit-optimize/scikit-optimize)
[[doc]](https://scikit-optimize.github.io/)

Scikit-Optimize is a stable and well polished BO package based on the GP and tree-based models in Scikit-Learn. Not supported are model parameter integration and multi-objective optimization. Compared to GPy, the GP modeling in Scikit-Learn is rather rudimentary. Mixed search spaces and constraints are supported, as well as external, delayed and batched evaluations. For Hyperparameter tuning in Scikit-Learn there is a drop-in replacement for Grid/RandomSearchCV.

## GPFlowOpt (with TF & GPFlow)
* GPFlow [[code]](https://github.com/GPflow/GPflow) [[doc]](https://gpflow.readthedocs.io/en/latest/) [[paper]](https://arxiv.org/abs/1711.03845)
* GPFlowOpt [[code]](https://github.com/GPflow/GPflowOpt) [[doc]](https://gpflowopt.readthedocs.io/en/latest/) [[paper]](http://jmlr.org/papers/v18/16-537.html)

GPFlowOpt is BO package built on top of GPFlow, which in turn uses TensorFlow for fast linear algebra computations with GPU-support and auto-differentiation. This makes it more extensible as different models and acquisition functions can be implemented without having to define gradients for the optimizer.
The top-level API is inspired GPy. Development of GPFlowOpt seems to have stopped completely. 

## BoTorch (with PyTorch, GPyTorch & Pyro)
* BoTorch [[code]](https://github.com/pytorch/botorch) [[doc]](https://botorch.org/) [[paper]](https://arxiv.org/abs/1910.06403)
* GPyTorch [[code]](https://github.com/cornellius-gp/gpytorch) [[doc]](https://gpytorch.ai/) [[paper]](https://arxiv.org/abs/1809.11165)
* Pyro [[code]](https://github.com/pyro-ppl/pyro) [[doc]](http://docs.pyro.ai) [[paper]](https://arxiv.org/abs/1810.09538)

BoTorch is BO package built on top of GPyTorch (GP modeling), Pyro (MCMC and variational inference) and PyTorch as its compution framework. Hence, as GPFlowOpt, it profits from native GPU support and auto-differentiation. BoTorch is still in its beta phase and several features such as MCMC integrated model parameters are not yet implemented. Activity on the github project is high though and it seems to have won over GPFlowOpt. BoTorch is extremely flexible, at the expense of requiring plently of boilerplate code and in-depth knowledge of PyTorch when using it.

## Emukit
[[code]](https://github.com/amzn/emukit)
[[doc]](https://amzn.github.io/emukit/)
[[paper]]()

Emukit is toolbox for BO and and Bayesian quadrature, developed by Amazon. It is intended to be independent of the modeling framework, but supports first class support for GPy. Similar built-in support for other frameworks is apparently not planned. Mixed search spaces and constraints are supported, multi-objective optimization is currently not.

## Dragonfly
[[code]](https://github.com/dragonfly/dragonfly)
[[doc]](https://dragonfly-opt.readthedocs.io)
[[paper]](https://arxiv.org/abs/1903.06694)

Dragonfly is package developed at Carneggie Melon University. It has native implementations of GPs with the typical kernels, an optimizer (DOO), MCMC samplers copied from copied from pymc3 and pgmpy (Metropolis, Slice, NUTS, HMC). Multi-objective optimization is supported via random scalarizations, constraints are not.
Note that Thompson sampling from GPs looks incorrectly implemented, as points are sampled without keeping track of the previously sampled points.

## Other projects
* [fmfn/BayesianOptimization](https://github.com/fmfn/BayesianOptimization) - Small BO package using GPs from Scikit-Learn 
* [Cornell-MOE](https://github.com/wujian16/Cornell-MOE) - Relatively old Python / C++ package for BO
* [ProcessOptimizer](https://github.com/bytesandbrains/ProcessOptimizer) - Fork of Scikit-Optimizer for optimizing real world processes. Provides classes for composable constraints. However, only rejection sampling is implemented.
* [Phoenics](https://github.com/aspuru-guzik-group/phoenics) - BO package using kernel density estimates and a specific multi-objective acquistion called CHIMERA
* [TS-EMO](https://github.com/Eric-Bradford/TS-EMO) - Matlab implementation of the TS-EMO algorithm
