# baptistar/BOCS

Bayesian Optimization of Combinatorial Structures
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
 Failed to load latest commit information. BOCSpy Jul 4, 2018 algorithms Jun 19, 2018 plotting scripts Jul 4, 2018 stat_model test_problems Jun 19, 2018 tools validation LICENSE.md Jun 18, 2018 README.md

# Bayesian Optimization of Combinatorial Structures (BOCS)

## What is the BOCS algorithm?

The BOCS algorithm is a method for finding a global minimizer of an expensive-to-evaluate black-box function that is defined over discrete inputs given a finite budget of function evaluations. The algorithm combines an adaptive generative model and semidefinite programming techniques for scalability of the acquisition function to large combinatorial domains. For more information on the details of the algorithm, please see the ICML paper.

## Authors

Ricardo Baptista (MIT) and Matthias Poloczek (University of Arizona)

## Installation

The BOCS algorithm is implemented in MATLAB and only requires the CVX package to be available in the local path for performing convex optimization. The scripts for comparing to other discrete optimization methods require an installation of SMAC, the python wrapper pySMAC and the bayesopt function in MATLAB for Bayesian Optimization with the Expected Improvement acquisition function.

A python implementation of the BOCS algorithm is also available in the BOCSpy folder. The code has been tested with Python 2.7 and 3.5 and requires the cvxpy and cvxopt packages.

## Example running BOCS on a benchmark problem

We provide an example for running the BOCS algorithm the Ising model sparsification benchmark problem on a 9-node grid graph with a budget of 100 sample evaluations. The code can also be run from MATLAB using the file `scripts/example_ising.m`.

The script first defines the input parameters in the `inputs` struct. These include the number of discrete variables (i.e., edges in the grid graph) `n_vars`, the sample evaluation budget `evalBudget`, the number of samples initially used to build the statistical model `n_init`, the lambda parameter `lambda`, and the horseshoe `estimator` used for the regression model. Other options for `estimator` are bayes and mle.

```inputs = struct;
inputs.n_vars     = 12;
inputs.evalBudget = 100;
inputs.n_init     = 20;
inputs.lambda     = 1e-4;
inputs.estimator  = 'horseshoe';
```

We randomly instantiate the edge weights of the Ising model and define the objective function based on the KL divergence in `inputs.model`. We add an l_1 penalty scaled by the lambda parameter in `inputs.penalty`.

```% Generate random graphical model
Theta   = rand_ising_grid(9);
Moments = ising_model_moments(Theta);

% Save objective function and regularization term
inputs.model    = @(x) KL_divergence_ising(Theta, Moments, x);
inputs.penalty  = @(x) inputs.lambda*sum(x,2);
```

We sample the initial values for the statistical model and evaluate the objective function at these samples. Using these inputs, we run the BOCS-SA and BOCS-SDP algorithms with the order 2 regression model.

```% Generate initial samples for statistical models
inputs.x_vals   = sample_models(inputs.n_init, inputs.n_vars);
inputs.y_vals   = inputs.model(inputs.x_vals);

% Run BOCS-SDP and BOCS-SA (order 2)
B_SA  = BOCS(inputs.model, inputs.penalty, inputs, 2, 'SA');
B_SDP = BOCS(inputs.model, inputs.penalty, inputs, 2, 'sdp');
```

We also provide an example of running the python implementation of BOCS on the binary quadratic programming benchmark problem. The code can be run using the command `python BOCSpy/example_bqp.py`. The script produces a plot with the convergence results from running BOCS-SA and BOCS-SDP on an instance of the problem.

## Comparison to other discrete optimization algorithms

To compare BOCS to other algorithms, we provided files in the scripts folder that run all cases for the quadratic programming problem, the Ising model sparsification problem, and the contamination control problem. The algorithms that are compared include:

To run these examples, we provided three files in the scripts folder named `opt_runs_quad`, `opt_runs_ising` and `opt_runs_cont` for the benchmark problems. In each of the files, the `n_func` parameter defines multiple random instances of the test function and the `n_runs` parameter defines the number of repeated evaluations of each algorithm starting from different initial datasets. The `n_proc` parameter can be used to specify the number of available processors to parallelize the execution of these tests. Lastly, running `scripts/create_plots` will plot the average output of these tests and present statistics of the solutions found by all algorithms, while running `scripts/cross_validate` followed by `scripts/plot_crossval` can be used to validate the statistical models for each benchmark problem.