# Bayesian Optimization with Trees in OMLT

This notebook introduces the gradient-boosted trees (GBT) functionality of `OMLT` and how such models can be incorporated in Bayesian optimization loops. For a more comprehensive framework using GBT models for Bayesian optimization please check out another project of our group: [ENTMOOT](https://github.com/cog-imperial/entmoot).

## List of Python Imports
We start by importing a list of dependencies to implement the example. `OMLT` is compatible with all tree ensemble training libraries that support ONNX outputs. In this tutorial we use the `lightgbm` package.

In [2]:
import random
import tempfile
import numpy as np
import lightgbm as lgb
import pyomo.environ as pe
from onnxmltools.convert.lightgbm.convert import convert
from skl2onnx.common.data_types import FloatTensorType
from omlt.block import OmltBlock
from omlt.gbt import BigMFormulation, GradientBoostedTreeModel

from helpers import generate_gbt_data

random.seed(100)

## Define Dataset
We first define a simple dataset by sampling 100 random points from the 10D Rastrigin function. Every input feature of the Rastrigin function is bounded by `(-5.12, 5.12)`.

In [3]:
def f(X):
    # Rastrigin benchmark function
    x = np.asarray_chkfinite(X)
    n = len(x)
    res = 10*n + sum( x**2 - 10 * np.cos( 2 * np.pi * x ))
    return res

f_bnds = [(-5.12,5.12) for _ in range(10)]

# generate dataset
data = {'X': [], 'y': []}

for _ in range(100):
    sample =[random.uniform(*bnd) for bnd in f_bnds]
    
    data['X'].append(sample)
    data['y'].append(f(sample))

## Train the Tree Ensemble
Next we define our training function to train the tree ensemble based on the data we generated.

In [4]:
def train_tree(data):
    FIXED_PARAMS = {'objective': 'regression',
                    'metric': 'rmse',
                    'boosting': 'gbdt',
                    'num_trees': 20,
                    'max_depth': 3,
                    'min_data_in_leaf': 2,
                    'min_data_per_group': 2,
                    'random_state': 100,
                    'verbose': -1}

    train_data = lgb.Dataset(data['X'], 
                             label=data['y'],
                             params={'verbose': -1})

    model = lgb.train(FIXED_PARAMS, 
                      train_data,
                      verbose_eval=False)
    return model

## Handling Trees with ONNX
ONNX needs to know the number of features and their type. Currently, ONNX doesn't support categorical features so we can only train models with continous features in `lightgbm`. To handle categorical features we recommend to perform a one-hot encoding transformation first.

In [5]:
def get_onnx_model(lgb_model):
    # export onnx model
    float_tensor_type = FloatTensorType([None, lgb_model.num_feature()])
    initial_type = [('float_input', float_tensor_type)]
    onnx_model = convert(lgb_model, 
                         initial_types=initial_type, 
                         target_opset=8)
    return onnx_model

You can write the ONNX model to a file so that it can be inspected using a tool like [Netron](https://netron.app/).

In [6]:
# build lightgbm model and export from onnx
lgb_model = train_tree(data)
onnx_model = get_onnx_model(lgb_model)

with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as f:
    f.write(onnx_model.SerializeToString())
    print(f'Onnx model written to {f.name}')

Onnx model written to /var/folders/05/s3jw9hyd1l95qtpg8jjbkv340000gn/T/tmpttvmw05j.onnx




## Build the Pyomo Model
We build the `Pyomo` model by first defining the input bounds and input domain.

In [7]:
# define problem specifications
input_bounds = f_bnds
input_domain = [pe.Reals for _ in range(len(input_bounds))]

def get_opt_model_core(input_domain, input_bounds):
    # init optimization model
    opt_model = pe.ConcreteModel()

    # define the optimization model
    opt_model.x = pe.Var(range(len(input_bounds)), 
                         domain=lambda m, i: input_domain[i], 
                         bounds=lambda m, i: input_bounds[i])
    return opt_model

opt_model = get_opt_model_core(input_domain, input_bounds)

We can print the model to check if everything worked correctly.

In [8]:
opt_model.pprint()

1 Set Declarations
    x_index : Size=1, Index=None, Ordered=False
        Key  : Dimen : Domain : Size : Members
        None :     1 :    Any :   10 : {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

1 Var Declarations
    x : Size=10, Index=x_index
        Key : Lower : Value : Upper : Fixed : Stale : Domain
          0 : -5.12 :  None :  5.12 : False :  True :  Reals
          1 : -5.12 :  None :  5.12 : False :  True :  Reals
          2 : -5.12 :  None :  5.12 : False :  True :  Reals
          3 : -5.12 :  None :  5.12 : False :  True :  Reals
          4 : -5.12 :  None :  5.12 : False :  True :  Reals
          5 : -5.12 :  None :  5.12 : False :  True :  Reals
          6 : -5.12 :  None :  5.12 : False :  True :  Reals
          7 : -5.12 :  None :  5.12 : False :  True :  Reals
          8 : -5.12 :  None :  5.12 : False :  True :  Reals
          9 : -5.12 :  None :  5.12 : False :  True :  Reals

2 Declarations: x_index x


We import the general `OMLT` block and the `GradientBoostedTreeModel` module. `OMLT` uses `BigMFormulation` to encode the tree ensembles. This optimization model formulation was adapted from Misic 2020.

In [9]:
from omlt.block import OmltBlock
from omlt.gbt import BigMFormulation, GradientBoostedTreeModel

In include the tree model as a `Pyomo` block we import a few objects from `OMLT` and add everything to our optimization model.

In [10]:
def add_tree_model(opt_model, onnx_model, input_bounds):
    # init omlt block and gbt model based on the onnx format
    opt_model.gbt = OmltBlock()
    gbt_model = GradientBoostedTreeModel(onnx_model, 
                                         input_bounds=input_bounds)
    
    # omlt uses a big-m formulation to encode the tree models
    formulation = BigMFormulation(gbt_model)
    opt_model.gbt.build_formulation(formulation)
    opt_model.obj = pe.Objective(expr=opt_model.gbt.outputs[0])
    
add_tree_model(opt_model, onnx_model, input_bounds)

Solve the problems

In [11]:
import gurobipy
opt_model.obj = pe.Objective(expr=opt_model.gbt.outputs[0])
solver = pe.SolverFactory('gurobi_direct')
solver.solve(opt_model, tee=True)

    'pyomo.core.base.objective.ScalarObjective'>) on block unknown with a new
    Component (type=<class 'pyomo.core.base.objective.ScalarObjective'>). This
    block.del_component() and block.add_component().
    AttributeError: module 'gurobipy' has no attribute 'gurobi'


ApplicationError: No Python bindings available for <class 'pyomo.solvers.plugins.solvers.gurobi_direct.GurobiDirect'> solver plugin

In [12]:
print(opt_model.x.pprint())

x : Size=10, Index=x_index
    Key : Lower : Value : Upper : Fixed : Stale : Domain
      0 : -5.12 :  None :  5.12 : False :  True :  Reals
      1 : -5.12 :  None :  5.12 : False :  True :  Reals
      2 : -5.12 :  None :  5.12 : False :  True :  Reals
      3 : -5.12 :  None :  5.12 : False :  True :  Reals
      4 : -5.12 :  None :  5.12 : False :  True :  Reals
      5 : -5.12 :  None :  5.12 : False :  True :  Reals
      6 : -5.12 :  None :  5.12 : False :  True :  Reals
      7 : -5.12 :  None :  5.12 : False :  True :  Reals
      8 : -5.12 :  None :  5.12 : False :  True :  Reals
      9 : -5.12 :  None :  5.12 : False :  True :  Reals
None
