Several Local Bayesian Optimization Implementations #730

dengdifan · 2021-07-15T21:06:19Z

a GPyTorch based Gaussian Process
a Partial Sparse Gaussian Process
Thompson Sampling acquisition function
TurBO
BOinG
examples on turbo and boing

To run examples/fmin_rosenbrock_boing.py
You need to install pyrfr from this PR to get more interface (However, Ive not written test file for that PR)
automl/random_forest_run#64

AndreBiedenkapp

Please change the merge destination from master to development.
Also please add some tests.

AndreBiedenkapp · 2021-07-16T08:20:02Z

requirements.txt

@@ -9,3 +9,6 @@ pyrfr>=0.8.0
 lazy_import
 dask
 distributed
+torch>=1.9.0
+gpytorch>=1.5.0
+pyro-ppl


Could you please also specify a minimal version that is required

renesass · 2021-07-16T12:27:32Z

python examples/fmin_rosenbrock_boing.py

File "/Users/rene/anaconda3/envs/SMAC/lib/python3.9/site-packages/smac/optimizer/local_bo/epm_chooser_boing.py", line 365, in choose_next bounds_ss_cont, bounds_ss_cat, ss_data_indices = subspace_extraction(X=X, File "/Users/rene/anaconda3/envs/SMAC/lib/python3.9/site-packages/smac/optimizer/local_bo/epm_chooser_boing.py", line 492, in subspace_extraction trees = model.rf.get_all_trees() AttributeError: 'binary_rss_forest' object has no attribute 'get_all_trees'

AndreBiedenkapp

Partial Review since I'm loosing focus on the overall PR.
Since it is so large, it might be best to split this up into smaller PRs, one for each indivicual feature.

AndreBiedenkapp · 2021-07-16T08:40:06Z

smac/epm/gaussian_process_gpytorch.py

+gpytorch.settings.debug.off()
+
+
+class ExactGPModel(ExactGP):


Without any docstring it's difficult to see at a glance what this is used for.

AndreBiedenkapp · 2021-07-16T08:41:54Z

smac/epm/gaussian_process_gpytorch.py

+                 normalize_y: bool = True,
+                 n_opt_restarts: int = 10,


Since they are not optional I suggest to move them before likelihood

AndreBiedenkapp · 2021-07-16T09:00:40Z

smac/epm/gaussian_process_gpytorch.py

+        bounds_cont: np.ndarray
+            bounds of continuous hyperparameters
+        bounds_cat:  typing.List[typing.List[typing.Tuple]],
+            bounds of categorical hyperparameters, need to be flattened, e.g. all the possible categorical needs to
+            be listed


I don't see whey those are necessary. The bounds argument should suffice to get these values already as the documentation states above. I.e. continuous hypers have the bounds (lower, upper) and categoricals have (n_cat, np.nan).
You can also have a look at the get_types function in util_funcs.py which takes care of getting the bounds.

SMAC3/smac/epm/util_funcs.py

Lines 12 to 74 in 55369ec

def get_types(

config_space: ConfigurationSpace,

instance_features: typing.Optional[np.ndarray] = None,

) -> typing.Tuple[typing.List[int], typing.List[typing.Tuple[float, float]]]:

"""TODO"""

# Extract types vector for rf from config space and the bounds

types = [0] * len(config_space.get_hyperparameters())

bounds = [(np.nan, np.nan)] * len(types)

for i, param in enumerate(config_space.get_hyperparameters()):

parents = config_space.get_parents_of(param.name)

if len(parents) == 0:

can_be_inactive = False

else:

can_be_inactive = True

if isinstance(param, (CategoricalHyperparameter)):

n_cats = len(param.choices)

if can_be_inactive:

n_cats = len(param.choices) + 1

types[i] = n_cats

bounds[i] = (int(n_cats), np.nan)

elif isinstance(param, (OrdinalHyperparameter)):

n_cats = len(param.sequence)

types[i] = 0

if can_be_inactive:

bounds[i] = (0, int(n_cats))

else:

bounds[i] = (0, int(n_cats) - 1)

elif isinstance(param, Constant):

# for constants we simply set types to 0 which makes it a numerical

# parameter

if can_be_inactive:

bounds[i] = (2, np.nan)

types[i] = 2

else:

bounds[i] = (0, np.nan)

types[i] = 0

# and we leave the bounds to be 0 for now

elif isinstance(param, UniformFloatHyperparameter):

# Are sampled on the unit hypercube thus the bounds

# are always 0.0, 1.0

if can_be_inactive:

bounds[i] = (-1.0, 1.0)

else:

bounds[i] = (0, 1.0)

elif isinstance(param, UniformIntegerHyperparameter):

if can_be_inactive:

bounds[i] = (-1.0, 1.0)

else:

bounds[i] = (0, 1.0)

elif not isinstance(param, (UniformFloatHyperparameter,

UniformIntegerHyperparameter,

OrdinalHyperparameter,

CategoricalHyperparameter)):

raise TypeError("Unknown hyperparameter type %s" % type(param))

if instance_features is not None:

types = types + [0] * instance_features.shape[1]

return types, bounds

Actually, I believe the resulting self.bound_cat and self.bound_cont are never used in the code anyways.

That is only used in another GP model, I will fix that

AndreBiedenkapp · 2021-07-16T09:07:10Z

smac/epm/gaussian_process_gpytorch.py

+        self.cont_dims = np.where(np.array(types) == 0)[0]
+
+        self.normalize_y = normalize_y
+        self.n_opt_restarts = n_opt_restarts


Please add a check somewhere that this parameter is a positive integer or otherwise handle illegal values.

AndreBiedenkapp · 2021-07-16T09:07:14Z

smac/epm/gaussian_process_gpytorch.py

+
+        self.num_points = 0
+
+    def _train(self, X: np.ndarray, y: np.ndarray, do_optimize: bool = True) -> 'GaussianProcess':


Suggested change

def _train(self, X: np.ndarray, y: np.ndarray, do_optimize: bool = True) -> 'GaussianProcess':

def _train(self, X: np.ndarray, y: np.ndarray, do_optimize: bool = True) -> GaussianProcessGPyTorch:

since it returns self it should be typhinted with the correct class.

AndreBiedenkapp · 2021-07-16T12:03:30Z