Fix Multi-Objective cache #872

renesass · 2022-07-04T14:12:57Z

Closes #852.

I slightly rearranged the runhistory so that it's more intuitive.

What else did I do?
average_cost/min_cost/sum_cost return a list of floats (instead of a float) now.
For example, for average_cost, each objective is averaged separately. Imagine you have two runs with [100, 200] and [0, 0] then you'd get [50, 100]. However, once you call get_cost, this (cached) [50, 100] is normalized based on all passed entries. That means that get_cost returns a float in the end.

Unfortunately, the intensifier and also some other methods work with average_cost or sum_cost (which should actually be a private method). So, I have to call normalize_cost in the intensifier again. Since _cost_per_config(cache for average_cost) is always storing the current state with all objective values, we are caching it correctly and normalize it once it's needed in the get_cost method.

Edit: I added an argument in average_cost/min_cost/sum_cost to return the normalized (single float) values. Makes the code easier and scalable.

One more thing which really is weird: In the runhistory2epm we iterate over the run dictionary (runkey, runvalue). But here we use the run.cost directly and hence we have to call normalize_costs in the case of MO again. I wonder why this is done this way? @mfeurer

smac/intensification/abstract_racer.py

smac/intensification/hyperband.py

smac/intensification/intensification.py

smac/intensification/parallel_scheduling.py

smac/optimizer/multi_objective/aggregation_strategy.py

smac/utils/multi_objective.py

timruhkopf · 2022-07-07T08:09:54Z

smac/optimizer/multi_objective/aggregation_strategy.py

        """
        Transform a multi-objective loss to a single loss.

        Parameters
        ----------
-            values (np.ndarray): Normalized values.
+        values : list[float]


I would argue, that the normalization is part of the aggregation strategy and should probably be moved here instead

I don't see a reason why the values should not be normalized here.

tests/test_utils/test_multi_objective.py

timruhkopf

Normalization returns a one vector if budget = None. This may not be intuitive and desirable.

we might want to make normalization part of the aggregation strategy instead (making it explicit)
We could as well call the aggregationStrategy ScalarizationStrategy, implying that we want to make a scalar objective from the multi objective

mfeurer · 2022-07-07T09:08:10Z

smac/optimizer/multi_objective/parego.py


        # Normalize st all theta values sum up to 1
        theta = theta / (np.sum(theta) + 1e-10)

        # Weight the values
        theta_f = theta * values
-
-        return np.max(theta_f, axis=1) + self.rho * np.sum(theta_f, axis=1)
+        return np.max(theta_f, axis=0) + self.rho * np.sum(theta_f, axis=0)


I'm somewhat surprised to see that the summation over an axis changes without a respective test changing. Does this mean that there is no test for ParEGO?

mfeurer · 2022-07-07T09:12:13Z

smac/utils/multi_objective.py

-            numerator = data - min_value
-            normalized_values.append(numerator / denominator)
+            cost = p / q
+        costs += [cost]


Why not costs.append()?

## Features * [BOinG](https://arxiv.org/abs/2111.05834): A two-stage Bayesian optimization approach to allow the optimizer to focus on the most promising regions. * [TurBO](https://arxiv.org/abs/1910.01739): Reimplementaion of TurBO-1 algorithm. * Updated pSMAC: Can pass arbitrary SMAC facades now. Added example and fixed tests. ## Improvements * Enabled caching for multi-objectives (#872). Costs are now normalized in `get_cost` or optionally in `average_cost`/`sum_cost`/`min_cost` to receive a single float value. Therefore, the cached cost values do not need to be updated everytime a new entry to the runhistory was added. ## Interface changes * We changed the location of Gaussian processes and random forests. They are in the folders `epm/gaussian_process` and `epm/random_forest` now. * Also, we restructured the optimizer folder and therefore the location of the acquisition functions and configuration chooser. * Multi-objective functions are located in the folder `multi_objective`. * pSMAC facade was moved to the facade directory. Co-authored-by: Difan Deng <deng@dengster.tnt.uni-hannover.de> Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> Co-authored-by: Carolin Benjamins <benjamins@tnt.uni-hannover.de> Co-authored-by: timruhkopf <timruhkopf@gmail.com>

Version 1.4.0 (automl#730, automl#855, automl#869, automl#872)

Fixed MO cache problem

8593d0b

renesass linked an issue Jul 4, 2022 that may be closed by this pull request

Integrate Multi-Objective caching #852

Closed

renesass added the enhancement label Jul 4, 2022

renesass added 9 commits July 5, 2022 09:42

Fix MO tests

1fc402a

Fixing MO in AbstractRacer

7127a65

Accepting ints

85e692a

Hacking in the sum cost for MO

2176b99

Fix mypi

2d6a9ce

Bugfix

9f23798

Make mypi happy

cc6dc30

Fix tests

0746e9b

Make format

47906ef

renesass requested review from benjamc and dengdifan July 6, 2022 06:50

renesass added 2 commits July 6, 2022 12:47

Remove num_obj in MO optimizer

ad0ebf4

Updated changelog

b6d27f9