Surprising results of trial values over time #769

amueller · 2023-10-10T22:39:47Z

Thank you for your quick reply and the awesome plot_trials_over_time function.
I'm a bit surprised by the output I get on my first experiment, though:

I'm running some toy experiment on a benchmark dataset trying to maximize accuracy (and I don't have checkpointing enabled for now).
This is running MOBSTER and I would have expected that the orange trial at the bottom would be stopped pretty early on, but for some reason it keeps going. Do you have an explanation for that?

The text was updated successfully, but these errors were encountered:

geoalgo · 2023-10-11T11:40:19Z

Hi Andreas,

I am speculating but based on your previous snippet:

config_space = {
    'learning-rate': loguniform(1e-5, 1e-1),
    'n-layers': randint(1, 10),
    'hidden-size': lograndint(4, 2048),
    'dropout-rate': uniform(0, 1),
    'weight_decay': loguniform(1e-7, 1e-1),
    'onehot': choice([True, False]),
    'epochs': 10000,
}

it seems that you are running at most 10000 epochs. Now what ASHA (and MOBSTER) do is to let a configuration continue when not enough trials are available for comparison (this is to avoid the need of synchronization, you accept to make a potential suboptimal decision when not enough data is available at the beginning but then you can parallelize a simple fashion).

If you do not want this to happen, you can use checkpointing in MOBSTER/ASHA (in which case, the trials are paused/resumed when not enough trials exists to compare results), you can for instance check this item in the FAQ to support trial checkpointing: https://syne-tune.readthedocs.io/en/latest/faq.html#how-can-i-enable-trial-checkpointing.

aaronkl · 2023-10-12T18:41:44Z

Thanks Andreas for you interest! To add to David's point, there are essentially two different ways to implement ASHA, which we call stopping and promotion. The stopping case is similar to the median stopping rule and the promotion base follows the exact definition as proposed in the original ASHA paper. More detail about the two variants are described here. In Syne Tune the default is to use the stopping variant which always runs one of the first N configurations until the end, where N is the number of workers (we use this one as a default as it does not require checkpointing from the user). My guess is that this also what we are seeing here.

amueller · 2023-10-19T22:38:43Z

Thank you, that seems a likely explanation, I'll check the paper for more details!

amueller closed this as completed Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surprising results of trial values over time #769

Surprising results of trial values over time #769

amueller commented Oct 10, 2023

geoalgo commented Oct 11, 2023

aaronkl commented Oct 12, 2023

amueller commented Oct 19, 2023

Surprising results of trial values over time #769

Surprising results of trial values over time #769

Comments

amueller commented Oct 10, 2023

geoalgo commented Oct 11, 2023

aaronkl commented Oct 12, 2023

amueller commented Oct 19, 2023