Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising results of trial values over time #769

Closed
amueller opened this issue Oct 10, 2023 · 3 comments
Closed

Surprising results of trial values over time #769

amueller opened this issue Oct 10, 2023 · 3 comments

Comments

@amueller
Copy link

Thank you for your quick reply and the awesome plot_trials_over_time function.
I'm a bit surprised by the output I get on my first experiment, though:
image
I'm running some toy experiment on a benchmark dataset trying to maximize accuracy (and I don't have checkpointing enabled for now).
This is running MOBSTER and I would have expected that the orange trial at the bottom would be stopped pretty early on, but for some reason it keeps going. Do you have an explanation for that?

@geoalgo
Copy link
Collaborator

geoalgo commented Oct 11, 2023

Hi Andreas,

I am speculating but based on your previous snippet:

config_space = {
    'learning-rate': loguniform(1e-5, 1e-1),
    'n-layers': randint(1, 10),
    'hidden-size': lograndint(4, 2048),
    'dropout-rate': uniform(0, 1),
    'weight_decay': loguniform(1e-7, 1e-1),
    'onehot': choice([True, False]),
    'epochs': 10000,
}

it seems that you are running at most 10000 epochs. Now what ASHA (and MOBSTER) do is to let a configuration continue when not enough trials are available for comparison (this is to avoid the need of synchronization, you accept to make a potential suboptimal decision when not enough data is available at the beginning but then you can parallelize a simple fashion).

If you do not want this to happen, you can use checkpointing in MOBSTER/ASHA (in which case, the trials are paused/resumed when not enough trials exists to compare results), you can for instance check this item in the FAQ to support trial checkpointing: https://syne-tune.readthedocs.io/en/latest/faq.html#how-can-i-enable-trial-checkpointing.

@aaronkl
Copy link
Collaborator

aaronkl commented Oct 12, 2023

Thanks Andreas for you interest! To add to David's point, there are essentially two different ways to implement ASHA, which we call stopping and promotion. The stopping case is similar to the median stopping rule and the promotion base follows the exact definition as proposed in the original ASHA paper. More detail about the two variants are described here. In Syne Tune the default is to use the stopping variant which always runs one of the first N configurations until the end, where N is the number of workers (we use this one as a default as it does not require checkpointing from the user). My guess is that this also what we are seeing here.

@amueller
Copy link
Author

Thank you, that seems a likely explanation, I'll check the paper for more details!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants