-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surprising results of trial values over time #769
Comments
Hi Andreas, I am speculating but based on your previous snippet: config_space = {
'learning-rate': loguniform(1e-5, 1e-1),
'n-layers': randint(1, 10),
'hidden-size': lograndint(4, 2048),
'dropout-rate': uniform(0, 1),
'weight_decay': loguniform(1e-7, 1e-1),
'onehot': choice([True, False]),
'epochs': 10000,
} it seems that you are running at most 10000 epochs. Now what ASHA (and MOBSTER) do is to let a configuration continue when not enough trials are available for comparison (this is to avoid the need of synchronization, you accept to make a potential suboptimal decision when not enough data is available at the beginning but then you can parallelize a simple fashion). If you do not want this to happen, you can use checkpointing in MOBSTER/ASHA (in which case, the trials are paused/resumed when not enough trials exists to compare results), you can for instance check this item in the FAQ to support trial checkpointing: https://syne-tune.readthedocs.io/en/latest/faq.html#how-can-i-enable-trial-checkpointing. |
Thanks Andreas for you interest! To add to David's point, there are essentially two different ways to implement ASHA, which we call stopping and promotion. The stopping case is similar to the median stopping rule and the promotion base follows the exact definition as proposed in the original ASHA paper. More detail about the two variants are described here. In Syne Tune the default is to use the stopping variant which always runs one of the first N configurations until the end, where N is the number of workers (we use this one as a default as it does not require checkpointing from the user). My guess is that this also what we are seeing here. |
Thank you, that seems a likely explanation, I'll check the paper for more details! |
Thank you for your quick reply and the awesome
plot_trials_over_time
function.I'm a bit surprised by the output I get on my first experiment, though:
I'm running some toy experiment on a benchmark dataset trying to maximize accuracy (and I don't have checkpointing enabled for now).
This is running
MOBSTER
and I would have expected that the orange trial at the bottom would be stopped pretty early on, but for some reason it keeps going. Do you have an explanation for that?The text was updated successfully, but these errors were encountered: