Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate pending trials from parent/child for exc #631

Merged
merged 8 commits into from
Aug 20, 2021

Conversation

bouthilx
Copy link
Member

[Fixes #576 ]

Why:

Experiment cannot reserve trial of parent experiment. This is very
problematic as non-completed trials of parents cannot be execute anymore
unless the environment state is reverted to the one used for parent
experiment (ex: resetting code). It should look for executable trials
across the EVC tree.

Running trials from parent experiments may cause issue if the child
experiment has a different script path, different code version or
different cmdline call. We should attempt running the trial with the
corresponding experiment configuration. It's not clear what to do if it
fails. If we simply leave the trial status to interrupted the child
experiment will try it again.

Another option is to copy the trial to the child experiment and run it
with child configuration. If the user checkpointed the trial state with
trial.hash_params, the checkpoint will be lost as trial.hash_params
will change based on the experiment id. This is safe, protecting users
from resuming with a different code version.

How:

Fetch trials from EVC tree and duplicate any pending trials to current
experiment. A hash of the params is used to avoid duplicating trials
that are already available in the current experiment.

@bouthilx bouthilx added the bug Indicates an unexpected problem or unintended behavior label Jul 29, 2021
@bouthilx bouthilx added this to the v0.1.16 milestone Jul 29, 2021
Why:

Experiment cannot reserve trial of parent experiment. This is very
problematic as non-completed trials of parents cannot be execute anymore
unless the environment state is reverted to the one used for parent
experiment (ex: resetting code). It should look for executable trials
across the EVC tree.

Running trials from parent experiments may cause issue if the child
experiment has a different script path, different code version or
different cmdline call. We should attempt running the trial with the
corresponding experiment configuration. It's not clear what to do if it
fails. If we simply leave the trial status to interrupted the child
experiment will try it again.

Another option is to copy the trial to the child experiment and run it
with child configuration. If the user checkpointed the trial state with
trial.hash_params, the checkpoint will be lost as trial.hash_params
will change based on the experiment id. This is safe, protecting users
from resuming with a different code version.

How:

Fetch trials from EVC tree and duplicate any pending trials to current
experiment. A hash of the params is used to avoid duplicating trials
that are already available in the current experiment.
Max and mean strategies were failing when all trials observed have no
valid objectives.
@bouthilx bouthilx force-pushed the hotfix/reserve_parent_trials branch from b720e5f to c6a9549 Compare July 29, 2021 20:51
@bouthilx bouthilx merged commit 23a4127 into Epistimio:develop Aug 20, 2021
@bouthilx bouthilx deleted the hotfix/reserve_parent_trials branch August 20, 2021 20:18
@bouthilx bouthilx mentioned this pull request Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Experiment cannot reserve trial of parent experiment
1 participant