Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resiliency of trial reservation #693

Merged

Conversation

bouthilx
Copy link
Member

Why:

We had two levels of patience when reserving a trial. There was the
customizable max_idle_time used in producer.produce() to limit the
time spend trying to generate new trials, and there was _max_depth in
reserve_trial limiting the number of times a reservation would be
attempted and producer.produce would be called. This lead to misleading
error messages. For instance, with many workers it happened that a
worker would always be unable to reserve a trial because each time it
executed producer.produce() all other workers would reserve the trials
before the current worker had time to reserve one. In such scenario the
error message would be that the algorithm was unable to sample new point
and is waiting for trials to complete. It is not true. We should state
the number of trials that were generated during these reservation
attempts and recommend increasing the pool-size and timeout.

How:

Producer.produce only attempts producing pool-size once (calling
algo.suggest only once) and returns the number of successfully
produced trials. The whole patience is moved to reserve_trial where
it attempts reserving and producing until it reaches the timeout, in
which case a helpful error message is raised.

@bouthilx bouthilx added this to the v0.2 milestone Nov 23, 2021
@bouthilx bouthilx added this to In progress in Release v0.2.0 via automation Nov 23, 2021
@bouthilx bouthilx force-pushed the feature/improved_producer_resiliency branch 4 times, most recently from 91be352 to 8bcfd85 Compare November 23, 2021 17:13
Why:

We had two levels of patience when reserving a trial. There was the
customizable `max_idle_time` used in producer.produce() to limit the
time spend trying to generate new trials, and there was `_max_depth` in
`reserve_trial` limiting the number of times a reservation would be
attempted and producer.produce would be called. This lead to misleading
error messages. For instance, with many workers it happened that a
worker would always be unable to reserve a trial because each time it
executed producer.produce() all other workers would reserve the trials
before the current worker had time to reserve one. In such scenario the
error message would be that the algorithm was unable to sample new point
and is waiting for trials to complete. It is not true. We should state
the number of trials that were generated during these reservation
attempts and recommend increasing the pool-size and timeout.

How:

Producer.produce only attempts producing `pool-size` once (calling
algo.suggest only once) and returns the number of successfully
produced trials. The whole patience is moved to `reserve_trial` where
it attempts reserving and producing until it reaches the timeout, in
which case a helpful error message is raised.
@bouthilx bouthilx force-pushed the feature/improved_producer_resiliency branch from 8bcfd85 to 6720033 Compare November 23, 2021 19:08
@bouthilx bouthilx merged commit 9ca27cf into Epistimio:develop Nov 24, 2021
Release v0.2.0 automation moved this from In progress to Done Nov 24, 2021
@bouthilx bouthilx mentioned this pull request Nov 24, 2021
@bouthilx bouthilx added the enhancement Improves a feature or non-functional aspects (e.g., optimization, prettify, technical debt) label Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves a feature or non-functional aspects (e.g., optimization, prettify, technical debt)
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

1 participant