Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resiliency of trial reservation #693

Merged

Commits on Nov 23, 2021

  1. Improve resiliency of trial reservation

    Why:
    
    We had two levels of patience when reserving a trial. There was the
    customizable `max_idle_time` used in producer.produce() to limit the
    time spend trying to generate new trials, and there was `_max_depth` in
    `reserve_trial` limiting the number of times a reservation would be
    attempted and producer.produce would be called. This lead to misleading
    error messages. For instance, with many workers it happened that a
    worker would always be unable to reserve a trial because each time it
    executed producer.produce() all other workers would reserve the trials
    before the current worker had time to reserve one. In such scenario the
    error message would be that the algorithm was unable to sample new point
    and is waiting for trials to complete. It is not true. We should state
    the number of trials that were generated during these reservation
    attempts and recommend increasing the pool-size and timeout.
    
    How:
    
    Producer.produce only attempts producing `pool-size` once (calling
    algo.suggest only once) and returns the number of successfully
    produced trials. The whole patience is moved to `reserve_trial` where
    it attempts reserving and producing until it reaches the timeout, in
    which case a helpful error message is raised.
    bouthilx committed Nov 23, 2021
    Configuration menu
    Copy the full SHA
    6720033 View commit details
    Browse the repository at this point in the history