Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize intermediate state transition for bundle manager #2438

Merged
merged 6 commits into from
Jul 3, 2020

Conversation

candicegjing
Copy link
Contributor

@candicegjing candicegjing commented Jun 19, 2020

Fixed #2485

Copied from the above ref:
There are essentially two issues here:

  1. The bundle manager keeps assigning bundles to a worker that is going to be terminated soon.
    1. a bundle that is assigned to a terminating worker will stay in the starting state for 5 minutes until it gets picked up by the bundle manager.
    2. after the starting bundle gets picked up by the bundle manager, the worker goes offline, but there is still a record of this offline worker in the database.
  2. The bundle manager keeps assigning bundles to a worker that is offline. (worker is disconnected, but its record still exists in the database)

Based on the above summary, there are differences and overlap between 1 and 2:

  1. overlap: 1.ii and 2.
    1. goal: minimize the number of bundles that will be assigned to an offline worker.
    2. approach: detect the offline worker before dispatching a bundle.
  2. difference: 1.a
    1. goal: minimize the number of bundles that will be assigned to a terminating worker.
    2. approach: detect the terminating worker before dispatching a bundle.

This PR aims to fix 1.a: minimize the number of bundles that will be assigned to a worker which will be terminating soon.

In the PR, I added a new column in the worker table is_terminating (please bear with my poor English, feel free to suggest changes in the naming). Whenever the termination signal is sent, the worker will update the database with is_terminating=True in the earliest time. Then when dispatching bundles from bundle manager, always check for this field before starting a new bundle on a worker. This way, we minimize the number of bundles that will be sent to a worker that is going to be terminated soon.

  1. Pros:
    1. avoid bundles getting stuck in starting state for 5 minutes (messed up with bundles' execution priority)
    1. the intermediate state transition between starting and staged
  2. Cons:
    1. one more database call --> performance overhead

@candicegjing candicegjing changed the title Minimize intermediate state transition for bundles Minimize intermediate state transition for bundle manager Jun 19, 2020
@nelson-liu nelson-liu merged commit 0a14c1c into master Jul 3, 2020
@nelson-liu nelson-liu deleted the termination-column branch July 3, 2020 06:11
adiprerepa pushed a commit that referenced this pull request May 27, 2021
Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>
Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>
Co-authored-by: Nelson Liu <nelson-liu@users.noreply.github.com>
Co-authored-by: Nelson Liu <nfliu@nelsonliu.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure dispatching jobs in the order that user specified
3 participants