Minimize intermediate state transition for bundle manager #2438
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixed #2485
Copied from the above ref:
There are essentially two issues here:
starting
state for 5 minutes until it gets picked up by the bundle manager.starting
bundle gets picked up by the bundle manager, the worker goes offline, but there is still a record of this offline worker in the database.Based on the above summary, there are differences and overlap between 1 and 2:
This PR aims to fix 1.a: minimize the number of bundles that will be assigned to a worker which will be terminating soon.
In the PR, I added a new column in the worker table
is_terminating
(please bear with my poor English, feel free to suggest changes in the naming). Whenever the termination signal is sent, the worker will update the database withis_terminating=True
in the earliest time. Then when dispatching bundles from bundle manager, always check for this field before starting a new bundle on a worker. This way, we minimize the number of bundles that will be sent to a worker that is going to be terminated soon.1. avoid bundles getting stuck in
starting
state for 5 minutes (messed up with bundles' execution priority)1. the intermediate state transition between
starting
andstaged
1. one more database call --> performance overhead