Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round-robin empty workers with queueing enabled #7222

Closed
wants to merge 2 commits into from

Conversation

gjoseph92
Copy link
Collaborator

@gjoseph92 gjoseph92 commented Oct 28, 2022

With queuing enabled, when multiple workers have 0 tasks, rather than always picking the first of the empty workers each time, this implements round-robin logic. This ensures that when you're occasionally submitting a few tasks to an idle cluster, the same workers aren't used each time.

See #4637, #4638 for equivalent behavior with queuing disabled.

Requires #7221.

Closes #7197.

  • Tests added / passed
  • Passes pre-commit run --all-files

min_v: float | None = None
for cws in dict.values(self.idle):
# ^ micro-optimization: `SortedDict` inherits from plain `dict`; iterating
# in non-sorted order is 10x faster and order doesn't matter here.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 10x number comes from here #4925 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if we do #7221 and/or #6974, then self.idle doesn't need to be a SortedDict anymore, so we'd then get rid of this micro-optimization. The only place using the sortedness of self.idle is the current round-robin logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #7245

@github-actions
Copy link
Contributor

github-actions bot commented Oct 28, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   6h 14m 14s ⏱️ - 14m 35s
  3 168 tests ±0    3 082 ✔️  - 1    84 💤 +1  2 ±0 
23 440 runs  ±0  22 537 ✔️ +1  901 💤 +1  2  - 2 

For more details on these failures, see this check.

Results for commit c578cb2. ± Comparison against base commit 02b9430.

♻️ This comment has been updated with latest results.

@gjoseph92
Copy link
Collaborator Author

Another possible approach:

@gjoseph92
Copy link
Collaborator Author

I'm leaning towards #7248. I think that's more sensible. This special case isn't very important, and we're going to remove the test for it anyway: #7221 (comment)

@gjoseph92 gjoseph92 closed this Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Round-robin worker selection makes poor choices with worker-saturation > 1.0
1 participant