Revert idle classification when worker-saturation is set #7278

fjetter · 2022-11-09T13:51:49Z

This restores the behavior of the idle set to the behavior before #6614 when worker-saturation is set.

All code paths in the new queuing path will use a different idleness set based on a different definition of idle.

github-actions · 2022-11-09T15:24:45Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files +      12       15 suites +12 6h 26m 31s ⏱️ + 5h 32m 51s
  3 175 tests +  1 907   3 089 ✔️ +  1 857   83 💤 +  47 3 ❌ +3
23 492 runs +19 691 22 585 ✔️ +18 892 903 💤 +795 4 ❌ +4

For more details on these failures, see this check.

Results for commit 7215427. ± Comparison against base commit 88515db.

♻️ This comment has been updated with latest results.

gjoseph92

We should update test_queued_paused_new_worker and test_queued_paused_unpaused in test_scheduler.py, and any other tests which make explicit assertions about Scheduler.idle.

distributed/scheduler.py

gjoseph92 · 2022-11-09T15:58:59Z

distributed/scheduler.py

+        saturated.discard(ws)
+        if self.is_unoccupied(ws, occ, p):
            if ws.status == Status.running:
                idle[ws.address] = ws


Suggested change

saturated.discard(ws)

if self.is_unoccupied(ws, occ, p):

if ws.status == Status.running:

idle[ws.address] = ws

if self.is_unoccupied(ws, occ, p):

if ws.status == Status.running:

idle[ws.address] = ws

saturated.discard(ws)

This is more consistent with previous behavior. Notice that before, if a worker was occupied, but not saturated, it wouldn't be removed from the saturated set. This is probably not intentional or correct, but we're trying to match previous behavior here.

saturated.discard was always called unless the worker was truly classified as such, see

distributed/distributed/scheduler.py

Lines 2602 to 2616 in 0dd00be

idle = self.idle

saturated = self.saturated

if p < nc or occ < nc * avg / 2:

idle[ws.address] = ws

saturated.discard(ws)

else:

idle.pop(ws.address, None)

if p > nc:

pending: float = occ * (p - nc) / (p * nc)

if 0.4 < pending > 1.9 * avg:

saturated.add(ws)

return

saturated.discard(ws)

so my behavior is consistent with what it was before

Other than dashboard visuals, saturated is only used in stealing to avoid sorting over all workers, https://github.com/fjetter/distributed/blob/a5d686572e3289e9d7ce71c063205cc35d4a06c2/distributed/stealing.py#L422-L431 and I'm not too concerned about this since stealing is a bit erratic either way

distributed/scheduler.py

gjoseph92 · 2022-11-09T16:36:07Z

distributed/scheduler.py

-            else not _worker_full(ws, self.WORKER_SATURATION)
-        ):
+        saturated.discard(ws)
+        if self.is_unoccupied(ws, occ, p):


I'm quite concerned that we're now calling is_unoccupied every time, even when queuing is enabled. This significantly slows down the scheduler: #7256. The urgency of fixing that was diminished by queuing being on by default and getting to skip that slow code path.

I'm not sure that a known and large scheduler performance degradation is worth avoiding hypothetical small changes to work-stealing behavior due to the changed definition of idle when queuing is on.

If we can fix #7256 before a release, then I'm happy with this change, otherwise I'd be concerned by this tradeoff.

After running some benchmarks, it looks like occupancy might not have as much of an effect on end-to-end runtime as I'd expected: #7256 (comment). So I'm happy with this if we want to go with it.

For ~~performance reasons~~ and practicality though, I'd like to consider #7280 as another solution to #7085.

Edit: that uses occupancy too, so there's a similar performance cost. I think doing both PRs would be a good idea.

fjetter · 2022-11-10T13:42:55Z

The performance regression around occupancy was already introduced in release 2022.10.0. I'm looking into this right now but will not hold off on merging this PR because of that.

fjetter added 2 commits November 9, 2022 14:45

Revert idle classification when worker-saturation is set

7d55c8d

Fix paused workers

b199066

fjetter mentioned this pull request Nov 9, 2022

Queue by default #7279

Merged

gjoseph92 reviewed Nov 9, 2022

View reviewed changes

gjoseph92 mentioned this pull request Nov 10, 2022

Scheduler.total_occupancy is significant runtime cost #7256

Closed

gjoseph92 added a commit to gjoseph92/snakebench that referenced this pull request Nov 10, 2022

use dask/distributed#7278

7cf5322

Review comments

7215427

fjetter merged commit 27a91dd into dask:main Nov 10, 2022

fjetter deleted the revert_idle_classification_rootish_tasks branch November 10, 2022 16:31

This was referenced Nov 10, 2022

Round-robin worker selection makes poor choices with worker-saturation > 1.0 #7197

Open

Consistent worker selection for no-deps cases #7280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert idle classification when worker-saturation is set #7278

Revert idle classification when worker-saturation is set #7278

fjetter commented Nov 9, 2022

github-actions bot commented Nov 9, 2022 •

edited

gjoseph92 left a comment

gjoseph92 Nov 9, 2022

fjetter Nov 10, 2022

gjoseph92 Nov 9, 2022

gjoseph92 Nov 10, 2022 •

edited

fjetter commented Nov 10, 2022

	idle = self.idle
	saturated = self.saturated
	if p < nc or occ < nc * avg / 2:
	idle[ws.address] = ws
	saturated.discard(ws)
	else:
	idle.pop(ws.address, None)

	if p > nc:
	pending: float = occ * (p - nc) / (p * nc)
	if 0.4 < pending > 1.9 * avg:
	saturated.add(ws)
	return

	saturated.discard(ws)

Revert idle classification when worker-saturation is set #7278

Revert idle classification when worker-saturation is set #7278

Conversation

fjetter commented Nov 9, 2022

github-actions bot commented Nov 9, 2022 • edited

Unit Test Results

gjoseph92 left a comment

Choose a reason for hiding this comment

gjoseph92 Nov 9, 2022

Choose a reason for hiding this comment

fjetter Nov 10, 2022

Choose a reason for hiding this comment

gjoseph92 Nov 9, 2022

Choose a reason for hiding this comment

gjoseph92 Nov 10, 2022 • edited

Choose a reason for hiding this comment

fjetter commented Nov 10, 2022

github-actions bot commented Nov 9, 2022 •

edited

gjoseph92 Nov 10, 2022 •

edited