[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. `worker_client`) #7481

fjetter · 2023-01-17T14:28:09Z

Having tasks flagged as long running protects the worker from being classified as idle. However, when scaling adaptively and Adapt recommends fewer workers than there are idle, this can cause workers with seceded tasks to be downscaled even though there are better options.

This suggests to only close workers with long running tasks if there are no other possible options.

Considering that these workers may be driving a computation we may even need to go as far as to prohibit downscaling of those tasks entirely

github-actions · 2023-01-17T15:30:54Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      27 files ±  0       27 suites ±0 11h 46m 14s ⏱️ + 13m 13s
  3 942 tests +  2   3 832 ✔️ +  2   108 💤 ±0 2 ❌ ±0
49 584 runs +26 47 312 ✔️ +25 2 270 💤 +1 2 ❌ ±0

For more details on these failures, see this check.

Results for commit 315b337. ± Comparison against base commit 415d4fa.

♻️ This comment has been updated with latest results.

mrocklin · 2023-01-17T16:54:00Z

Considering that these workers may be driving a computation we may even need to go as far as to prohibit downscaling of those tasks entirely

I think that if there is a running task, long-running or otherwise, that we should not choose to remove that worker when using adaptive scaling, even if it is mostly-idle.

fjetter · 2023-01-23T12:49:26Z

I think that if there is a running task, long-running or otherwise, that we should not choose to remove that worker when using adaptive scaling, even if it is mostly-idle.

I think I agree given the current implementation. There is a possibility to allow for workers to naturally conclude their work by not allowing further assignments of tasks. This would be a similar mechanism as suggested in #3761 (comment) (basically setting a worker to pause and steal the remaining tasks)
For now, I think I don't want to go down that path but merely want to show options if a stricter behavior causes problems.

fjetter · 2023-12-14T11:23:15Z

distributed/scheduler.py

+        valid_workers = [ws for ws in self.workers.values() if not ws.long_running]
+        groups = groupby(key, valid_workers)


this now prohibits downscaling of workers with long-running tasks, e.g. tasks-within-tasks

crusaderky

Mostly nits

distributed/scheduler.py

distributed/tests/test_scheduler.py

fjetter force-pushed the deprioritize_long_running_from_adaptive branch from 1b101df to d381e34 Compare December 14, 2023 11:20

fjetter commented Dec 14, 2023

View reviewed changes

fjetter self-assigned this Dec 14, 2023

fjetter added 3 commits December 14, 2023 18:06

Deprioritize long runnign tasks when adapting

9e1f35d

Never downscale workers with long running tasks

317a967

revert sort key

a1ae1a7

fjetter force-pushed the deprioritize_long_running_from_adaptive branch from 0f5b0da to a1ae1a7 Compare December 14, 2023 17:06

fjetter changed the title ~~Deprioritize long running tasks when adapting~~ Do not allow workers to downscale if they are running long-running tasks (worker-client) Dec 14, 2023

fjetter changed the title ~~Do not allow workers to downscale if they are running long-running tasks (worker-client)~~ Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client) Dec 14, 2023

fjetter changed the title ~~Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client)~~ [Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client) Dec 14, 2023

fjetter added the adaptive All things relating to adaptive scaling label Dec 14, 2023

crusaderky reviewed Dec 14, 2023

View reviewed changes

crusaderky added 6 commits December 18, 2023 10:29

Update distributed/scheduler.py

f791944

Update distributed/tests/test_scheduler.py

c3d5d87

Update distributed/tests/test_scheduler.py

d43514e

Update distributed/tests/test_scheduler.py

3f0b322

Merge branch 'main' into deprioritize_long_running_from_adaptive

4b9a081

polish

315b337

crusaderky approved these changes Dec 18, 2023

View reviewed changes

crusaderky merged commit 53e95ec into dask:main Dec 18, 2023
30 of 34 checks passed

fjetter deleted the deprioritize_long_running_from_adaptive branch January 12, 2024 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. `worker_client`) #7481

[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. `worker_client`) #7481

fjetter commented Jan 17, 2023

github-actions bot commented Jan 17, 2023 •

edited

mrocklin commented Jan 17, 2023

fjetter commented Jan 23, 2023

fjetter Dec 14, 2023

crusaderky left a comment

		valid_workers = [ws for ws in self.workers.values() if not ws.long_running]
		groups = groupby(key, valid_workers)

[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client) #7481

[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client) #7481

Conversation

fjetter commented Jan 17, 2023

github-actions bot commented Jan 17, 2023 • edited

Unit Test Results

mrocklin commented Jan 17, 2023

fjetter commented Jan 23, 2023

fjetter Dec 14, 2023

Choose a reason for hiding this comment

crusaderky left a comment

Choose a reason for hiding this comment

[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. `worker_client`) #7481

[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. `worker_client`) #7481

github-actions bot commented Jan 17, 2023 •

edited