Use cases for work stealing #6600

fjetter · 2022-06-20T14:34:55Z

Work stealing is a fairly complex machinery intended to redistribute tasks on a cluster to achieve a homogeneous occupancy, i.e. all workers will be busy for approximately the same time.

I'm currently aware of two use cases for it

A) Adaptive scaling or generally upscaling scenarios, i.e. adding more workers to a cluster require some kind of load balancing. Without work stealing, a newly added worker would sit idle until the scheduler might assign a task which is not even guaranteed or might work very poorly (e.g. #4471)
By having work stealing enabled, we are automatically ensuring that any newly added worker is able to start working since it gets tasks assigned via the stealing mechanism. However, this is known to not work well (list not exhaustive)

B) Another application would be a workload with vastly different runtimes in a TaskGroup. This is particularly concerning if there are few tasks in this task group or the runtime distribution is asymmetrical such that even after running many tasks the runtime differences would not cancel themselves and we'd have few workers with very large queues, effectively extending overall runtime by having a large tail in the computation.

I am not entirely sure if this usecase is actually very relevant and would appreciate some additional information around it. If this is indeed relevant we may benefit from an improved runtime tracking, e.g. with error measurement (e.g. #4028) in combination with a simpler, more selective algorithm.

The current work stealing algorithm has a couple of issues. Currently open issues can be filtered by the label stealing

Stealing also is known to be a trigger for deadlocks (at least four have been reported and fixed by now) since it requires a handshake that can cause timing issues (see e.g. https://github.com/dask/distributed/pulls?q=is%3Apr+is%3Aclosed+stealing+label%3Astealing+label%3Adeadlock)

There are even cases where work stealing is known to cause harm by reverting smart scheduler decisions, e.g. #6573

I'm currently trying to estimate whether we should pursue work stealing and try to make it robust or abandon this extension in favor of a less general but more robust solution for A and possibly B.

Thoughts?

cc @mrocklin @crusaderky @gjoseph92

fjetter · 2022-06-20T14:38:24Z

There is currently a proposal open to introduce a third use case

C) Redistribute tasks that are currently assigned to paused workers (see #3761)

gjoseph92 · 2022-06-21T19:42:19Z

In practice, I feel like work stealing mostly exists to rebalance the 10s or 100s of thousands of root tasks in processing, especially when a new worker arrives.

This is a pretty different use case than moving a handful of downstream tasks to a new worker to increase parallelism.

I think if these root vs downstream cases were separate, we could have better algorithms for both.

Holding back root tasks on the scheduler (and implementing an alternative load-balancing approach for those held-back tasks) would do that. I think that with #6560, we might be okay without work stealing at all. I think we'd still want it eventually for some scenarios, but the immediate need for it would be much less.

mrocklin · 2022-07-01T14:48:30Z

Sorry for missing this issue (I stopped tracking Dask issues a couple weeks ago while on vacation)

Some thoughts:

(for A) Upscaling is pretty common. We'll need to have a solution for it
(for B) It's not just "vastly different runtimes within a TaskGroup". When you get to the end of a long computation there are stragglers. The unbalance here can be non-trivial.
However, this becomes less of an issue if we're allocating tasks non-eagerly as @gjoseph92 suggests. The value of work-stealing goes down significantly if we don't have a bunch of stuff on the workers
The deadlocks will likely still be around. Tasks will get moved around for other reasons I think. I could be wrong here though.

Anyway, I agree with @gjoseph92 that if we hold back tasks on the scheduler then work stealing becomes more optional than it is today. I'd be curious about up-scaling, but it's probably not a huge issue.

fjetter added performance discussion Discussing a topic with no specific actions yet stability Issue or feature related to cluster stability (e.g. deadlock) scheduling stealing labels Jun 20, 2022

fjetter mentioned this issue Jun 21, 2022

Ensure steal requests from same-IP but distinct workers are rejected #6585

Merged

gjoseph92 mentioned this issue Jun 21, 2022

[WIP] Queue root tasks on scheduler, not workers [with co-assignment] #6598

Draft

2 tasks

gjoseph92 mentioned this issue Jun 24, 2022

Design and prototype for root-ish task deprioritization by withholding tasks on the scheduler #6560

Closed

fjetter mentioned this issue Aug 19, 2022

Withhold root tasks [no co assignment] #6614

Merged

2 tasks

fjetter added the adaptive All things relating to adaptive scaling label Aug 26, 2022

fjetter mentioned this issue Sep 2, 2022

Timeboxed push for simplifying work stealing #6993

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cases for work stealing #6600

Use cases for work stealing #6600

fjetter commented Jun 20, 2022

fjetter commented Jun 20, 2022

gjoseph92 commented Jun 21, 2022

mrocklin commented Jul 1, 2022

Use cases for work stealing #6600

Use cases for work stealing #6600

Comments

fjetter commented Jun 20, 2022

fjetter commented Jun 20, 2022

gjoseph92 commented Jun 21, 2022

mrocklin commented Jul 1, 2022