[WIP] Co assignment groups #7141

fjetter · 2022-10-17T13:24:00Z

This is an implementation of an algorithm we discussed in an offline work session that tries to combine tasks into groups based on whether or not they should be co-located to reduce network traffic and RAM.

I called these groups "co assignment groups" or in short, cogroups. The idea is basically to lean on dask.order and use "jumps" in priority to detect branches. cc @eriknw I would be very interested if something like this can not be returned directly from one of the dask.order functions.

This is leaning on an earlier attempt for this in #7076

This implementation is still incomplete. Particularly, what's missing is

Integration with task queuing
Handling of underutilized clusters
All sorts of performance optimizations of the algorithm itself (e.g. there are many sorts still in there but nothing that could not be simply refactored)
probably a couple of other things

Raw notes from the offline workshop (I'll open another issue shortly to summarize)

fjetter · 2022-10-17T13:26:04Z

distributed/scheduler.py

+    def cogroup_objective(self, cogroup: int, ws: WorkerState) -> tuple:
+        # Cogroups are not always connected subgraphs but if we assume they
+        # were, only the top prio task would need a transfer
+        tasks_in_group = self.cogroups[cogroup]
+        # TODO: this could be made more efficient / we should remeber max if it is required
+        ts_top_prio = max(tasks_in_group, key=lambda ts: ts.priority)
+        dts: TaskState
+        comm_bytes: int = 0
+        cotasks_on_worker = 0
+        for ts in tasks_in_group:
+            if ts in ws.processing or ws in ts.who_has:
+                cotasks_on_worker += 1
+        for dts in ts_top_prio.dependencies:
+            if (
+                # This is new compared to worker_objective
+                (dts not in tasks_in_group or dts not in ws.processing)
+                and ws not in dts.who_has
+            ):
+                nbytes = dts.get_nbytes()
+                comm_bytes += nbytes
+
+        stack_time: float = ws.occupancy / ws.nthreads
+        start_time: float = stack_time + comm_bytes / self.bandwidth
+
+        if ts_top_prio.actor:
+            raise NotImplementedError("Cogroup assignment for actors not implemented")
+        else:
+            return (-cotasks_on_worker, start_time, ws.nbytes)


This is a very naive way to decide where to put the task. We could also use a similar approach to #7076 but this felt minimal invasice

fjetter · 2022-10-17T13:27:21Z

distributed/scheduler.py

+            if ts.cogroup is not None:
+                decider = self.decide_worker_cogroup
            else:
-                if not (ws := self.decide_worker_non_rootish(ts)):
-                    return {ts.key: "no-worker"}, {}, {}
+                decider = self.decide_worker_non_rootish

+            if not (ws := decider(ts)):
+                return {ts.key: "no-worker"}, {}, {}


As already stated, I haven't dealt with queuing, yet. The structure of all the decide functions felt sufficiently confusing that I didn't know where to put the new logic. Should not be too difficult but will require some thought. I mostly wanted to verify the core logic quickly

fjetter · 2022-10-17T13:27:54Z

distributed/scheduler.py

+        cogroups = coassignmnet_groups(sorted_tasks[::-1], start=start)
+        self.cogroups.update(cogroups)


TODO: Somewhere we'd need to handle cleanup of Scheduler.cogroups

fjetter · 2022-10-17T13:29:17Z

distributed/scheduler.py

+            while len(next.dependents) == 1:
+                dep = list(next.dependents)[0]
+                if len(dep.dependencies) != 1:
+                    # This algorithm has the shortcoming that groups may grow too large if the dependent of a group
+                    group_dependents_seen.add(dep)
+                    break
+                next = dep


Two things where this deviates from the original whiteboard implementation

I ended up walking linear chains after all. This may not be necessary after 2.) any more, I haven't checked.

I'm breaking early by excluding any dependents of groups. This is a but ugly but pragmatic.

fjetter · 2022-10-17T13:29:44Z

distributed/tests/test_scheduler.py

+    nthreads=[("", 1)] * 6,
+    config={"distributed.scheduler.worker-saturation": 1.0},
+)
+async def test_utilization_over_co_assignment(c, s, *workers):


I copied these over from #7076 but they are not working yet

gjoseph92 · 2022-10-17T15:11:40Z

@fjetter we had the same idea for a fun weekend project, I also put together a prototype of this on the train a couple days ago. I think you've gotten further than me, but I'll push up my branch since we did some things a little differently and it might be interesting to compare.

Overall, I got as far as discovering that it didn't do well with widely-shared dependencies, or fan-in tasks in tree reductions. I may have missed something in the implementation though. I'll show you a couple of tests for that that were failing, let's see if they work on your branch.

github-actions · 2022-10-17T15:25:18Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    14 files -         1     14 suites - 1 0s ⏱️ - 6h 17m 3s
  696 tests -   2 447   650 ✔️ -   2 357   21 💤 -   64 25 ❌ - 26
2 332 runs - 20 919 2 113 ✔️ - 20 174 173 💤 - 740 46 ❌ -   5

For more details on these failures, see this check.

Results for commit 1344f9c. ± Comparison against base commit 6002e72.

♻️ This comment has been updated with latest results.

distributed/scheduler.py

wence- · 2022-10-18T09:56:48Z

distributed/scheduler.py

@@ -8408,3 +8452,50 @@ def transition(
                self.metadata[key] = ts.metadata
                self.state[key] = finish
                self.keys.discard(key)
+
+
+def coassignmnet_groups(


Suggested change

def coassignmnet_groups(

def coassignment_groups(

wence- · 2022-10-18T09:58:33Z

distributed/scheduler.py

+            while len(next.dependents) == 1:
+                dep = list(next.dependents)[0]
+                if len(dep.dependencies) != 1:
+                    # This algorithm has the shortcoming that groups may grow too large if the dependent of a group


Incomplete comment? "If the dependent of a group ..."

So we find a task and then walk dependents (recursively) (as long as there is only one dependent) and add them to the group until we find a task that has more than a single dependency?

I fixed this already but didn't push the commit... 🤦

distributed/system_monitor.py

distributed/tests/test_coassignmnet_group.py

wence- · 2022-10-18T10:30:26Z

distributed/scheduler.py

+                    break
+                next = dep
+            max_prio = tasks.index(next) + 1
+            groups[group] = set(tasks[min_prio:max_prio])


Suggested change

groups[group] = set(tasks[min_prio:max_prio])

tasks = set(tasks[min_prio:max_prio])

for ts in tasks:

ts.cogroup = group

groups[group] = tasks

Rationale: this connection between TaskState and cogroup data structures must be maintained, best to do so at construction time, rather than having to remember that things are done later.

I chose not to do this s.t. coassignment_groups is a pure function. Much easier to test and reason about. Might be slightly worse in performance but I doubt this will be relevant

wence- · 2022-10-18T10:31:17Z

distributed/scheduler.py

+        for gr_ix, tss in self.cogroups.items():
+            for ts in tss:
+                ts.cogroup = gr_ix


If one wants to go with a plain dict for maintaining cogroups, I think it would make more sense if this invariant were maintained in coassignment_groups (see below).

I haven't decided, yet, what to use to maintain this. Maintenance of this structure is not implemented yet (e.g. we're not cleaning it up again). For now, I am using a dict for simplicity. I'm also not set on gr_ix being an integer fwiw

wence- · 2022-10-18T10:35:22Z

distributed/scheduler.py

+
+
+def coassignmnet_groups(
+    tasks: Sequence[TaskState], start: int = 0


OK, so tasks is a list of taskstates sorted in increasing priority order.

yes, I wanted to add a TODO to verify this but this is guaranteed in update_graph so for this prototype, it works

distributed/scheduler.py

gjoseph92 · 2022-10-19T09:58:44Z

@fjetter @wence- this was my implementation: https://github.com/gjoseph92/distributed/pull/6/files. Just in case it's useful for comparison.

fjetter added 3 commits October 17, 2022 11:29

Algo for cogroups

c760eed

Simple decide_worker based on cogroups

3579451

Fix comment

1344f9c

fjetter requested a review from gjoseph92 October 17, 2022 13:24

fjetter commented Oct 17, 2022

View reviewed changes

wence- reviewed Oct 18, 2022

View reviewed changes

fjetter added 3 commits October 19, 2022 10:26

Rename function

4e82242

Add vorticity test case

d451823

Remove list.index calls

d06ce1a

gjoseph92 mentioned this pull request Oct 19, 2022

Priority families gjoseph92/distributed#6

Draft

fjetter mentioned this pull request Oct 21, 2022

test_basic_sum occasionally takes 340% time and 160% memory to complete coiled/benchmarks#315

Closed

gjoseph92 mentioned this pull request Oct 25, 2022

Fix decide_worker_rootish_queuing_disabled assert #7065

Merged

2 tasks

fjetter mentioned this pull request Nov 11, 2022

Validate stateless co-assignment algorithm #7298

Closed

gjoseph92 mentioned this pull request Dec 13, 2022

Stateless co-groups algorithm based on dask.order priorities dask/dask#9755

Closed

This was referenced Oct 13, 2023

[WIP] Rewrite dask order dask/dask#10557

Closed

[RFC] Order assignment groups dask/dask#10562

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Co assignment groups #7141

[WIP] Co assignment groups #7141

fjetter commented Oct 17, 2022 •

edited

fjetter Oct 17, 2022

fjetter Oct 17, 2022

fjetter Oct 17, 2022

fjetter Oct 17, 2022

fjetter Oct 17, 2022

gjoseph92 commented Oct 17, 2022

github-actions bot commented Oct 17, 2022 •

edited

wence- Oct 18, 2022

wence- Oct 18, 2022

wence- Oct 18, 2022

fjetter Oct 18, 2022

wence- Oct 18, 2022

fjetter Nov 11, 2022

wence- Oct 18, 2022

fjetter Oct 18, 2022

wence- Oct 18, 2022

fjetter Oct 18, 2022

gjoseph92 commented Oct 19, 2022

		cogroups = coassignmnet_groups(sorted_tasks[::-1], start=start)
		self.cogroups.update(cogroups)

-            groups[group] = set(tasks[min_prio:max_prio])
+            tasks = set(tasks[min_prio:max_prio])
+            for ts in tasks:
+                ts.cogroup = group
+            groups[group] = tasks



		def coassignmnet_groups(
		tasks: Sequence[TaskState], start: int = 0

[WIP] Co assignment groups #7141

Are you sure you want to change the base?

[WIP] Co assignment groups #7141

Conversation

fjetter commented Oct 17, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoseph92 commented Oct 17, 2022

github-actions bot commented Oct 17, 2022 • edited

Unit Test Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoseph92 commented Oct 19, 2022

fjetter commented Oct 17, 2022 •

edited

github-actions bot commented Oct 17, 2022 •

edited