[Dask.order] Ignore data tasks when ordering by fjetter · Pull Request #10706 · dask/dask

fjetter · 2023-12-14T16:59:56Z

This supersedes #10619

This may be a little controversial... However, there are frequently topologies (particularly in the array space) that have a dummy task at the bottom of the graph that includes some metadata (e.g. for zarr). In the xarray world, those are frequently embedded numpy arrays.

I believe we should special case such tasks since they can throw off otherwise fine heuristics.

They never have dependencies so we can schedule them whenever we want
There is no point in running their dependents more quickly than others trying to release them. We cannot release their data since the data is embedded in the graph/run_spec (even a released task currently holds on to their run_spec)
The data itself is typically very small, otherwise it would not be feasible to embed it into a graph

So, why is this controversial

With this, ordering would be different for say da.from_numpy(np.zeros(100), chunks=20) and da.zeros(100, chunk=20) since the first would literally embed the numpy array into the dask graph while the latter generates the data whenever needed. I'm not sure if this is such a bad thing. It may just be a little surprising but I don't think this will have negative effects. This can be seen in one of our tests where I changed to a dask.array.random

fjetter · 2023-12-14T17:02:16Z

I'm not aware of any cases that perform better or worse with this but I am reasonably certain that this graph transformation is a good thing, see the argument above. The only place I am aware of where this is used is really the array world where zarr et al have this dummy task.

This should put an end to the conversation about whether or not they should inline those tasks, at least for the dask/dask part. There is still the distributed scheduler that could mess this up, of course.

fjetter · 2023-12-14T17:03:56Z

Generally speaking, I could see us add various equivalency transformations to a graph that we know won't impact performance but will simplify the topology. This is now the second one I added, after #10697 which is basically an equivalent to this PR but for leaf nodes.
I am not aware of any such transformations at the moment but if more problems around ordering pop up, this might be an angle to investigate

dcherian · 2023-12-14T17:06:39Z

This should put an end to the conversation about whether or not they should inline those tasks,

Woot! Woot!

hendrikmakait

Thanks, @fjetter!

dask/order.py

Co-authored-by: Hendrik Makait <hendrik@makait.com>

fjetter added 2 commits December 14, 2023 17:55

ignore data tasks in roots

40b41fb

add assert_topological_sort

6145ee5

fjetter mentioned this pull request Dec 14, 2023

[Dask.order] Ignore data tasks when ordering #10619

Closed

fjetter requested a review from hendrikmakait December 14, 2023 17:00

hendrikmakait approved these changes Dec 18, 2023

View reviewed changes

dask/order.py Outdated Show resolved Hide resolved

Apply suggestions from code review

b24e624

Co-authored-by: Hendrik Makait <hendrik@makait.com>

hendrikmakait merged commit 131f89d into dask:main Dec 19, 2023

milesgranger pushed a commit to milesgranger/dask that referenced this pull request Jan 3, 2024

[Dask.order] Ignore data tasks when ordering (dask#10706)

49e1b70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dask.order] Ignore data tasks when ordering#10706

[Dask.order] Ignore data tasks when ordering#10706
hendrikmakait merged 3 commits intodask:mainfrom
fjetter:ignore_data_root_tasks

fjetter commented Dec 14, 2023

Uh oh!

fjetter commented Dec 14, 2023

Uh oh!

fjetter commented Dec 14, 2023

Uh oh!

dcherian commented Dec 14, 2023

Uh oh!

hendrikmakait left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

fjetter commented Dec 14, 2023

Uh oh!

fjetter commented Dec 14, 2023

Uh oh!

fjetter commented Dec 14, 2023

Uh oh!

dcherian commented Dec 14, 2023

Uh oh!

hendrikmakait left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants