Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fixes for dask.order - Remove change of tactical goal in single dep path #10505

Closed
wants to merge 2 commits into from

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Sep 12, 2023

This is waaaay too early for a PR but considering how complex this subject is I wanted to share intermediate progress asap.

This is specifically looking at the graph taken from #10384. The simplified reproducer I provided over there is part of this test and I started investigating why this is ordered poorly

test_xarray_reduction

Main ordering

test_xarray_reduction_order-age

Pressure 8

Optimal ordering

I ordered the graph manually to what I consider optimal ordering for this case. I don't necessarily strive to reproduce this but this should serve as an orientation. This is likely not unique. Note how source nodes have a homogeneous age and how terminal/end nodes are computed more quickly.

test_xarray_reduction_order-age-optimal

Pressure 6

This PR (WIP)

As a first step, I looked into why ("b", 1, 0) is scheduled way too late with priority 22. It should be scheduled immediately after ("a", 1, 0) to free up ("a", 0, 0).

It turns out that this task is scheduled poorly due to a "change of the tactical goal" that is based on the partition_keys. Removing this entirely "fixes" this specific section of the graph and allows ("b", 1, 0) to be scheduled much more quickly, reducing pressure from 8 to 6.

That's already nice but the resulting ordering is still far away from optimal and a little to heterogenous for my taste considering how symmetrical this problem is.

I haven't found out, yet, whether this "change of tactical goal" is a bad idea in general or whether the partition_key cost function is just bad. FWIW this specific code branch is untested. While it is covered by test_map_overlap and test_array_store_final_order, removing this condition entirely does not have an impact on any asserted property. In fact, at least for test_map_overlap, removing this condition also reduces pressure and generates a better graph (the other graph is more difficult to analyze)

test_xarray_reduction_order-age

cc @eriknw

Comment on lines -366 to 368
if not singles:
continue
# if not singles:
continue
process_singles = True
Copy link
Member Author

@fjetter fjetter Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling this GOTO now actually generates the optimal order

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, this also breaks some tests now. Have to investigate further

@fjetter
Copy link
Member Author

fjetter commented Sep 12, 2023

I found two smart switches that actually cause this specific graph to be ordered worse. However, the linear growth described in #10384 (comment) is still a problem.

@fjetter
Copy link
Member Author

fjetter commented Sep 25, 2023

Note: I'm just documenting observations here. There are no deep insights here and I only recommend reading this if you are really interested and want to following along

Considering that #10384 (comment) shows that dataframes scale much better, I started looking into what the difference between the two graphs is.

  • both graphs share the same keys. In fact, the array graph is a true subset of the DataFrame graph (set(mean.__dask_graph__()).issubset(set(mean.to_dask_dataframe().__dask_graph__())) is True)
  • Both graphs start ordering with the same key

After the initial pick of the key, the two orderings diverge already starting with the pick of the second key. Effectively, the first key is a random number gen that breaks into a pow and a mul task and the decision about which one to pick next is done here by relying on the partition_key that is defined as

dask/dask/order.py

Lines 221 to 233 in 3520077

partition_keys = {
key: (
(min_dependencies - total_dependencies[key] + 1)
* (total_dependents - min_heights)
)
for key, (
total_dependents,
min_dependencies,
_,
min_heights,
_,
) in metrics.items()
}

This partition_keys function is also used to "change the strategic goal" that I disabled earlier. Inspecting the two partition keys for both the dataframe and array workloads reveal an interesting property

Task Array DataFrame
pow 84 300
mul 108 299

So, not only does the dataframe choose a different path, the partition_keys are vastly different in both cases even though the graphs are sufficiently similar (I'll get to the topological differences in a moment)

Inspecting why that is, we look at the calculated graph metrics

pow Array DataFrame
total_dependents 5 8
min_dependencies 85 301
max_dependencies 85 301
min_height 4 7
total_dependencies 2 2
mul Array DataFrame
total_dependents 5 8
min_dependencies 110 301
max_dependencies 110 301
min_height 4 7
max_height 4 7
total_dependencies 3 3

So, all numbers are very different. The partition_key is a function of most of those metrics partition_key(min_dependencies, total_dependents, min_heights, total_dependencies)

In a very, very, very simplified version of this graph, we can see that the mul tasks require an additional dependency while the pow tasks could proceed further immediately

image

Now, in terms of the overall differences between the graph, the two versions (very small) below

DataFrame

dataframe-small

Array

array-small

The most obvious difference between arrays and DataFrames is that the DataFrame has a single final task as output key while the array graph has three (the three sub-arrays of the xarray datasets). This may be relevant since dask.order is actually implementing a special case for this scenario

dask/dask/order.py

Lines 128 to 135 in 3520077

# Single root nodes that depend on everything. These cause issues for
# the current ordering algorithm, since we often hit the root node
# and fell back to the key tie-breaker to choose which immediate dependency
# to finish next, rather than finishing off subtrees.
# So under the special case of a single root node that depends on the entire
# tree, we skip processing it normally.
# See https://github.com/dask/dask/issues/6745
root_nodes = {k for k, v in dependents.items() if not v}
and likely also explains the significant differences in the metrics, particularly min_dependencies

to be continued...

@fjetter
Copy link
Member Author

fjetter commented Sep 25, 2023

I did a little time traveling and measured pressure for dataframe / array

Commit PR Dask Version DataFrame Array
9983c19 #7929 current 51 511
439c4ab #6779 2020.12.0 1011 1011
816cfa7 #5872 2.17.0 1011 1011
c63dc14 #5584 2.9.2 514 513
6741495 #3588 0.18.2 348 205
59d8c9c #3652 0.18.2.dev 348 348
f4a7531 #3303 0.17.2 135 135
f577913 #3298 0.17.2.dev 1011 1011
11a50f0 #3271 0.17.2.dev 1011 1011

The good news is that we apparently are not at the worst case (which seems to be 1011 for this example) and this is no clear regression. However, it looks like the version of f4a7531 was doing pretty well for both cases (and it was symmetrical)

FWIW these commits go back to version 0.17.something and represent all notable changes to dask.order since

@mrocklin
Copy link
Member

I'd be curious what this would look like applied to the A/B tests. Presumably recent fixes were for some reason. It seems like we weren't measuring well though, except with unit tests. Applying these to a broader set of workflows would be interesting.

@fjetter
Copy link
Member Author

fjetter commented Sep 26, 2023

I'd be curious what this would look like applied to the A/B tests.

Running. This will take a moment

@fjetter
Copy link
Member Author

fjetter commented Sep 26, 2023

Just looking over the version of 0.17.2 though, I doubt it is sufficient. There are many relatively obvious failure cases it does not handle. However, it is actually more robust for the cases I'm currently looking at (even fixes #9995)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants