[WIP] Fixes for dask.order - Remove change of tactical goal in single dep path #10505

fjetter · 2023-09-12T10:35:45Z

This is waaaay too early for a PR but considering how complex this subject is I wanted to share intermediate progress asap.

This is specifically looking at the graph taken from #10384. The simplified reproducer I provided over there is part of this test and I started investigating why this is ordered poorly

Main ordering

Pressure 8

Optimal ordering

I ordered the graph manually to what I consider optimal ordering for this case. I don't necessarily strive to reproduce this but this should serve as an orientation. This is likely not unique. Note how source nodes have a homogeneous age and how terminal/end nodes are computed more quickly.

Pressure 6

This PR (WIP)

As a first step, I looked into why ("b", 1, 0) is scheduled way too late with priority 22. It should be scheduled immediately after ("a", 1, 0) to free up ("a", 0, 0).

It turns out that this task is scheduled poorly due to a "change of the tactical goal" that is based on the partition_keys. Removing this entirely "fixes" this specific section of the graph and allows ("b", 1, 0) to be scheduled much more quickly, reducing pressure from 8 to 6.

That's already nice but the resulting ordering is still far away from optimal and a little to heterogenous for my taste considering how symmetrical this problem is.

I haven't found out, yet, whether this "change of tactical goal" is a bad idea in general or whether the partition_key cost function is just bad. FWIW this specific code branch is untested. While it is covered by test_map_overlap and test_array_store_final_order, removing this condition entirely does not have an impact on any asserted property. In fact, at least for test_map_overlap, removing this condition also reduces pressure and generates a better graph (the other graph is more difficult to analyze)

cc @eriknw

fjetter · 2023-09-12T16:48:19Z

dask/order.py

-                    if not singles:
-                        continue
+                    # if not singles:
+                    continue
                    process_singles = True


Disabling this GOTO now actually generates the optimal order

however, this also breaks some tests now. Have to investigate further

fjetter · 2023-09-12T16:51:24Z

I found two smart switches that actually cause this specific graph to be ordered worse. However, the linear growth described in #10384 (comment) is still a problem.

fjetter · 2023-09-25T16:12:12Z

Note: I'm just documenting observations here. There are no deep insights here and I only recommend reading this if you are really interested and want to following along

Considering that #10384 (comment) shows that dataframes scale much better, I started looking into what the difference between the two graphs is.

both graphs share the same keys. In fact, the array graph is a true subset of the DataFrame graph (set(mean.__dask_graph__()).issubset(set(mean.to_dask_dataframe().__dask_graph__())) is True)
Both graphs start ordering with the same key

After the initial pick of the key, the two orderings diverge already starting with the pick of the second key. Effectively, the first key is a random number gen that breaks into a pow and a mul task and the decision about which one to pick next is done here by relying on the partition_key that is defined as

dask/dask/order.py

Lines 221 to 233 in 3520077

    
           partition_keys = { 
        
               key: ( 
        
                   (min_dependencies - total_dependencies[key] + 1) 
        
                   * (total_dependents - min_heights) 
        
               ) 
        
               for key, ( 
        
                   total_dependents, 
        
                   min_dependencies, 
        
                   _, 
        
                   min_heights, 
        
                   _, 
        
               ) in metrics.items() 
        
           }

This partition_keys function is also used to "change the strategic goal" that I disabled earlier. Inspecting the two partition keys for both the dataframe and array workloads reveal an interesting property

Task	Array	DataFrame
`pow`	84	300
`mul`	108	299

So, not only does the dataframe choose a different path, the partition_keys are vastly different in both cases even though the graphs are sufficiently similar (I'll get to the topological differences in a moment)

Inspecting why that is, we look at the calculated graph metrics

`pow`	Array	DataFrame
`total_dependents`	5	8
`min_dependencies`	85	301
`max_dependencies`	85	301
`min_height`	4	7
`total_dependencies`	2	2

`mul`	Array	DataFrame
`total_dependents`	5	8
`min_dependencies`	110	301
`max_dependencies`	110	301
`min_height`	4	7
`max_height`	4	7
`total_dependencies`	3	3

So, all numbers are very different. The partition_key is a function of most of those metrics partition_key(min_dependencies, total_dependents, min_heights, total_dependencies)

In a very, very, very simplified version of this graph, we can see that the mul tasks require an additional dependency while the pow tasks could proceed further immediately

Now, in terms of the overall differences between the graph, the two versions (very small) below

DataFrame

Array

The most obvious difference between arrays and DataFrames is that the DataFrame has a single final task as output key while the array graph has three (the three sub-arrays of the xarray datasets). This may be relevant since dask.order is actually implementing a special case for this scenario

dask/dask/order.py

Lines 128 to 135 in 3520077

    
           # Single root nodes that depend on everything. These cause issues for 
        
           # the current ordering algorithm, since we often hit the root node 
        
           # and fell back to the key tie-breaker to choose which immediate dependency 
        
           # to finish next, rather than finishing off subtrees. 
        
           # So under the special case of a single root node that depends on the entire 
        
           # tree, we skip processing it normally. 
        
           # See https://github.com/dask/dask/issues/6745 
        
           root_nodes = {k for k, v in dependents.items() if not v}

and likely also explains the significant differences in the metrics, particularly min_dependencies

to be continued...

fjetter · 2023-09-25T16:46:42Z

I did a little time traveling and measured pressure for dataframe / array

Commit	PR	Dask Version	DataFrame	Array
`9983c19`	#7929	current	51	511
`439c4ab`	#6779	2020.12.0	1011	1011
`816cfa7`	#5872	2.17.0	1011	1011
`c63dc14`	#5584	2.9.2	514	513
`6741495`	#3588	0.18.2	348	205
`59d8c9c`	#3652	0.18.2.dev	348	348
`f4a7531`	#3303	0.17.2	135	135
`f577913`	#3298	0.17.2.dev	1011	1011
`11a50f0`	#3271	0.17.2.dev	1011	1011

The good news is that we apparently are not at the worst case (which seems to be 1011 for this example) and this is no clear regression. However, it looks like the version of f4a7531 was doing pretty well for both cases (and it was symmetrical)

FWIW these commits go back to version 0.17.something and represent all notable changes to dask.order since

mrocklin · 2023-09-25T16:55:47Z

I'd be curious what this would look like applied to the A/B tests. Presumably recent fixes were for some reason. It seems like we weren't measuring well though, except with unit tests. Applying these to a broader set of workflows would be interesting.

fjetter · 2023-09-26T12:49:36Z

I'd be curious what this would look like applied to the A/B tests.

Running. This will take a moment

fjetter · 2023-09-26T12:50:33Z

Just looking over the version of 0.17.2 though, I doubt it is sufficient. There are many relatively obvious failure cases it does not handle. However, it is actually more robust for the cases I'm currently looking at (even fixes #9995)

remove change of tactical goal in single dep path

64bedd4

fjetter mentioned this pull request Sep 12, 2023

Dask.order causing high memory pressure for multi array compute calls (commonly used in xarray) #10384

Closed

Disable jump to singles

8b41c82

fjetter commented Sep 12, 2023

View reviewed changes

fjetter mentioned this pull request Sep 26, 2023

[dask.order] Reduce memory pressure for multi array reductions by releasing splitter tasks more eagerly #10535

Merged

fjetter closed this Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fixes for dask.order - Remove change of tactical goal in single dep path #10505

[WIP] Fixes for dask.order - Remove change of tactical goal in single dep path #10505

fjetter commented Sep 12, 2023

fjetter Sep 12, 2023 •

edited

fjetter Sep 12, 2023

fjetter commented Sep 12, 2023

fjetter commented Sep 25, 2023

fjetter commented Sep 25, 2023 •

edited

mrocklin commented Sep 25, 2023

fjetter commented Sep 26, 2023

fjetter commented Sep 26, 2023

[WIP] Fixes for dask.order - Remove change of tactical goal in single dep path #10505

[WIP] Fixes for dask.order - Remove change of tactical goal in single dep path #10505

Conversation

fjetter commented Sep 12, 2023

Main ordering

Optimal ordering

This PR (WIP)

fjetter Sep 12, 2023 • edited

Choose a reason for hiding this comment

fjetter Sep 12, 2023

Choose a reason for hiding this comment

fjetter commented Sep 12, 2023

fjetter commented Sep 25, 2023

DataFrame

Array

fjetter commented Sep 25, 2023 • edited

mrocklin commented Sep 25, 2023

fjetter commented Sep 26, 2023

fjetter commented Sep 26, 2023

fjetter Sep 12, 2023 •

edited

fjetter commented Sep 25, 2023 •

edited