Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure process_runnables is not too eager in the presence of multiple splits #11367

Merged
merged 2 commits into from
Sep 3, 2024

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Sep 3, 2024

Well, this isn't fixing anything visible yet but it fixes an internal branch of the algorithm that's been acting too eagerly. I'm still working on a test.

This relates to #11363

CPath before

image

After

image

So, the initial version would start executing along the critical path until it hits the first wall. After this, it would try to process runnable tasks as good as possible. Since the root task is shared by all branches, this allows all computation branches to be considered as candidates (this is the essential dilemma of the widely shared dependencies).
Following these candidates effectively causes a depth first search. However, if we do not process them at all, this typically leaves many intermediates in memory we could otherwise release. Therefore, to allow some progress, these runnable paths are allowed to execute if they are reducing to a task, i.e. we're speculatively walking down those branches and if those speculatively executed branches reduce into a task, they are allowed to execute for real.

What happens here is that those executable branches encounter multiple splits (we're following splits as well since they often allow us to reduce stuff). After the second split we're actually finding an intermediate reducer... This is where the algorithm on main decided to go with it. However, this leaves us still with one more task in memory than if we didn't execute that branch which is a suboptimal decision. Only if all those splits reduce, we should execute the branch.

In the case of this specific example this doesn't change anything but I believe this makes the algorithm more predictable.

Simplified graph
dsk = {
        ("transpose-9c80c6b26d7ef5cfec7274ee9e6b091e", 0, 0, 0): (
            f,
            ("vindex-merge", 0, 0, 0),
        ),
        ("transpose-9c80c6b26d7ef5cfec7274ee9e6b091e", 0, 1, 0): (
            f,
            ("vindex-merge", 0, 1, 0),
        ),
        ("vindex-merge", 0, 0, 0): (
            f,
            ("vindex-slice", 0, 0, 0),
            ("vindex-slice", 1, 0, 0),
            ("vindex-slice", 2, 0, 0),
            ("vindex-slice", 3, 0, 0),
        ),
        ("vindex-slice", 2, 0, 0): (f, ("getitem-vindex", 2, 0, 0)),
        ("vindex-merge", 0, 1, 0): (
            f,
            ("vindex-slice", 0, 0, 1),
            ("vindex-slice", 1, 0, 1),
            ("vindex-slice", 2, 0, 1),
            ("vindex-slice", 3, 0, 1),
        ),
        ("vindex-slice", 2, 0, 1): (f, ("getitem-vindex", 2, 0, 1)),
        ("vindex-slice", 1, 0, 0): (f, ("getitem-vindex", 1, 0, 0)),
        ("getitem-vindex", 2, 0, 1): (
            f,
            ("shuffle-split", 321),
            ("shuffle-split", 322),
        ),
        ("vindex-slice", 3, 0, 1): (f, ("getitem-vindex", 3, 0, 1)),
        ("vindex-slice", 0, 0, 1): (f, ("getitem-vindex", 0, 0, 1)),
        ("getitem-vindex", 3, 0, 1): (
            f,
            ("shuffle-split", 299),
            ("shuffle-split", 300),
        ),
        ("getitem-vindex", 2, 0, 0): (f, ("concatenate", 0, 1, 9, 1)),
        ("vindex-slice", 3, 0, 0): (f, ("getitem-vindex", 3, 0, 0)),
        ("shuffle-split", 322): (f, ("concatenate", 0, 1, 9, 1)),
        ("vindex-slice", 0, 0, 0): (f, ("getitem-vindex", 0, 0, 0)),
        ("getitem-vindex", 0, 0, 1): (
            f,
            ("shuffle-split", 341),
            ("shuffle-split", 342),
        ),
        ("vindex-slice", 1, 0, 1): (f, ("getitem-vindex", 1, 0, 1)),
        ("shuffle-split", 321): (f, ("concatenate-getitem", 321)),
        ("shuffle-split", 299): (f, ("concatenate-getitem", 299)),
        ("getitem-vindex", 3, 0, 0): (f, ("concatenate", 0, 1, 8, 1)),
        ("getitem-vindex", 0, 0, 0): (f, ("concatenate", 0, 1, 10, 0)),
        ("getitem-vindex", 1, 0, 0): (f, ("concatenate", 0, 1, 9, 0)),
        ("concatenate-getitem", 299): (f, ("fetch-raster", 0, 0, 8, 1)),
        ("concatenate", 0, 1, 9, 1): (f, ("getitem-sub", 0, 1, 9, 1)),
        ("concatenate-getitem", 321): (f, ("fetch-raster", 0, 0, 9, 1)),
        ("fetch-raster", 0, 0, 8, 1): (f, ("asset_table", 0, 0)),
        ("concatenate", 0, 1, 8, 1): (f, ("getitem-sub", 0, 1, 8, 1)),
        ("fetch-raster", 0, 0, 9, 1): (f, ("asset_table", 0, 0)),
        ("concatenate", 0, 1, 9, 0): (f, ("getitem-sub", 0, 1, 9, 0)),
        ("shuffle-split", 300): (f, ("concatenate", 0, 1, 8, 1)),
        ("concatenate", 0, 1, 10, 0): (f, ("getitem-sub", 0, 1, 10, 0)),
        ("asset_table", 0, 0): (f, ("asset-table-data", 0, 0)),
        ("shuffle-split", 342): (f, ("concatenate", 0, 1, 10, 0)),
        ("getitem-vindex", 1, 0, 1): (
            f,
            ("shuffle-split", 319),
            ("shuffle-split", 320),
        ),
        ("getitem-sub", 0, 1, 10, 0): (f, ("fetch-raster", 0, 0, 10, 0)),
        ("getitem-sub", 0, 1, 8, 1): (f, ("fetch-raster", 0, 0, 8, 1)),
        ("shuffle-split", 341): (f, ("concatenate-getitem", 341)),
        ("getitem-sub", 0, 1, 9, 1): (f, ("fetch-raster", 0, 0, 9, 1)),
        ("concatenate-getitem", 341): (f, ("fetch-raster", 0, 0, 10, 0)),
        ("asset-table-data", 0, 0): (f,),
        ("getitem-sub", 0, 1, 9, 0): (f, ("fetch-raster", 0, 0, 9, 0)),
        ("fetch-raster", 0, 0, 10, 0): (f, ("asset_table", 0, 0)),
        ("shuffle-split", 320): (f, ("concatenate", 0, 1, 9, 0)),
        ("shuffle-split", 319): (f, ("concatenate-getitem", 319)),
        ("fetch-raster", 0, 0, 9, 0): (f, ("asset_table", 0, 0)),
        ("concatenate-getitem", 319): (f, ("fetch-raster", 0, 0, 9, 0)),
    }

Copy link
Contributor

github-actions bot commented Sep 3, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     13 files  ± 0       13 suites  ±0   3h 4m 57s ⏱️ + 2m 28s
 13 173 tests + 2   12 117 ✅ + 2   1 056 💤 ±0  0 ❌ ±0 
137 296 runs  +24  118 268 ✅ +22  19 028 💤 +2  0 ❌ ±0 

Results for commit 32901e0. ± Comparison against base commit f5b9811.

♻️ This comment has been updated with latest results.

Comment on lines +2569 to +2572
for k in final_keys:
# Ensure that we're not processing the entire graph using
# process_runnables (fractional values) but are using the critical path
assert o[k].critical_path in {1, 2}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isn't great but it's something.

If nothing else, this test now adds this stackstac graph which hasn't been in here before. I also managed to change order in a way that pressure for this test increased but the remaining test suite was green so this test does add good signal even if the critical path assertion is a bit off.

@phofl phofl merged commit 4be4880 into dask:main Sep 3, 2024
24 checks passed
@phofl
Copy link
Collaborator

phofl commented Sep 3, 2024

thx

@fjetter fjetter deleted the dask_order_refinements branch September 4, 2024 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants