Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid substituting keys that occur multiple times #10646

Merged
merged 1 commit into from Dec 12, 2023
Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Nov 23, 2023

Closes #10645

I haven't run any widespread tests, yet. However, for the cases that are affected, this triggers the computation of a task twice and apparently also prohibits some kind of numpy optimization as reported in #10645

The graph topology here doesn't strike me as rare or very special

image

so the add in this graph is basically fused into a fused into a matmul-transpose blob / subgraphcallable that accepts two inputs that are identical. I would argue that it is a bug of blockwise_fusion that generates such a blob in the first place since by rewriting that subgraph callable it would be possible to accept just a single dependency, restoring this fusion again.

I'm not very familiar with the blockwise fusion code and was able to find this first.

With this PR the above graph optimizes to
image

This change effectively stops fusing because it detects this situation. If the blockwise fusion bug is fixed, it would fuse into one task (again) but without recomputing the intermediate result twice

cc @rjzamora

@fjetter
Copy link
Member Author

fjetter commented Nov 23, 2023

I'm pretty sure this is a bug that would've been caught if this all was type annotated.

@fjetter
Copy link
Member Author

fjetter commented Nov 23, 2023

I'm running our benchmarks to see if anything pops up. We don't have a X + 1; X.T @ X test but I mostly want to make sure no other workloads are negatively affected

@fjetter
Copy link
Member Author

fjetter commented Nov 23, 2023

our benchmarks look good

@@ -227,6 +227,48 @@ def test_fuse_keys():
)


def test_donot_substitute_same_key_multiple_times():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_donot_substitute_same_key_multiple_times():
def test_donot_substitute_same_key_multiple_times() -> None:

If you add -> None to new tests mypy will be able to check them as well. I think it's good practice to just do it if you are worrying about the type hints.
It helps catch the edge cases that aren't always obvious when just type hinting the functions.

@fjetter fjetter merged commit 80520de into dask:main Dec 12, 2023
26 of 28 checks passed
@fjetter fjetter deleted the fix_fuse branch December 12, 2023 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

X.T @ X is almost 2 times slower if X has any dependencies in the task graph
2 participants