Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider reactivating low-level DataFrame optimization when not all layers are Blockwise #8447

Open
gjoseph92 opened this issue Dec 3, 2021 · 1 comment · May be fixed by #8451
Open

Consider reactivating low-level DataFrame optimization when not all layers are Blockwise #8447

gjoseph92 opened this issue Dec 3, 2021 · 1 comment · May be fixed by #8451

Comments

@gjoseph92
Copy link
Collaborator

Since #7620, we've seen a few instances where users have gotten burned by root-task overproduction (see dask/distributed#5555, dask/distributed#5223 for background) because certain DataFrame optimizations still use low-level graphs, and therefore aren't getting fused anymore. Examples:

We do want to get everything to Blockwise eventually, but our bandwith to track these down and fix them is limited. In the interim, I propose that by default, we still do low-level fusion when any of the layers in the graph are materialized.

cc @rjzamora @ian-r-rose @jrbourbeau

gjoseph92 added a commit to gjoseph92/dask that referenced this issue Dec 3, 2021
Change the DataFrame optimization default so that DataFrames that may need low-level fusion still get it.

Closes dask#8447
@gjoseph92 gjoseph92 linked a pull request Dec 3, 2021 that will close this issue
3 tasks
@rjzamora
Copy link
Member

rjzamora commented Dec 3, 2021

We do want to get everything to Blockwise eventually, but our bandwith to track these down and fix them is limited. In the interim, I propose that by default, we still do low-level fusion when any of the layers in the graph are materialized.

My motivation to finally refocus my efforts on HLG cleanup is certainly reaching a tipping point! Sorry for being somewhat MIA in these topics lately.

I agree that we probably want the middle ground you are suggesting. As I type this, I see that you submitted a PR - So, perhaps we can discuss the specific details there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants