New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider reactivating low-level DataFrame optimization when not all layers are Blockwise #8447
Comments
Change the DataFrame optimization default so that DataFrames that may need low-level fusion still get it. Closes dask#8447
My motivation to finally refocus my efforts on HLG cleanup is certainly reaching a tipping point! Sorry for being somewhat MIA in these topics lately. I agree that we probably want the middle ground you are suggesting. As I type this, I see that you submitted a PR - So, perhaps we can discuss the specific details there :) |
Since #7620, we've seen a few instances where users have gotten burned by root-task overproduction (see dask/distributed#5555, dask/distributed#5223 for background) because certain DataFrame optimizations still use low-level graphs, and therefore aren't getting fused anymore. Examples:
partition_info
inmap_partitions
materializes the graph unnecessarily #8309map_partitions
in various DataFrame join methods #8306We do want to get everything to Blockwise eventually, but our bandwith to track these down and fix them is limited. In the interim, I propose that by default, we still do low-level fusion when any of the layers in the graph are materialized.
cc @rjzamora @ian-r-rose @jrbourbeau
The text was updated successfully, but these errors were encountered: