New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make repartition
a no-op when divisions match
#9924
Conversation
Does this close #9922? |
No, this is separate from that issue. Though that issue was what made me think about this use case. |
Will plan to merge this in a few hours if not further comment. I don't think the changes here are particularly controversial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 🚀
I am the author of #9922, my complaint was to avoid adding unnecessary repartition nodes when old and new divisions are equal. So to me, this PR solves my issue. I mentioned |
@@ -7696,6 +7696,10 @@ def repartition(df, divisions=None, force=False): | |||
>>> ddf = dd.repartition(df, [0, 5, 10, 20]) # doctest: +SKIP | |||
""" | |||
|
|||
# no-op fastpath for when we already have matching divisions | |||
if is_dask_collection(df) and df.divisions == divisions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing if divisions is a list
, it only works with tuples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an issue because if we concat a few Dask DataFrames with identical divisions then it will still repartition every dds:
dask.dataframe.multi.concat()
calls align_partitions()
that calls df.repartition()
with a list
divisions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I open a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate issue would be good, thanks @epizut
No need to actually repartition anything if the input divisions are already equal to the existing divisions