Skip to content

fix groupby shift bug caused by unsorted partitions after shuffle#8782

Merged
jcrist merged 2 commits intodask:mainfrom
kori73:groupby-shift-bug-fix
Mar 8, 2022
Merged

fix groupby shift bug caused by unsorted partitions after shuffle#8782
jcrist merged 2 commits intodask:mainfrom
kori73:groupby-shift-bug-fix

Conversation

@kori73
Copy link
Copy Markdown
Contributor

@kori73 kori73 commented Mar 5, 2022

We now apply sort_index to each partition if dataframe is shuffled

I have run the assertion 10 times since the result was non-deterministic. Probability of passing with the previous implementation should be low. If you have a better idea, I can improve the test.

We now apply sort_index to each partition if dataframe is shuffled
Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kori73, this fix looks good to me. I pushed a small patch that cleans up the test a bit (comments on what changed below). Once tests pass I'll merge this.

@jcrist jcrist merged commit 9b53504 into dask:main Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants