Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce boolean ascending for sort_values #8440

Merged
merged 5 commits into from Jan 24, 2022

Conversation

charlesbluca
Copy link
Member

It is possible to pass a list of ascending booleans to sort_values, which is not yet supported by _calculate_divisions (which computes divisions in ascending order), and can cause undefined behavior.

This small PR adds a check that ascending is a boolean before calling rearrange_by_divisions; note that this check happens after all single-partition cases are handled, as Pandas can handle lists of ascending booleans.

  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlesbluca , can you also add a test demonstrating this behavior?

dask/dataframe/shuffle.py Outdated Show resolved Hide resolved
@jsignell
Copy link
Member

This looks nice! Just missing a test like Ian mentioned.

@jsignell
Copy link
Member

@charlesbluca, I think if you merge or rebase off main, lots of these tests will pass.

@charlesbluca
Copy link
Member Author

Thanks @jsignell, merged in the latest

@jsignell jsignell merged commit 1a0f01b into dask:main Jan 24, 2022
@jsignell
Copy link
Member

Thanks @charlesbluca!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants