Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow naive concatenation of sorted dataframes #4725

Merged
merged 1 commit into from Apr 25, 2019

Conversation

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Apr 22, 2019

Previously we would raise a somewhat confusing error when a user tried
to concatenate dataframes that were sorted, pointing them towards
interleave_partitions=True. Now we allow this behavior without
complaint, producing an unsorted dataframe.

This stops guiding users towards correct behavior, but also reduces the
confusion for novice users.

Fixes #4693

  • Tests added / passed
  • Passes flake8 dask
Previously we would raise a somewhat confusing error when a user tried
to concatenate dataframes that were sorted, pointing them towards
`interleave_partitions=True`.  Now we allow this behavior without
complaint, producing an unsorted dataframe.

This stops guiding users towards correct behavior, but also reduces the
confusion for novice users.

Fixes dask#4693
@mrocklin
Copy link
Member Author

@mrocklin mrocklin commented Apr 22, 2019

Loading

raise ValueError('All inputs have known divisions which '
'cannot be concatenated in order. Specify '
'interleave_partitions=True to ignore order')
divisions = [None] * (sum([df.npartitions for df in dfs]) + 1)
Copy link
Member Author

@mrocklin mrocklin Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could also warn here, but naive users don't have a clear way to turn this warning off

Loading

@mrocklin
Copy link
Member Author

@mrocklin mrocklin commented Apr 24, 2019

Merging tomorrow if there are no comments

Loading

@mrocklin mrocklin merged commit 9f21623 into dask:master Apr 25, 2019
2 checks passed
Loading
@mrocklin mrocklin deleted the relax-interleave_partitions branch Apr 25, 2019
jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this issue May 14, 2019
Previously we would raise a somewhat confusing error when a user tried
to concatenate dataframes that were sorted, pointing them towards
`interleave_partitions=True`.  Now we allow this behavior without
complaint, producing an unsorted dataframe.

This stops guiding users towards correct behavior, but also reduces the
confusion for novice users.

Fixes dask#4693
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

1 participant