Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] [DataFusion] Implement "coalesce partitions" operator #26546

Closed
asfimport opened this issue Nov 14, 2020 · 1 comment
Closed

[Rust] [DataFusion] Implement "coalesce partitions" operator #26546

asfimport opened this issue Nov 14, 2020 · 1 comment

Comments

@asfimport
Copy link

The coalesce partitions operator simply reduces the number of partitions to the specified amount.

The target partition count must be >=1

If the target partition count is >= the number of input partitions then this is a no-op and can be optimized out of the plan.

The simplest implementation would be to assign one or more input partitions to each output partition. This works well where the number of input partitions is divisible by the number of output partitions e.g. going from 64 input partitions to 8 output partitions. In other cases, the resulting partitions may have data skew e.g. going from 3 partitions to 2. It would be possible to do the partitioning at the row level but that would add a lot of overhead and the "repartition" operator should be used for that case.

Reporter: Andy Grove / @andygrove

Note: This issue was originally created as ARROW-10583. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Andrew Lamb / @alamb:
Migrated to github: apache/datafusion#112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant