-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a way to preserve partitioning through UnionExec
without losing ordering
#10314
Comments
UnionExec
UnionExec
UnionExec
without losing ordering
Hi @alamb, I am trying to work on this. I am not very familiar on the |
Hi @xinlifoobar -- this sounds like it is on the right track |
Hi @alamb, found another interesting case while testing. I am not very sure, do you think this could apply
With
|
Is your feature request related to a problem or challenge?
The
EnforceDistribution
physical optimizer pass in DataFusion in some cases will introduceInterleaveExec
to increase partitioning when data passes through aUnionExec
:datafusion/datafusion/core/src/physical_optimizer/enforce_distribution.rs
Lines 1196 to 1226 in 2231183
Here is what
InterleaveExec
does:datafusion/datafusion/physical-plan/src/union.rs
Lines 286 to 317 in 4edbdd7
However, this has the potential downside of destroying and pre-existing ordering which is sometimes preferable than increasing / improving partitionining (e.g. see #10257 and
datafusion.optimizer.prefer_existing_sort
setting)Describe the solution you'd like
I would like there to be some way to preserve the partitioning after a
UnionExec
without losing the ordering information and then remove theprefer_existing_union
flagDescribe alternatives you've considered
One possibility is to add a
preserve_order
flag toInterleaveExec
the same way asRepartitionExec
has apreserve_order
flag:datafusion/datafusion/physical-plan/src/repartition/mod.rs
Lines 328 to 417 in 4edbdd7
Additional context
We encountered this while working on #10259 @mustafasrepo and @phillipleblanc pointed out that config flag
prefer_existing_union
was effectively the same asprefer_existing_sort
The text was updated successfully, but these errors were encountered: