Skip to content

Repartition is being added incorrectly in some cases #4883

@alamb

Description

@alamb

Describe the bug
I am seeing a Repartition being added incorrectly in some cases in our IOx plans (which then causes resorts to happen, which is a huge deal for us)

Expected behavior
If the data is sorted it should not be resorted

Additional context

The repartition is being added by the https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_optimizer/repartition.rs physical optimizer pass
I previously fixed a similar issue like this in #1776 but DataFusion got more sophisticated recently (in a good way). I believe that somehow it doesn't realize that the output of the UnionExec is sorted and thus should not be repartitioned

To Reproduce
My suspicion is that #4714 / 899c86a is the specific code that is causing this change (as the relies_on_input_order, which I added explicitly for this case, is now ignored -- see #4856).

I think the fix is to update DataFusion to be smarter about knowing how UnionExec is sorted in

In fact, looking at the tests I wrote and were changed in #4714

https://github.com/apache/arrow-datafusion/blob/556282a8b6da6cb7d41d8c311211ae49b7ed82a7/datafusion/core/src/physical_optimizer/repartition.rs#L574-L586

You can see exactly that the Repartition and sort have been added

Found while updating DataFusion in IOx: https://github.com/influxdata/influxdb_iox/pull/6483#discussion_r1065433368

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions