-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
I am seeing a Repartition being added incorrectly in some cases in our IOx plans (which then causes resorts to happen, which is a huge deal for us)
Expected behavior
If the data is sorted it should not be resorted
Additional context
The repartition is being added by the https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_optimizer/repartition.rs physical optimizer pass
I previously fixed a similar issue like this in #1776 but DataFusion got more sophisticated recently (in a good way). I believe that somehow it doesn't realize that the output of the UnionExec is sorted and thus should not be repartitioned
To Reproduce
My suspicion is that #4714 / 899c86a is the specific code that is causing this change (as the relies_on_input_order, which I added explicitly for this case, is now ignored -- see #4856).
I think the fix is to update DataFusion to be smarter about knowing how UnionExec is sorted in
In fact, looking at the tests I wrote and were changed in #4714
You can see exactly that the Repartition and sort have been added
Found while updating DataFusion in IOx: https://github.com/influxdata/influxdb_iox/pull/6483#discussion_r1065433368