-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
Description
Describe the bug
We previously had a plan like this (where the RepartitionExec was added prior to a filter in order to increase parallelism).
However, after upgrading DataFusion, the RepartitionExec is no longer there. I actually think this is a slightly worse plan as now the filter can not be done in parallel
FilterExec: tag@2 = A
RepartitionExec: partitioning=RoundRobinBatch(4) <--- This RepartitionExec has been removed
DeduplicateExec: [tag@2 ASC,time@3 ASC]
SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
UnionExec
ParquetExec: limit=None, partitions={1 group: [[1/1/1/1/00000000-0000-0000-0000-000000000000.parquet]]}, predicate=tag = Dictionary(Int32, Utf8("A")), pruning_predicate=tag_min@0 <= A AND A <= tag_max@1, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time] |
SortExec: [tag@2 ASC,time@3 ASC].
RecordBatchesExec: batches_groups=1 batches=1
To Reproduce
I am working on a reproducer
Expected behavior
A RepartitionExec should be added if it will increase parallelism for filtering
Additional context
We found this while upgrading IOx:
https://github.com/influxdata/influxdb_iox/pull/6603 -- see https://github.com/influxdata/influxdb_iox/pull/6603/files#r1072606494
mingmwang