Skip to content

Sometimes Filters are not repartitioned when they could be #4967

@alamb

Description

@alamb

Describe the bug

We previously had a plan like this (where the RepartitionExec was added prior to a filter in order to increase parallelism).

However, after upgrading DataFusion, the RepartitionExec is no longer there. I actually think this is a slightly worse plan as now the filter can not be done in parallel

FilterExec: tag@2 = A
 RepartitionExec: partitioning=RoundRobinBatch(4)  <--- This RepartitionExec has been removed
   DeduplicateExec: [tag@2 ASC,time@3 ASC]
    SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
      UnionExec
       ParquetExec: limit=None, partitions={1 group: [[1/1/1/1/00000000-0000-0000-0000-000000000000.parquet]]}, predicate=tag = Dictionary(Int32, Utf8("A")), pruning_predicate=tag_min@0 <= A AND A <= tag_max@1, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time] |
       SortExec: [tag@2 ASC,time@3 ASC].
         RecordBatchesExec: batches_groups=1 batches=1

To Reproduce
I am working on a reproducer

Expected behavior
A RepartitionExec should be added if it will increase parallelism for filtering

Additional context
We found this while upgrading IOx:

https://github.com/influxdata/influxdb_iox/pull/6603 -- see https://github.com/influxdata/influxdb_iox/pull/6603/files#r1072606494

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingoptimizerOptimizer rules

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions