Skip to content

Bug: EnforceDistribution optimizer loses fetch (LIMIT) from CoalescePartitionsExec and SortPreservingMergeExec #21169

@zhuqi-lucas

Description

@zhuqi-lucas

Describe the bug

When LimitPushdown merges a GlobalLimitExec into a CoalescePartitionsExec (or SortPreservingMergeExec) as a fetch value, the EnforceDistribution optimizer rule strips and re-inserts distribution-changing operators without preserving the fetch. This causes queries with LIMIT over multi-partition sources to silently lose the limit and potentially return duplicate/extra rows.

Root cause

In enforce_distribution.rs, the function remove_dist_changing_operators strips CoalescePartitionsExec, SortPreservingMergeExec, and RepartitionExec from the plan tree. It does not capture or propagate any fetch value that was embedded in those operators. Later, when add_merge_on_top re-inserts a merge operator to satisfy SinglePartition distribution, the fetch is gone.

To Reproduce

  1. Create a parquet table with multiple row groups / partitions.
  2. Run a query with LIMIT, e.g. SELECT * FROM t LIMIT 1.
  3. After LimitPushdown, the plan has CoalescePartitionsExec(fetch=1).
  4. EnforceDistribution strips the CoalescePartitionsExec and re-inserts one without fetch.
  5. The limit is silently lost.

Expected behavior

EnforceDistribution should preserve the fetch value when removing and re-inserting distribution-changing operators.

Additional context

This is analogous to the existing logic that preserves ordering through SortPreservingMergeExec — the fetch (limit push-down) should receive the same treatment.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions