Skip to content

Fix PushdownSort dropping LIMIT when eliminating SortExec#21744

Open
sgrebnov wants to merge 3 commits intoapache:mainfrom
spiceai:sgrebnov/0420-topK-ensure-limit
Open

Fix PushdownSort dropping LIMIT when eliminating SortExec#21744
sgrebnov wants to merge 3 commits intoapache:mainfrom
spiceai:sgrebnov/0420-topK-ensure-limit

Conversation

@sgrebnov
Copy link
Copy Markdown
Member

Which issue does this PR close?

When PushdownSort removes a SortExec because a source returns Exact (guaranteeing ordering), any fetch (LIMIT) on the SortExec is silently dropped if the underlying plan does not support with_fetch().

For example, ProjectionExec supports try_pushdown_sort (delegating to its child) but does not implement with_fetch(). A plan like SortExec(fetch=10) → ProjectionExec → source that gets sort-eliminated loses the limit.

What changes are included in this PR?

In the Exact branch of PushdownSort, when the eliminated SortExec carried a fetch:

  1. Try with_fetch() on the pushed-down source first
  2. If with_fetch() returns None, fall back to wrapping with GlobalLimitExec

Are these changes tested?

Yes. Three new unit tests:

  • test_sort_pushdown_exact_no_fetch_no_limit — Exact elimination without fetch: no limit wrapper added
  • test_sort_pushdown_exact_preserves_fetch_with_global_limit — Exact elimination with fetch, source does NOT support with_fetch(): GlobalLimitExec wrapper added
  • test_sort_pushdown_exact_preserves_fetch_with_source_support — Exact elimination with fetch, source supports with_fetch(): limit pushed into source directly

Are there any user-facing changes?

No.

Copy link
Copy Markdown
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would be better if there are some slt tests

Also, do you want to include it into new minor release?

Comment thread datafusion/physical-optimizer/src/pushdown_sort.rs
let inner = if let Some(fetch) = sort_child.fetch() {
inner.with_fetch(Some(fetch)).unwrap_or(inner)
inner.with_fetch(Some(fetch)).unwrap_or_else(|| {
Arc::new(GlobalLimitExec::new(inner, 0, Some(fetch)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 102 mentions multi-partitioning (sort_child.preserve_partitioning()) but GlobalLimitExec requires single partitioning - https://docs.rs/datafusion-physical-plan/53.1.0/src/datafusion_physical_plan/limit.rs.html#170

Copy link
Copy Markdown
Member Author

@sgrebnov sgrebnov Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-g - great point, thank you. Updated to use LocalLimitExec. This is consistent with how enforce_sorting handles this.

https://github.com/apache/datafusion/blob/main/datafusion/physical-optimizer/src/enforce_sorting/mod.rs#L585-L594

            // If the sort has a fetch, we need to add a limit:
            if properties.output_partitioning().partition_count() == 1 {
                let mut global_limit =
                    GlobalLimitExec::new(Arc::clone(sort_input), 0, Some(fetch));
                global_limit.set_required_ordering(required_ordering);
                Arc::new(global_limit)
            } else {
                let mut local_limit = LocalLimitExec::new(Arc::clone(sort_input), fetch);
                local_limit.set_required_ordering(required_ordering);
                Arc::new(local_limit)
            }

Note: I didn't add set_required_ordering here because the Exact result means the source's plan properties already guarantee the ordering, but happy to add it for consistency / if required - please let me know.

@alamb alamb added performance Make DataFusion faster and removed performance Make DataFusion faster labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sgrebnov and @martin-g and @xudong963

Comment thread datafusion/physical-optimizer/src/pushdown_sort.rs
@sgrebnov
Copy link
Copy Markdown
Member Author

LGTM, would be better if there are some slt tests

@xudong963 — There's an existing SLT test (sort_pushdown.slt Test 1.3) that already covers the Exact pushdown + LIMIT case where the source supports with_fetch(). The specific fallback path this PR fixes (GlobalLimitExec/LocalLimitExec wrapping when the source doesn't support with_fetch()) can't be triggered through SLT because all built-in data sources support with_fetch(). Adding a custom test-only table provider to the SLT harness just for this seems like overkill, so I'd propose keeping the unit tests with the configurable TestScan. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants