Skip to content

feat: support limit push down in datafusion#177

Merged
JingsongLi merged 3 commits intoapache:mainfrom
luoyuxia:data-fusion-support-limit
Apr 3, 2026
Merged

feat: support limit push down in datafusion#177
JingsongLi merged 3 commits intoapache:mainfrom
luoyuxia:data-fusion-support-limit

Conversation

@luoyuxia
Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia commented Apr 1, 2026

Purpose

Linked issue: close #xxx
as a part of #173

Brief change log

Tests

API and Format

Documentation

@luoyuxia luoyuxia force-pushed the data-fusion-support-limit branch from 5d5c757 to 7aa0858 Compare April 1, 2026 22:50
let stream = futures::stream::once(fut).try_flatten();

// Apply limit if specified
let limited_stream: Pin<Box<dyn Stream<Item = DFResult<RecordBatch>> + Send>> =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just a limit applied to this stream, will datafusion do it itself?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, DataFusion already does this on its own — it just stops early at the scan source in current pr. My original plan was to leave the limit pushdown to Paimon core for a follow-up PR, but we can also implement it directly in this PR if you prefer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can continue to complete it, please note that there will be special logic for pushing down the limit in data evolution mode.

@luoyuxia luoyuxia force-pushed the data-fusion-support-limit branch from 7aa0858 to ae5353a Compare April 2, 2026 09:05
@JingsongLi
Copy link
Copy Markdown
Contributor

Please rebase master.

@luoyuxia luoyuxia force-pushed the data-fusion-support-limit branch from ae5353a to 3d8e6e4 Compare April 3, 2026 01:34
@luoyuxia luoyuxia force-pushed the data-fusion-support-limit branch from 3d8e6e4 to 8605bf5 Compare April 3, 2026 02:16
Comment thread crates/paimon/src/table/table_scan.rs Outdated
// than waiting until splits are created.
//
// Reference: [AppendOnlyFileStoreScan.postFilterManifestEntries](https://github.com/apache/paimon/blob/release-1.3/paimon-core/src/main/java/org/apache/paimon/operation/AppendOnlyFileStoreScan.java#L91)
let limit_pushdown_at_manifest_level = self.table.schema.primary_keys().is_empty()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need this. Just do limit in apply_limit_pushdown.

Comment thread crates/paimon/src/table/source.rs Outdated
/// by overlapping ranges. Two ranges overlap if `current.start <= previous_group_end`.
///
/// Reference: [RangeHelper.mergeOverlappingRanges()](https://github.com/apache/paimon/blob/release-1.3/paimon-common/src/main/java/org/apache/paimon/utils/RangeHelper.java#L59)
fn merge_overlapping_row_id_ranges(files: &[DataFileMeta]) -> Vec<Vec<&DataFileMeta>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reuse group_by_overlapping_row_id?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Not noticed it.

Comment thread crates/paimon/src/table/table_scan.rs Outdated
///
/// This does not guarantee an exact final row count. If any split's
/// `merged_row_count()` is `None` (for example because of unknown deletion
/// cardinality), all remaining splits are kept and the caller or query
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all remaining splits are kept?

@luoyuxia luoyuxia force-pushed the data-fusion-support-limit branch from 5f9b620 to f0bfe40 Compare April 3, 2026 06:34
@luoyuxia luoyuxia force-pushed the data-fusion-support-limit branch from f0bfe40 to 55a3951 Compare April 3, 2026 06:37
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 1154bdb into apache:main Apr 3, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants