feat: support limit push down in datafusion#177
Conversation
5d5c757 to
7aa0858
Compare
| let stream = futures::stream::once(fut).try_flatten(); | ||
|
|
||
| // Apply limit if specified | ||
| let limited_stream: Pin<Box<dyn Stream<Item = DFResult<RecordBatch>> + Send>> = |
There was a problem hiding this comment.
If it's just a limit applied to this stream, will datafusion do it itself?
There was a problem hiding this comment.
Yes, DataFusion already does this on its own — it just stops early at the scan source in current pr. My original plan was to leave the limit pushdown to Paimon core for a follow-up PR, but we can also implement it directly in this PR if you prefer.
There was a problem hiding this comment.
You can continue to complete it, please note that there will be special logic for pushing down the limit in data evolution mode.
7aa0858 to
ae5353a
Compare
|
Please rebase master. |
ae5353a to
3d8e6e4
Compare
3d8e6e4 to
8605bf5
Compare
| // than waiting until splits are created. | ||
| // | ||
| // Reference: [AppendOnlyFileStoreScan.postFilterManifestEntries](https://github.com/apache/paimon/blob/release-1.3/paimon-core/src/main/java/org/apache/paimon/operation/AppendOnlyFileStoreScan.java#L91) | ||
| let limit_pushdown_at_manifest_level = self.table.schema.primary_keys().is_empty() |
There was a problem hiding this comment.
Maybe we don't need this. Just do limit in apply_limit_pushdown.
| /// by overlapping ranges. Two ranges overlap if `current.start <= previous_group_end`. | ||
| /// | ||
| /// Reference: [RangeHelper.mergeOverlappingRanges()](https://github.com/apache/paimon/blob/release-1.3/paimon-common/src/main/java/org/apache/paimon/utils/RangeHelper.java#L59) | ||
| fn merge_overlapping_row_id_ranges(files: &[DataFileMeta]) -> Vec<Vec<&DataFileMeta>> { |
There was a problem hiding this comment.
Just reuse group_by_overlapping_row_id?
There was a problem hiding this comment.
Thanks Not noticed it.
| /// | ||
| /// This does not guarantee an exact final row count. If any split's | ||
| /// `merged_row_count()` is `None` (for example because of unknown deletion | ||
| /// cardinality), all remaining splits are kept and the caller or query |
There was a problem hiding this comment.
all remaining splits are kept?
5f9b620 to
f0bfe40
Compare
f0bfe40 to
55a3951
Compare
Purpose
Linked issue: close #xxx
as a part of #173
Brief change log
Tests
API and Format
Documentation