Skip to content

[core] Optimize data-evolution scan with index#7305

Merged
JingsongLi merged 4 commits intoapache:masterfrom
leaves12138:optimize-scan-with-index
Feb 26, 2026
Merged

[core] Optimize data-evolution scan with index#7305
JingsongLi merged 4 commits intoapache:masterfrom
leaves12138:optimize-scan-with-index

Conversation

@leaves12138
Copy link
Contributor

@leaves12138 leaves12138 commented Feb 25, 2026

Purpose

Speed up scan with large rowRanges.

Testing:
40w rowRange push down, cost 3720ms before this optimization, cost 400ms+ after this optimization. (By starrocks engine)

Tests

API and Format

Documentation

Generative AI tooling

for (DataFileMeta file : dataSplit.dataFiles()) {
fileRanges.add(file.nonNullRowIdRange());
}
Function<Split, List<IndexedSplit>> process =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce a static method for this?

return () -> indexedSplits;
}

private static int lowerBoundRangeTo(List<Range> ranges, long target) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use RowRangeIndex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

}

@VisibleForTesting
static final class RowRangeIndex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract this class to a separate class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

}

@VisibleForTesting
static boolean intersectsRowRanges(RowRangeIndex index, long fileFrom, long fileTo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this method into RowRangeIndex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


// No matching indices found, skip this entry
return false;
private RowRangeIndex(long[] starts, long[] ends) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass a List<Range> ranges into here, and you should sort it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

for (Range expected : rowRanges) {
if (Range.intersection(fileRowRange, expected) != null) {
return true;
private static int lowerBound(long[] sorted, long target) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

@VisibleForTesting
static RowRangeIndex buildRowRangeIndex(List<Range> sortedAndMerged) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into RowRangeIndex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

List<Split> splits = batchScan.withRowRanges(rowRanges).plan().splits();
return wrapToIndexSplits(splits, rowRanges, scoreGetter);
List<Range> sortedPushDownRanges = Range.sortAndMergeOverlap(rowRanges, true);
List<Split> splits = batchScan.withRowRanges(sortedPushDownRanges).plan().splits();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a new method batchScan.withRowRanges(RowRangeIndex)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@leaves12138 leaves12138 force-pushed the optimize-scan-with-index branch from 2754909 to 7a87a0b Compare February 26, 2026 09:04
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit e751b6c into apache:master Feb 26, 2026
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants