[core] SstFileReader supports range query #6842

steFaiz · 2025-12-19T09:21:45Z

Purpose

This PR is the pre-step of #6834, introducing a new access pattern for SstFileReader. Users can seek to specified key and start to iterate on remaining records.

Tests

Please see org.apache.paimon.sst.SstFileTest

API and Format

No api modification

Documentation

No related documentation

steFaiz · 2025-12-19T09:31:31Z

unrelated maven errors, will try reopen it later

steFaiz · 2025-12-22T08:34:33Z

@JingsongLi PTAL if you have some time! My concern is that do we need to split lookup and range query into two different classes? e.g. SstFileLookupReader and SstFileScanReader, so that we do not need to detach index iterator for each lookup.

JingsongLi · 2025-12-23T01:20:28Z

paimon-common/src/main/java/org/apache/paimon/sst/SstFileReader.java

    private final Comparator<MemorySlice> comparator;
    private final Path filePath;
    private final BlockCache blockCache;
    private final BlockIterator indexBlockIterator;


Can you introduce a BlockMeta for this? Just store data, recordCount, comparator.

JingsongLi · 2025-12-23T01:22:26Z

paimon-common/src/main/java/org/apache/paimon/sst/SstFileReader.java

+     * Seek to the position of the record whose key is exactly equal to or greater than the
+     * specified key.
+     */
+    public void seekTo(byte[] key) throws IOException {


Can you create a SstFileIterator for this? It can be:

class SstFileIterator { void seekTo(byte[] key); BlockIterator readBatch(); }

steFaiz · 2025-12-23T03:34:49Z

maven repository errors, will reopen this PR to rerun tests later

JingsongLi · 2025-12-23T05:55:52Z

+1

* upstream/master: (30 commits) [core] Fix data evolution mode with limit manifest push down [core] optimize read append table with limit (apache#6848) [test] Fix unstable Data Evolution test Bump org.apache.logging.log4j:log4j-core from 2.17.1 to 2.25.3 (apache#6845) [core] SstFileReader supports range query (apache#6842) [hotfix] Remove useless roadmap [core] Refactor Index Reader in BlockReader (apache#6865) [python] Support read blob row by offsets in with_shard feature (apache#6863) [docs] Fixing documentation about lookup change log producers (apache#6860) [flink] Fix that action/procedure cannot remove unexisting files from manifests when dv enabled. (apache#6854) [spark] Fix SparkCatalog converts catalog option keys to lower case (apache#6708) [spark] Adding support for Iceberg compatibility options to be passed as table properties with dataframe APIs (apache#6803) [spark] Refactor metadata only delete (apache#6852) [spark] Refactor spark v2 DELETE (apache#6851) [core] Minor refactor the maintenance order in TableCommitImpl (apache#6830) [spark] Enable data-evolution table compact (apache#6839) [spark] Introduce source.split.target-size-with-column-pruning (apache#6837) [test] Fix failed test testWriteFixedBucketWithDifferentBucketNumber (apache#6843) [python] Refactor read and write options (apache#6808) [spark] Optimize MERGE INTO self-merge updates on dataEvolution table (apache#6827) ...

steFaiz changed the title ~~[core] SstFileReader supports range query.~~ [core] SstFileReader supports range query Dec 19, 2025

steFaiz closed this Dec 19, 2025

steFaiz reopened this Dec 19, 2025

JingsongLi closed this Dec 19, 2025

JingsongLi reopened this Dec 19, 2025

JingsongLi reviewed Dec 23, 2025

View reviewed changes

steFaiz closed this Dec 23, 2025

steFaiz reopened this Dec 23, 2025

JingsongLi mentioned this pull request Dec 23, 2025

[core] Refactor Index Reader in BlockReader #6865

Merged

steFaiz added 4 commits December 23, 2025 13:28

[core] SstFileReader supports range query.

8759344

fix comment

e6d51d3

fix class comments

98f7c58

rebase on master

bef2581

steFaiz force-pushed the sst-file-scan branch from 52ad9e8 to bef2581 Compare December 23, 2025 05:45

JingsongLi merged commit dccbb57 into apache:master Dec 23, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] SstFileReader supports range query #6842

[core] SstFileReader supports range query #6842

Uh oh!

steFaiz commented Dec 19, 2025

Uh oh!

steFaiz commented Dec 19, 2025

Uh oh!

steFaiz commented Dec 22, 2025

Uh oh!

JingsongLi Dec 23, 2025

Uh oh!

JingsongLi Dec 23, 2025

Uh oh!

steFaiz commented Dec 23, 2025

Uh oh!

JingsongLi commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[core] SstFileReader supports range query #6842

[core] SstFileReader supports range query #6842

Uh oh!

Conversation

steFaiz commented Dec 19, 2025

Purpose

Tests

API and Format

Documentation

Uh oh!

steFaiz commented Dec 19, 2025

Uh oh!

steFaiz commented Dec 22, 2025

Uh oh!

JingsongLi Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

JingsongLi Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

steFaiz commented Dec 23, 2025

Uh oh!

JingsongLi commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants