Skip to content

Conversation

@steFaiz
Copy link
Contributor

@steFaiz steFaiz commented Dec 19, 2025

Purpose

This PR is the pre-step of #6834, introducing a new access pattern for SstFileReader. Users can seek to specified key and start to iterate on remaining records.

Tests

Please see org.apache.paimon.sst.SstFileTest

API and Format

No api modification

Documentation

No related documentation

@steFaiz steFaiz changed the title [core] SstFileReader supports range query. [core] SstFileReader supports range query Dec 19, 2025
@steFaiz steFaiz closed this Dec 19, 2025
@steFaiz steFaiz reopened this Dec 19, 2025
@steFaiz
Copy link
Contributor Author

steFaiz commented Dec 19, 2025

unrelated maven errors, will try reopen it later

@JingsongLi JingsongLi closed this Dec 19, 2025
@JingsongLi JingsongLi reopened this Dec 19, 2025
@steFaiz
Copy link
Contributor Author

steFaiz commented Dec 22, 2025

@JingsongLi PTAL if you have some time! My concern is that do we need to split lookup and range query into two different classes? e.g. SstFileLookupReader and SstFileScanReader, so that we do not need to detach index iterator for each lookup.

private final Comparator<MemorySlice> comparator;
private final Path filePath;
private final BlockCache blockCache;
private final BlockIterator indexBlockIterator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you introduce a BlockMeta for this? Just store data, recordCount, comparator.

* Seek to the position of the record whose key is exactly equal to or greater than the
* specified key.
*/
public void seekTo(byte[] key) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a SstFileIterator for this? It can be:

class SstFileIterator {
       void seekTo(byte[] key);
       BlockIterator readBatch();
}

@steFaiz
Copy link
Contributor Author

steFaiz commented Dec 23, 2025

maven repository errors, will reopen this PR to rerun tests later

@JingsongLi
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit dccbb57 into apache:master Dec 23, 2025
23 checks passed
jerry-024 added a commit to jerry-024/paimon that referenced this pull request Dec 23, 2025
* upstream/master: (30 commits)
  [core] Fix data evolution mode with limit manifest push down
  [core] optimize read append table with limit (apache#6848)
  [test] Fix unstable Data Evolution test
  Bump org.apache.logging.log4j:log4j-core from 2.17.1 to 2.25.3 (apache#6845)
  [core] SstFileReader supports range query (apache#6842)
  [hotfix] Remove useless roadmap
  [core] Refactor Index Reader in BlockReader (apache#6865)
  [python] Support read blob row by offsets in with_shard feature (apache#6863)
  [docs] Fixing documentation about lookup change log producers (apache#6860)
  [flink] Fix that action/procedure cannot remove unexisting files from manifests when dv enabled. (apache#6854)
  [spark] Fix SparkCatalog converts catalog option keys to lower case (apache#6708)
  [spark] Adding support for Iceberg compatibility options to be passed as table properties with dataframe APIs (apache#6803)
  [spark] Refactor metadata only delete (apache#6852)
  [spark] Refactor spark v2 DELETE (apache#6851)
  [core] Minor refactor the maintenance order in TableCommitImpl (apache#6830)
  [spark] Enable data-evolution table compact (apache#6839)
  [spark] Introduce source.split.target-size-with-column-pruning (apache#6837)
  [test] Fix failed test testWriteFixedBucketWithDifferentBucketNumber (apache#6843)
  [python] Refactor read and write options (apache#6808)
  [spark] Optimize MERGE INTO self-merge updates on dataEvolution table (apache#6827)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants