-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[core] SstFileReader supports range query #6842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
unrelated maven errors, will try reopen it later |
|
@JingsongLi PTAL if you have some time! My concern is that do we need to split lookup and range query into two different classes? e.g. SstFileLookupReader and SstFileScanReader, so that we do not need to detach index iterator for each lookup. |
| private final Comparator<MemorySlice> comparator; | ||
| private final Path filePath; | ||
| private final BlockCache blockCache; | ||
| private final BlockIterator indexBlockIterator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you introduce a BlockMeta for this? Just store data, recordCount, comparator.
| * Seek to the position of the record whose key is exactly equal to or greater than the | ||
| * specified key. | ||
| */ | ||
| public void seekTo(byte[] key) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a SstFileIterator for this? It can be:
class SstFileIterator {
void seekTo(byte[] key);
BlockIterator readBatch();
}
|
maven repository errors, will reopen this PR to rerun tests later |
52ad9e8 to
bef2581
Compare
|
+1 |
* upstream/master: (30 commits) [core] Fix data evolution mode with limit manifest push down [core] optimize read append table with limit (apache#6848) [test] Fix unstable Data Evolution test Bump org.apache.logging.log4j:log4j-core from 2.17.1 to 2.25.3 (apache#6845) [core] SstFileReader supports range query (apache#6842) [hotfix] Remove useless roadmap [core] Refactor Index Reader in BlockReader (apache#6865) [python] Support read blob row by offsets in with_shard feature (apache#6863) [docs] Fixing documentation about lookup change log producers (apache#6860) [flink] Fix that action/procedure cannot remove unexisting files from manifests when dv enabled. (apache#6854) [spark] Fix SparkCatalog converts catalog option keys to lower case (apache#6708) [spark] Adding support for Iceberg compatibility options to be passed as table properties with dataframe APIs (apache#6803) [spark] Refactor metadata only delete (apache#6852) [spark] Refactor spark v2 DELETE (apache#6851) [core] Minor refactor the maintenance order in TableCommitImpl (apache#6830) [spark] Enable data-evolution table compact (apache#6839) [spark] Introduce source.split.target-size-with-column-pruning (apache#6837) [test] Fix failed test testWriteFixedBucketWithDifferentBucketNumber (apache#6843) [python] Refactor read and write options (apache#6808) [spark] Optimize MERGE INTO self-merge updates on dataEvolution table (apache#6827) ...
Purpose
This PR is the pre-step of #6834, introducing a new access pattern for SstFileReader. Users can seek to specified key and start to iterate on remaining records.
Tests
Please see org.apache.paimon.sst.SstFileTest
API and Format
No api modification
Documentation
No related documentation