Skip to content

[SPARK-55997][SS] Set upper bound to prefixScan in RocksDB state store provider#54816

Closed
HeartSaVioR wants to merge 1 commit intoapache:masterfrom
HeartSaVioR:SPARK-55997
Closed

[SPARK-55997][SS] Set upper bound to prefixScan in RocksDB state store provider#54816
HeartSaVioR wants to merge 1 commit intoapache:masterfrom
HeartSaVioR:SPARK-55997

Conversation

@HeartSaVioR
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR proposes to set upper bound to prefixScan (including iterator with column family) in RocksDB state store provider. This is to guide RocksDB to stop finding the next valid key if it already sought the boundary and now figures out the next valid key beyond boundary.

Why are the changes needed?

For prefix scan (and iterator with column family) in RocksDB state store provider, we create an iterator and seek to the first valid key for the prefix (or vcf), and trigger next till there is no key or the given key is out of bound.

When triggering next, RocksDB has to figure out the next valid key from current position. The issue is "valid" key - let's say there is column family vcf1 which is set to perform prefix scan, and the prefix of the keys are 'a', 'b', 'c' (for simplicity). After we remove all keys for the prefix 'b', prefix scan of the prefix 'a' has to go through all tombstones for 'b' to finally find the valid key from the prefix 'c', which can take a lot of time if the number of keys for prefix 'b' was outstanding.

Since we use virtual column family (vcf) and vcf is identified by prefix, we have the similar problem "across" vcfs. Suppose the case where there are two virtual column families vcf1 and vcf2, where vcf1 is set to perform prefix scan while vcf2 is to perform range scan (and sequentially removed based on watermark/timestamp advancement). If there is prefix scan for vcf1 which is the last prefix of vcf1, it has to go through tombstones for vcf2 to finally find the valid key for vcf2.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests for RocksDB class level. Existing tests for prefix scan and iterator with column family should verify e2e.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: claude-4.6-opus

Copy link
Copy Markdown
Contributor

@anishshri-db anishshri-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@HeartSaVioR
Copy link
Copy Markdown
Contributor Author

Thanks! Merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants