[python] Support snapshot_id and tag_name in Ray read_paimon API#7802
Merged
Conversation
Lets callers time-travel a Ray scan to a specific snapshot id or a named tag through the top-level ``read_paimon`` facade, mirroring Java ``CoreOptions.SCAN_SNAPSHOT_ID`` / ``SCAN_TAG_NAME``. The two arguments are mutually exclusive at both the public entry point and the underlying ``CatalogSplitProvider``. Also fills in the ``scan.snapshot-id`` plumbing that was missing on the Python side: a new ``CoreOptions.SCAN_SNAPSHOT_ID`` config, a ``TimeTravelUtil`` branch resolving it via ``SnapshotManager``, and a matching path in ``TableScan._create_file_scanner``. The two existing callers of ``TimeTravelUtil.try_travel_to_snapshot`` (full-text scan, vector search scan) are updated to pass the snapshot manager. Tests: TimeTravelUtil unit tests, CatalogSplitProvider time-travel unit tests, and Ray ``read_paimon`` integration tests covering both snapshot-id and tag-name paths plus the mutual-exclusion guard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Add time-travel support to the top-level
pypaimon.ray.read_paimonAPI,so a Ray scan can read a specific snapshot id or a named tag.
Why
Before this PR,
read_paimonalways read the latest snapshot — therewas no way to reproduce a scan against a fixed point in history through
the recommended public facade. Internally pypaimon already understood
scan.tag-name(added with #7243), but the matchingscan.snapshot-idplumbing was missing on the Python side even though the option exists in
Java's
CoreOptions.SCAN_SNAPSHOT_ID.What changed
Public API —
pypaimon/ray/ray_paimon.py:read_paimon(..., snapshot_id=None, tag_name=None)ValueErrorif both set)Backing plumbing:
pypaimon/read/datasource/split_provider.py:CatalogSplitProvidertakes the two new fields, applies them via
table.copy({"scan.snapshot-id": ..., "scan.tag-name": ...})in_ensure_table. Same mutual-exclusionguard as a defense-in-depth layer.
pypaimon/common/options/core_options.py: newSCAN_SNAPSHOT_IDconfig (long type, no default), aligned with Java's
CoreOptions.SCAN_SNAPSHOT_ID.pypaimon/snapshot/time_travel_util.py:try_travel_to_snapshotnowaccepts a
snapshot_managerand resolvesscan.snapshot-idagainst it.pypaimon/read/table_scan.py:_create_file_scannerroutesSCAN_SNAPSHOT_IDthroughsnapshot_manager.get_snapshot_by_id+manifest_list_manager.read_all, mirroring the existingSCAN_TAG_NAMEbranch.TimeTravelUtilcallers (full-text scan, vector search scan)are updated to pass the snapshot manager.
Docs —
docs/content/pypaimon/ray-data.md: added aTime travelexample block and parameter docs.
Tests
time_travel_util_test.py(new, 6 cases): SCAN_KEYS contents,snapshot-id resolution, missing-id raise, missing-snapshot-manager raise,
mutual exclusion at the util layer.
split_provider_test.py(+3 cases): provider-level snapshot-id /tag-name time travel + ctor mutual-exclusion guard.
ray_integration_test.py(+3 cases):read_paimonend-to-end withsnapshot_id/tag_name, plus the public mutual-exclusion guard.All read-path regression tests still pass (57/57 across reader-pk,
reader-append-only, projection, time-travel, ray integration).
Out of scope
scan.mode=from-snapshotetc.) is unchanged.CoreOptions.SCAN_SNAPSHOT_ID; no Java changesneeded.