-
Notifications
You must be signed in to change notification settings - Fork 492
Open
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
Currently, Fluss does not support performing full snapshot scans of the latest data in batch-execution mode on Primary-Key tables. However, this capability is critically important in many scenarios, such as OLAP queries or ad-hoc data inspection.
The existing LimitBatchScanner does not support full snapshot inspection, and the KvSnapshotBatchScanner does not support ad-hoc queries on the current dataset snapshot. Neither of them meets the requirements.
Solution
The above features can be separated into two subtasks:
- Fluss supports ad-hoc full snapshot scanning for Primary Key Tables
- Flink Integration: support non-limited scanning for Primary Key Tables in batch execution mode
Anything else?
The underlying implementation of the newly introduced BatchScanner should follow the streaming fetch mode to avoid scanning large datasets in a single RPC communication.
Willingness to contribute
- I'm willing to submit a PR!
caozhen1937
Metadata
Metadata
Assignees
Labels
No labels