Skip to content

[Umbrella] Support full scan in batch mode for PrimaryKey Table #1876

@platinumhamburg

Description

@platinumhamburg

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Currently, Fluss does not support performing full snapshot scans of the latest data in batch-execution mode on Primary-Key tables. However, this capability is critically important in many scenarios, such as OLAP queries or ad-hoc data inspection.
The existing LimitBatchScanner does not support full snapshot inspection, and the KvSnapshotBatchScanner does not support ad-hoc queries on the current dataset snapshot. Neither of them meets the requirements.

Solution

The above features can be separated into two subtasks:

  • Fluss supports ad-hoc full snapshot scanning for Primary Key Tables
  • Flink Integration: support non-limited scanning for Primary Key Tables in batch execution mode

Anything else?

The underlying implementation of the newly introduced BatchScanner should follow the streaming fetch mode to avoid scanning large datasets in a single RPC communication.

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions