Skip to content

[client] Implement adaptive fetch rate control for LogScanner to reduce overhead on partitioned tables #3006

@swuferhong

Description

@swuferhong

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

For partitioned tables (e.g., hourly partitions with 256 buckets, retaining 24 hours of data), a streaming LogScanner subscribes to all partitions, resulting in 256 × 24 = 6144 buckets. However, only the latest partition's 256 buckets actively receive data. The current LogFetcher implementation treats all buckets equally, sending fetch requests to every bucket each round regardless of whether they have data. This causes unnecessary CPU usage, network overhead, and wasted fetch requests on inactive partitions.

Solution

No response

Anything else?

No response

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

No labels
No labels
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions