Search before asking
Motivation
For partitioned tables (e.g., hourly partitions with 256 buckets, retaining 24 hours of data), a streaming LogScanner subscribes to all partitions, resulting in 256 × 24 = 6144 buckets. However, only the latest partition's 256 buckets actively receive data. The current LogFetcher implementation treats all buckets equally, sending fetch requests to every bucket each round regardless of whether they have data. This causes unnecessary CPU usage, network overhead, and wasted fetch requests on inactive partitions.
Solution
No response
Anything else?
No response
Willingness to contribute
Search before asking
Motivation
For partitioned tables (e.g., hourly partitions with 256 buckets, retaining 24 hours of data), a streaming LogScanner subscribes to all partitions, resulting in 256 × 24 = 6144 buckets. However, only the latest partition's 256 buckets actively receive data. The current LogFetcher implementation treats all buckets equally, sending fetch requests to every bucket each round regardless of whether they have data. This causes unnecessary CPU usage, network overhead, and wasted fetch requests on inactive partitions.
Solution
No response
Anything else?
No response
Willingness to contribute