-
Notifications
You must be signed in to change notification settings - Fork 518
Open
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
Problem Description
When running a Tiering job with high write throughput, the data synchronization cannot keep up with the write speed. The root cause analysis reveals two main issues:
- Parallelism is bounded by bucket count - Tiering job parallelism is 1:1 mapped to bucket count, limiting scalability
- Read and write operations cannot be pipelined - Reading from Fluss and writing to Paimon are executed sequentially, resulting in low CPU utilization
Root Cause Analysis
- Split Granularity Equals Bucket Granularity: Each split covers exactly one bucket, which limits the maximum parallelism.
- Sequential Read-Write Pattern: The current implementation reads from Fluss and writes to Paimon synchronously.
Solution
No response
Anything else?
No response
Willingness to contribute
- I'm willing to submit a PR!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels