Skip to content

[tiering] Tiering Job Performance: Read-Write Pipeline Optimization #2915

@beryllw

Description

@beryllw

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Problem Description

When running a Tiering job with high write throughput, the data synchronization cannot keep up with the write speed. The root cause analysis reveals two main issues:

  1. Parallelism is bounded by bucket count - Tiering job parallelism is 1:1 mapped to bucket count, limiting scalability
  2. Read and write operations cannot be pipelined - Reading from Fluss and writing to Paimon are executed sequentially, resulting in low CPU utilization

Root Cause Analysis

  1. Split Granularity Equals Bucket Granularity: Each split covers exactly one bucket, which limits the maximum parallelism.
  2. Sequential Read-Write Pattern: The current implementation reads from Fluss and writes to Paimon synchronously.

Solution

No response

Anything else?

No response

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions