Skip to content

[flink] Avoid creating empty lake writer in TieringSplitReader #3290

@luoyuxia

Description

@luoyuxia

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

main (development)

Please describe the bug 🐞

In TieringSplitReader.forLogRecords, the lake writer is created before checking whether the current polled records actually contain any record with record.logOffset() < stoppingOffset.

If the polled batch only contains records whose offsets have already reached or passed the split stoppingOffset, the split can still be marked finished based on the last record offset, but nothing is written into the lake writer.

This shows up with logical empty batches that still advance offsets, for example with first-row merge engine updates or deleting a non-existent key. In that case recordWriter.complete() may fail during tiering commit with:

The size of CommitMessage must be 1, but got [].

A concrete sequence is:

  1. TieringSplitReader subscribes a log split with a finite stoppingOffset.
  2. forLogRecords receives bucketScanRecords for that bucket, but every record has logOffset() >= stoppingOffset.
  3. A lake writer has already been created, although no record is written.
  4. The reader sees lastRecord.logOffset() >= stoppingOffset - 1, finishes the split, and calls completeLakeWriter().
  5. The underlying lake writer completes an empty write and fails, for example Paimon throws The size of CommitMessage must be 1, but got [].

Solution

Create the lake writer lazily only when the first record satisfying record.logOffset() < stoppingOffset is encountered. If no record is actually written for the batch/split, keep the current completeLakeWriter() behavior and return a null write result.

Are you willing to submit a PR?

  • I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions