Skip to content

Rollback log files for MOR (table v6) collide on retry due to UNKNOWN_WRITE_TOKEN #18827

@nsivabalan

Description

@nsivabalan

Problem

When a rollback is retried on a Merge-on-Read (MOR) table at table version 6 — either after a failure or because the rollback was re-driven — the rollback log files can collide because they inherit the previously-existing log file's write token (often UNKNOWN_WRITE_TOKEN = 1-0-1) instead of using a per-task write token. This produces multiple files with the same name across rollback attempts, causing overwrites and metadata-table inconsistencies.

Root cause

In RollbackHelperV1#maybeDeleteAndCollectStats, the pre-computed log version map ((latestVersion, existingWriteToken)) was being applied to the new rollback log writer like this:

writerBuilder.withLogVersion(preComputedVersion.getLeft())
    .withLogWriteToken(preComputedVersion.getRight()); // <-- overrides per-task token

This overrode the per-task write token that CommonClientUtils.generateWriteToken(taskContextSupplier) had just set, so on rollover (HoodieLogFile.rollOver(rolloverLogWriteToken)) the new rollback log inherited the existing log's token. Retried rollbacks ran with the same token and ended up writing files with identical names.

Fix

When an existing log file is found for the file group being rolled back:

  • Keep the per-task write token from CommonClientUtils.generateWriteToken (don't override it).
  • Explicitly bump the writer's log version to latest + 1 (so the new file lands at a fresh version even when the per-task token alone wouldn't differentiate it).
  • Only apply the bump in doDelete=true paths; in doDelete=false paths (stats-only) we let WriterBuilder.build() rediscover the existing version so the downstream storage.getPathInfo lookup still resolves.

Also: change the "no existing log file" sentinel in preComputeLogVersions from (LOGFILE_BASE_VERSION, UNKNOWN_WRITE_TOKEN) to (LOGFILE_BASE_VERSION, null) so the call site can distinguish "no log file" from "a real log file whose token happens to equal UNKNOWN_WRITE_TOKEN."

Impact

  • Repeated rollback attempts no longer create colliding log files.
  • Metadata table no longer sees conflicting write tokens for rollback log files.
  • File-slice ordering remains consistent across retries.

Reproduction

A test that exercises this on master: force table version 6 on a MOR table, write a base commit followed by an updates commit (leaving the second commit in inflight state), execute a rollback via MergeOnReadRollbackActionExecutor, then replay the rollback after restoring the inflight commit's timeline files + marker directory. With the bug, the second rollback's log files collide with the first attempt's; with the fix, each rollback attempt produces uniquely-named log files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions