Problem
When a rollback is retried on a Merge-on-Read (MOR) table at table version 6 — either after a failure or because the rollback was re-driven — the rollback log files can collide because they inherit the previously-existing log file's write token (often UNKNOWN_WRITE_TOKEN = 1-0-1) instead of using a per-task write token. This produces multiple files with the same name across rollback attempts, causing overwrites and metadata-table inconsistencies.
Root cause
In RollbackHelperV1#maybeDeleteAndCollectStats, the pre-computed log version map ((latestVersion, existingWriteToken)) was being applied to the new rollback log writer like this:
writerBuilder.withLogVersion(preComputedVersion.getLeft())
.withLogWriteToken(preComputedVersion.getRight()); // <-- overrides per-task token
This overrode the per-task write token that CommonClientUtils.generateWriteToken(taskContextSupplier) had just set, so on rollover (HoodieLogFile.rollOver(rolloverLogWriteToken)) the new rollback log inherited the existing log's token. Retried rollbacks ran with the same token and ended up writing files with identical names.
Fix
When an existing log file is found for the file group being rolled back:
- Keep the per-task write token from
CommonClientUtils.generateWriteToken (don't override it).
- Explicitly bump the writer's log version to
latest + 1 (so the new file lands at a fresh version even when the per-task token alone wouldn't differentiate it).
- Only apply the bump in
doDelete=true paths; in doDelete=false paths (stats-only) we let WriterBuilder.build() rediscover the existing version so the downstream storage.getPathInfo lookup still resolves.
Also: change the "no existing log file" sentinel in preComputeLogVersions from (LOGFILE_BASE_VERSION, UNKNOWN_WRITE_TOKEN) to (LOGFILE_BASE_VERSION, null) so the call site can distinguish "no log file" from "a real log file whose token happens to equal UNKNOWN_WRITE_TOKEN."
Impact
- Repeated rollback attempts no longer create colliding log files.
- Metadata table no longer sees conflicting write tokens for rollback log files.
- File-slice ordering remains consistent across retries.
Reproduction
A test that exercises this on master: force table version 6 on a MOR table, write a base commit followed by an updates commit (leaving the second commit in inflight state), execute a rollback via MergeOnReadRollbackActionExecutor, then replay the rollback after restoring the inflight commit's timeline files + marker directory. With the bug, the second rollback's log files collide with the first attempt's; with the fix, each rollback attempt produces uniquely-named log files.
Problem
When a rollback is retried on a Merge-on-Read (MOR) table at table version 6 — either after a failure or because the rollback was re-driven — the rollback log files can collide because they inherit the previously-existing log file's write token (often
UNKNOWN_WRITE_TOKEN=1-0-1) instead of using a per-task write token. This produces multiple files with the same name across rollback attempts, causing overwrites and metadata-table inconsistencies.Root cause
In
RollbackHelperV1#maybeDeleteAndCollectStats, the pre-computed log version map ((latestVersion, existingWriteToken)) was being applied to the new rollback log writer like this:This overrode the per-task write token that
CommonClientUtils.generateWriteToken(taskContextSupplier)had just set, so on rollover (HoodieLogFile.rollOver(rolloverLogWriteToken)) the new rollback log inherited the existing log's token. Retried rollbacks ran with the same token and ended up writing files with identical names.Fix
When an existing log file is found for the file group being rolled back:
CommonClientUtils.generateWriteToken(don't override it).latest + 1(so the new file lands at a fresh version even when the per-task token alone wouldn't differentiate it).doDelete=truepaths; indoDelete=falsepaths (stats-only) we letWriterBuilder.build()rediscover the existing version so the downstreamstorage.getPathInfolookup still resolves.Also: change the "no existing log file" sentinel in
preComputeLogVersionsfrom(LOGFILE_BASE_VERSION, UNKNOWN_WRITE_TOKEN)to(LOGFILE_BASE_VERSION, null)so the call site can distinguish "no log file" from "a real log file whose token happens to equal UNKNOWN_WRITE_TOKEN."Impact
Reproduction
A test that exercises this on master: force table version 6 on a MOR table, write a base commit followed by an updates commit (leaving the second commit in inflight state), execute a rollback via
MergeOnReadRollbackActionExecutor, then replay the rollback after restoring the inflight commit's timeline files + marker directory. With the bug, the second rollback's log files collide with the first attempt's; with the fix, each rollback attempt produces uniquely-named log files.