Skip to content

Fix chained RENAME COLUMN mutations on re-attached parts#96351

Merged
alexey-milovidov merged 22 commits intomasterfrom
fix-chained-rename-column-mutation
Mar 3, 2026
Merged

Fix chained RENAME COLUMN mutations on re-attached parts#96351
alexey-milovidov merged 22 commits intomasterfrom
fix-chained-rename-column-mutation

Conversation

@alexey-milovidov
Copy link
Member

@alexey-milovidov alexey-milovidov commented Feb 7, 2026

When a part was detached, multiple RENAME COLUMN mutations applied (e.g. c0 -> c3 -> c4), and the part re-attached, the system failed to properly resolve the rename chain, resulting in a LOGICAL_ERROR exception ("Cannot calculate columns sizes when columns or checksums are not initialized").

Four interrelated bugs contributed:

  1. makeCloneInDetached deleted metadata_version.txt during DETACH (keep_metadata_version was false for normal detach). The re-attached part then lost its original metadata version, and the fallback in loadColumnsChecksumsIndexes set it to the table's current version, making the system think all renames were already applied.

  2. loadPartAndFixMetadataImpl overwrote metadata_version.txt with the table's current version during ATTACH, also masking pending renames.

  3. getColumnsForNewDataPart in MutateTask.cpp skipped RENAME commands when the source column didn't exist in the part, breaking chains where the intermediate column name (e.g. c3) was never in the original part.

  4. AlterConversions::addMutationCommand stored each rename separately, so getColumnNewName("c0") returned the intermediate name "c3" instead of the final "c4", breaking on-fly column resolution during reads.

Fixes: preserve metadata_version.txt during DETACH, don't overwrite it during ATTACH, and properly chain renames in both mutation application and on-fly read paths.

This partially addresses the bugs described in #84295 — specifically the chained rename resolution in mutations (MutateTask.cpp) and on-fly reads (AlterConversions). The concurrent rename test (03352_concurrent_rename_alter) remains removed as it triggers a separate unfixed bug involving incorrect column type resolution when both original and renamed columns exist in the same part (see #84295 for details).

Found via BuzzHouse fuzzer (TSan/AMD):
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=96192&sha=fa3063dc3bbb4b5600d67dc5650e81851b1fdd66&name_0=PR&name_1=BuzzHouse%20%28amd_tsan%29

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix logical error on attaching a part in MergeTree if there were several chained renames between detaching and attaching.

alexey-milovidov and others added 2 commits February 8, 2026 00:45
When a part was detached, multiple RENAME COLUMN mutations applied
(e.g. `c0` -> `c3` -> `c4`), and the part re-attached, the system
failed to properly resolve the rename chain, resulting in a
`LOGICAL_ERROR` exception ("Cannot calculate columns sizes when
columns or checksums are not initialized").

Four interrelated bugs contributed:

1. `makeCloneInDetached` deleted `metadata_version.txt` during DETACH
   (`keep_metadata_version` was false for normal detach). The re-attached
   part then lost its original metadata version, and the fallback in
   `loadColumnsChecksumsIndexes` set it to the table's current version,
   making the system think all renames were already applied.

2. `loadPartAndFixMetadataImpl` overwrote `metadata_version.txt` with
   the table's current version during ATTACH, also masking pending renames.

3. `getColumnsForNewDataPart` in `MutateTask.cpp` skipped RENAME commands
   when the source column didn't exist in the part, breaking chains where
   the intermediate column name (e.g. `c3`) was never in the original part.

4. `AlterConversions::addMutationCommand` stored each rename separately,
   so `getColumnNewName("c0")` returned the intermediate name `"c3"`
   instead of the final `"c4"`, breaking on-fly column resolution during reads.

Fixes: preserve `metadata_version.txt` during DETACH, don't overwrite it
during ATTACH, and properly chain renames in both mutation application and
on-fly read paths.

Found via BuzzHouse fuzzer (TSan/AMD):
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=96192&sha=fa3063dc3bbb4b5600d67dc5650e81851b1fdd66&name_0=PR&name_1=BuzzHouse%20%28amd_tsan%29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add more thorough test cases:
- Case 2: Three chained renames (c0 -> c1 -> c2 -> c3)
- Case 3: Five chained renames (c0 -> c1 -> c2 -> c3 -> c4 -> c5)
- Case 4: Two columns with independent rename chains (a -> a1 -> a2 -> a3, b -> b1 -> b2)

Use unique ZK paths per case to avoid stale metadata conflicts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Feb 7, 2026

Workflow [PR], commit [9599a37]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Feb 7, 2026
@alexey-milovidov alexey-milovidov marked this pull request as draft February 8, 2026 03:51
alexey-milovidov and others added 2 commits February 8, 2026 05:16
After restoring ZK from scratch, the /metadata ZNode version starts
at 0, but parts may have higher metadata_version from prior ALTERs.
Bump the ZK version in `restoreMetadataInZooKeeper` to match the max
parts version so parts don't appear "from the future".

Also fix `test_attach_detach_partition` S3 test: since
`metadata_version.txt` is now preserved during DETACH
(`keep_metadata_version = true`), remove the object count adjustments
that accounted for its deletion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov marked this pull request as ready for review February 8, 2026 07:56
alexey-milovidov and others added 4 commits February 10, 2026 11:14
The test fails in the private sync CI because randomized settings
enable `automatic_parallel_replicas_mode 2`, which takes a different
reading code path that does not properly apply alter conversions for
parts with pending rename mutations. This causes `NO_SUCH_COLUMN_IN_TABLE`
for the old column name during SELECT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test `03352_concurrent_rename_alter` was disabled and then removed,
but the underlying bugs it exposed (wrong rename order in mutations
and reads) are being fixed in this PR. Re-add the test as
`03915_concurrent_rename_alter`.

See #84295

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The style check forbids the `timeout` command in .sh tests because it
leads to race conditions. Use the standard `SECONDS`/`TIMELIMIT` pattern
instead to bound the loop duration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alexey-milovidov added a commit that referenced this pull request Feb 15, 2026
The `find_master_builds` function runs `git log origin/master` inside
Docker containers that run as root, but the repository is owned by the
`ubuntu` user. Git's ownership check blocks the command, returning empty
output, so no master commits are found and the job fails with
"Could not find master builds in S3".

Add `git config --global --add safe.directory` before the git command,
following the existing pattern in `build_clickhouse.py` and
`build_toolchain.py`.

#96351

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alexey-milovidov and others added 6 commits February 16, 2026 16:01
This test was brought back from `03352_concurrent_rename_alter` (which was
previously disabled and removed), but it triggers an unfixed bug where
concurrent renames with type changes cause a `Logical error: Stream ... is
not found` exception. The underlying issue is tracked in #84295 and involves
incorrect column type resolution in `getColumnsForNewDataPart` when both the
original and renamed columns exist in the source part.

The deterministic chained rename test (`03905_chained_rename_column_mutation`)
remains and properly covers the detach/attach rename chain fix in this PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…column-mutation

# Conflicts:
#	src/Storages/StorageReplicatedMergeTree.cpp
@alexey-milovidov alexey-milovidov self-assigned this Feb 27, 2026
@alexey-milovidov alexey-milovidov merged commit 1a22797 into master Mar 3, 2026
148 of 149 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-chained-rename-column-mutation branch March 3, 2026 17:51
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants