Skip to content

Fix fatal exception in loadDataPart for parts with outdated TIDs#96866

Merged
alexey-milovidov merged 1 commit intomasterfrom
fix-outdated-tid-loadDataPart
Feb 14, 2026
Merged

Fix fatal exception in loadDataPart for parts with outdated TIDs#96866
alexey-milovidov merged 1 commit intomasterfrom
fix-outdated-tid-loadDataPart

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Feb 14, 2026

Summary

  • Fix a LOGICAL_ERROR (Trying to get CSN for too old TID) that shows up during ATTACH TABLE when loading parts with outdated TIDs from rolled-back transactions
  • Replace getCSNAndAssert with getCSN in loadDataPart for both creation and removal TID lookups — the existing code already correctly treats unknown CSNs as rolled-back transactions
  • Triggered by test 03915_intersecting_parts_rollback from PR Fix intersecting parts on disk from rolled-back merge transactions #96635, where failed merge transactions on an Ordinary database leave parts with stale TIDs that get cleaned up from the transaction log by other concurrent transactions

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=e9b4a584c40ec53b02093d24df92975e0d72c87f&name_0=MasterCI&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29

Test plan

  • Builds successfully (RelWithDebInfo)
  • Existing test 03915_intersecting_parts_rollback should pass without crashing the server
  • No regressions in stateless tests (arm_binary, parallel)

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

🤖 Generated with Claude Code

During part loading, `getCSNAndAssert` throws a fatal LOGICAL_ERROR when
encountering a part whose creation TID has been cleaned up from the
transaction log (tail_ptr moved past the TID's start_csn). This happens
when a merge transaction fails (e.g., on an Ordinary database that
doesn't support transactions) and leaves a part on disk with a TID that
was never committed. As other transactions advance the log, the cleanup
thread moves tail_ptr past the stale TID.

Replace `getCSNAndAssert` with `getCSN` in `loadDataPart` for both
creation and removal TIDs. When `getCSN` returns `UnknownCSN` for an
outdated TID, the existing code correctly treats the part as from a
rolled-back transaction (`RolledBackCSN`), marking it as outdated for
removal. This is safe because if the transaction had been committed,
its CSN would have been persisted to the part's metadata before the
log entry was cleaned up.

Fixes the server crash triggered by test `03915_intersecting_parts_rollback`
from PR #96635.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=e9b4a584c40ec53b02093d24df92975e0d72c87f&name_0=MasterCI&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Feb 14, 2026

Workflow [PR], commit [e0c510b]

Summary:

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Feb 14, 2026
@alexey-milovidov alexey-milovidov self-assigned this Feb 14, 2026
@alexey-milovidov alexey-milovidov merged commit 7dbecb0 into master Feb 14, 2026
134 of 135 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-outdated-tid-loadDataPart branch February 14, 2026 08:18
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants