Skip to content

lookup table by name not uuid that can be stale#102724

Open
seva-potapov wants to merge 1 commit intoClickHouse:masterfrom
seva-potapov:refresh-mv-unknown-table-race
Open

lookup table by name not uuid that can be stale#102724
seva-potapov wants to merge 1 commit intoClickHouse:masterfrom
seva-potapov:refresh-mv-unknown-table-race

Conversation

@seva-potapov
Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

during refresh of materialized view we lookup table by name instead of uuid that can be stale

Details:

IdentifierResolver::resolveTableIdentifier calls resolveStorageID which stamps a UUID onto the StorageID, then passes it to RefreshTask::getAndLockTargetTable. Inside that function, DatabaseCatalog::getTable takes the UUID shortcut in getTableImpl (line 386), which throws UNKNOWN_TABLE when the UUID is stale (after EXCHANGE TABLES + DROP of old table). The 10-retry loop, designed exactly for this race, never fires because the exception propagates out.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 15, 2026

Workflow [PR], commit [8bd163b]

Summary:


AI Review

Summary

This PR fixes a real race in refreshable materialized views where getAndLockTargetTable could receive a stale UUID and throw UNKNOWN_TABLE before the retry loop had a chance to recover. The change to resolve by name (database + table) in this path is consistent with the function’s retry design and with concurrent EXCHANGE TABLES behavior, and the updated stateless test now exercises reads while auto-refresh remains active. High-level verdict: the fix looks correct and complete for the scoped issue.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Apr 15, 2026
@tiandiwonder tiandiwonder self-assigned this Apr 15, 2026
Copy link
Copy Markdown
Contributor

@tiandiwonder tiandiwonder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog category to Bug Fix?

@seva-potapov
Copy link
Copy Markdown
Contributor Author

Changelog category to Bug Fix?

no, because it's too hard to replicate in test and it would fail verification on unpatched clickhouse

Comment on lines +1102 to +1109
/// Look up by name, not UUID. The caller (IdentifierResolver) may pass a UUID-bearing StorageID
/// from resolveStorageID, but that UUID can be stale: EXCHANGE TABLES during refresh replaces
/// the target table with a new UUID, and then the old table is dropped. If we pass the stale
/// UUID to getTable, it throws UNKNOWN_TABLE via the tryGetByUUID shortcut in getTableImpl,
/// bypassing our retry loop. Name-based lookup is safe: the database mutex is held across
/// applyState renames, so getTable blocks during the rename and resolves correctly afterward.
StorageID name_only_id(storage_id.database_name, storage_id.table_name);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to have a short comment?

/// Look up by name (ignore UUID). During refresh, EXCHANGE TABLES may replace the target table
/// with a new UUID; using a stale UUID can fail fast and bypass the retry loop below.
StorageID name_only_id(storage_id.database_name, storage_id.table_name);

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 15, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 75.00% (6/8) · Uncovered code

Full report · Diff report

Copy link
Copy Markdown
Member

@tavplubix tavplubix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seva-potapov
Copy link
Copy Markdown
Contributor Author

@tavplubix getAndLockTargetTable is reached via RefreshSet::tryGetTaskForInnerTable, whose InnerTableMap is already keyed by DatabaseAndTableNameHash/Equal (RefreshSet.h:77), so we got here by name in the first place.

The function's own doc comment (line 1094) starts with "Get table by name", and the retry loop is documented as "retry until we see a different table by the same name" and the UUID shortcut in getTableImpl was just preventing that loop from running.

There's also precedent for stripping the UUID when operating by name in this file: exchangeTargetTable does exactly that at StorageMaterializedView.cpp:644

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants