lookup table by name not uuid that can be stale#102724
lookup table by name not uuid that can be stale#102724seva-potapov wants to merge 1 commit intoClickHouse:masterfrom
Conversation
|
Workflow [PR], commit [8bd163b] Summary: ✅ AI ReviewSummaryThis PR fixes a real race in refreshable materialized views where ClickHouse Rules
Final Verdict
|
tiandiwonder
left a comment
There was a problem hiding this comment.
Changelog category to Bug Fix?
no, because it's too hard to replicate in test and it would fail verification on unpatched clickhouse |
| /// Look up by name, not UUID. The caller (IdentifierResolver) may pass a UUID-bearing StorageID | ||
| /// from resolveStorageID, but that UUID can be stale: EXCHANGE TABLES during refresh replaces | ||
| /// the target table with a new UUID, and then the old table is dropped. If we pass the stale | ||
| /// UUID to getTable, it throws UNKNOWN_TABLE via the tryGetByUUID shortcut in getTableImpl, | ||
| /// bypassing our retry loop. Name-based lookup is safe: the database mutex is held across | ||
| /// applyState renames, so getTable blocks during the rename and resolves correctly afterward. | ||
| StorageID name_only_id(storage_id.database_name, storage_id.table_name); | ||
|
|
There was a problem hiding this comment.
better to have a short comment?
/// Look up by name (ignore UUID). During refresh, EXCHANGE TABLES may replace the target table
/// with a new UUID; using a stale UUID can fail fast and bypass the retry loop below.
StorageID name_only_id(storage_id.database_name, storage_id.table_name);
LLVM Coverage Report
Changed lines: 75.00% (6/8) · Uncovered code |
tavplubix
left a comment
There was a problem hiding this comment.
See https://github.com/ClickHouse/ClickHouse/pull/98678/changes#r3094050289
Resolving by name can cause other issues
|
@tavplubix getAndLockTargetTable is reached via RefreshSet::tryGetTaskForInnerTable, whose InnerTableMap is already keyed by DatabaseAndTableNameHash/Equal (RefreshSet.h:77), so we got here by name in the first place. The function's own doc comment (line 1094) starts with "Get table by name", and the retry loop is documented as "retry until we see a different table by the same name" and the UUID shortcut in getTableImpl was just preventing that loop from running. There's also precedent for stripping the UUID when operating by name in this file: exchangeTargetTable does exactly that at StorageMaterializedView.cpp:644 |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
during refresh of materialized view we lookup table by name instead of uuid that can be stale
Details:
IdentifierResolver::resolveTableIdentifier calls resolveStorageID which stamps a UUID onto the StorageID, then passes it to RefreshTask::getAndLockTargetTable. Inside that function, DatabaseCatalog::getTable takes the UUID shortcut in getTableImpl (line 386), which throws UNKNOWN_TABLE when the UUID is stale (after EXCHANGE TABLES + DROP of old table). The 10-retry loop, designed exactly for this race, never fires because the exception propagates out.