release-26.2: sql/catkv: refresh stale name->ID mappings in SystemDatabaseCache by dt · Pull Request #169139 · cockroachdb/cockroach

dt · 2026-04-27T15:55:39Z

Backport 3/3 commits from #169044.

/cc @cockroachdb/release

The per-node SystemDatabaseCache holds name→ID mappings for system database namespace entries for the lifetime of the process. Until now it treated the first observation of any name as authoritative: once a name was cached, subsequent observations of the same name with a different ID were silently dropped. The comment on the type even codified this as an assumption ("for a given name and a given cluster version, the name to ID mapping will never change for the life of process").

That assumption does not hold for PCR reader tenants. A reader tenant is bootstrapped with its own dynamically-allocated IDs for system tables that lack a fixed reserved-range ID (e.g. system.privileges), and then SetupOrAdvanceStandbyReaderCatalog rewrites the namespace to point those names at the externalized descriptors replicated from the source tenant, which carry the source's IDs.

Any cache entry populated between reader tenant startup and that rewrite — and the very first user authentication populates one for system.privileges via the synthetic privilege cache — pins the bootstrap-time ID forever. Since the rewrite also deletes the descriptor at the bootstrap-time ID, every subsequent name resolution that hits the cache returns an ID that no descriptor exists for, and lookups fail with resolved <name> to <id> but found no descriptor with id <id>. Restarting the reader tenant masks the bug because it discards the cache.

This change permits the cache to replace an existing entry when an incoming observation has the same name but a different ID and a strictly newer MVCC timestamp than the cached entry. The MVCC timestamp guard ensures a stale read at an older AOST cannot regress a fresher mapping, and the same check is repeated under the write lock in update so a concurrent fresher write cannot be clobbered by a slower stale one. The warm-up entries inserted at construction carry a zero MVCC timestamp, so any real KV-sourced observation dominates them, which is the desired behavior.

Resolves #153555.

Epic: none
Release note (bug fix): A physical cluster replication reader tenant no longer fails authentication and other queries with errors of the form resolved <name> to <id> but found no descriptor with id <id> after the reader tenant ingests a system table at an ID different from the one it was bootstrapped with. Previously, a per-node namespace cache could pin the bootstrap-time ID and require a tenant restart to recover.

Release justification: bug fix for PCR reader tenant authentication failures

The per-node SystemDatabaseCache holds name->ID mappings for system database namespace entries for the lifetime of the process. Until now it treated the first observation of any name as authoritative: once a name was cached, subsequent observations of the same name with a different ID were silently dropped. The comment on the type even codified this as an assumption ("for a given name and a given cluster version, the name to ID mapping will never change for the life of process"). That assumption does not hold for PCR reader tenants. A reader tenant is bootstrapped with its own dynamically-allocated IDs for system tables that lack a fixed reserved-range ID (e.g. system.privileges), and then SetupOrAdvanceStandbyReaderCatalog rewrites the namespace to point those names at the externalized descriptors replicated from the source tenant, which carry the source's IDs. Any cache entry populated between reader tenant startup and that rewrite -- and the very first user authentication populates one for system.privileges via the synthetic privilege cache -- pins the bootstrap-time ID forever. Since the rewrite also deletes the descriptor at the bootstrap-time ID, every subsequent name resolution that hits the cache returns an ID that no descriptor exists for, and lookups fail with "resolved <name> to <id> but found no descriptor with id <id>". Restarting the reader tenant masks the bug because it discards the cache. This change permits the cache to replace an existing entry when an incoming observation has the same name but a different ID and a strictly newer MVCC timestamp than the cached entry. The MVCC timestamp guard ensures a stale read at an older AOST cannot regress a fresher mapping, and the same check is repeated under the write lock in update so a concurrent fresher write cannot be clobbered by a slower stale one. The warm-up entries inserted at construction carry a zero MVCC timestamp, so any real KV-sourced observation dominates them, which is the desired behavior. Resolves cockroachdb#153555. Epic: none Release note (bug fix): A physical cluster replication reader tenant no longer fails authentication and other queries with errors of the form "resolved <name> to <id> but found no descriptor with id <id>" after the reader tenant ingests a system table at an ID different from the one it was bootstrapped with. Previously, a per-node namespace cache could pin the bootstrap-time ID and require a tenant restart to recover.

Reverts 767fcb6 ("roachtest: delake pcr roachtests that offset reader tenant system tables") and 20ea8e1 ("roachest: ensure reader tenant accepts connections before restart"), which together restarted the reader tenant in PCR roachtests after the standby read poller had advanced. The restart was a workaround for the SystemDatabaseCache bug fixed in the previous commit: a stale name->ID mapping for system.privileges would survive in the per-node cache and cause queries to fail with "resolved privileges to <id> but found no descriptor". Restarting the tenant discarded the cache. With the cache itself refreshed correctly, the workaround is no longer needed and the test goes back to exercising the same flow a customer would see. Epic: none Release note: none

The doctype comment still claimed that the name->ID mapping never changes for the life of the process. The previous commit relaxed that to permit replacement when an incoming observation carries a different ID at a strictly newer MVCC timestamp. Update the comment to match. Epic: none Release note: None

blathers-crl · 2026-04-27T15:55:43Z

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

trunk-io · 2026-04-27T15:55:43Z

😎 Merged successfully - details.

cockroach-teamcity · 2026-04-27T15:56:01Z

This change is

spilchen

LGTM

dt added 3 commits April 27, 2026 15:50

dt requested a review from msbutler April 27, 2026 15:55

dt requested a review from a team as a code owner April 27, 2026 15:55

blathers-crl Bot added backport Label PR's that are backports to older release branches T-disaster-recovery labels Apr 27, 2026

msbutler requested a review from spilchen April 27, 2026 20:05

msbutler approved these changes Apr 27, 2026

View reviewed changes

spilchen approved these changes Apr 27, 2026

View reviewed changes

trunk-io Bot merged commit 5780c2b into cockroachdb:release-26.2 Apr 27, 2026
33 of 34 checks passed

celeste-cockroachdb Bot added the target-release-26.2.1 label Apr 27, 2026

dt deleted the backport26.2-169044 branch April 29, 2026 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-26.2: sql/catkv: refresh stale name->ID mappings in SystemDatabaseCache#169139

release-26.2: sql/catkv: refresh stale name->ID mappings in SystemDatabaseCache#169139
trunk-io[bot] merged 3 commits into
cockroachdb:release-26.2from
dt:backport26.2-169044

dt commented Apr 27, 2026

Uh oh!

blathers-crl Bot commented Apr 27, 2026

Uh oh!

trunk-io Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

cockroach-teamcity commented Apr 27, 2026

Uh oh!

spilchen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dt commented Apr 27, 2026

Uh oh!

blathers-crl Bot commented Apr 27, 2026

Uh oh!

trunk-io Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Apr 27, 2026

Uh oh!

spilchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

trunk-io Bot commented Apr 27, 2026 •

edited

Loading