Skip to content

Fix race in ReplicatedAccessStorage that could evict a just-created entity (UNKNOWN_ROLE)#106525

Open
Slach wants to merge 3 commits into
ClickHouse:masterfrom
Slach:fix_replicated_rbac_race_condition
Open

Fix race in ReplicatedAccessStorage that could evict a just-created entity (UNKNOWN_ROLE)#106525
Slach wants to merge 3 commits into
ClickHouse:masterfrom
Slach:fix_replicated_rbac_race_condition

Conversation

@Slach
Copy link
Copy Markdown
Contributor

@Slach Slach commented Jun 5, 2026

Closes: #106521

Fixes a TOCTOU race in ReplicatedAccessStorage that could temporarily make a just-created access entity unresolvable by name.

ZooKeeperReplicator::refreshEntities(all=false), run by the background watch thread, read the /uuid children list via getChildrenWatch without holding mutex, and only then took the lock to call memory_storage.removeAllExcept(snapshot). A concurrent insertEntity creates the entity's znode first and only then takes mutex to put the entity into memory_storage, so it could commit an entity after the snapshot was taken but before the watch thread acquired the lock. removeAllExcept then evicted the freshly inserted entity from the local cache. Any statement resolving it by name in that window failed with There is no role ... (UNKNOWN_ROLE), even on the same session that had just created the role (e.g. CREATE USER ... DEFAULT ROLE, CREATE QUOTA ... TO).

The fix acquires mutex before reading the children list, so the snapshot is consistent with the in-memory state. This matches the all=true branch, which already reads entities from ZooKeeper under the lock.

A probabilistic stress regression test is added to test_replicated_users: it concurrently creates a role and immediately references it by name while background churn keeps the /uuid children list volatile and many seeded entities widen the eviction window. The detection logic was validated against an unpatched clickhouse-server 26.3.4.11, where it reproduces UNKNOWN_ROLE within ~17s; with the fix it never fails.

Originally observed as a flaky clickhouse-backup TestRBAC failure: https://github.com/Altinity/clickhouse-backup/actions/runs/26991435518/job/79652420544

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a race in ReplicatedAccessStorage where the background watch thread could evict a just-created access entity (role, user, quota, settings profile) from the local cache, making it temporarily unresolvable by name and failing statements with UNKNOWN_ROLE ("There is no role ...") even on the session that had just created it.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Slach and others added 3 commits June 5, 2026 09:01
…ntity

`ZooKeeperReplicator::refreshEntities(all=false)`, run by the background
watch thread, read the `/uuid` children list via `getChildrenWatch` without
holding `mutex` and only then took the lock to call
`memory_storage.removeAllExcept(snapshot)`. A concurrent `insertEntity`
(which creates the znode first and only then takes `mutex` to put the entity
into `memory_storage`) could commit an entity after the snapshot was taken
but before the watch thread acquired the lock, so `removeAllExcept` evicted
the freshly inserted entity. Any statement resolving it by name in that
window failed with `There is no role ... (UNKNOWN_ROLE)`, even on the same
session that had just created the role (e.g. `CREATE USER ... DEFAULT ROLE`,
`CREATE QUOTA ... TO`).

Acquire `mutex` before reading the children list so the snapshot is
consistent with the in-memory state, matching the `all=true` branch which
already reads entities from ZooKeeper under the lock.

Reproducer and analysis: ClickHouse#106521
Originally observed as a flaky clickhouse-backup TestRBAC failure:
https://github.com/Altinity/clickhouse-backup/actions/runs/26991435518/job/79652420544

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…race

Stress test in `test_replicated_users`: concurrently create a role and
immediately reference it by name (`CREATE USER ... DEFAULT ROLE`,
`CREATE QUOTA ... TO`), with background churn keeping the `/uuid` children
list volatile and many seeded entities widening the eviction window. On the
buggy build the freshly created role is evicted from the local cache by the
background watch refresh and one of the referencing statements fails with
`There is no role ... (UNKNOWN_ROLE)`; with the fix it never does.

The detection logic was validated locally against an unpatched
clickhouse-server 26.3.4.11: it reproduces `UNKNOWN_ROLE` within ~17s.

Related: ClickHouse#106521

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Explain in the test why the stress driver is an in-container bash script
with background jobs and `wait`, rather than Python threads driving
`node.query` from the host (a host<->container round trip per iteration
would lower the iteration rate and weaken reproduction of this narrow
race) or `xargs -P` (only replaces the `& ... wait` fan-out while forcing
the worker bodies into a third level of nested quoting).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReplicatedAccessStorage: freshly created role evicted from local cache by background watch refresh (TOCTOU race) -> UNKNOWN_ROLE

1 participant