UNIQUE KEY: index cache (7/N)#104282
Conversation
ClickHouse-side `CacheBase` adapter that implements RocksDB's `Cache` API for the UNIQUE KEY index SST block cache. Includes Context registration, server-side wiring, server settings, ProfileEvents, and CurrentMetrics. Settings (read by Server.cpp at startup + reload): - `unique_key_index_cache_policy` (default `SLRU`) - `unique_key_index_cache_size_bytes` (default `1 GiB`) - `unique_key_index_cache_size_ratio` (default `0.5`) Setting size to 0 disables the cache; `getUniqueKeyIndexCache` returns nullptr. On `USE_ROCKSDB=0` builds the registration is a graceful no-op. ProfileEvents: - `UniqueKeyIndexCacheLookupMicroseconds` — combined Lookup + Insert wall-clock inside the adapter. - `UniqueKeyIndexCacheHits` / `UniqueKeyIndexCacheMisses` — per-lookup hit/miss counters. CurrentMetrics: - `UniqueKeyIndexCacheBytes`, `UniqueKeyIndexCacheEntries`. `updateUniqueKeyIndexCacheConfiguration` logs an INFO line when the configured cache size changes. Tests (`gtest_unique_key_index_cache.cpp`): 5 cases pinning wrapper-only behaviour — insert/lookup with hit/miss counter assertion, slice-key erase, release-erase semantics, identity-aware replacement protection, concurrency smoke. Tests that re-verify the dependency contracts (CacheBase eviction, atomic ref counting) are intentionally not included. Reviewer: PR 10 is the first production reader consumer of this cache; the SST-reader-cache invalidation race helpers (removeWithToken*, replaceIfPresent) and their gtest coverage are out of scope here and travel with PR 10. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [ecea231] Summary: ✅ AI ReviewSummaryThis PR adds a Findings❌ Blockers
Tests
ClickHouse Rules
Performance & Safety
Final VerdictStatus: Minimum required action: enforce strict-capacity admission against pinned/non-evictable memory in |
- SetStrictCapacityLimit: rename parameter from `v` (cpp) and `strict_capacity_limit` (header) to `value` in both, satisfying readability-inconsistent-declaration-parameter-name and avoiding shadow with the same-name member. - Release(): capture the erased outcome inside the predicate (which runs under the cache lock) instead of comparing two count() snapshots. Closes a race where unrelated concurrent insert/erase between the two count() calls could flip the return value.
Unit tests (amd_llvm_coverage) saw both Hits/Misses deltas as 0 even though the asan/tsan/msan variants (with and without function_prop_fuzzer) all see the expected 1/1. The Lookup is happening (the ASSERT_NE / EXPECT_EQ on returned handles all pass on coverage too) — only the global_counters read returns 0. Coverage seems to leave the increments in a thread-local counters subtree that doesn't propagate to global. Gate the delta assertions behind !WITH_COVERAGE; six other Unit-test configs validate the wiring.
Mirror rocksdb::Cache's strict-capacity contract: refuse the standalone when strict_capacity_limit is set, the requested charge exceeds the cap, and the caller did not opt into an uncharged handle. With allow_uncharged=true (the typical RocksDB caller) the standalone handle is returned uncharged — consistent with the spec; CacheBase has no native standalone-tracked path for proper charging.
GetPinnedUsage now reports the sum of charges of currently-pinned entries (per-entry pin_count gates 0->1 add and 1->0 remove on the shared total), matching rocksdb::Cache spec. Closes the strict-cap bypass where pinned-but-evicted entries escaped both GetUsage and GetPinnedUsage. CreateStandalone in the strict + over-cap + allow_uncharged=true path now sets effective_charge=0 so GetCharge / GetUsage(handle) / GetPinnedUsage all consistently report 0. Release(erase=true) predicate also requires pin_count==0 under the backing lock — narrows the Lookup-vs-erase race to a small residual window where T2's get and pin_count.fetch_add are not jointly under the backing lock; closing that fully needs a CacheBase API extension and is out of scope here.
| /// over the limit briefly during eviction which is fine for BLOCK cache use. | ||
| if (strict_capacity_limit.load(std::memory_order_relaxed)) | ||
| { | ||
| const size_t max = backing->maxSizeInBytes(); |
There was a problem hiding this comment.
SetStrictCapacityLimit(true) is still bypassable with pinned handles because Insert only rejects when charge > capacity, but never accounts for memory already pinned outside backing.
Concrete trace with capacity=64:
Insert(k1, charge=64, &h1)succeeds (GetPinnedUsage()==64).EraseUnRefEntries()clearsbacking, buth1still keeps 64 bytes pinned.Insert(k2, charge=64, nullptr)also succeeds (charge <= capacity), so live memory becomes ~128 bytes under strict mode.
That violates RocksDB strict-capacity semantics (inserts should fail when non-evictable pinned usage leaves insufficient room). Please include pinned memory in admission control, e.g. reject when charge + pinned_charge_total (+ current usage if needed) > capacity, and add a regression test for this scenario.
LLVM Coverage Report
Changed lines: 90.62% (319/352) | lost baseline coverage: 7 line(s) · Uncovered code |
Part of #103486
ClickHouse
CacheBaseadapter for the RocksDB block cache, plusContextandServerregistration and the matching metrics. The adapter implements RocksDB'sCacheAPI —Insert/Lookup/Erase/Ref/Release, capacity, and eviction — directly on top ofCacheBase, so it is testable in isolation against theCacheAPI contract without an SST reader.Scope:
UniqueKeyIndexCacheadapter with identity-aware release / erase semantics:Release(handle, erase_if_last_ref=true)removes the entry only if the cached pointer still matches the handle, so a newer entry inserted under the same key is not evicted.unique_key_index_cache_policy(defaultSLRU),unique_key_index_cache_size_bytes(default 1 GiB;0disables andgetUniqueKeyIndexCachereturnsnullptr),unique_key_index_cache_size_ratio(default0.5).Changelog category (leave one):
Changelog entry: N/A (incremental slice; user-facing changelog ships with the last PR in the stack once the full feature is reviewable).
Documentation entry for user-facing changes