Skip to content

UNIQUE KEY: index cache (7/N)#104282

Draft
murphy-4o wants to merge 5 commits intoClickHouse:masterfrom
murphy-4o:uk-pr-7-dense-index-cache
Draft

UNIQUE KEY: index cache (7/N)#104282
murphy-4o wants to merge 5 commits intoClickHouse:masterfrom
murphy-4o:uk-pr-7-dense-index-cache

Conversation

@murphy-4o
Copy link
Copy Markdown
Member

Part of #103486

ClickHouse CacheBase adapter for the RocksDB block cache, plus Context and Server registration and the matching metrics. The adapter implements RocksDB's Cache API — Insert / Lookup / Erase / Ref / Release, capacity, and eviction — directly on top of CacheBase, so it is testable in isolation against the Cache API contract without an SST reader.

Scope:

  • UniqueKeyIndexCache adapter with identity-aware release / erase semantics: Release(handle, erase_if_last_ref=true) removes the entry only if the cached pointer still matches the handle, so a newer entry inserted under the same key is not evicted.
  • Server settings: unique_key_index_cache_policy (default SLRU), unique_key_index_cache_size_bytes (default 1 GiB; 0 disables and getUniqueKeyIndexCache returns nullptr), unique_key_index_cache_size_ratio (default 0.5).
  • Context wiring

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry: N/A (incremental slice; user-facing changelog ships with the last PR in the stack once the full feature is reviewable).

Documentation entry for user-facing changes

  • Documentation will land with the user-facing slice.

ClickHouse-side `CacheBase` adapter that implements RocksDB's `Cache`
API for the UNIQUE KEY index SST block cache. Includes Context
registration, server-side wiring, server settings, ProfileEvents, and
CurrentMetrics.

Settings (read by Server.cpp at startup + reload):
- `unique_key_index_cache_policy` (default `SLRU`)
- `unique_key_index_cache_size_bytes` (default `1 GiB`)
- `unique_key_index_cache_size_ratio` (default `0.5`)

Setting size to 0 disables the cache; `getUniqueKeyIndexCache` returns
nullptr. On `USE_ROCKSDB=0` builds the registration is a graceful no-op.

ProfileEvents:
- `UniqueKeyIndexCacheLookupMicroseconds` — combined Lookup + Insert
  wall-clock inside the adapter.
- `UniqueKeyIndexCacheHits` / `UniqueKeyIndexCacheMisses` — per-lookup
  hit/miss counters.

CurrentMetrics:
- `UniqueKeyIndexCacheBytes`, `UniqueKeyIndexCacheEntries`.

`updateUniqueKeyIndexCacheConfiguration` logs an INFO line when the
configured cache size changes.

Tests (`gtest_unique_key_index_cache.cpp`): 5 cases pinning wrapper-only
behaviour — insert/lookup with hit/miss counter assertion, slice-key
erase, release-erase semantics, identity-aware replacement protection,
concurrency smoke. Tests that re-verify the dependency contracts
(CacheBase eviction, atomic ref counting) are intentionally not included.

Reviewer: PR 10 is the first production reader consumer of this cache;
the SST-reader-cache invalidation race helpers (removeWithToken*,
replaceIfPresent) and their gtest coverage are out of scope here and
travel with PR 10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 7, 2026

Workflow [PR], commit [ecea231]

Summary:


AI Review

Summary

This PR adds a UniqueKeyIndexCache adapter that exposes CacheBase through RocksDB Cache, wires it into server/context settings, and adds unit tests. I found one correctness issue in strict-capacity handling: inserts can still exceed configured capacity when prior entries remain pinned.

Findings

❌ Blockers

  • [src/Storages/MergeTree/UniqueKey/UniqueKeyIndexCache.cpp:109] SetStrictCapacityLimit(true) is bypassable because Insert only checks charge > capacity and ignores already pinned memory (pinned_charge_total). With pinned handles alive, a new insert can be admitted and push live memory beyond capacity under strict mode.
    Suggested fix: include pinned/non-evictable memory in admission control (e.g., reject when charge + pinned_charge_total (+ current usage if needed) > capacity) and add a regression test for this scenario.
Tests
  • ⚠️ Add a unit test that pins one entry at full capacity, removes backing refs, then verifies a second Insert fails under strict capacity (SetStrictCapacityLimit(true)), to prevent regressions of strict-limit enforcement.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Performance & Safety
  • Strict-capacity admission currently ignores pinned memory and can exceed configured cache memory under read workloads with long-lived pinned handles.
Final Verdict

Status: ⚠️ Request changes

Minimum required action: enforce strict-capacity admission against pinned/non-evictable memory in UniqueKeyIndexCache::Insert and cover it with a regression test.

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label May 7, 2026
Comment thread src/Storages/MergeTree/UniqueKey/UniqueKeyIndexCache.cpp Outdated
murphy-4o added 2 commits May 7, 2026 18:05
- SetStrictCapacityLimit: rename parameter from `v` (cpp) and
  `strict_capacity_limit` (header) to `value` in both, satisfying
  readability-inconsistent-declaration-parameter-name and avoiding
  shadow with the same-name member.
- Release(): capture the erased outcome inside the predicate (which
  runs under the cache lock) instead of comparing two count() snapshots.
  Closes a race where unrelated concurrent insert/erase between the two
  count() calls could flip the return value.
Unit tests (amd_llvm_coverage) saw both Hits/Misses deltas as 0 even
though the asan/tsan/msan variants (with and without function_prop_fuzzer)
all see the expected 1/1. The Lookup is happening (the ASSERT_NE / EXPECT_EQ
on returned handles all pass on coverage too) — only the global_counters
read returns 0. Coverage seems to leave the increments in a thread-local
counters subtree that doesn't propagate to global. Gate the delta
assertions behind !WITH_COVERAGE; six other Unit-test configs validate
the wiring.
Comment thread src/Storages/MergeTree/UniqueKey/UniqueKeyIndexCache.cpp
Mirror rocksdb::Cache's strict-capacity contract: refuse the standalone
when strict_capacity_limit is set, the requested charge exceeds the cap,
and the caller did not opt into an uncharged handle. With
allow_uncharged=true (the typical RocksDB caller) the standalone handle
is returned uncharged — consistent with the spec; CacheBase has no
native standalone-tracked path for proper charging.
Comment thread src/Storages/MergeTree/UniqueKey/UniqueKeyIndexCache.cpp Outdated
GetPinnedUsage now reports the sum of charges of currently-pinned
entries (per-entry pin_count gates 0->1 add and 1->0 remove on the
shared total), matching rocksdb::Cache spec. Closes the strict-cap
bypass where pinned-but-evicted entries escaped both GetUsage and
GetPinnedUsage.

CreateStandalone in the strict + over-cap + allow_uncharged=true path
now sets effective_charge=0 so GetCharge / GetUsage(handle) /
GetPinnedUsage all consistently report 0.

Release(erase=true) predicate also requires pin_count==0 under the
backing lock — narrows the Lookup-vs-erase race to a small residual
window where T2's get and pin_count.fetch_add are not jointly under
the backing lock; closing that fully needs a CacheBase API extension
and is out of scope here.
/// over the limit briefly during eviction which is fine for BLOCK cache use.
if (strict_capacity_limit.load(std::memory_order_relaxed))
{
const size_t max = backing->maxSizeInBytes();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetStrictCapacityLimit(true) is still bypassable with pinned handles because Insert only rejects when charge > capacity, but never accounts for memory already pinned outside backing.

Concrete trace with capacity=64:

  1. Insert(k1, charge=64, &h1) succeeds (GetPinnedUsage()==64).
  2. EraseUnRefEntries() clears backing, but h1 still keeps 64 bytes pinned.
  3. Insert(k2, charge=64, nullptr) also succeeds (charge <= capacity), so live memory becomes ~128 bytes under strict mode.

That violates RocksDB strict-capacity semantics (inserts should fail when non-evictable pinned usage leaves insufficient room). Please include pinned memory in admission control, e.g. reject when charge + pinned_charge_total (+ current usage if needed) > capacity, and add a regression test for this scenario.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 8, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 90.62% (319/352) | lost baseline coverage: 7 line(s) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant