Skip to content

Replace collision-chain eviction with global LRU in#97867

Open
crakjie wants to merge 1 commit intoClickHouse:masterfrom
crakjie:cache-dictionary-clock-lru
Open

Replace collision-chain eviction with global LRU in#97867
crakjie wants to merge 1 commit intoClickHouse:masterfrom
crakjie:cache-dictionary-clock-lru

Conversation

@crakjie
Copy link

@crakjie crakjie commented Feb 24, 2026

Description

The previous implementation coupled storage position to key hash, limiting
eviction to a 10-cell collision chain. Frequently accessed keys could be
evicted while cold keys in other chains survived.

Decouple storage from hashing by using a for key-to-slot lookup
and a flat array for cell storage. Eviction now uses a clock algorithm
(GCLOCK) that sweeps all cells globally, giving recently accessed entries
a higher survival count. This provides O(1) amortized eviction with
approximate LRU semantics.

Changelog category (leave one):

  • Improvement

Changelog entry :

Replace collision-chain eviction by approximate LRU in CacheDictionary. Evicted entries should be more predictable.

Documentation entry for user-facing changes

When accessed entries in CacheDictionary are updated to keep track of the recent usages. When when a new entry need a room, the cache evict one of the least recently used key.

  • Motivation:
    With the previous implementation a frequently accessed key can be evicted simply because its collision chain is full of other keys, while completely cold keys in other chains remain untouched. In other words, the cache does not provide LRU semantics — access frequency has little influence on which entries survive

@CLAassistant
Copy link

CLAassistant commented Feb 24, 2026

CLA assistant check
All committers have signed the CLA.

@crakjie
Copy link
Author

crakjie commented Feb 24, 2026

closes #62178

  The previous implementation coupled storage position to key hash, limiting
  eviction to a 10-cell collision chain. Frequently accessed keys could be
  evicted while cold keys in other chains survived. ClickHouse#62178

  Decouple storage from hashing by using a  for key-to-slot lookup
  and a flat array for cell storage. Eviction now uses a clock algorithm
  (GCLOCK) that sweeps all cells globally, giving recently accessed entries
  a higher survival count. This provides O(1) amortized eviction with
  approximate LRU semantics.
@crakjie crakjie force-pushed the cache-dictionary-clock-lru branch from 656ed00 to c9b7730 Compare February 24, 2026 17:13
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Feb 25, 2026
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Feb 25, 2026

Workflow [PR], commit [c9b7730]

Summary:

job_name test_name status info comment
AST fuzzer (arm_asan) failure
Logical error: 'std::exception. Code: 1001, type: std::out_of_range, e.what() = vector (version 26.3.1.13), Stack trace: (STID: 2508-3132) FAIL cidb
BuzzHouse (arm_asan) failure
Logical error: Stream A for column B with type C is not found (STID: 3190-573d) FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Feb 25, 2026
@crakjie
Copy link
Author

crakjie commented Mar 5, 2026

Can someone explain me what is AST fuzzer and BuzzHouse and how to deal with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants