OAK-12216 : clear segment cache after successful compaction#2904
OAK-12216 : clear segment cache after successful compaction#2904rishabhdaim wants to merge 4 commits into
Conversation
|
Build is failing due to resource leakage in DocumentNodeStoreIT test. To be fixed by : #2905 |
|
What AI flagged (I'm not familiar with this part of the code, so I would need some time to read up on it, to give a proper review. Since this was flagged as the mosts risky PR, I'm still adding it): After a successful compaction (Oak's GC), this PR throws away the entire L2 cache so the new generation of segments can fill it from scratch. Why it's problematic:
|
|
thanks @ChlineSaurus for the review. I have asked re-review for the AI suggested review and below are the findings: Verdict on each AI claim
Mostly correct, but overstated. DefaultCleanupStrategy.cleanup() (line 41) does unconditionally call context.getSegmentCache().clear(). The PR adds a second clear in notifyCompactionSucceeded → However "strictly worse" overstates it. The first clear's intent is legitimate: evict old-gen segments before new-gen reads start filling the cache, so new-gen content doesn't compete with stale entries. The
Misleading. That comment is in CleanupFirstCompactionStrategy.java:60 and refers to the pre-compaction cleanup phase — when the new generation hasn't been committed yet and transient (in-flight) segments exist.
Technically true but practically negligible. The cacheManager.evictOldGeneration() call happens inside compactionSucceeded(), which fires after the new code's cache clear. But after setHead() commits the new
Valid. notifyCompactionSucceeded() is called from both compactionSucceeded() (line 288) and compactionPartiallySucceeded() (line 292). For partial success, segments from the new partial generation were just
Overstated. The putSegment() implementation calls id.loaded(segment) before cache.put(id, segment). If a concurrent clear() fires between those two calls, the L1 memoization (SegmentId.segment) gets cleared by P.S. : The one claim worth acting on is 4: notifyCompactionSucceeded() should probably skip the cache clear when !compacted.isComplete() (partial success). addressed this here : d2bdaea |
jsedding
left a comment
There was a problem hiding this comment.
I am skeptical that clearing the cache after compaction is a good idea.
On a running system, there may be open sessions that reference a pre-compacted head state. Clearing the cache may therefore cause an unexpected surge in loads of old segments. Given that all segments from the new compacted generation need to also get into the cache at the same time, I believe that this may cause more trouble than it's worth.
I would expect Caffeine's normal eviction mechanism to take care of the GC generation change-over in a smoother fashion than what this change could achieve.
|
@jsedding Yes, I agree with you on this. Also, based on the benchmarks, the improvement is only for first few thousand entries only. |
After compaction, old-generation segments remain in the Caffeine cache with saturated W-TinyLFU frequency counts (freq=15). New-generation segments enter at freq=0 and are auto-rejected by the admission gate (threshold=5), causing a read freeze until they accumulate enough frequency. Calling cache.clear() immediately after compaction evicts old-generation incumbents so new-generation segments can fill the cache directly without competing against saturated sketch counts. Change is guarded by FeatureToggle FT_CLEAR_CACHE_OAK-12216 (enabled by default — bug-fix behaviour). Disable at runtime if needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the standalone FeatureToggle field with a public AtomicBoolean (FT_OAK_12216_ENABLE) as the shared state, and a string constant for the toggle name (FT_CLEAR_CACHE_OAK_12216). SegmentNodeStoreRegistrar creates a FeatureToggle wrapping that AtomicBoolean and registers it on the Whiteboard so runtime tooling (JMX) can discover and flip it without a code change. notifyCompactionSucceeded() reads FT_OAK_12216_ENABLE.get() directly for zero-overhead access on the hot compaction path. AbstractCompactionStrategy visibility widened to public so the registrar (different package) can reference the constants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
As discussed with @jsedding we won't be fixing this. |



No description provided.