Conversation
## Bug A rare `IllegalReferenceCountException: refCnt: 0, decrement: 1` occurs when `LogCacheBlock.free()` tries to release a `StreamRecordBatch` whose refCnt is already 0. Root cause: `tryMerge` performs the expensive `mergeBlock` call outside the write lock for performance. During that window, `clearStreamRecords` can acquire the write lock and call `block.free(streamId)`, which removes a `StreamCache` from the block and schedules an async release of all its `StreamRecordBatch` records (refCnt → 0). When `tryMerge` then re-acquires the write lock, it only checks that `left`/`right` are still the same block objects in `blocks` — it does not detect that the blocks were mutated. So the merged `newBlock` (which shares the same record references via `mergeBlock`) replaces `left`/`right` in `blocks`. Later, when `newBlock.free()` is called, it tries to release records that are already at refCnt 0, causing the crash. ## Fix Add a `freeOpsModCount` field to `LogCacheBlock` that is incremented whenever the block is mutated by a free operation (`free()` or `free(long streamId)`). In `tryMerge`, snapshot both blocks' `freeOpsModCount` inside the same write lock that selects `left` and `right`. After `mergeBlock` completes, re-acquire the write lock and verify the counts are unchanged before committing the swap. If either block was mutated in between, the merge is aborted and `newBlock` is simply discarded — preventing stale record references from entering `blocks` and being double-freed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a rare ref-count double-release race in LogCache.tryMerge() by detecting block mutations that can occur while the merge work runs outside the write lock.
Changes:
- Added
freeOpsModCounttoLogCacheBlock, incremented onfree()andfree(streamId)mutations. - Snapshotted
freeOpsModCountfor merge candidates under the write lock, and validated it again before committing the merged block swap. - Aborted merges when either candidate block was mutated during the merge window to prevent stale references from entering
blocks.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add two tests to cover the race condition fix in LogCache.tryMerge: - testMergeAbortedWhenBlockMutatedDuringMerge: verifies that freeOpsModCount is incremented by clearStreamRecords and that tryMerge detects the mutation and aborts the block swap. - testNoDoubleReleaseUnderConcurrentClearAndMerge: stress test that runs markFree and clearStreamRecords concurrently across 200 iterations, asserting no IllegalReferenceCountException occurs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gezi-lzq
approved these changes
Apr 14, 2026
superhx
added a commit
that referenced
this pull request
Apr 14, 2026
## Bug A rare `IllegalReferenceCountException: refCnt: 0, decrement: 1` occurs when `LogCacheBlock.free()` tries to release a `StreamRecordBatch` whose refCnt is already 0. Root cause: `tryMerge` performs the expensive `mergeBlock` call outside the write lock for performance. During that window, `clearStreamRecords` can acquire the write lock and call `block.free(streamId)`, which removes a `StreamCache` from the block and schedules an async release of all its `StreamRecordBatch` records (refCnt → 0). When `tryMerge` then re-acquires the write lock, it only checks that `left`/`right` are still the same block objects in `blocks` — it does not detect that the blocks were mutated. So the merged `newBlock` (which shares the same record references via `mergeBlock`) replaces `left`/`right` in `blocks`. Later, when `newBlock.free()` is called, it tries to release records that are already at refCnt 0, causing the crash. ## Fix Add a `freeOpsModCount` field to `LogCacheBlock` that is incremented whenever the block is mutated by a free operation (`free()` or `free(long streamId)`). In `tryMerge`, snapshot both blocks' `freeOpsModCount` inside the same write lock that selects `left` and `right`. After `mergeBlock` completes, re-acquire the write lock and verify the counts are unchanged before committing the swap. If either block was mutated in between, the merge is aborted and `newBlock` is simply discarded — preventing stale record references from entering `blocks` and being double-freed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(s3stream): add tests for freeOpsModCount merge guard in LogCache Add two tests to cover the race condition fix in LogCache.tryMerge: - testMergeAbortedWhenBlockMutatedDuringMerge: verifies that freeOpsModCount is incremented by clearStreamRecords and that tryMerge detects the mutation and aborts the block swap. - testNoDoubleReleaseUnderConcurrentClearAndMerge: stress test that runs markFree and clearStreamRecords concurrently across 200 iterations, asserting no IllegalReferenceCountException occurs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
superhx
added a commit
that referenced
this pull request
Apr 14, 2026
## Bug A rare `IllegalReferenceCountException: refCnt: 0, decrement: 1` occurs when `LogCacheBlock.free()` tries to release a `StreamRecordBatch` whose refCnt is already 0. Root cause: `tryMerge` performs the expensive `mergeBlock` call outside the write lock for performance. During that window, `clearStreamRecords` can acquire the write lock and call `block.free(streamId)`, which removes a `StreamCache` from the block and schedules an async release of all its `StreamRecordBatch` records (refCnt → 0). When `tryMerge` then re-acquires the write lock, it only checks that `left`/`right` are still the same block objects in `blocks` — it does not detect that the blocks were mutated. So the merged `newBlock` (which shares the same record references via `mergeBlock`) replaces `left`/`right` in `blocks`. Later, when `newBlock.free()` is called, it tries to release records that are already at refCnt 0, causing the crash. ## Fix Add a `freeOpsModCount` field to `LogCacheBlock` that is incremented whenever the block is mutated by a free operation (`free()` or `free(long streamId)`). In `tryMerge`, snapshot both blocks' `freeOpsModCount` inside the same write lock that selects `left` and `right`. After `mergeBlock` completes, re-acquire the write lock and verify the counts are unchanged before committing the swap. If either block was mutated in between, the merge is aborted and `newBlock` is simply discarded — preventing stale record references from entering `blocks` and being double-freed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(s3stream): add tests for freeOpsModCount merge guard in LogCache Add two tests to cover the race condition fix in LogCache.tryMerge: - testMergeAbortedWhenBlockMutatedDuringMerge: verifies that freeOpsModCount is incremented by clearStreamRecords and that tryMerge detects the mutation and aborts the block swap. - testNoDoubleReleaseUnderConcurrentClearAndMerge: stress test that runs markFree and clearStreamRecords concurrently across 200 iterations, asserting no IllegalReferenceCountException occurs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
superhx
added a commit
that referenced
this pull request
Apr 14, 2026
…3316) ## Bug A rare `IllegalReferenceCountException: refCnt: 0, decrement: 1` occurs when `LogCacheBlock.free()` tries to release a `StreamRecordBatch` whose refCnt is already 0. Root cause: `tryMerge` performs the expensive `mergeBlock` call outside the write lock for performance. During that window, `clearStreamRecords` can acquire the write lock and call `block.free(streamId)`, which removes a `StreamCache` from the block and schedules an async release of all its `StreamRecordBatch` records (refCnt → 0). When `tryMerge` then re-acquires the write lock, it only checks that `left`/`right` are still the same block objects in `blocks` — it does not detect that the blocks were mutated. So the merged `newBlock` (which shares the same record references via `mergeBlock`) replaces `left`/`right` in `blocks`. Later, when `newBlock.free()` is called, it tries to release records that are already at refCnt 0, causing the crash. ## Fix Add a `freeOpsModCount` field to `LogCacheBlock` that is incremented whenever the block is mutated by a free operation (`free()` or `free(long streamId)`). In `tryMerge`, snapshot both blocks' `freeOpsModCount` inside the same write lock that selects `left` and `right`. After `mergeBlock` completes, re-acquire the write lock and verify the counts are unchanged before committing the swap. If either block was mutated in between, the merge is aborted and `newBlock` is simply discarded — preventing stale record references from entering `blocks` and being double-freed. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
superhx
added a commit
that referenced
this pull request
Apr 14, 2026
…3315) ## Bug A rare `IllegalReferenceCountException: refCnt: 0, decrement: 1` occurs when `LogCacheBlock.free()` tries to release a `StreamRecordBatch` whose refCnt is already 0. Root cause: `tryMerge` performs the expensive `mergeBlock` call outside the write lock for performance. During that window, `clearStreamRecords` can acquire the write lock and call `block.free(streamId)`, which removes a `StreamCache` from the block and schedules an async release of all its `StreamRecordBatch` records (refCnt → 0). When `tryMerge` then re-acquires the write lock, it only checks that `left`/`right` are still the same block objects in `blocks` — it does not detect that the blocks were mutated. So the merged `newBlock` (which shares the same record references via `mergeBlock`) replaces `left`/`right` in `blocks`. Later, when `newBlock.free()` is called, it tries to release records that are already at refCnt 0, causing the crash. ## Fix Add a `freeOpsModCount` field to `LogCacheBlock` that is incremented whenever the block is mutated by a free operation (`free()` or `free(long streamId)`). In `tryMerge`, snapshot both blocks' `freeOpsModCount` inside the same write lock that selects `left` and `right`. After `mergeBlock` completes, re-acquire the write lock and verify the counts are unchanged before committing the swap. If either block was mutated in between, the merge is aborted and `newBlock` is simply discarded — preventing stale record references from entering `blocks` and being double-freed. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
A rare
IllegalReferenceCountException: refCnt: 0, decrement: 1occurs whenLogCacheBlock.free()tries to release aStreamRecordBatchwhose refCnt is already 0.tryMergeperformsmergeBlockoutside the write lock for performance. During that window,clearStreamRecordscan acquire the write lock and callblock.free(streamId), which removes aStreamCachefrom the block and schedules an async release of all itsStreamRecordBatchrecords (refCnt → 0). WhentryMergere-acquires the write lock, it only checks thatleft/rightare still the same block objects inblocks— it does not detect that the blocks were mutated. So the mergednewBlock(which shares the same record references viamergeBlock) replacesleft/rightinblocks. Later, whennewBlock.free()is called, it tries to release records already at refCnt 0, causing the crash.Fix
Add a
freeOpsModCountfield toLogCacheBlockthat is incremented whenever the block is mutated by a free operation (free()orfree(long streamId)). IntryMerge, snapshot both blocks'freeOpsModCountinside the same write lock that selectsleftandright. AftermergeBlockcompletes, re-acquire the write lock and verify the counts are unchanged before committing the swap. If either block was mutated in between, the merge is aborted — preventing stale record references from enteringblocksand being double-freed.Test plan
LogCacheunit tests passIllegalReferenceCountExceptionunder concurrentclearStreamRecords+markFreeworkloads🤖 Generated with Claude Code