perf(chunker): reduce dirty tracking granularity from 4KB to 2MB#2306
perf(chunker): reduce dirty tracking granularity from 4KB to 2MB#2306
Conversation
The chunker Cache tracked fetched state in a sync.Map at blockSize (4KB)
granularity, producing 1024 entries per 4MB fetch. For large images this
wastes significant memory on dirty map overhead even when only a small
portion is read.
Switch chunker caches to track at 2MB granularity (ChunkerDirtyGranularity),
reducing entries per fetch from 1024 to 2. Overlay and process-memory caches
retain blockSize granularity for ExportToDiff correctness.
Key changes:
- Add dirtyGranularity field to Cache, with NewCacheWithDirtyGranularity
- Rewrite isCached/setIsCached to use aligned dirty granularity keys
- StreamingChunker: defer dirty marking to runFetch after full chunk fetch;
waiter mechanism (bytesReady) still provides block-level notification
- Add sliceDirect for post-waiter mmap reads (bypasses isCached check)
- ExportToDiff guards against accidental use with coarse granularity
BenchmarkRandomAccess results (identical timing, ~97% less memory):
Before After
GCS/StreamingChunker 45350 avg-us 128564 B 2487 alloc 45122 avg-us 4282 B 72 alloc
GCS/FullFetchChunker 66700 avg-us 135076 B 2464 alloc 66168 avg-us 3056 B 53 alloc
NFS/StreamingChunker 6162 avg-us 127782 B 2481 alloc 6165 avg-us 3795 B 70 alloc
NFS/FullFetchChunker 14160 avg-us 134893 B 2462 alloc 14002 avg-us 2826 B 51 alloc
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit 688782f. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2740c3af84
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
matthewlouisbrockman
left a comment
There was a problem hiding this comment.
makes sense, fixes that off by 1 in the setiscached as a bonus
downside looks like we're starting to get a bit of divergent use of the cache between the read chunker / pause paths but think can do that in another PR later
|
The |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 688782f. Configure here.
| end := min(off+length, c.size) | ||
|
|
||
| return (*c.mmap)[off:end], nil | ||
| } |
There was a problem hiding this comment.
sliceDirect missing bounds check can panic
Low Severity
sliceDirect computes end := min(off+length, c.size) but never validates that off < c.size. If off >= c.size, then end <= off, and the expression (*c.mmap)[off:end] panics with a slice bounds out of range. The original Slice method is protected by isCached which returns false for off >= c.size, preventing the mmap access. Since sliceDirect intentionally bypasses isCached, this implicit safety check is lost.
Reviewed by Cursor Bugbot for commit 688782f. Configure here.


The chunker Cache tracked fetched state in a sync.Map at blockSize (4KB) granularity, producing 1024 entries per 4MB fetch. For large images this wastes significant memory on dirty map overhead even when only a small portion is read.
Switch chunker caches to track at 2MB granularity (ChunkerDirtyGranularity), reducing entries per fetch from 1024 to 2. Overlay and process-memory caches retain blockSize granularity for ExportToDiff correctness.
Key changes:
BenchmarkRandomAccess results (identical timing, ~97% less memory):