Skip to content

perf(chunker): reduce dirty tracking granularity from 4KB to 2MB#2306

Merged
djeebus merged 4 commits intomainfrom
lev-chunker-resolution
Apr 6, 2026
Merged

perf(chunker): reduce dirty tracking granularity from 4KB to 2MB#2306
djeebus merged 4 commits intomainfrom
lev-chunker-resolution

Conversation

@djeebus
Copy link
Copy Markdown
Contributor

@djeebus djeebus commented Apr 6, 2026

The chunker Cache tracked fetched state in a sync.Map at blockSize (4KB) granularity, producing 1024 entries per 4MB fetch. For large images this wastes significant memory on dirty map overhead even when only a small portion is read.

Switch chunker caches to track at 2MB granularity (ChunkerDirtyGranularity), reducing entries per fetch from 1024 to 2. Overlay and process-memory caches retain blockSize granularity for ExportToDiff correctness.

Key changes:

  • Add dirtyGranularity field to Cache, with NewCacheWithDirtyGranularity
  • Rewrite isCached/setIsCached to use aligned dirty granularity keys
  • StreamingChunker: defer dirty marking to runFetch after full chunk fetch; waiter mechanism (bytesReady) still provides block-level notification
  • Add sliceDirect for post-waiter mmap reads (bypasses isCached check)
  • ExportToDiff guards against accidental use with coarse granularity

BenchmarkRandomAccess results (identical timing, ~97% less memory):

Test Before: CPU Memory Allocs After: CPU Memory Allocs
GCS/StreamingChunker 45350 us 128 KB 2487 45122 us 4 KB 72
GCS/FullFetchChunker 66700 us 135 KB 2464 66168 us 3 KB 53
NFS/StreamingChunker 6162 us 128 KB 2481 6165 us 4 KB 70
NFS/FullFetchChunker 14160 us 134 KB 2462 14002 us 3 KB 51

The chunker Cache tracked fetched state in a sync.Map at blockSize (4KB)
granularity, producing 1024 entries per 4MB fetch. For large images this
wastes significant memory on dirty map overhead even when only a small
portion is read.

Switch chunker caches to track at 2MB granularity (ChunkerDirtyGranularity),
reducing entries per fetch from 1024 to 2. Overlay and process-memory caches
retain blockSize granularity for ExportToDiff correctness.

Key changes:
- Add dirtyGranularity field to Cache, with NewCacheWithDirtyGranularity
- Rewrite isCached/setIsCached to use aligned dirty granularity keys
- StreamingChunker: defer dirty marking to runFetch after full chunk fetch;
  waiter mechanism (bytesReady) still provides block-level notification
- Add sliceDirect for post-waiter mmap reads (bypasses isCached check)
- ExportToDiff guards against accidental use with coarse granularity

BenchmarkRandomAccess results (identical timing, ~97% less memory):

                              Before                          After
GCS/StreamingChunker   45350 avg-us  128564 B  2487 alloc   45122 avg-us  4282 B  72 alloc
GCS/FullFetchChunker   66700 avg-us  135076 B  2464 alloc   66168 avg-us  3056 B  53 alloc
NFS/StreamingChunker    6162 avg-us  127782 B  2481 alloc    6165 avg-us  3795 B  70 alloc
NFS/FullFetchChunker   14160 avg-us  134893 B  2462 alloc   14002 avg-us  2826 B  51 alloc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 6, 2026

PR Summary

Medium Risk
Changes cache coherency/availability tracking and streaming fetch synchronization, which could surface as false cache misses or racey reads if the new granularity/waiter assumptions are wrong. Impact is limited to the chunker cache layer and includes added guardrails/tests.

Overview
Updates the block cache to support configurable dirty tracking granularity and switches chunker caches to mark fetched data at a coarse 2MiB resolution to cut sync.Map overhead. To keep streaming reads correct with coarse marking, it adds a direct mmap slicing path after the waiter mechanism, defers dirty marking until a full chunk fetch completes, closes a small TOCTOU window around session creation/removal, and prevents ExportToDiff from running when dirty tracking isn’t block-level.

Reviewed by Cursor Bugbot for commit 688782f. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2740c3af84

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

@matthewlouisbrockman matthewlouisbrockman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, fixes that off by 1 in the setiscached as a bonus

downside looks like we're starting to get a bit of divergent use of the cache between the read chunker / pause paths but think can do that in another PR later

@matthewlouisbrockman
Copy link
Copy Markdown
Contributor

The io.EOF bug is real should fix that though

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 688782f. Configure here.

end := min(off+length, c.size)

return (*c.mmap)[off:end], nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sliceDirect missing bounds check can panic

Low Severity

sliceDirect computes end := min(off+length, c.size) but never validates that off < c.size. If off >= c.size, then end <= off, and the expression (*c.mmap)[off:end] panics with a slice bounds out of range. The original Slice method is protected by isCached which returns false for off >= c.size, preventing the mmap access. Since sliceDirect intentionally bypasses isCached, this implicit safety check is lost.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 688782f. Configure here.

@djeebus djeebus merged commit 3d9bb32 into main Apr 6, 2026
36 checks passed
@djeebus djeebus deleted the lev-chunker-resolution branch April 6, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants