Skip to content

perf: memoize encoded inner chunk for scalar complete-shard writes#177

Merged
d-v-b merged 1 commit into
perf/prepared-write-v2from
perf/prepared-write-v2-scalar-memo
May 30, 2026
Merged

perf: memoize encoded inner chunk for scalar complete-shard writes#177
d-v-b merged 1 commit into
perf/prepared-write-v2from
perf/prepared-write-v2-scalar-memo

Conversation

@d-v-b
Copy link
Copy Markdown
Owner

@d-v-b d-v-b commented May 30, 2026

In ShardingCodec._encode_partial_sync's full-shard-rewrite loop, a scalar broadcast value produces byte-for-byte identical results for every complete inner chunk (same fill, same empty-check, same encoded bytes). Compute that outcome once and reuse it across all complete chunks instead of re-merging, re-checking write_empty_chunks, and re-encoding tens of thousands of identical chunks. Incomplete edge chunks still merge against their own data individually.

Target case (fused, memory, chunks=100/shards=1M, no compression): write 92.26ms -> 21.59ms (4.3x). Pipeline parity (byte-identical to batched) and 956 tests pass under the fused pipeline; adversarial partial-overwrite/ edge/compression/2D/aliasing checks pass.

[Description of PR]

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

In ShardingCodec._encode_partial_sync's full-shard-rewrite loop, a scalar
broadcast value produces byte-for-byte identical results for every complete
inner chunk (same fill, same empty-check, same encoded bytes). Compute that
outcome once and reuse it across all complete chunks instead of re-merging,
re-checking write_empty_chunks, and re-encoding tens of thousands of identical
chunks. Incomplete edge chunks still merge against their own data individually.

Target case (fused, memory, chunks=100/shards=1M, no compression):
write 92.26ms -> 21.59ms (4.3x). Pipeline parity (byte-identical to batched)
and 956 tests pass under the fused pipeline; adversarial partial-overwrite/
edge/compression/2D/aliasing checks pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-v-b d-v-b merged commit c9c8c26 into perf/prepared-write-v2 May 30, 2026
2 checks passed
@d-v-b d-v-b deleted the perf/prepared-write-v2-scalar-memo branch May 30, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant