Background
The AVX2 and NEON scanners already implement a per-chunk in-string fast probe (landed in issue #5): when the scanner knows it is inside a string and detects no " or \ in the current chunk, it skips the structural classification work and moves to the next chunk. This drops per-chunk cost from ~25 to ~10 ops.
However, the scanner still pays ALU work for every 64-byte chunk inside a long string. For multi-MB single-string payloads (e.g. a JSON value that is a base64-encoded blob), a memchr2(b'"', b'\\') jump could skip directly to the next interesting byte, approaching memory-bandwidth throughput.
Proposal
After the per-chunk fast probe fires, check whether we are far from the end of the current string and use memchr2 to locate the next " or \, then advance i past all intervening chunks in one jump.
Why it was deferred
- No workload in the current bench harness triggers multi-MB single-string payloads.
- Requires careful handling of
bs_carry across the jump boundary.
When to revisit
Once issue #24 (large bench fixtures) lands, check whether large_dump.json contains the kind of string-heavy payloads (large base64, embedded text) that would benefit. If profiling shows the in-string chunk loop as the hot path, implement the jump.
Implementation notes
memchr2 is in the memchr crate (already a common indirect dependency). Evaluate whether adding it directly is worth the dependency.
- The
bs_carry (backslash-carry) value must be zero before jumping; verify no escaped " is skipped.
- Add a proptest-driven regression case covering the cross-chunk boundary.
Background
The AVX2 and NEON scanners already implement a per-chunk in-string fast probe (landed in issue #5): when the scanner knows it is inside a string and detects no
"or\in the current chunk, it skips the structural classification work and moves to the next chunk. This drops per-chunk cost from ~25 to ~10 ops.However, the scanner still pays ALU work for every 64-byte chunk inside a long string. For multi-MB single-string payloads (e.g. a JSON value that is a base64-encoded blob), a
memchr2(b'"', b'\\')jump could skip directly to the next interesting byte, approaching memory-bandwidth throughput.Proposal
After the per-chunk fast probe fires, check whether we are far from the end of the current string and use
memchr2to locate the next"or\, then advanceipast all intervening chunks in one jump.Why it was deferred
bs_carryacross the jump boundary.When to revisit
Once issue #24 (large bench fixtures) lands, check whether
large_dump.jsoncontains the kind of string-heavy payloads (large base64, embedded text) that would benefit. If profiling shows the in-string chunk loop as the hot path, implement the jump.Implementation notes
memchr2is in thememchrcrate (already a common indirect dependency). Evaluate whether adding it directly is worth the dependency.bs_carry(backslash-carry) value must be zero before jumping; verify no escaped"is skipped.