Skip to content

perf(scan): memchr2 cross-chunk jump for very long string interiors #26

@membphis

Description

@membphis

Background

The AVX2 and NEON scanners already implement a per-chunk in-string fast probe (landed in issue #5): when the scanner knows it is inside a string and detects no " or \ in the current chunk, it skips the structural classification work and moves to the next chunk. This drops per-chunk cost from ~25 to ~10 ops.

However, the scanner still pays ALU work for every 64-byte chunk inside a long string. For multi-MB single-string payloads (e.g. a JSON value that is a base64-encoded blob), a memchr2(b'"', b'\\') jump could skip directly to the next interesting byte, approaching memory-bandwidth throughput.

Proposal

After the per-chunk fast probe fires, check whether we are far from the end of the current string and use memchr2 to locate the next " or \, then advance i past all intervening chunks in one jump.

Why it was deferred

  • No workload in the current bench harness triggers multi-MB single-string payloads.
  • Requires careful handling of bs_carry across the jump boundary.

When to revisit

Once issue #24 (large bench fixtures) lands, check whether large_dump.json contains the kind of string-heavy payloads (large base64, embedded text) that would benefit. If profiling shows the in-string chunk loop as the hot path, implement the jump.

Implementation notes

  • memchr2 is in the memchr crate (already a common indirect dependency). Evaluate whether adding it directly is worth the dependency.
  • The bs_carry (backslash-carry) value must be zero before jumping; verify no escaped " is skipped.
  • Add a proptest-driven regression case covering the cross-chunk boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions