Skip to content

perf(scan): memchr-based fast path for in-string content #5

@membphis

Description

@membphis

Context

The AVX2 scanner's string-skip fast path (added in PR #3, src/scan/avx2.rs:35-43) detects when an entire 64-byte chunk lies inside a string and skips the structural-mask computation. The per-chunk work is still significant:

  • 2 × loadu (free)
  • backslash mask: ~6 ops
  • quote mask: ~6 ops
  • find_escape_mask_with_carry: ~10 scalar ALU ops + branches
  • fast-path branch

~25 ops per "skip" chunk. For the multimodal bench's 10 MB scenario (~95% chunks inside the giant base64 strings), this is the dominant cost.

Proposal

Replace the SIMD-mask path with a memchr-style search for the next interesting byte (" or \) while in_string holds:

  • 1 SIMD load + 1 cmpeq against either-of-two-bytes + 1 movemask + 1 test-and-skip per chunk
  • glibc's memchr peaks at ~30 GB/s; our base64 payload has no quotes/backslashes mid-string so the search bails per chunk with 0 hits

Estimated impact (op-count analysis, not measured)

size est. speedup
100 KB – 1 MB ~1.5–2×
5 MB – 10 MB ~3×

10 MB scan would drop from ~2.9 ms → ~1 ms per iter. Lower bound could be 1.5× if cache or front-end effects dominate.

Validation plan

  • Existing 2000-case scanner_crosscheck proptest (scalar/AVX2 parity)
  • New unit tests: long string spanning >100 chunks with no escapes; long string with periodic escapes
  • make bench 3-run median before/after, posted in PR

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions