Skip to content

Make the long_boundary benchmark dominated by the patched code path#280

Merged
Kludex merged 1 commit intomainfrom
boost-long-boundary-bench
May 10, 2026
Merged

Make the long_boundary benchmark dominated by the patched code path#280
Kludex merged 1 commit intomainfrom
boost-long-boundary-bench

Conversation

@Kludex
Copy link
Copy Markdown
Owner

@Kludex Kludex commented May 10, 2026

Summary

  • change LONG_BOUNDARY_BODY content from string.printable (contains \r) to b"abcdefgh" (no \r)
  • bump body size from 2 MiB to 8 MiB

Why

The existing test_multipart_long_boundary does measure the partial-boundary fallback path, but in instrumentation mode CodSpeed reports the change from PR #275 as essentially flat. After tracing it through valgrind-equivalent instruction counting, two issues:

  1. With a string.printable body, \r (the internal boundary[0]) appears every ~100 bytes. Each occurrence triggers the inner state-machine block which advances index past 0 then resets - that loop runs identically in patched and unpatched versions. It dominates the parser's instruction count and washes out the fallback delta.
  2. The 2 MiB body keeps absolute fallback work below CodSpeed's noise floor.

A body without \r lets the parser stay in the find-miss fallback path on every chunk: the unpatched Python while loop scans the whole 16 KiB look-back region per chunk, while the patched bytes.find does the same work in C. Bumping the body to 8 MiB gives 515 chunks of that work.

Local wall-clock (unpatched main vs PR #275):

Body Unpatched Patched Speedup
2 MiB no-\r 88 ms 6 ms 14x
8 MiB no-\r 334 ms 7 ms 50x
16 MiB no-\r 661 ms 7.5 ms 87x

This should give CodSpeed instrumentation a clear, dominant signal on PR #275.

Test plan

AI Disclaimer

This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.

Two changes so the partial-boundary fallback overhead dominates total
parser work, making the speedup visible under instrumentation mode:

- body pattern switches from string.printable (contains \r every ~100
  bytes) to b'abcdefgh' (no \r). With no boundary[0] candidates in
  the body, the inner state machine never engages and the parser's
  per-chunk cost is dominated by the find-miss fallback path.
- body size grows from 2 MiB to 8 MiB so the absolute work in that
  fallback is large enough to clear CodSpeed's instruction-count
  noise floor.

Local wall-clock comparison of unpatched vs patched: 661 ms -> 7.5 ms
(87x) at 16 MiB, 333 ms -> 6.6 ms at 8 MiB.
@Kludex Kludex enabled auto-merge (squash) May 10, 2026 10:00
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 10, 2026

Merging this PR will degrade performance by 72.91%

❌ 1 regressed benchmark
✅ 5 untouched benchmarks
⏩ 6 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation test_multipart_long_boundary 998.1 µs 3,684.7 µs -72.91%

Comparing boost-long-boundary-bench (e8935e5) with main (a6467c9)

Open in CodSpeed

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@Kludex Kludex merged commit 09cb8c3 into main May 10, 2026
13 of 14 checks passed
@Kludex Kludex deleted the boost-long-boundary-bench branch May 10, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant