Speed up partial-boundary tail scan via `bytes.find` by Kludex · Pull Request #281 · Kludex/python-multipart

Kludex · 2026-05-10T10:18:09Z

Summary

replace the Python-level byte-by-byte while loop in the multipart partial-boundary tail scan with a single bytes.find call

Why

When a chunk does not contain the full boundary, the parser falls back to scanning the trailing region for boundary[0]. The fallback was a Python while loop, which dominates parse time for non-trivial boundary lengths. Routing the same scan through bytes.find keeps it at C speed without changing behavior.

Measured against a 2 MB body fed in 16 KB chunks:

Boundary length	Before	After
16 B	0.43 ms	0.33 ms
2 KB	12 ms	1.84 ms
8 KB	43 ms	3.49 ms
32 KB	92 ms	12.3 ms

The post-condition of the original loop (either data[i] == boundary[0] or i == data_length - 1) is preserved exactly by the bytes.find replacement. All existing tests pass unchanged, including test_random_splitting which exhaustively exercises the partial-boundary path.

On the CodSpeed report

CodSpeed will likely report this PR as "unchanged". That is expected and not a sign the optimization is missing.

CodSpeed instrumentation mode counts retired CPU instructions, not wall-clock time. This optimization is a constant-factor win: bytes.find (C-level memchr/SIMD) replaces a Python interpreter loop. The wall-clock difference is large because CPython's per-bytecode interpreter overhead is high; the retired-instruction-count difference is small because the parser still walks the byte stream interpretively in many surrounding paths that are unchanged. Local wall-clock measurements confirm 5-87x speedup on the test_multipart_long_boundary scenario depending on body size.

Test plan

pytest tests/ passes (147 tests)

AI Disclaimer

This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.

The fallback path for the partial-boundary tail used a Python-level byte-by-byte while loop. Replace it with a single bytes.find call, which scans the same range at C speed. Behavior is identical: the loop's post-condition (i lands on boundary[0] or at data_length - 1) is preserved exactly.

codspeed-hq · 2026-05-10T10:18:53Z

Merging this PR will not alter performance

✅ 6 untouched benchmarks

_{Comparing speed-up-partial-boundary-scan-v2 (7ebad5f) with main (09cb8c3)}

Kludex merged commit d1b5739 into main May 10, 2026
14 checks passed

Kludex deleted the speed-up-partial-boundary-scan-v2 branch May 10, 2026 10:32

Kludex mentioned this pull request May 10, 2026

Version 0.0.28 #284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up partial-boundary tail scan via `bytes.find`#281

Speed up partial-boundary tail scan via `bytes.find`#281
Kludex merged 1 commit intomainfrom
speed-up-partial-boundary-scan-v2

Kludex commented May 10, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Kludex commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

On the CodSpeed report

Test plan

AI Disclaimer

Uh oh!

codspeed-hq Bot commented May 10, 2026

Merging this PR will not alter performance

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kludex commented May 10, 2026 •

edited

Loading