Speed up multipart header parsing and callback dispatch#295
Conversation
Parse header field names and values with bytes.find/translate to jump to the delimiter instead of scanning byte by byte, and drop the per-callback logger.debug calls from the hot path. This roughly halves parse time for header-heavy and small forms (large_form ~3x, simple_form ~2x, querystring ~2.4x on CPython 3.13/3.14), with no behaviour change for valid input.
Merging this PR will improve performance by 34.1%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | test_querystring_large_form |
1,229.4 µs | 916.8 µs | +34.1% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/hot-path-optimizations (0a64a18) with main (9d3ead5)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a64a18255
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| field = data[i:end] | ||
| if field.translate(None, TOKEN_CHARS): |
There was a problem hiding this comment.
Enforce header size before bulk-validating field names
For a malformed multipart part whose header field never contains : and is delivered in a large write() chunk, this slices and translates the entire remaining chunk before advance_header_size(end - i) runs below. That means the default max_header_size no longer bounds the work or temporary allocation for oversized header names; a request with megabytes of token characters in a header line will be scanned/copied in full instead of failing once the 4 KiB limit is crossed.
Useful? React with 👍 / 👎.
Summary
Two hot-path optimizations to
MultipartParser, both behaviour-preserving for valid input:find-based header parsing.HEADER_FIELDandHEADER_VALUEnow jump straight to the delimiter withdata.find()and validate the whole field-name span at once viabytes.translate(None, TOKEN_CHARS), instead of scanning byte by byte. This mirrors howPART_DATAand the querystring parser already usedata.find.logger.debug.BaseParser.callbackwas issuing twologger.debug(...)calls on every single callback (per part-data chunk, per header, per field), which dominated dispatch cost even when logging was disabled.A couple of incidental cleanups:
boundary_length = len(boundary)is hoisted out of the per-iterationPART_DATApath, andTOKEN_CHARSis exposed asbytes(withTOKEN_CHARS_SETderived from it) for the bulk validation.Benchmarks
Measured against
mainon CPython 3.13 and 3.14 (best-of-N, ns/op), using the existingtests/test_benchmarks.pyworkloads:large_formsimple_formquerystringfile_uploadworstcaseHeader-heavy and small forms benefit most; body-dominated cases (
file_upload,worstcase) already usedfindfor boundary scanning, so they only pick up the callback-dispatch savings.Correctness
main's implementation: identical callback event streams and identical errors across randomized bodies x every chunk-split strategy (including byte-by-byte), at both small and default header limits.tests/test_formparsers.py(40 tests) against this branch - all pass.The one observable difference is on a malformed-input error path: a header name that is both over
max_header_sizeand contains an invalid character may report the invalid-character error rather than the size-limit error (both still raiseMultipartParseError). This was deemed acceptable.AI Disclaimer
This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.