refactor(dpi/http): case-fold header name without allocating#301
Merged
domcyrus merged 1 commit intoMay 21, 2026
Merged
Conversation
The header loop in `analyze_http` lowercased every header line via `to_lowercase()` only to compare the result against the two ASCII literals "host" and "user-agent". That allocates a fresh `String` per header line per packet, on the DPI hot path. HTTP/1.1 §3.2 makes field-names case-insensitive, so compare in place with `str::eq_ignore_ascii_case` — no allocation, behavior preserved. Add `test_http_mixed_case_host_and_user_agent_headers` covering four case mixes per header name so a future refactor that drops the case-insensitive compare fails this test instead of silently regressing — same shape as the regression tests added in domcyrus#290 (SSDP) and domcyrus#278 (NetBIOS). Refs domcyrus#300
|
This PR is lacking any proof that this is a performance gain and or that the compiler does not optimize the current code to the same as the proposed code. |
Owner
|
@laundmo That's a fair point but I do actually think that this code is actually slightly cleaner and more correct (ASCII-only matching vs. full Unicode case-folding. Hence I think this is a good change. |
Owner
|
@obchain Thanks for you PR, this LGTM! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The header loop in
analyze_http(src/network/dpi/http.rs) lowercased every header line into a freshStringonly to compare the result against the two ASCII literals"host"and"user-agent". For an HTTP request with N header lines this allocates NStrings per packet, on the DPI hot path.HTTP/1.1 §3.2 makes field-names case-insensitive, so compare in place with
str::eq_ignore_ascii_case— no allocation, behavior preserved:Same shape as the SSDP fix in #290 — the HTTP path is hotter (every detected packet on a TCP HTTP flow), and realistic header counts are 8–20+ per request, so the savings are larger.
Closes #300.
Why a regression test
The existing tests only exercised the canonical
Host:capitalisation, so dropping the case-insensitive compare would not have failed any test. The newtest_http_mixed_case_host_and_user_agent_headersexercises four case mixes per header name (Host,host,HOST,hOsTandUser-Agent,user-agent,USER-AGENT,User-AGENT) so a future refactor that drops the case fold will fail this test instead of silently regressing — same shape as the regression test added in #290 (SSDP) and #278 (NetBIOS).Verification
cargo test --lib: 362 passed, 0 failed (5 innetwork::dpi::http, incl. the new test).cargo clippy --all-targets -- -D warnings: clean.cargo fmt --check: clean.Notes
Real-world HTTP traffic uses varied capitalisation for header names: most clients send
Host/User-Agent, butcurl --headerlets the user send any casing, HTTP/2 → HTTP/1.1 gateways often lower-case the names, and RFC 7230 §3.2 explicitly makes the field-name case-insensitive. The previousto_lowercase()already accepted any casing — this change keeps that behavior without the per-header allocation.