fuzz: improve coverage with new serializers target and stronger invariant checks by anonrig · Pull Request #1104 · ada-url/ada

anonrig · 2026-03-28T01:34:11Z

Add a new fuzz/serializers.cc target that directly exercises code paths
that were only reachable indirectly before:

ada::serializers::ipv4() and ada::serializers::ipv6() with arbitrary
uint64_t / uint16_t[8] inputs, including a round-trip invariant check for
find_longest_sequence_of_ipv6_pieces.
ada::checkers::try_parse_ipv4_fast() with a serialisation round-trip
assertion (parse → serialise → re-parse must give the same address).
All six ada::unicode::percent_encode() character-set variants, plus the
template <false>/<true> forms, with a decode round-trip check.
Integration tests that embed the synthesised addresses in real URLs and
verify href idempotency.

Strengthen existing targets with new invariant-based checks:

parse.cc

FDP-controlled sequential setter testing: the fuzzer now picks an
arbitrary sequence of setter/value pairs and asserts that ada::url and
ada::url_aggregator stay in sync (same href) after every step.
Re-parse idempotency: parse(href).get_href() == href is asserted for
every successfully parsed URL.
url_search_params round-trip via URL: parses the fuzz input as a query
string, mutates via search-params, and verifies serialisation idempotency
before setting it back on the URL.

can_parse.cc

Adds href round-trip: if parse(source) succeeds, can_parse(href) must
be true and parse(href).get_href() must equal href.

url_search_params.cc

Adds serialisation idempotency: url_search_params(sp.to_string()).to_string() == sp.to_string() is checked for both constructed and mutated params.
Adds URL round-trip integration test.

idna.cc

Adds to_ascii stability: to_ascii(to_ascii(x)) == to_ascii(x).
Adds to_unicode stability.
Adds long-domain (≤270 char) inputs to exercise DNS-length boundary checks.
Adds Punycode decode→encode→decode round-trip with equality assertion.

url.dict

Adds hex/octal IPv4 forms, IPv4-mapped IPv6, Windows/UNC file paths,
data URIs with content types, tricky percent sequences (overlong UTF-8,
double-encoding, C0 controls), opaque-host URLs, and scheme case variants.

https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

…iant checks Add a new `fuzz/serializers.cc` target that directly exercises code paths that were only reachable indirectly before: - `ada::serializers::ipv4()` and `ada::serializers::ipv6()` with arbitrary uint64_t / uint16_t[8] inputs, including a round-trip invariant check for `find_longest_sequence_of_ipv6_pieces`. - `ada::checkers::try_parse_ipv4_fast()` with a serialisation round-trip assertion (parse → serialise → re-parse must give the same address). - All six `ada::unicode::percent_encode()` character-set variants, plus the template `<false>`/`<true>` forms, with a decode round-trip check. - Integration tests that embed the synthesised addresses in real URLs and verify href idempotency. Strengthen existing targets with new invariant-based checks: `parse.cc` - FDP-controlled sequential setter testing: the fuzzer now picks an arbitrary sequence of setter/value pairs and asserts that `ada::url` and `ada::url_aggregator` stay in sync (same href) after every step. - Re-parse idempotency: `parse(href).get_href() == href` is asserted for every successfully parsed URL. - `url_search_params` round-trip via URL: parses the fuzz input as a query string, mutates via search-params, and verifies serialisation idempotency before setting it back on the URL. `can_parse.cc` - Adds href round-trip: if `parse(source)` succeeds, `can_parse(href)` must be true and `parse(href).get_href()` must equal `href`. `url_search_params.cc` - Adds serialisation idempotency: `url_search_params(sp.to_string()).to_string() == sp.to_string()` is checked for both constructed and mutated params. - Adds URL round-trip integration test. `idna.cc` - Adds `to_ascii` stability: `to_ascii(to_ascii(x)) == to_ascii(x)`. - Adds `to_unicode` stability. - Adds long-domain (≤270 char) inputs to exercise DNS-length boundary checks. - Adds Punycode decode→encode→decode round-trip with equality assertion. `url.dict` - Adds hex/octal IPv4 forms, IPv4-mapped IPv6, Windows/UNC file paths, data URIs with content types, tricky percent sequences (overlong UTF-8, double-encoding, C0 controls), opaque-host URLs, and scheme case variants. https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

…rage url_pattern.cc - Add exec/test consistency invariant: exec() finds a match iff test() returns true, checked for both no-base and with-base variants, and for url_pattern_input (init-based) inputs. An abort is triggered on any disagreement. - Introduce exercise_match_result() helper that also walks the previously untested url_pattern_result::inputs vector. - Add testing for url_pattern_init::base_url, a field that was previously never exercised by any fuzzer (tested with a fixed and a fuzzed base). - Add random bitmask field selection so every subset (0..255) of the eight url_pattern_init component fields is reachable, covering combinations the previous single-field and all-fields cases missed. ada_c.c - Add C-API re-parse idempotency check: after all setter mutations the URL's final href is re-parsed via ada_parse() and compared byte-for-byte; an abort is triggered if parsing fails or produces a different href. serializers.cc - Add direct coverage of internal utility functions unreachable through the URL parsing pipeline alone: - ada::unicode::has_tabs_or_newline() - ada::checkers::is_ipv4() with cross-check: if is_ipv4 reports true, embedding the string as an http:// host must be accepted by the parser - ada::checkers::path_signature() - ada::checkers::is_windows_drive_letter() and is_normalized_windows_drive_letter() with a consistency assertion (normalised ⊆ drive-letter) - ada::unicode::to_lower_ascii() (in-place, length invariant asserted) - ada::unicode::contains_forbidden_domain_code_point() - ada::unicode::contains_forbidden_domain_code_point_or_upper() - ada::checkers::verify_dns_length() https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

codecov · 2026-03-28T01:35:55Z

Codecov Report

❌ Patch coverage is 17.94872% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.60%. Comparing base (3b9bacb) to head (a616e41).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/ada_idna.cpp	8.57%	26 Missing and 6 partials ⚠️

❌ Your patch status has failed because the patch coverage (17.94%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1104      +/-   ##
==========================================
- Coverage   59.86%   59.60%   -0.27%     
==========================================
  Files          37       37              
  Lines        5811     5815       +4     
  Branches     2846     2848       +2     
==========================================
- Hits         3479     3466      -13     
- Misses        537      569      +32     
+ Partials     1795     1780      -15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tl::expected<T, E> cannot be compared to std::nullopt_t; use .has_value() instead. This was caught by the OSS-Fuzz CI build. https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

…ents - Run clang-format --style=Google on all fuzz files to fix CI violations - Add fuzz/unicode.cc: new fuzzer for 15+ low-level unicode/checker/helper utilities (per-char classification, string helpers, prune_hash, trim, remove_ascii_tab_or_newline, delimiter finders, to_ascii internal API) with consistency invariants (forbidden_host⊆forbidden_domain, ascii_digit⊆ascii_hex_digit, lowercase_hex⊆ascii_hex_digit, etc.) - Add fuzz/unicode.options pointing at url.dict - Add unicode.cc compilation to fuzz/build.sh - fuzz/parse.cc: assert clear_port/clear_search/clear_hash postconditions - fuzz/url_search_params.cc: assert has(k) and get(k) consistency during range-for iteration; verify has(k,v)→has(k) invariant - fuzz/ada_c.c: validate ada_get_scheme_type ∈ [0,6], ada_get_host_type ∈ [0,2], and all ada_url_components offsets ≤ href.length when not omitted https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

ada_get_components() returns const ada_url_components*, not a value. Use pointer variable + -> dereference for comps, and keep . access for the ada_string and ada_version_components value types. https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

ada_url_components.port stores the port NUMBER (0-65535), not an offset into the href string. Checking port > href.length gives false positives for any URL whose port number exceeds the href length (e.g. http://example.com:8080/). Remove the erroneous port bounds check. The other offset fields (protocol_end, username_end, host_start, host_end, pathname_start, search_start, hash_start) are genuine byte offsets into the href and their bounds checks are correct. https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

Both functions had an early-exit check `input.starts_with("xn--")` that was intended to prevent double-encoded ACE labels (a decoded Punycode label that itself starts with "xn--"). However the check was incorrect: Punycode always outputs all ASCII (basic) code points first in the encoded stream, regardless of their position in the original label. A mixed label like "CJK-chars xn--" encodes to content starting with "xn--" even though the decoded label starts with a non-ASCII character. Consequence: ada accepted such URLs (e.g. http://[cjk]xn--./), produced a normalized href with the host ACE-encoded as "xn--xn---...", then failed to re-parse that href -- a self-consistency violation. Fix: remove the premature starts_with check from punycode_to_utf32 and instead add a post-decode check that rejects labels whose decoded form actually starts with U+0078 U+006E U+002D U+002D ("xn--"). This correctly handles the dangerous double-encoding case while allowing valid mixed labels. verify_punycode is simplified to delegate to punycode_to_utf32, which carries the correct check, rather than duplicating the logic with the same bug. Fixes round-trip idempotency: ada_parse(href) now succeeds when href was produced by ada itself, for all tested inputs. https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

codspeed-hq · 2026-03-28T12:20:20Z

Merging this PR will not alter performance

✅ 27 untouched benchmarks
⏩ 4 skipped benchmarks¹

_{Comparing claude/improve-fuzzing-coverage-qRMnO (a616e41) with main (3b9bacb)}

4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

The previous fix for the nested-ACE-label security check (whatwg/url#803) made verify_punycode delegate to punycode_to_utf32, which allocates a std::u32string. This regressed percent-encode / IDNA performance because: 1. verify_punycode now heap-allocates on every call. 2. The to_unicode call site called verify_punycode then immediately called punycode_to_utf32 on the same input -- two full decodes. Fix both issues: * Restore verify_punycode to be fully allocation-free: the new implementation mirrors punycode_to_utf32's decode loop but only tracks the first 4 code points (via a stack-allocated uint32_t[4]), which is all that is needed to check the "xn--" prefix invariant. * Collapse the nested verify_punycode + punycode_to_utf32 calls at the to_unicode call site into a single punycode_to_utf32 call. Both outer/inner else-branches did the same thing (append original label), so the two levels reduce to one. Also add a regression test that exercises the mixed-label case that triggered the original bug (a label like U+33FF U+33FD xn-- that encodes to Punycode content starting with "xn--" even though the decoded label starts with a non-ASCII code point). https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

Both url::parse_host and url_aggregator::parse_host have fast paths for pure decimal IPv4 and simple ASCII domains that returned true without setting is_valid=true. If a prior setter call left is_valid=false (e.g. set_host with a forbidden code point), a subsequent set_host taking the fast path would leave is_valid=false, causing parse_port to skip updating the port silently — diverging url and url_aggregator state and triggering the sequential-setter href-mismatch abort in fuzz/parse.cc:534. Fix by explicitly setting is_valid=true before returning from each successful fast path in both implementations. https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

Adds set_host_fast_path_restores_is_valid typed test covering the bug where a failed set_host (leaving is_valid=false) followed by a successful set_host taking the fast path would leave is_valid=false, causing parse_port to silently skip updating the port. https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

Applies relevant changes from ada-url/ada#1104: - punycode_to_utf32: remove premature starts_with("xn--") early-exit that incorrectly rejected valid punycode whose payload begins with "xn--" (e.g. a mixed label like U+33FF U+33FD followed by ASCII "xn--"). Replace with a post-decode check that rejects labels whose *decoded* form starts with U'x' U'n' U'-' U'-', which correctly identifies double-encoded ACE. - verify_punycode: rewrite to be allocation-free by tracking only the first 4 decoded code points using a stack array, simulating the insert-at-position operation just for those slots. This avoids heap allocation while preserving the correctness of the double-ACE detection. - to_unicode: collapse the redundant verify_punycode + punycode_to_utf32 calls into a single punycode_to_utf32 call, eliminating the double-decode overhead. Adds a regression test covering the mixed-label case and verifying to_ascii idempotency. Fixes: whatwg/url#803 https://claude.ai/code/session_01C5dm8PKRZ7y7a8aTp53zBQ

claude added 2 commits March 28, 2026 00:50

claude added 5 commits March 28, 2026 01:37

fuzz: fix can_parse.cc compile error with tl::expected

3be317d

tl::expected<T, E> cannot be compared to std::nullopt_t; use .has_value() instead. This was caught by the OSS-Fuzz CI build. https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

claude added 4 commits March 28, 2026 13:11

fix: clang-format the new regression test

a616e41

https://claude.ai/code/session_018zxmHQjAxYzQMdPzWzZjef

anonrig merged commit 5737046 into main Mar 28, 2026
54 of 55 checks passed

anonrig deleted the claude/improve-fuzzing-coverage-qRMnO branch March 28, 2026 23:10

anonrig mentioned this pull request Mar 28, 2026

Apply ada#1104: fix punycode xn-- check + eliminate per-label allocations ada-url/idna#72

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fuzz: improve coverage with new serializers target and stronger invariant checks#1104

fuzz: improve coverage with new serializers target and stronger invariant checks#1104
anonrig merged 11 commits intomainfrom
claude/improve-fuzzing-coverage-qRMnO

anonrig commented Mar 28, 2026

Uh oh!

codecov bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anonrig commented Mar 28, 2026

Uh oh!

codecov bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 28, 2026 •

edited

Loading

codspeed-hq bot commented Mar 28, 2026 •

edited

Loading