Skip to content

Enforce max input length to the Ada URL parser#1126

Merged
lemire merged 7 commits intomainfrom
control_url_size
Apr 10, 2026
Merged

Enforce max input length to the Ada URL parser#1126
lemire merged 7 commits intomainfrom
control_url_size

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Apr 8, 2026

Up until now, we limited the size of the input string when parsing (to 4 GB). This could not be changed (it was hardcoded). Further, it was weakly enforced. It was possible to receive a string that was under 4GB and produce a larger normalized string. Further, the setters would enforce no length limit.

That's not much of a weakness because if you have a 4 GB URL, you have other problems.

BUT: we can cheaply set a different default. Say: you want all URLs to fit in 64kB. If the normalized URL exceeds 64kB, then you fail, that's it.

To make this efficient, we need a get_href_size() to save on some of the checks.

AI is telling me that once you get to 8kB or beyond, URLs become unusable in practice due to the limits set at the server level. I am not arguing we set a hard limit, but many users would definitely want to put a reasonable limit as part of their system's design. I cannot see any reason for a URL to exceed 2 MB in the real world. It is almost certainly a bug or an attack.

The fun thing is that we can now fuzz this easily. Set a short limit (like 512 bytes) and you can test it out and make sure that you fail before you exceed the limit.

…r library, adding ada::set_max_input_length() and ada::get_max_input_length() functions for configurable limits (defaulting to 4GB) to prevent DoS attacks and excessive memory usage. Key changes include a new get_href_size() method for efficient size calculation without allocation, enforcement checks in all parsers and setters with automatic reversion on limit exceedance, and comprehensive tests including unit tests in max_input_length.cpp and a fuzzing simulation in max_length_fuzzer.cpp. The implementation uses thread-safe atomics, preserves ABI compatibility by only adding new functions, and covers edge cases like percent-encoding expansion and cumulative setter operations.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a configurable maximum URL length guard to Ada, enforcing the limit not only on raw input but also on the normalized/serialized href produced by parsing and setters. It also introduces a get_href_size() API to measure href length without allocating.

Changes:

  • Add global ada::set_max_input_length() / ada::get_max_input_length() and enforce the limit during parsing (including post-normalization growth).
  • Enforce the same limit across URL setter APIs (both url and url_aggregator), rolling back changes when the limit would be exceeded.
  • Add get_href_size() plus new tests and fuzzers to validate length invariants and get_href_size() correctness.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/implementation.cpp Adds global max-length configuration storage and accessors.
src/parser.cpp Enforces max length on input and on normalized output size after parsing.
src/url.cpp Enforces max-length across setters for ada::url and adds rollback behavior.
src/url_aggregator.cpp Enforces max-length across setters for ada::url_aggregator with rollback.
include/ada/implementation.h Exposes the new max-length configuration API.
include/ada/url.h / include/ada/url-inl.h Adds url::get_href_size() implementation mirroring get_href().
include/ada/url_aggregator.h / include/ada/url_aggregator-inl.h Adds url_aggregator::get_href_size() (buffer size).
tests/basic_tests.cpp Adds tests asserting get_href_size() matches get_href().size().
tests/max_input_length.cpp Adds focused gtest coverage for max-length enforcement.
tests/basic_fuzzer.cpp Extends existing fuzzer-style test with max-length invariant checks.
tests/max_length_fuzzer.cpp Adds a standalone fuzzer-like executable to validate max-length invariants.
tests/CMakeLists.txt Wires new tests into the build/test system.
fuzz/max_length.cc Adds an OSS-Fuzz style target for max-length enforcement invariants.
README.md Documents the new URL size limit behavior and get_href_size().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/max_input_length.cpp
Comment thread tests/max_length_fuzzer.cpp
Comment thread tests/basic_fuzzer.cpp
Comment thread tests/basic_tests.cpp Outdated
Comment thread src/implementation.cpp
Comment thread README.md
Comment thread include/ada/implementation.h
Comment thread src/url.cpp Outdated
Comment thread fuzz/max_length.cc
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 70.73171% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.34%. Comparing base (95895d6) to head (6f1efaa).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/url.cpp 55.81% 3 Missing and 16 partials ⚠️
src/url_aggregator.cpp 69.69% 0 Missing and 10 partials ⚠️
src/implementation.cpp 37.50% 2 Missing and 3 partials ⚠️
src/parser.cpp 71.42% 0 Missing and 2 partials ⚠️

❌ Your patch status has failed because the patch coverage (70.73%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1126      +/-   ##
==========================================
+ Coverage   59.71%   60.34%   +0.62%     
==========================================
  Files          37       37              
  Lines        5958     6057      +99     
  Branches     2907     2955      +48     
==========================================
+ Hits         3558     3655      +97     
+ Misses        593      573      -20     
- Partials     1807     1829      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 8, 2026

Merging this PR will not alter performance

✅ 27 untouched benchmarks
⏩ 4 skipped benchmarks1


Comparing control_url_size (6f1efaa) with main (67b4245)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@anonrig
Copy link
Copy Markdown
Member

anonrig commented Apr 10, 2026

clang-tidy seems to fail

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 10, 2026

clang-tidy seems to fail

False positive. It cannot happen. The tool is not smart enough.

@lemire lemire merged commit aaa4055 into main Apr 10, 2026
51 of 52 checks passed
@lemire lemire deleted the control_url_size branch April 10, 2026 21:33
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 10, 2026

@anonrig Next release should be a major release.

@bbayles
Copy link
Copy Markdown

bbayles commented Apr 13, 2026

For the Python bindings, I'm inclined to set a maximum of 256 KiB and document that it's possible to change it at the process level. Seems like a generous default to me, but if we're aware of applications that need more, I could choose something larger.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 13, 2026

@bbayles gunicorn has different limits by default...

https://gunicorn.org/reference/settings/

For basic http, it is going to be 4kB. It seems that 256 KiB is plenty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants