Enforce max input length to the Ada URL parser#1126
Conversation
…r library, adding ada::set_max_input_length() and ada::get_max_input_length() functions for configurable limits (defaulting to 4GB) to prevent DoS attacks and excessive memory usage. Key changes include a new get_href_size() method for efficient size calculation without allocation, enforcement checks in all parsers and setters with automatic reversion on limit exceedance, and comprehensive tests including unit tests in max_input_length.cpp and a fuzzing simulation in max_length_fuzzer.cpp. The implementation uses thread-safe atomics, preserves ABI compatibility by only adding new functions, and covers edge cases like percent-encoding expansion and cumulative setter operations.
There was a problem hiding this comment.
Pull request overview
This PR adds a configurable maximum URL length guard to Ada, enforcing the limit not only on raw input but also on the normalized/serialized href produced by parsing and setters. It also introduces a get_href_size() API to measure href length without allocating.
Changes:
- Add global
ada::set_max_input_length()/ada::get_max_input_length()and enforce the limit during parsing (including post-normalization growth). - Enforce the same limit across URL setter APIs (both
urlandurl_aggregator), rolling back changes when the limit would be exceeded. - Add
get_href_size()plus new tests and fuzzers to validate length invariants andget_href_size()correctness.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/implementation.cpp |
Adds global max-length configuration storage and accessors. |
src/parser.cpp |
Enforces max length on input and on normalized output size after parsing. |
src/url.cpp |
Enforces max-length across setters for ada::url and adds rollback behavior. |
src/url_aggregator.cpp |
Enforces max-length across setters for ada::url_aggregator with rollback. |
include/ada/implementation.h |
Exposes the new max-length configuration API. |
include/ada/url.h / include/ada/url-inl.h |
Adds url::get_href_size() implementation mirroring get_href(). |
include/ada/url_aggregator.h / include/ada/url_aggregator-inl.h |
Adds url_aggregator::get_href_size() (buffer size). |
tests/basic_tests.cpp |
Adds tests asserting get_href_size() matches get_href().size(). |
tests/max_input_length.cpp |
Adds focused gtest coverage for max-length enforcement. |
tests/basic_fuzzer.cpp |
Extends existing fuzzer-style test with max-length invariant checks. |
tests/max_length_fuzzer.cpp |
Adds a standalone fuzzer-like executable to validate max-length invariants. |
tests/CMakeLists.txt |
Wires new tests into the build/test system. |
fuzz/max_length.cc |
Adds an OSS-Fuzz style target for max-length enforcement invariants. |
README.md |
Documents the new URL size limit behavior and get_href_size(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (70.73%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1126 +/- ##
==========================================
+ Coverage 59.71% 60.34% +0.62%
==========================================
Files 37 37
Lines 5958 6057 +99
Branches 2907 2955 +48
==========================================
+ Hits 3558 3655 +97
+ Misses 593 573 -20
- Partials 1807 1829 +22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will not alter performance
Comparing Footnotes
|
|
clang-tidy seems to fail |
False positive. It cannot happen. The tool is not smart enough. |
|
@anonrig Next release should be a major release. |
|
For the Python bindings, I'm inclined to set a maximum of 256 KiB and document that it's possible to change it at the process level. Seems like a generous default to me, but if we're aware of applications that need more, I could choose something larger. |
|
@bbayles gunicorn has different limits by default... https://gunicorn.org/reference/settings/ For basic http, it is going to be 4kB. It seems that 256 KiB is plenty. |
Up until now, we limited the size of the input string when parsing (to 4 GB). This could not be changed (it was hardcoded). Further, it was weakly enforced. It was possible to receive a string that was under 4GB and produce a larger normalized string. Further, the setters would enforce no length limit.
That's not much of a weakness because if you have a 4 GB URL, you have other problems.
BUT: we can cheaply set a different default. Say: you want all URLs to fit in 64kB. If the normalized URL exceeds 64kB, then you fail, that's it.
To make this efficient, we need a
get_href_size()to save on some of the checks.AI is telling me that once you get to 8kB or beyond, URLs become unusable in practice due to the limits set at the server level. I am not arguing we set a hard limit, but many users would definitely want to put a reasonable limit as part of their system's design. I cannot see any reason for a URL to exceed 2 MB in the real world. It is almost certainly a bug or an attack.
The fun thing is that we can now fuzz this easily. Set a short limit (like 512 bytes) and you can test it out and make sure that you fail before you exceed the limit.