-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wishlist] Option to limit recursion depth #1120
Comments
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
Why don’t you just have something like cp which calls my_cp(0) and my_cp(n) which calls my_cp(n+1) in the alt? This way you can throw an error when n is too big without any modification in nom |
Thanks @Keruspe, it works! Two follow ups:
|
I'm not sure a custom input type would be the solution, as it would not handle having multiple parsers doing recursion (like hashes and arrays in JSON). And the suggestion from @Keruspe works only if the recursion is right in the same function, not multiple levels deeper. I was experimenting with an ugly hack, using a static variable: use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::sequence::delimited;
use nom::{Parser, IResult, error::ParseError, error::ErrorKind};
fn cp(i: &str) -> nom::IResult<&str, &str> {
delimited(tag("("), alt((tag("x"), rec(cp))), tag(")"))(i)
}
fn rec<I, O, E: ParseError<I>, F>(mut parser: F) -> impl FnMut(I) -> IResult<I, O, E>
where F: Parser<I, O, E>
{
static mut REC: u32 = 0;
move |input: I| {
unsafe {
println!("calling rec(), REC = {}", REC);
REC += 1;
if REC > 10 {
return Err(nom::Err::Failure(E::from_error_kind(input, ErrorKind::TooLarge)));
}
}
let res = parser.parse(input);
unsafe {
REC -= 1;
}
res
}
}
#[test]
fn recursion_limit() {
let mut input: String = std::iter::repeat('(').take(10000).collect();
input += "x";
input.extend(std::iter::repeat(')').take(10000));
println!("{:?}", cp(input.as_str()));
println!("try again:");
println!("{:?}", cp(input.as_str()));
} I believe that could be a good approach if we can solve the following issues:
I think that can only be done in a macro, but maybe there's a better way |
If you're referring to #1125, then I don't see why not? As long as the parsers share the same input instance, they have a piece of shared auxiliary data they can all look at to decide to stop the recursions. |
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
you might want to have different recursion limits depending on the parser, like some heavy top level objects have a depth of 3-4 max, but then lower objects get different limits |
Before I go on, I'd like to make it clear, that I'm not arguing that #1125 is and elegant, or a "good" solution. My original design goal was to keep the nom changes minimal, but still allow me to keep track of the recursion depth. Then @Keruspe showed me a better, non-intrusive solution :). But, on the powerful-vs-useful spectrum[1] #1125 is on the powerful side. I can see why you would not want to use it, but you can implement arbitrarily complex algorithms with it. In the parser test I added a simple integer as aux data, but if you want you can add a pair of integers, one keeping count of the "heavy" object depth, and another counting the "light" object depths. 1: http://blog.higher-order.com/blog/2014/12/21/maximally-powerful/ |
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
In order to avoid StackOverflow one might want to look at https://lib.rs/crates/stacker |
@homersimpsons Well, that turns the SO into an OOM kill, which may or may not be an improvement. When processing potentially hostile data though, I'd still prefer a limit on recursion depth and and explicit parsing failure I can feed to my IDS. |
Conditional compilation requires I would not bet on the compiler to optimise away the counter in each function though, when there is no concept to set a const value to |
This would be great. It's tough to work around it right now, as it's also hard to thread state through the nom combinators. In our project we had to resort to a thread-local to keep track the recursion count. |
Issue
Deep recursions trigger a stack overflow, that canot be handled.
Consequence
Cannot use recursive nom parser on untrusted input, as cannot protect against DoS.
Demo
A minimal repro case:
With input:
reports:
Improvement Option
Intruduce a maximal depth limit, which when reached gracefuly terminates the parsing.
The text was updated successfully, but these errors were encountered: