Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wishlist] Option to limit recursion depth #1120

Open
kandykoma opened this issue Mar 13, 2020 · 10 comments
Open

[wishlist] Option to limit recursion depth #1120

kandykoma opened this issue Mar 13, 2020 · 10 comments
Milestone

Comments

@kandykoma
Copy link

Issue

Deep recursions trigger a stack overflow, that canot be handled.

Consequence

Cannot use recursive nom parser on untrusted input, as cannot protect against DoS.

Demo

A minimal repro case:

fn cp(i: &str) -> nom::IResult<&str, &str> {
    use nom::branch::alt;
    use nom::bytes::complete::tag;
    use nom::sequence::delimited;

    delimited(tag("("), alt((tag("x"), cp)), tag(")"))(i)
}

fn main() {
    let mut buffer = String::new();
    let stdin = std::io::stdin();
    let mut handle = stdin.lock();

    {
        use std::io::Read;
        handle.read_to_string(&mut buffer).expect("read failed");
    }
    println!("{:?}", cp(&buffer));
}

With input:

perl -e 'print("(" x 10000, "x", ")" x 10000, "\n")' | ./target/debug/main

reports:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted

Improvement Option

Intruduce a maximal depth limit, which when reached gracefuly terminates the parsing.

kandykoma added a commit to kandykoma/nom that referenced this issue Mar 21, 2020
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
kandykoma added a commit to kandykoma/nom that referenced this issue Mar 22, 2020
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
kandykoma added a commit to kandykoma/nom that referenced this issue Mar 22, 2020
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
kandykoma added a commit to kandykoma/nom that referenced this issue Mar 22, 2020
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
kandykoma added a commit to kandykoma/nom that referenced this issue Mar 22, 2020
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
@Keruspe
Copy link
Contributor

Keruspe commented Mar 23, 2020

Why don’t you just have something like cp which calls my_cp(0) and my_cp(n) which calls my_cp(n+1) in the alt?

This way you can throw an error when n is too big without any modification in nom

@kandykoma
Copy link
Author

Thanks @Keruspe, it works!

Two follow ups:

  • is there a better way to trigger an error?
  • is there a way to not only signal a matching failure, but trigger an abort of the parsing process?
fn cp(n: u32) -> impl Fn(&str) -> nom::IResult<&str, &str> {
    use nom::branch::alt;
    use nom::bytes::complete::tag;
    use nom::sequence::delimited;
    move |i| {
        if n > 10 {
            tag("never matching string")(i)
        } else {
            delimited(tag("("), alt((tag("x"), cp(n+1))), tag(")"))(i)
        }
    }
}

fn main() {
    let mut buffer = String::new();
    let stdin = std::io::stdin();
    let mut handle = stdin.lock();

    {
        use std::io::Read;
        handle.read_to_string(&mut buffer).expect("read failed");
    }
    println!("{:?}", cp(0)(&buffer));
}

@Geal
Copy link
Collaborator

Geal commented Oct 11, 2020

I'm not sure a custom input type would be the solution, as it would not handle having multiple parsers doing recursion (like hashes and arrays in JSON). And the suggestion from @Keruspe works only if the recursion is right in the same function, not multiple levels deeper.

I was experimenting with an ugly hack, using a static variable:

use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::sequence::delimited;
use nom::{Parser, IResult, error::ParseError, error::ErrorKind};

fn cp(i: &str) -> nom::IResult<&str, &str> {
    delimited(tag("("), alt((tag("x"), rec(cp))), tag(")"))(i)
}

fn rec<I, O, E: ParseError<I>, F>(mut parser: F) -> impl FnMut(I) -> IResult<I, O, E>
    where F: Parser<I, O, E>
    {
        static mut REC: u32 = 0;
        move |input: I| {
            unsafe {
                println!("calling rec(), REC = {}", REC);

                REC += 1;
                if REC > 10 {
                    return Err(nom::Err::Failure(E::from_error_kind(input, ErrorKind::TooLarge)));
                }
            }
            let res = parser.parse(input);
            unsafe {
                REC -= 1;
            }
            res
        }

}

#[test]
fn recursion_limit() {
    let mut input: String = std::iter::repeat('(').take(10000).collect();
    input += "x";
    input.extend(std::iter::repeat(')').take(10000));
    println!("{:?}", cp(input.as_str()));
    println!("try again:");
    println!("{:?}", cp(input.as_str()));
}

I believe that could be a good approach if we can solve the following issues:

  • use a safer solution than the static mut
  • have a different counter for each parser that recurses (in JSON, hash and array would each have their own)
  • the recursing parser can be declared as a function: in the example I pass rec(cp) as argument, but if the recursion is several level deepers and done from multiple parsers, each calling point should use the same counter. The easiest would be to enerate a function that wraps it

I think that can only be done in a macro, but maybe there's a better way

@Geal Geal added this to the 6.0 milestone Oct 11, 2020
@kandykoma
Copy link
Author

it would not handle having multiple parsers doing recursion

If you're referring to #1125, then I don't see why not? As long as the parsers share the same input instance, they have a piece of shared auxiliary data they can all look at to decide to stop the recursions.

kandykoma added a commit to kandykoma/nom that referenced this issue Oct 12, 2020
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
@Geal
Copy link
Collaborator

Geal commented Oct 12, 2020

you might want to have different recursion limits depending on the parser, like some heavy top level objects have a depth of 3-4 max, but then lower objects get different limits

@kandykoma
Copy link
Author

Before I go on, I'd like to make it clear, that I'm not arguing that #1125 is and elegant, or a "good" solution. My original design goal was to keep the nom changes minimal, but still allow me to keep track of the recursion depth. Then @Keruspe showed me a better, non-intrusive solution :).

But, on the powerful-vs-useful spectrum[1] #1125 is on the powerful side. I can see why you would not want to use it, but you can implement arbitrarily complex algorithms with it. In the parser test I added a simple integer as aux data, but if you want you can add a pair of integers, one keeping count of the "heavy" object depth, and another counting the "light" object depths.

1: http://blog.higher-order.com/blog/2014/12/21/maximally-powerful/

kandykoma added a commit to kandykoma/nom that referenced this issue Oct 13, 2020
Having auxiliary data carried along with the input helps working around rust-bakery#1120.
@homersimpsons
Copy link
Contributor

In order to avoid StackOverflow one might want to look at https://lib.rs/crates/stacker

@kandykoma
Copy link
Author

@homersimpsons Well, that turns the SO into an OOM kill, which may or may not be an improvement. When processing potentially hostile data though, I'd still prefer a limit on recursion depth and and explicit parsing failure I can feed to my IDS.

@matu3ba
Copy link

matu3ba commented Jan 14, 2021

I think that can only be done in a macro, but maybe there's a better way

Conditional compilation requires cfg-macro to define the data layout, so there is no way around.
Even const-generics, which ideally can be optimised away in structures need at some point to conditionally set the const value.

I would not bet on the compiler to optimise away the counter in each function though, when there is no concept to set a const value to unused (in zig you have the value undefined for that during comptime evaluation).

@arthurprs
Copy link

This would be great. It's tough to work around it right now, as it's also hard to thread state through the nom combinators. In our project we had to resort to a thread-local to keep track the recursion count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants