Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many_m_n can succeed when min > max #1333

Closed
carado opened this issue Jun 26, 2021 · 14 comments
Closed

many_m_n can succeed when min > max #1333

carado opened this issue Jun 26, 2021 · 14 comments
Milestone

Comments

@carado
Copy link

carado commented Jun 26, 2021

  • nom version: 6.2.1
  • nom compilation features used: none

The following code:

fn main() {
  let res: nom::IResult<&str, Vec<char>> =
    nom::multi::many_m_n(4, 2, nom::character::complete::char('a'))("aaa");
  dbg!(res);
}

will succeed, consuming two 'a':

[src/main.rs:4] res = Ok(
    (
        "a",
        [
            'a',
            'a',
        ],
    ),
)

While it is unlikely that someone would hardcode a minimum value greater than a maximum value, they can be the result of more complex code, and a call to many_m_n in such conditions should probably systematically fail (as the constraint of parsing something at least min but at most max times would be impossible to satisfy).

Have a nice day

@Geal Geal added this to the 7.0 milestone Aug 9, 2021
@Geal
Copy link
Collaborator

Geal commented Aug 9, 2021

thanks for the report, it will be fixed in nom 7 with ab42ced

@cenodis
Copy link
Contributor

cenodis commented Aug 10, 2021

@Geal looking at ab42ced it looks like the many_m_n parser just fails instantly with a standard nom error. This makes it look like the parser works normally but didnt match anything. I think a panic would be more appropriate since min > max is nonsensical input and can never create a working parser. The panic should also occur in the function and not inside the closure since the parser creation itself should fail.

Additionally a nom error might also never reach the user since certain parser such as alt() might swallow it. This could lead to unintended behaviour since the parser runs successfully but doesnt produce the desired output. Since no error is reported finding the source of such an issue would be potentially difficult and time consuming.

@Geal
Copy link
Collaborator

Geal commented Aug 11, 2021

Parsers should never panic, and since parsers can be created dynamically from the output of other parsers, parser creation should not panic either (that could result in easy DoS attacks by messing with the format).
I think that error should be a failure instead, which would ensure it bubbles up through the parser chain and is not blocked by alt

@carado
Copy link
Author

carado commented Aug 11, 2021

@cenodis it's not like all parameters to parsers are always hardcoded; I made this issue because I myself wrote a parser where min and max were the result of some computations, and it indeed made sense that, when those results happen to lead to min > max, parsing silently fails.

If this function wasn't many_m_n(min: usize, max: usize) but was instead many_m_n<const MIN: usize, const MAX: usize>() I would agree with you because that would force the parameters to be always the same, and so a piece of code compiled with MIN > MAX would indeed necessarily be wrong.

It's also consistent with e.g. std::ops::RangeBounds::contains, which simply returns false instead of panicking — a range with min > max is simply seen as an empty range.

fn main() {
    dbg!((4..=2).is_empty());   // => true
    dbg!((4..=2).contains(&3)); // => false
}

@cenodis
Copy link
Contributor

cenodis commented Aug 11, 2021

@Geal I wasnt aware parsers should never panic, my apologies.

However, I would like to argue for Failure over Error in such cases. Especially with regards to @carado's post.

As already stated Error can be easily swallowed by a chain of parsers making it difficult to diagnose when it happens unintentionally. Especially when working with dynamic data where it can be easy to miss the fact that min might be greater than max.

The fact that the parser fails silently is also not really obvious from the outside. After all the function is called many_m_n and not many_m_n_except_if_n_greater_than_m. This makes the behaviour of this parser more complex than it needs to be and makes following the program flow more difficult.

I dont disagree that there are cases where having it fail silently is useful. But that should not be part of the many_m_n parser itself. I would recommend solving it with parser combinations:

cond_err(min <= max, many_m_n(parser, min, max))

Side note: I actually thought cond did this but it returns Ok when the condition is false. Hence the "cond_err" pseudoparser. Basically just cond but returns Error if the condition is false.

This is also more in line with the parser combinator principle (Combining multiple parsers with simple, obvious behaviour instead of having one parser with complex behaviour).
Additionally it documents the fact that max > min in the surrounding code is desired and not an accident.

I dont see much practical reason to be consistent with the RangeBounds of the std library. Especially since ranges are not consistent within Rust itself. For example you cant create a slice over a range with max > min. If it was really consistent with RangeBounds shouldnt it be an empty slice since the bound is empty? In the same way I dont see what meaningful information such a range would convey when used with this parser. It would just make it more confusing and easier to make mistakes.

Edit: fold_many_m_n should probably be handled the same way since its basically the same parser just organized a bit different.

@Geal
Copy link
Collaborator

Geal commented Aug 13, 2021

4a04c56 changes the Error to Failure and 37eedf3 adds the check to fold_many_m_n

@Geal
Copy link
Collaborator

Geal commented Aug 21, 2021

fix released in nom 7: https://crates.io/crates/nom/7.0.0

@Stargateur
Copy link
Contributor

I do not think the solution here is good, it's should be outside the closure and in my opinion panic on such case or return a IResult<Parser> if we want to allow dynamic behavior but I don't see any real world problem where this could be used dynamically.

Actually why not take a impl RangeBounds ?

@cenodis
Copy link
Contributor

cenodis commented Aug 21, 2021

@Stargateur

in my opinion panic

as already stated by Geal, parser creation should never fail. A panic is therefore unnaceptable.

return a IResult

IResult makes no sense here since it encodes a parser result. Im going to assume you meant Result<Parser>. While possible, it could make composition more difficult since you have to somehow unwrap the created parser.

not take a impl RangeBounds

To my knowledge RangeBounds does not forbid the creation of ranges with max > min. So it doesnt help in this case.

In #1356 I suggested replacing the Failure with a seperate error type specifically meant to represent a "broken" parser. That would be more in line with how parsers work in nom and would allow handling of such errors via combinators (or bubble up to the caller if not handled).
I feel like that would be the most appropriate way of handling such scenarios without breaking everything (most parsers already bubble up any errors they dont explicitly handle to cover cases like Failure and Incomplete) and stays in line with the "normal" way of nom error handling.

@Stargateur
Copy link
Contributor

IResult makes no sense here since it encodes a parser result. Im going to assume you meant Result<Parser>. While possible, it could make composition more difficult since you have to somehow unwrap the created parser.

Indeed, one would need something like many_m_n(5, 4).convert()?(input) but I believe it's make sense, the error is about call to many_m_n not the Parser created. Anyway that a detail I guess so maybe not very important and the cureent solution is ok.

@Stargateur
Copy link
Contributor

To my knowledge RangeBounds does not forbid the creation of ranges with max > min. So it doesnt help in this case.

No but https://doc.rust-lang.org/core/ops/trait.RangeBounds.html#method.contains would corretly report false:

fn main() {
    let r = 42..21;
    
    assert_eq!(r.contains(&30), false);
}

@cenodis
Copy link
Contributor

cenodis commented Aug 21, 2021

@Stargateur

No but https://doc.rust-lang.org/core/ops/trait.RangeBounds.html#method.contains would corretly report false:

What does that have to do with anything? max > min is invalid for this parser and Rangebound does not prevent such a range from being constructed.

@Stargateur
Copy link
Contributor

Stargateur commented Aug 21, 2021

@Stargateur

No but https://doc.rust-lang.org/core/ops/trait.RangeBounds.html#method.contains would corretly report false:

What does that have to do with anything? max > min is invalid for this parser and Rangebound does not prevent such a range from being constructed.

that would solve the original problem, but that would not allow your request to make it an hard error. Well yes That could depend on how you use it

@carado
Copy link
Author

carado commented Aug 22, 2021

I think (42..21).contains(&30) returning false is pretty consistent with many_m_n(parser, 42, 21)("…") returning a failure; neither panic, both accept the range and consider it de-facto empty (refusing all inputs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants