`parse_lazy` has imprecise semantics #95

chengsun · 2017-03-30T13:38:34Z

So on attempting to fix a couple of other parser combinators' behaviours on error, I've come to realise that parse_lazy doesn't have the right semantics to correctly support what we want.

Let's look at a motivating example.

let mut emptyok_then_emptyerr = (value(()).message("not expected"), char('x'));

assert_eq!(emptyok_then_emptyerr.parse(State::new("hi")),
    Err(ParseError {
       position: SourcePosition { line: 1, column: 1 },
       errors: vec![Error::Unexpected('h'.into()),
                    Error::Expected('x'.into())],
    }));

// assert fails: actual error contains the message "not expected"

Here we sequentially chain two parsers, the first of which returns EmptyOk, and the second returns EmptyErr. Now this combined tuple parser will return EmptyErr, meaning "no input was consumed, and one of my child parsers failed". However due to the semantics of parse_lazy, this will also imply "the first of my child parsers failed". Hence, when add_error is called the message "not expected" will be added, which is clearly wrong.

What's the fix? parse_lazy should not conflate "no input was consumed" with "first parser failed." So it seems like we have two options:

Make parse_lazy semantics tighter. This will probably entail returning a separate flag "was actually lazy", which becomes the thing that specifies whether add_error should be called or not. (One thing which I'm not sure about in this scenario is whether the Consumed flag is still useful or not? Is it currently important anywhere other than for determining laziness?)
Keep the semantics of parse_lazy but rewrite some current parsers so they are not lazy. This loses us some amount of speed. Notably the tuple and then combinators will have to become non-lazy.

The text was updated successfully, but these errors were encountered:

Marwes · 2017-03-30T15:05:34Z

Ouch, that is a case I failed to consider :(.

An idea, if we add a method on Parser can_return_empty_ok(&self) -> bool. This would return true for any parser which could return EmptyOk. When add_error is called the sequencing parsers could then check if their first parser can_return_empty_ok and if its true then they call add_error on the second parser (and if the second parser can_return_empty_ok call add_error on the third parser etc. It's a bit boilerplate-y but It might work.

chengsun · 2017-03-30T15:29:30Z

That wouldn't work for parsers that can return both EmptyOk and EmptyErr.

chengsun · 2017-03-30T15:34:48Z

I like the idea of making parse_lazy return (ConsumedResult<...>, bool), where the errors were deferred (and add_error needs to be called) iff the boolean is true.

Then, in this case tuple(empty_ok, empty_err).parse_lazy() would add the second subparser's error immediately, and return false. tuple(empty_err, empty_err).parse_lazy() would still be lazy and return true, requiring add_error to be called.

chengsun · 2017-03-30T15:39:00Z

Rather, parse_lazy should return the following:

pub enum LazyResult<T, E> {
    ConsumedOk(T),
    EmptyOk(T),
    ConsumedErr(E),
    EmptyErr(E, bool),
}

Where EmptyErr(E, true) indicates that add_error should be used.

Marwes · 2017-03-30T17:08:30Z

That wouldn't work for parsers that can return both EmptyOk and EmptyErr.

Could you give an example how that wont work?

While I believe your idea might work it would be a breaking change which is unfortunate.

chengsun · 2017-03-30T17:36:39Z

I mean, it would work, just not in the way that you described.

By default all parsers would have to claim can_return_empty_ok() == true for safety. Sequencing parsers can only be lazy if their first child claims false.

Assume a sequencing parser returns EmptyErr. If the sequencing parser's first child says it can return EmptyOk, we have no idea whether it was the first or the second child which caused the error, because either

the first child returned EmptyErr;
the first child returned EmptyOk and then the second child EmptyErr.

Marwes · 2017-03-31T13:47:13Z

Assume a sequencing parser returns EmptyErr. If the sequencing parser's first child says it can return EmptyOk, we have no idea whether it was the first or the second child which caused the error, because either...

Yep, that's right, it wasn't very thought through idea by me :).

So basically we need some extra state somewhere to say what parser caused the error since parsers returning EmptyOk means that it is possible for the second (or third, etc) parser in a sequence to cause an EmptyErr to be returned (breaking my implicit assumption of EmptyErr only being generated from the first parser). Due to the nature of the library adding this state is definitely going to be breaking regardless of the solution.

Fixes #95

Marwes · 2017-08-05T16:22:52Z

A fix for this is coming in #105. It turned out to have a lot more edge cases than I initially had thought of but I am pretty sure the approach is sound. Only drawback of it is that it can result in slowdowns in some uses of parsers returning EmptyOk

Marwes added breaking change bug labels Apr 11, 2017

Marwes added this to the 3.0 milestone Apr 11, 2017

Marwes mentioned this issue Jul 23, 2017

fix: Fixes to try and expected #102

Merged

Marwes added a commit that referenced this issue Jul 25, 2017

fix: Add the correct errors after sequencing has returned EmptyOk

9973dc3

Fixes #95

Marwes added a commit that referenced this issue Jul 25, 2017

fix: Add the correct errors after sequencing has returned EmptyOk

42922b4

Fixes #95

Marwes added a commit that referenced this issue Aug 2, 2017

fix: Add the correct errors after sequencing has returned EmptyOk

f8ea882

Fixes #95

Marwes added a commit that referenced this issue Aug 2, 2017

fix: Add the correct errors after sequencing has returned EmptyOk

3992f47

Fixes #95

Marwes added a commit that referenced this issue Aug 4, 2017

fix: Add the correct errors after sequencing has returned EmptyOk

deadb4a

Fixes #95

Marwes closed this as completed in 54fecc6 Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`parse_lazy` has imprecise semantics #95

`parse_lazy` has imprecise semantics #95

chengsun commented Mar 30, 2017

Marwes commented Mar 30, 2017

chengsun commented Mar 30, 2017

chengsun commented Mar 30, 2017 •

edited

Loading

chengsun commented Mar 30, 2017

Marwes commented Mar 30, 2017

chengsun commented Mar 30, 2017

Marwes commented Mar 31, 2017 •

edited

Loading

Marwes commented Aug 5, 2017

parse_lazy has imprecise semantics #95

parse_lazy has imprecise semantics #95

Comments

chengsun commented Mar 30, 2017

Marwes commented Mar 30, 2017

chengsun commented Mar 30, 2017

chengsun commented Mar 30, 2017 • edited Loading

chengsun commented Mar 30, 2017

Marwes commented Mar 30, 2017

chengsun commented Mar 30, 2017

Marwes commented Mar 31, 2017 • edited Loading

Marwes commented Aug 5, 2017

`parse_lazy` has imprecise semantics #95

`parse_lazy` has imprecise semantics #95

chengsun commented Mar 30, 2017 •

edited

Loading

Marwes commented Mar 31, 2017 •

edited

Loading