Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nom 5.0 conversion #878

Open
Geal opened this Issue Mar 6, 2019 · 3 comments

Comments

Projects
None yet
2 participants
@Geal
Copy link
Owner

Geal commented Mar 6, 2019

After some thought, I reached a satisfying new design for nom 5.0, that I tried in the nomfun repository.
This design uses functions instead of macros, with the same signature as macros combinators, mostly having I -> IResult<I,O,E> functions as arguments, and returning other functions, or applying them directly on some input.
As an example, here is how the pair combinator would be written:

pub fn pair<I, O1, O2, E, F, G>(first: F, second: G) -> IResult<I, (O1, O2), E>
  where F: Fn(I) -> IResult<I, O1, E>,
        G: Fn(I) -> IResult<I, O2, E> {

  move |input: I| {
    let (input, o1) = first(input)?;
    second(input).map(|(i, o2)| (i, (o1, o2)))
  }
}

pub fn pairc<I, O1, O2, E, F, G>(input: I, first: F, second: G) -> IResult<I, (O1, O2), E>
  where F: Fn(I) -> IResult<I, O1, E>,
        G: Fn(I) -> IResult<I, O2, E> {

  pair(first, second)(input)
}

This way we have two versions, one that combines two parsers and makes another one, and another that can take some input.

The macro version can then be rewritten that way:

macro_rules! pair(
  ($i:expr, $submac:ident!( $($args:tt)* ), $submac2:ident!( $($args2:tt)* )) => (
    pair!($i, |i| $submac!(i, $($args)*), |i| $submac2!(i, $($args2)*))
  );

  ($i:expr, $submac:ident!( $($args:tt)* ), $g:expr) => (
    pair!($i, |i| $submac!(i, $($args)*), $g);
  );

  ($i:expr, $f:expr, $submac:ident!( $($args:tt)* )) => (
    pair!($i, $f, |i| $submac!(i, $($args)*));
  );

  ($i:expr, $f:expr, $g:expr) => (
    $crate::pairc($i, $f, $g)
  );
);

As we can see currently in the 5.0 branch, most combinators are easy to replace:

  • extract the macro's code
  • put it in a function
  • add the proper trait bounds to the function
  • replace the macro's code with a function call

The resulting code is functionally equivalent, has less type inference issues and is much faster when built with link time optimization (I have seen results like 20% faster with the new design on some benchmarks).

Another benefit of this system is that it benefits from better import behaviour. Right now, even in edition 2018, macros that are exported are put at the top level (so the module import like macros use is actually a lie). So I cannot make variants of macros 'except by changing their name, to do stuff like separated_list and separated_list_complete.
This was an issue because we expect slightly different behaviour in streaming parsers (where we're not sure we'll get the whole data at once) or in complete parsers (where we're sure we have the whole input). In nom 4, I tried to solve this by introducing the CompleteByteSlice and CompleteStr input types, that would behave differently, so you could use the same macros but have different parsers depending on the input. This proved difficult to use, especially considering that we might want to switch back and forth between behaviours (streaming parsers using TLV will know that they have the complete data inside the TLV, complete parsers might want to use methods that work directly on &[u8] and &str. Also, most people did not bother reading the documentation about it and started directly using &[u8] and &str as input when they expected the other behaviour, which resulted in quite some time spent explaining it.

So with functions, we can actually make specialized versions of combinators. We could imagine having streaming and complete versions of many0, tag, etc. And we would let people use those versions by importing them directly (use nom::multi::streaming::many0, etc), and they could even use both versions in the same file.

The downside is that there's an enormous amount of work for this:

  • for most combinators (that are not affected by streaming or not), two functions to add:
    • a function returning a closure
    • a function that is called directly
  • for combinators affected by streaming, multiply that by 3:
    • a legacy function to keep backward compatibility inside the macro (although I'm not yet sure we should keep the old behaviour, I might just remove CompleteByteSlice and CompleteStr)
    • a function that works in streaming
    • a function that does not work in streaming

These functions will also require their own documentation and tests, and all of nom's documentation and examples should probably be adapted to this.
I'm making steady progress on converting the combinators, but there's still a lot to do. (TODO: make a checklist of which combinators were ported over or not)

Questions I have to solve now:

  • do I keep nom 4's behaviour with CompleteByteSlice and CompleteStr ?
  • do I keep macros related to streaming backward compatible with nom 4?
  • if not, which behaviour should I go with? Streaming or not?
  • do I make two versions of each combinators, or can I assume people will be able to write pair(first, second)(input) without any issues?
  • how do I port combinators that have a variable number of arguments, like do_parse (this one could probably be written directly with the ? syntax like this: https://github.com/Geal/nomfun/blob/master/benches/http.rs#L93-L102 ), tuple, permutation, alt, switch?
  • how do I reimplement ws?
  • should I extract method and ws in their own crates? They're not strictly necessary to nom and would make sense as separate libraries

@Geal Geal added this to the 5.0 milestone Mar 6, 2019

@Gedweb

This comment has been minimized.

Copy link

Gedweb commented Mar 13, 2019

Methods was marked as deprecated and reference to nonexistent nom-methods crate, so sad

@Geal

This comment has been minimized.

Copy link
Owner Author

Geal commented Mar 13, 2019

I have not pushed the crate yet. This major version will be a big cleanup, and it's a good occasion to extract things that are not as closely maintained as the rest

@Gedweb

This comment has been minimized.

Copy link

Gedweb commented Mar 13, 2019

How what about compatibility around a macros and functions moved to different crate?

BTW, in my opinion CompleteStr is useful, but not described on the documentation main page, in a cause many people try use &str directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.