Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing a streaming parser #1160

Open
jjbarr opened this issue Jun 9, 2020 · 7 comments
Open

Writing a streaming parser #1160

jjbarr opened this issue Jun 9, 2020 · 7 comments

Comments

@jjbarr
Copy link

jjbarr commented Jun 9, 2020

While the Nom documentation briefly discusses streaming parsers, it's unclear how one would actually go about writing one. In particular, it's unclear how to deal with the case where additional data is needed, or where to deal with that.

@jeanga
Copy link

jeanga commented Jul 22, 2020

I second that point.
As of today, it seems impossible to find solid guidance on how a nom consuming code should use the streaming API...

@fritschy
Copy link

fritschy commented Nov 3, 2020

Streaming parsers can return Err::Incomplete: https://docs.rs/nom/6.0.0/nom/enum.Err.html#variant.Incomplete

So a simple parser that wants to parse an u32 but does not have enough data (initially) will ask for more with this error:

use nom::{ IResult, Err, Needed, error::Error, number::streaming::be_u32, };
fn main() {
    let data = b"\x11\x22\x33\x44";

    let i = &data[..2];   // Note; only taking a part of the needed input here.
    let r: IResult<&[u8], u32, Error<&[u8]>> = be_u32(i);
    match r.unwrap_err() {
        Err::Incomplete(Needed::Size(n)) => println!("Need {} more bytes", n),
        e => println!("Some other error: {:?}", e),
    }

    let i = data;
    let r: IResult<&[u8], u32, Error<&[u8]>> = be_u32(i);
    match r {
        Ok((_i, u)) => println!("Parsed {:x}", u),
        e => println!("Some other error: {:?}", e),
    }
}

That is, if you write a rather complex parser - you can return Err::Incomplete to signal that you need more data, you can even omit the Size - using Needed::Unknown and just tell your caller that you need more w/o telling him how much.

Does this help?

@jeanga
Copy link

jeanga commented Nov 3, 2020

Hi fritschy,

First and foremost, thank you for taking the time to answer this question 👍
I have read the docs, and saw Err::Incomplete.

What I would need to better understand how streaming is supposed to be implemented is a complete sample.

For instance, "you can return Err::Incomplete to signal that you need more data", I understand that but what is the caller supposed to be doing with this information?
What becomes the partially parsed information? Should I seed it again to the parser with the missing bits I fetched?
How do I "resume" parsing when the missing information is available?

I have the feeling that without this information streaming support in nom is more theoretical than actually practical?

A sample/PoC parser reading a file 512 bytes at a time and parsing it would go a long way IMHO.

Hope this helps too :-)

@fritschy
Copy link

fritschy commented Nov 3, 2020

Hey!

[...]
First and foremost, thank you for taking the time to answer this question +1

You're welcome.

I have read the docs, and saw Err::Incomplete.

Yes, I figured. See below.

What I would need to better understand how streaming is supposed to be implemented is a complete sample.

For instance, "you can return Err::Incomplete to signal that you need more data", I understand that but what is the caller supposed to be doing with this information?
What becomes the partially parsed information? Should I seed it again to the parser with the missing bits I fetched?
How do I "resume" parsing when the missing information is available?

I was thinking that this might come up 👍

Well, the idea of nom (or parser combinators in general?) is, that one parser will operate on its input and return a result along with its remaining input. If e.g. your parser that needs more data fails, it will not consume anything, your "view" on the input data is unchanged. So now you grow you buffer (or get a new one?) and call your parser again.

I have the feeling that without this information streaming support in nom is more theoretical than actually practical?

Oh, it is not - I already succesfully implemented a streaming parser for a binary log-stream reader, although that was with nom 3.x I think...

A sample/PoC parser reading a file 512 bytes at a time and parsing it would go a long way IMHO.

Let me see if I can dig something up.

Edit: There was also this issue: #1145 (comment) which might have some more clues; as in "support is incomplete" ...

epage added a commit to epage/winnow that referenced this issue Feb 17, 2023
epage added a commit to epage/winnow that referenced this issue Feb 17, 2023
epage added a commit to epage/winnow that referenced this issue Feb 17, 2023
epage added a commit to epage/winnow that referenced this issue Feb 17, 2023
@reinhrst
Copy link

A sample/PoC parser reading a file 512 bytes at a time and parsing it would go a long way IMHO.

@jeanga I saw your message a couple of weeks ago, and decided to pick up the challenge. The more I dove in, the more I understood that this is not so easy. In the end, I wrote down my experiences in a blog post (including a POC).

Be aware that because of issue #1582, using nom::*::streaming may or may not work for you.

@jeanga
Copy link

jeanga commented Jun 19, 2023

Thank you for sharing your experience with this 😀
That is very nice from you.

@epage
Copy link
Contributor

epage commented Jun 19, 2023

I will say that when writing my own streaming parser demo, I found circular to be helpful for the problem of "fill a little more data into the buffer to be able to parse" when dealing with a variable sized frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants