-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zero-copy ini parser example #7
Conversation
This comment has been minimized.
This comment has been minimized.
I am starting to get a feel for this and begin to see why parser combinators were invented - they seem to happen quite naturally. It's interesting to see how one has to implement everything by hand, for example handling EOF or that there might simply not be anything to parse. Depending on the complexity of the format, this approach might come most naturally, whereas in others combinators might appear 'easier' to use (especially with prior experience). |
Unbelievably I was able to take the time and finish the first version of a basic ini parser. I am very much looking forward to your ideas on how to improve it to make it use Once I had the hang of it, it was quite easy to get productive, nearly without any surprises at all, even though it took me a moment to figure out how to more explicitly return my own error. When comparing this parser with the one provided in the combine examples I noticed that the error messages of For Thanks a lot for sharing your thoughts. |
I would say this wouldn't work due to the lifetime: fn read_tag<'i, E>(r: &mut Reader<'i, E>, tag: u8) .. {
r.consume(&[tag])?;
// ..
} so we do this: fn read_tag<'i, E>(r: &mut Reader<'i, E>, tag: u8) .. {
r.consume_u8(tag)?;
// ..
}
Totally doable and a good suggestion. A couple of thoughts:
The reason fn read_ini<'i, E>(r: &mut Reader<'i, E>) -> Result<Document<'i>, E>
where
E: Error<'i>,
{
skip_whitespace_or_comment(r, ConsumeTo::NextToken);
let document = match r.peek_u8() {
// this feels super hacky just to get the retry requirement
None => Err(r.read_u8().unwrap_err()),
Some(b'[') => Ok(Document {
globals: vec![],
sections: read_sections(r)?,
}),
Some(_) => Ok(Document {
globals: read_zero_or_more_properties_until_section(r)?,
sections: read_sections(r)?,
}),
};
}
I'll admit most cases will just use Not having
I'm not exactly sure what you mean here sorry! There are only two other ways that aren't
What is missing from
I think I need to make this more clear in the docs! But this is exactly the use case :D
I'm going to sit on this problem this weekend as I have a similar use case. I'm not sure how deeply it is a I loved reading your thoughts and I'll have another look over your implementation this weekend! Thank you very much for the amount of time you have put into this and providing the feedback! :D |
There is shortcoming here with enum variant tags with variable lengths. For example say you have a lot of variant tags like |
I believe to understand now that the
I like that too. The point I tried to make is that while changing a method that doesn't handle errors 'playfully' it's easy to chose a method on reader that does error handling, and even if the error won't leave the calling method, now a trait bound is required. Only in the first moment that was surprising to me, and I quickly figured what I would have to do to fix it. This might be more of a rust compiler/typesystem surprise than a surprise in
That's cool! I didn't see that one before :)!
It feels you kind of did, because controlling the display of input in an error case via environment variables would definitely help test cases. What happened to me is that during development, unit-tests that should not fail did actually produce an error. The input was displayed in hex, even though byte-ascii would have been preferred.
Totally agree! And I am looking forward to your take on it. With
I agree, even though thus far I might not have hit that complexity boundary. The most complex thing I have parsed with it were git-objects (trees, tags, commits), which is really when it worked well enough. However, when thinking about that error type it is using, my brain twists around itself, and that even without a streaming parser. Maybe doing things the
I agree, and also believe this doesn't to be more than one function. After all, the top-level
Whoop whoop, and I loved reading yours! That alone makes me want to use Please feel free to push changes directly into this branch, this should save you some time writing changes down as comments even though I am certain you already have them locally. Looking at diffs is fine and it's easy to comment in case I have questions or suggestions.
True, I wonder how that ends up being implemented in |
@Byron Sorry, slow on uptake for this, will reply soon :) |
No worries, same same 😅. However, I do try to get the first pieces of git-config parsing done today using span computation based on the example you provided. I will put my changes into a PR that will be shared here, instead of pushing to master directly, to enable you to take an occasional look should you find the time :). |
Just a small update. For the time being I've decided against a I've improved the See PR #8 for these changes |
Thanks @avitex for your continued updates and improvements! I went in quickly, merged master and believe to have picked more suitable peek methods. Also: Looking forward to the other improvements and cleanups you have planned! |
No thank you @Byron for your wonderful feedback! I'm going to merge this in :) |
Rebased this and merged this locally. Closing and thanks! |
I just stumbled over this parser combinator crate which seems to do interesting chaining that I haven't seen before. No word about how errors look like, though, but maybe useful as inspiration in case you create a parser combinator crate yourself. |
I appreciate that @Byron :D |
I thought it might be helpful for you to witness my humble beginnings, along with my raw notes shed some light on where I stumbled. And of course, I would be glad to learn about how you would do things differently.
Notes 2020-10-15
Please take everything as just my opinion based on possibly biased or downright wrong intuition. The following points are the thoughts in my head while trying to achieve a certain goal, thus might be quite 'raw' and maybe hard to follow.
skip_whitespace(…)
.skip_while(…)
has no trait bound on the error, butskip(…)
has. It's confusing as theskip_whitespace
function taken from the 'json' example doesn't have an Error constraint, but now that seems very rarely to be possible. Note to self: It's probably best to just pass the Error<'i> constraint at all times.