Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please, provide guideline with example for creating a parser #85

Closed
jan-hudec opened this issue Feb 15, 2017 · 4 comments
Closed

Please, provide guideline with example for creating a parser #85

jan-hudec opened this issue Feb 15, 2017 · 4 comments

Comments

@jan-hudec
Copy link

I needed to parse something that has rather simple structure, but I needed one element to be a grapheme (the rest is mostly ascii symbols with specific meanings). Now no parser library provides such parser out of the box and it is not surprising; the functionality to split string to graphemes is only provided by the unicode-segmentation crate.

I had something already slapped together in nom, except I was not (and still am not) happy with it, because it should operate on &str, but has to operate on &[u8], because nom lacks &str variants of some important primitives. So I considered rewriting it in combine, but gave up, because I had trouble figuring how to write the function that would work with parser() regarding:

  • How generic/specific it can be. The unicode-segmentation only works on &str (because of shortcomings of Rust iterators; another story) and I always have that to provide, but I didn't see documentation about what might/might not come in the inner parser. The example in description of parse() appears to take just &str in the closure, but I didn't see description.

  • How to properly construct the error. The ParseError/Error/Info construct is pretty complicated and would deserve some explanation, but I failed to run across any. And perhaps also some helpers to create the Expected and Unexpected primary errors from a stream state and (in the first case; unexpected does not need one) message easily.

@Marwes
Copy link
Owner

Marwes commented Feb 15, 2017

Something like this should work for &str.

parser(|input: &str| {
    let mut iter = input.graphemes();
    match iter.next() {
        Some(g) => {
            // Since we have consumed something to get to this point we need tell the caller this so it knows
            // whether to try other alternatives (if this errors later on)
            Ok((g, Consumed::Consumed(iter.as_str()))
        }
        None => Err(ParseError::end_of_input()),
    }
})

For a more generic parser I think that we need to at least limit the parser to RangeStream<Range = &str> since that will give the &str type which unicode segmentation needs. Its current implementation does not support that case however.

I am thinking that maybe I can add another method to Rangestream as below.

// `remaining` returns the entire remaining range which can be inspected letting one determine
// the length one need to pass to `uncons_range`
fn remaining(&self) -> Self::Range;

How to properly construct the error

There is really nothing special about any of these. expected just indicates what the parser expected to find and unexpected indicates what the parser actually found but ended up rejecting. Info is just somewhere to store the information, none of the variants have any special meaning. Usually only expected errors are necessary as unexpected are created automatically from the Stream.

@jan-hudec
Copy link
Author

I was not asking for help, I was suggesting you enhance the documentation. The response is not useful here (I've already decided I am not rewriting the nom thing). It would be somewhat useful in the documentation though—as long as you improve documentation of ParseError—I see all of 2 lines for the struct and nothing for most methods.

@Marwes
Copy link
Owner

Marwes commented Feb 20, 2017

Ok. ParseErrors methods are only really used internally so it to have slipped through the cracks. I will add a bit more though.

@jan-hudec
Copy link
Author

Well, but it's what one comes at when looking up the return type. So you either need to document them there, or you need to put a documentation somewhere up the chain describing how to create errors—which would mean in ParseResult, because there the error type is directly Consumed<ParseError<I>>. The wrapping of ParseError in Consumed is already something that needs explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants