Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nom4: take_while broken ? #766

Closed
chifflier opened this issue May 16, 2018 · 9 comments
Closed

nom4: take_while broken ? #766

chifflier opened this issue May 16, 2018 · 9 comments
Milestone

Comments

@chifflier
Copy link
Contributor

  • Rust version : rustc 1.27.0-nightly (ac3c2288f 2018-04-18)
  • nom version : 4.0
  • nom compilation features used: none

Test case

Example test case:

#[macro_use]
extern crate nom;

use nom::*;

// US-ASCII printable characters without comma
#[inline]
fn is_us_ascii(c: u8) -> bool {
    c >= 0x20 && c <= 0x7e && c != 0x2c
}

named!(parse_name<&[u8]>, take_while!(is_us_ascii));


fn main() {
        let res = parse_name(b"ssh-rsa");
        let expected = Ok((&b""[..],&b"ssh-rsa"[..]));
        assert_eq!(res, expected);

}

The above test fails (regression from nom3):

Error(Incomplete(Size(1))) at l.68 by ' take_while ! ( is_us_ascii ) '
thread 'ssh::tests::test_name' panicked at 'assertion failed: `(left == right)`
  left: `Err(Incomplete(Size(1)))`,
 right: `Ok(([], [115, 115, 104, 45, 114, 115, 97]))`',
@khernyo
Copy link
Contributor

khernyo commented May 16, 2018

I think this is expected: https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#dealing-with-incomplete-usage

Change your parser to require CompleteByteSlice and it should work:

// ...
named!(parse_name<CompleteByteSlice, CompleteByteSlice>, take_while!(is_us_ascii));

fn main() {
        let res = parse_name(CompleteByteSlice(b"ssh-rsa"));
        let expected = Ok((CompleteByteSlice(b""), CompleteByteSlice(b"ssh-rsa")));
        assert_eq!(res, expected);
}

@chifflier
Copy link
Contributor Author

It could work, except it creates an apocalypse of mismatched types in all calling functions and subparsers :(

@chifflier
Copy link
Contributor Author

Also, how can I use `str::from_utf8' on the result ?

map_res!(parse_name, str::from_utf8)

fails on a CompleteByteSlice, and

map_res!(parse_name, |s| str::from_utf8(&s)

also fails, with error [u8] does not have a constant size known at compile-time`

Parsing slices has become really difficult :/

@khernyo
Copy link
Contributor

khernyo commented May 16, 2018

How about this?

#[macro_use]
extern crate nom;

use nom::types::*;
use std::str;

#[inline]
fn is_us_ascii(c: u8) -> bool {
    c >= 0x20 && c <= 0x7e && c != 0x2c
}

named!(parse_name<CompleteByteSlice, &[u8]>, map!(take_while!(is_us_ascii), |b| b.0));

named!(parse_utf8<CompleteByteSlice, &str>, map_res!(parse_name, str::from_utf8));

fn main() {
    let res = parse_utf8(CompleteByteSlice(b"ssh-rsa"));
    let expected = Ok((CompleteByteSlice(b""), "ssh-rsa"));
    assert_eq!(res, expected);
}

Can you show what you are trying to parse? Maybe it's parsable without relying on eof, e.g. there might be a delimiter after the string or a length field before the string?

@chifflier
Copy link
Contributor Author

My use case is to upgrade the ssh-parser crate, and in particular this parsing code:
https://github.com/rusticata/ssh-parser/blob/master/src/ssh.rs#L55-L71
Note the parser does not use eof, but take_while.

The difficulty is that the text to parse comes from a binary parser, so using CompleteByteSlice everywhere is not an option: it would need to be propagated everywhere, resulting in type changes in structs, API etc.

Thanks to your example, I wrote a version of the parsing functions that uses CompleteByteSlice locally, while the rest of the code is unaffected:
https://github.com/rusticata/ssh-parser/blob/nom4/src/ssh.rs#L57-L76

I'm not sure yet this is the best solution, but the tests now pass.

@khernyo
Copy link
Contributor

khernyo commented May 18, 2018

I'm pretty sure I'm missing something, but would it be possible to change parse_string to do the str::from_utf8 conversion (and use parse_bytes where it really is a byte slice)? This way parse_name_list could simply split the string at , like this:

named!(parse_bytes<&[u8]>, do_parse!(
    len: be_u32 >>
    string: take!(len) >>
    ( string )
));

named!(parse_string<&str>, do_parse!(
    len: be_u32 >>
    string: map_res!(take!(len), str::from_utf8) >>
    ( string )
));

fn parse_name_list(i: &str) -> Vec<&str> {
    i.split(',').collect()
}

NOTE: it does not incorporate the is_us_ascii check anymore, so that still needs to be done somewhere.

It seems to me that it's possible, but I know nothing about the ssh protocol. I did try the rewrite and the tests still pass, but it might break something.

@ZerothLaw
Copy link

take_while! on &[u8] is unintuitive and creates breaking changes.

Here's a more minimal use case:

use nom::is_alphanumeric;

named!(alphanumeric, take_while!(is_alphanumeric));
fn main() {
    let r = alphanumeric(b"abcd");
    println!("r: {:?}", r);
}

Your documentation talks about &[u8] types everywhere, nor does anything really mention "CompleteByteSlice" in the basic guides.

@Lythenas
Copy link

To me this was really annoying too. I had to use CompleteStr everywhere.

Most cases where the parsers return Incomplete there is actually a partial result that could be used.

Maybe it would be better to add a distinction between a partial results and actual incomplete input errors. In my opinion the user of the library should decide if the partial result is an error or not.

@Geal
Copy link
Collaborator

Geal commented Jun 17, 2019

fixed in nom 5 where take_while has streaming or complete versions depending on what we might need

@Geal Geal closed this as completed Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants