nom4: take_while broken ? #766

chifflier · 2018-05-16T09:30:03Z

Rust version : rustc 1.27.0-nightly (ac3c2288f 2018-04-18)
nom version : 4.0
nom compilation features used: none

Test case

Example test case:

#[macro_use]
extern crate nom;

use nom::*;

// US-ASCII printable characters without comma
#[inline]
fn is_us_ascii(c: u8) -> bool {
    c >= 0x20 && c <= 0x7e && c != 0x2c
}

named!(parse_name<&[u8]>, take_while!(is_us_ascii));


fn main() {
        let res = parse_name(b"ssh-rsa");
        let expected = Ok((&b""[..],&b"ssh-rsa"[..]));
        assert_eq!(res, expected);

}

The above test fails (regression from nom3):

Error(Incomplete(Size(1))) at l.68 by ' take_while ! ( is_us_ascii ) '
thread 'ssh::tests::test_name' panicked at 'assertion failed: `(left == right)`
  left: `Err(Incomplete(Size(1)))`,
 right: `Ok(([], [115, 115, 104, 45, 114, 115, 97]))`',

The text was updated successfully, but these errors were encountered:

khernyo · 2018-05-16T10:05:32Z

I think this is expected: https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#dealing-with-incomplete-usage

Change your parser to require CompleteByteSlice and it should work:

// ...
named!(parse_name<CompleteByteSlice, CompleteByteSlice>, take_while!(is_us_ascii));

fn main() {
        let res = parse_name(CompleteByteSlice(b"ssh-rsa"));
        let expected = Ok((CompleteByteSlice(b""), CompleteByteSlice(b"ssh-rsa")));
        assert_eq!(res, expected);
}

chifflier · 2018-05-16T11:13:53Z

It could work, except it creates an apocalypse of mismatched types in all calling functions and subparsers :(

chifflier · 2018-05-16T11:30:39Z

Also, how can I use `str::from_utf8' on the result ?

map_res!(parse_name, str::from_utf8)

fails on a CompleteByteSlice, and

map_res!(parse_name, |s| str::from_utf8(&s)

also fails, with error [u8] does not have a constant size known at compile-time`

Parsing slices has become really difficult :/

khernyo · 2018-05-16T12:47:17Z

How about this?

#[macro_use]
extern crate nom;

use nom::types::*;
use std::str;

#[inline]
fn is_us_ascii(c: u8) -> bool {
    c >= 0x20 && c <= 0x7e && c != 0x2c
}

named!(parse_name<CompleteByteSlice, &[u8]>, map!(take_while!(is_us_ascii), |b| b.0));

named!(parse_utf8<CompleteByteSlice, &str>, map_res!(parse_name, str::from_utf8));

fn main() {
    let res = parse_utf8(CompleteByteSlice(b"ssh-rsa"));
    let expected = Ok((CompleteByteSlice(b""), "ssh-rsa"));
    assert_eq!(res, expected);
}

Can you show what you are trying to parse? Maybe it's parsable without relying on eof, e.g. there might be a delimiter after the string or a length field before the string?

chifflier · 2018-05-18T07:13:15Z

My use case is to upgrade the ssh-parser crate, and in particular this parsing code:
https://github.com/rusticata/ssh-parser/blob/master/src/ssh.rs#L55-L71
Note the parser does not use eof, but take_while.

The difficulty is that the text to parse comes from a binary parser, so using CompleteByteSlice everywhere is not an option: it would need to be propagated everywhere, resulting in type changes in structs, API etc.

Thanks to your example, I wrote a version of the parsing functions that uses CompleteByteSlice locally, while the rest of the code is unaffected:
https://github.com/rusticata/ssh-parser/blob/nom4/src/ssh.rs#L57-L76

I'm not sure yet this is the best solution, but the tests now pass.

khernyo · 2018-05-18T10:23:54Z

I'm pretty sure I'm missing something, but would it be possible to change parse_string to do the str::from_utf8 conversion (and use parse_bytes where it really is a byte slice)? This way parse_name_list could simply split the string at , like this:

named!(parse_bytes<&[u8]>, do_parse!(
    len: be_u32 >>
    string: take!(len) >>
    ( string )
));

named!(parse_string<&str>, do_parse!(
    len: be_u32 >>
    string: map_res!(take!(len), str::from_utf8) >>
    ( string )
));

fn parse_name_list(i: &str) -> Vec<&str> {
    i.split(',').collect()
}

NOTE: it does not incorporate the is_us_ascii check anymore, so that still needs to be done somewhere.

It seems to me that it's possible, but I know nothing about the ssh protocol. I did try the rewrite and the tests still pass, but it might break something.

ZerothLaw · 2018-08-31T18:51:07Z

take_while! on &[u8] is unintuitive and creates breaking changes.

Here's a more minimal use case:

use nom::is_alphanumeric;

named!(alphanumeric, take_while!(is_alphanumeric));
fn main() {
    let r = alphanumeric(b"abcd");
    println!("r: {:?}", r);
}

Your documentation talks about &[u8] types everywhere, nor does anything really mention "CompleteByteSlice" in the basic guides.

Lythenas · 2018-08-31T19:51:37Z

To me this was really annoying too. I had to use CompleteStr everywhere.

Most cases where the parsers return Incomplete there is actually a partial result that could be used.

Maybe it would be better to add a distinction between a partial results and actual incomplete input errors. In my opinion the user of the library should decide if the partial result is an error or not.

Geal · 2019-06-17T14:36:36Z

fixed in nom 5 where take_while has streaming or complete versions depending on what we might need

Geal mentioned this issue Oct 6, 2018

usability: address changes introduced with Complete* and many* combinators #839

Closed

Geal added this to the 5.0 milestone Jun 17, 2019

Geal closed this as completed Jun 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nom4: take_while broken ? #766

nom4: take_while broken ? #766

chifflier commented May 16, 2018

khernyo commented May 16, 2018

chifflier commented May 16, 2018

chifflier commented May 16, 2018

khernyo commented May 16, 2018

chifflier commented May 18, 2018

khernyo commented May 18, 2018

ZerothLaw commented Aug 31, 2018

Lythenas commented Aug 31, 2018

Geal commented Jun 17, 2019

nom4: take_while broken ? #766

nom4: take_while broken ? #766

Comments

chifflier commented May 16, 2018

Test case

khernyo commented May 16, 2018

chifflier commented May 16, 2018

chifflier commented May 16, 2018

khernyo commented May 16, 2018

chifflier commented May 18, 2018

khernyo commented May 18, 2018

ZerothLaw commented Aug 31, 2018

Lythenas commented Aug 31, 2018

Geal commented Jun 17, 2019