Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separated_list is inconsistent about empty elements #1165

Closed
gparker42 opened this issue Jun 15, 2020 · 2 comments
Closed

separated_list is inconsistent about empty elements #1165

gparker42 opened this issue Jun 15, 2020 · 2 comments
Milestone

Comments

@gparker42
Copy link

In nom-5.1.2 and nom-6.0.0, the separated_list functions are inconsistent when the element parser allows an empty string.

Test:

extern crate nom;

use nom::multi::*;
use nom::character::complete::*;
use nom::Err::Error;
use nom::IResult;

fn test_empty_digit0(s: &'static str) {
    pretty_print(s, separated_list0(char(','), digit0)(s.as_bytes()))
}

fn test_nonempty_digit0(s: &'static str) {
    pretty_print(s, separated_list1(char(','), digit0)(s.as_bytes()))
}

fn test_empty_digit1(s: &'static str) {
    pretty_print(s, separated_list0(char(','), digit1)(s.as_bytes()))
}

fn test_nonempty_digit1(s: &'static str) {
    pretty_print(s, separated_list1(char(','), digit1)(s.as_bytes()))
}

fn main() {
    let inputs = vec!["$", "1$", "1,$", "1,,$", "1,2$", "1,,2$", ",$", ",2$", ",,2$"];

    println!("\nseparated_list0(char(','), digit0)");
    for i in &inputs { test_empty_digit0(i) }

    println!("\nseparated_list1(char(','), digit0)");
    for i in &inputs { test_nonempty_digit0(i) }

    println!("\nseparated_list0(char(','), digit1)");
    for i in &inputs { test_empty_digit1(i) }

    println!("\nseparated_list1(char(','), digit1)");
    for i in &inputs { test_nonempty_digit1(i) }
}

fn pretty_print(input: &'static str, result: IResult<&[u8], Vec<&[u8]>>) {
    let quoted = format!("{:<7}", format!("\"{}\"", input));
    match result {
        Err(Error((rest, err))) =>
            println!("{}  error {:?}, rest \"{}\"",
                     quoted, err, String::from_utf8_lossy(rest).to_string()),
        Ok((rest, list))  =>
            println!("{}  ok [{}], rest \"{}\"", quoted,
                     list.iter().fold(String::new(), |acc, &s| acc+"\""+&String::from_utf8_lossy(s)+"\", ").trim_end_matches(", "),
                     String::from_utf8_lossy(rest).to_string()),
        x =>
            println!("{}  {:?}", input, x)
    }
}

Output:

separated_list0(char(','), digit0)
"$"      error SeparatedList, rest "$"
"1$"     ok ["1"], rest "$"
"1,$"    ok ["1", ""], rest "$"
"1,,$"   ok ["1", "", ""], rest "$"
"1,2$"   ok ["1", "2"], rest "$"
"1,,2$"  ok ["1", "", "2"], rest "$"
",$"     error SeparatedList, rest ",$"
",2$"    error SeparatedList, rest ",2$"
",,2$"   error SeparatedList, rest ",,2$"

separated_list1(char(','), digit0)
"$"      error SeparatedList, rest "$"
"1$"     ok ["1"], rest "$"
"1,$"    ok ["1", ""], rest "$"
"1,,$"   ok ["1", "", ""], rest "$"
"1,2$"   ok ["1", "2"], rest "$"
"1,,2$"  ok ["1", "", "2"], rest "$"
",$"     error SeparatedList, rest ",$"
",2$"    error SeparatedList, rest ",2$"
",,2$"   error SeparatedList, rest ",,2$"

separated_list0(char(','), digit1)
"$"      ok [], rest "$"
"1$"     ok ["1"], rest "$"
"1,$"    ok ["1"], rest ",$"
"1,,$"   ok ["1"], rest ",,$"
"1,2$"   ok ["1", "2"], rest "$"
"1,,2$"  ok ["1"], rest ",,2$"
",$"     ok [], rest ",$"
",2$"    ok [], rest ",2$"
",,2$"   ok [], rest ",,2$"

separated_list1(char(','), digit1)
"$"      error Digit, rest "$"
"1$"     ok ["1"], rest "$"
"1,$"    ok ["1"], rest ",$"
"1,,$"   ok ["1"], rest ",,$"
"1,2$"   ok ["1", "2"], rest "$"
"1,,2$"  ok ["1"], rest ",,2$"
",$"     error Digit, rest ",$"
",2$"    error Digit, rest ",2$"
",,2$"   error Digit, rest ",,2$"

Note that the digit0 matches returns error SeparatedList when the first element is empty (such as ",2$"), but succeed and returns empty strings when any subsequent element is empty (such as "1,$"). The expected result would be to always allow empty elements or always reject them.

The inconsistency is in the consume tests after the two uses of match f in separated_list. The first consume test if i1 == i … rejects an empty first element. The second consume test if i2 == i … is probably intended to reject subsequent empty elements. That test is always false because the match sep clause will always either advance past i or exit.

I see two possible fixes: reject empty elements more often, or don't reject empty elements.

To reject empty elements, change the consume test for the second match f to if i2 == i1 …. That change yields the following output:

separated_list0(char(','), digit0)
"$"      error SeparatedList, rest "$"
"1$"     ok ["1"], rest "$"
"1,$"    error SeparatedList, rest "$"
"1,,$"   error SeparatedList, rest ",$"
"1,2$"   ok ["1", "2"], rest "$"
"1,,2$"  error SeparatedList, rest ",2$"
",$"     error SeparatedList, rest ",$"
",2$"    error SeparatedList, rest ",2$"
",,2$"   error SeparatedList, rest ",,2$"

separated_list1(char(','), digit0)
"$"      error SeparatedList, rest "$"
"1$"     ok ["1"], rest "$"
"1,$"    error SeparatedList, rest "$"
"1,,$"   error SeparatedList, rest ",$"
"1,2$"   ok ["1", "2"], rest "$"
"1,,2$"  error SeparatedList, rest ",2$"
",$"     error SeparatedList, rest ",$"
",2$"    error SeparatedList, rest ",2$"
",,2$"   error SeparatedList, rest ",,2$"

The empty input case ("$") is still a bit weird after this change. Perhaps it would be better to handle that case specially to make separated_list0 return [] and separated_list1 return an error.

To accept empty elements, delete the consume test from both uses of match f. (Infinite loops are still prevented by the consume test in match sep.) That change yields the following output:

separated_list0(char(','), digit0)
"$"      ok [""], rest "$"
"1$"     ok ["1"], rest "$"
"1,$"    ok ["1", ""], rest "$"
"1,,$"   ok ["1", "", ""], rest "$"
"1,2$"   ok ["1", "2"], rest "$"
"1,,2$"  ok ["1", "", "2"], rest "$"
",$"     ok ["", ""], rest "$"
",2$"    ok ["", "2"], rest "$"
",,2$"   ok ["", "", "2"], rest "$"

separated_list1(char(','), digit0)
"$"      ok [""], rest "$"
"1$"     ok ["1"], rest "$"
"1,$"    ok ["1", ""], rest "$"
"1,,$"   ok ["1", "", ""], rest "$"
"1,2$"   ok ["1", "2"], rest "$"
"1,,2$"  ok ["1", "", "2"], rest "$"
",$"     ok ["", ""], rest "$"
",2$"    ok ["", "2"], rest "$"
",,2$"   ok ["", "", "2"], rest "$"

The empty input case ("$") is again a bit weird. Perhaps it would be better to handle that case specially to make separated_list0 return [] and separated_list1 return [""].

@gparker42
Copy link
Author

Probably related to #1063.

@Geal
Copy link
Collaborator

Geal commented Oct 25, 2020

now that ##1170 is merged, this should be more consistent

@Geal Geal closed this as completed Oct 25, 2020
@Geal Geal added this to the 6.0 milestone Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants