Orgize validation fails when parsing certain unicode values #22

calmofthestorm · 2020-05-09T01:48:37Z

In general I expect weird unicode values to get "interesting" results, but I'm going to report this since it results in a panic when debug_assertions are enabled.

Each of these characters, alone, as input, results in a panic in debug builds. I recommend running the example below with --release as otherwise calling parse will panic.

Up to you as to whether it's worth fixing. I saw you had a fuzz test in the source tree so I assume that crashes like this might be of interest, but I can also understand not wanting to go down the unicode rabbithole and it's unclear to me how often these actually come up in real use.

The one or two I tested with org-element work correctly -- a headline containing them in the title is parsed correctly.

fn main() {
    let s = "\u{000b}\u{0085}\u{00a0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200a}\u{2028}\u{2029}\u{202f}\u{205f}\u{3000}";

    for (i, c) in s.chars().enumerate() {
        let org = orgize::Org::parse_string(c.to_string());
        println!("Validation ok for {}: {}", i, org.validate().is_empty());
    }
}

The text was updated successfully, but these errors were encountered:

PoiScript · 2020-05-09T07:03:21Z

Thanks for reporting. Orgize will automatically validate the parsed struct and panic if any error occurs. It's disabled in release mode for increasing performance. For fuzz test, I believe it was broken after I upgraded to 2018 edition. But I just keep forgetting to fix it.

PoiScript · 2020-05-09T17:17:35Z

Oh, I see. I only check for the ascii whitespaces in some functions. But str::trim actually remove both ascii whitespaces and unicode whitespaces.

PoiScript · 2020-05-09T17:55:01Z

This was fixed by ba9c83c. But I decided to keep this issue opened as a remainder and closed it once we replace every u8::is_ascii_whitespace with char::is_whitespace.

PoiScript added a commit that referenced this issue May 9, 2020

fix(parser): use char::is_whitespace (#22)

ba9c83c

PoiScript added a commit that referenced this issue May 9, 2020

test: add test cases for issue #22

b482dcb

PoiScript mentioned this issue May 16, 2020

discuss: a new specifications #36

Open

PoiScript mentioned this issue Nov 17, 2023

Announcing v0.10 #70

Open

PoiScript closed this as completed Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orgize validation fails when parsing certain unicode values #22

Orgize validation fails when parsing certain unicode values #22

calmofthestorm commented May 9, 2020 •

edited

Loading

PoiScript commented May 9, 2020

PoiScript commented May 9, 2020

PoiScript commented May 9, 2020

Orgize validation fails when parsing certain unicode values #22

Orgize validation fails when parsing certain unicode values #22

Comments

calmofthestorm commented May 9, 2020 • edited Loading

PoiScript commented May 9, 2020

PoiScript commented May 9, 2020

PoiScript commented May 9, 2020

calmofthestorm commented May 9, 2020 •

edited

Loading