Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orgize validation fails when parsing certain unicode values #22

Closed
calmofthestorm opened this issue May 9, 2020 · 3 comments
Closed

Comments

@calmofthestorm
Copy link
Contributor

calmofthestorm commented May 9, 2020

In general I expect weird unicode values to get "interesting" results, but I'm going to report this since it results in a panic when debug_assertions are enabled.

Each of these characters, alone, as input, results in a panic in debug builds. I recommend running the example below with --release as otherwise calling parse will panic.

Up to you as to whether it's worth fixing. I saw you had a fuzz test in the source tree so I assume that crashes like this might be of interest, but I can also understand not wanting to go down the unicode rabbithole and it's unclear to me how often these actually come up in real use.

The one or two I tested with org-element work correctly -- a headline containing them in the title is parsed correctly.

fn main() {
    let s = "\u{000b}\u{0085}\u{00a0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200a}\u{2028}\u{2029}\u{202f}\u{205f}\u{3000}";

    for (i, c) in s.chars().enumerate() {
        let org = orgize::Org::parse_string(c.to_string());
        println!("Validation ok for {}: {}", i, org.validate().is_empty());
    }
}
@PoiScript
Copy link
Owner

Thanks for reporting. Orgize will automatically validate the parsed struct and panic if any error occurs. It's disabled in release mode for increasing performance. For fuzz test, I believe it was broken after I upgraded to 2018 edition. But I just keep forgetting to fix it.

@PoiScript
Copy link
Owner

Oh, I see. I only check for the ascii whitespaces in some functions. But str::trim actually remove both ascii whitespaces and unicode whitespaces.

@PoiScript
Copy link
Owner

This was fixed by ba9c83c. But I decided to keep this issue opened as a remainder and closed it once we replace every u8::is_ascii_whitespace with char::is_whitespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants