Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nom consuming 100% cpu #27

Closed
filipegoncalves opened this issue Apr 9, 2015 · 15 comments
Closed

nom consuming 100% cpu #27

filipegoncalves opened this issue Apr 9, 2015 · 15 comments

Comments

@filipegoncalves
Copy link
Contributor

I am exploring the possibility of switching to nom in a project I am working on. I am not fully familiar with nom yet, so please bear with me.

For starters, I was trying to come up with a parser that matches strings of the form [a-zA-Z][-a-zA-Z0-9_]*. I wrote this:

#[macro_use]
extern crate nom;

use std::str::from_utf8;

use nom::{alpha, alphanumeric};
use nom::{IResult, Needed};
use nom::IResult::*;

named!(identifier<&[u8], String>,
       chain!(
           h: map_res!(alpha, from_utf8) ~
           t: many0!(alt!(alphanumeric | tag!("-") | tag!("_"))),
           || {
               let  s = h.to_string();
               t.into_iter().fold(s, |mut accum, slice| {
                   accum.push_str(from_utf8(slice).unwrap()); accum })}));

And I tested it with:

    #[test]
    fn id_name() {
        let a_setting = &b"miles"[..];
        let res = setting_name(a_setting);
        assert_eq!(res, Done(&b""[..], "miles".to_string()));
    }

When I run cargo test my PC completely hangs. With top I can see that it starts consuming more and more CPU and memory until the entire system is completely unusable and I have to hard reset.

Am I doing something wrong? Is this the best way to make a parser to match this type of strings?

@filipegoncalves filipegoncalves changed the title nom consuming 100% cpu nom consuming 100% cpu Apr 9, 2015
@filipegoncalves
Copy link
Contributor Author

Update:

This is harmless:

named!(setting_name<&[u8], String>,
       chain!(
           h: map_res!(alpha, from_utf8) ~
           t: map_res!(alt!(alphanumeric | tag!("-") | tag!("_")), from_utf8),
           || { let mut s = h.to_string(); s.push_str(t); s }));

Apparently, adding many0! on t is the issue here. When I do it, the problem is back. Weird.

@Geal
Copy link
Collaborator

Geal commented Apr 9, 2015

Can I get back to you on Saturday about this issue? I am currently at a conference and do not have much time.

Usually, when there was a lot of CPU and memory consumption, it was caused by the compiler having a hard time resolving macros. Since nom uses a lot of macros, it happens sometimes.

You could try making a named function for the many0 line and call it from the chain. Please let me know if it does not work even with that fix, that will mean I have something to fix :)
Which version of nom are you using?

@filipegoncalves
Copy link
Contributor Author

Sure, take your time.

I tried with latest nom version on crates.io (0.2.1) with rustc 1.0.0-nightly (6436e348e). I also forked the repo. and applied #25 - same thing happened (this time on rustc 1.0.0-beta).

I'll try to follow your suggestion and go with a function when I have the chance. I'll let you know when I have news.

Enjoy the conference!

@thehydroimpulse
Copy link

I've also experienced this with the many0 and many1 macros during runtime (tests just hang). Haven't dug into the issue yet.

@Geal
Copy link
Collaborator

Geal commented Apr 9, 2015

Ok, can you share the code? I'll investigate.
Le 10 avr. 2015 01:00, "Daniel Fagnan" notifications@github.com a écrit :

I've also experienced this with the many0 and many1 macros during runtime
(tests just hang). Haven't dug into the issue yet.


Reply to this email directly or view it on GitHub
#27 (comment).

@filipegoncalves
Copy link
Contributor Author

After some digging I have a minimal test case that reproduces the issue. It is triggered by combining many0!() with alphanumeric. The problem is that many0!() expands into a loop that only breaks when the underlying parser returns something that is not Done. However, alphanumeric always returns Done, thus the loop never terminates.

The same happens with alpha. In general, any parser that always returns Done will cause an infinite loop when wrapped inside a many0!.

This is enough to reproduce the bug:

#[macro_use]
extern crate nom;

use nom::alphanumeric;
use nom::IResult;
use nom::IResult::*;

named!(just_a_test<&[u8], Vec<&[u8]> >,
       many0!(alphanumeric));

fn main() {
    println!("Entered main()");

    let test = &b"1as32"[..];
    let res = just_a_test(test);

    println!("Parsed input");
    assert_eq!(res, Done(&b""[..], vec![&b"1"[..], &b"a"[..], &b"s"[..], &b"3"[..], &b"2"[..]]));
}

This will print Entered main() and then it hangs on just_a_test().

In your opinion, what would be the correct way to fix this? I thought about changing alphanumeric and alpha so as not to return Done, but I don't know if this makes sense from a design point of view.

Perhaps a better design would be for many0!() to terminate if the underlying parser returned Done but didn't consume any input (in addition to stopping if the parser doesn't return Done, as it is now). What do you think? I'd be glad to work on this bug.

@Geal
Copy link
Collaborator

Geal commented Apr 11, 2015

Good catch. I think alpha, alphanumeric, is_a and others should be able to return Error if they do not find any input, since we have opt! for the cases where we could ignore some input.

I will also update many to ignore parsers that accept empty input. I just checked Parsec, there's a special case for that in their code as well.

@filipegoncalves
Copy link
Contributor Author

Alright, good to know. I'll wait for your fix.

Keep up the good job, this library is really cool.

Geal added a commit that referenced this issue Apr 11, 2015
@Geal
Copy link
Collaborator

Geal commented Apr 11, 2015

Here is a fix, let me know if this works for you.

@filipegoncalves
Copy link
Contributor Author

Works like a charm. Thanks!

@Geal
Copy link
Collaborator

Geal commented Apr 12, 2015

Great!

@NilsIrl
Copy link
Contributor

NilsIrl commented Oct 14, 2021

Failing when nothing is parsed is probably not a great idea either, it may be better if many just stopped parsing.

For example, I'm trying to parse comments (in the context of a programming language), and have the following code:

fn space0(input: &str) -> IResult<&str, &str> {
    recognize(many0(tuple((
        complete::multispace0,
        opt(comment),
        complete::multispace0,
    ))))(input)
}

Unfortunately it fails when there is no whitespace, what can I do?

@Stargateur
Copy link
Contributor

@NilsIrl without detail of grammar of what you try to parse is gonna be hard to advice you.

@Geal
Copy link
Collaborator

Geal commented Oct 14, 2021

it should be possibl to make a separate function for tuple((complete::multispace0, opt(comment), complete::multispace0)), and in that one check the first character. If it's not a space or the first char of a comment, return an error

@NilsIrl
Copy link
Contributor

NilsIrl commented Oct 18, 2021

I was able to fix it like so:

fn space0(input: &str) -> IResult<&str, &str> {
    recognize(many0(verify(
        recognize(tuple((
            complete::multispace0,
            opt(comment),
            complete::multispace0,
        ))),
        |s: &str| !s.is_empty(),
    )))(input)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants