Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpath: xpath parser causes stack overflow #25

Closed
ArtemGolovko opened this issue Feb 8, 2024 · 10 comments · Fixed by #27
Closed

xpath: xpath parser causes stack overflow #25

ArtemGolovko opened this issue Feb 8, 2024 · 10 comments · Fixed by #27
Assignees

Comments

@ArtemGolovko
Copy link

I encountered stack overflow when parsing //head//script[not(@src) and not(contains(., "gtag"))];

I suspect that the parser might get in infinite loop. Or any long xpath expression can cause stack overflow.

@James-LG
Copy link
Owner

James-LG commented Feb 8, 2024

I'm not able to reproduce this. I have a test passing that looks like this

#[test]
fn parse_should_handle_multiple_nots_in_predicate() {
    // arrange
    let xpath_text = r###"//head//script[not(@src) and not(contains(., "gtag"))]"###;

    // act
    let xpath = crate::xpath::parse(xpath_text).unwrap();

    // assert
    assert_eq!(xpath.to_string(), xpath_text);
}

What function were you calling exactly to cause the stack overflow? Is the overflow when you are applying the expression maybe?

@ArtemGolovko
Copy link
Author

ArtemGolovko commented Feb 9, 2024

I'm sorry I did not include bug reproduction.

Here it goes.

use skyscraper::{html, xpath::{self, XpathItemTree}};

fn main() {
    let document = html::parse(include_str!("../annupay.htm")).unwrap();
    let xpath_item_tree = XpathItemTree::from(&document);
    let xpath = xpath::parse(r###"//head//script[not(@src) and not(contains(., "gtag"))]"###).unwrap();
}

You can comment out the last line of main and see overflow go away.

I think instead of my html file (annupay.html), any other html file, of similar size, can be used. In my case my html file is 12 kb. If it is not the case, I will make sure to investigate my html file.

@James-LG
Copy link
Owner

Strange, I created a new rust project and used your exact code with some random html file and it works fine for me.

Does it fail for you with just the third line?

fn main() {
    let xpath =
        xpath::parse(r###"//head//script[not(@src) and not(contains(., "gtag"))]"###).unwrap();
}

@ArtemGolovko
Copy link
Author

ArtemGolovko commented Feb 10, 2024

You're right, it's the last line of main, that causes the issue. The bug appears in debug build, but not in release build.

@James-LG
Copy link
Owner

Interesting, I think that means your right on the stack size limit and the release mode is optimizing it enough to fit in your stack. What kind of machine are you running the code on, is it a normal PC or some smaller device?

Try running the code in a thread with a higher stack size like in this stack overflow response https://stackoverflow.com/a/44042122

@ArtemGolovko
Copy link
Author

Thank you, will try increasing the stack size.

And here is my stats.

Stat Value
CPU Intel Core i5-6200U
RAM 12 GB
OS Windows 10
rustc version 1.74.0

I don't think that my hardware is the problem. It might be a rustc bug.

@ArtemGolovko
Copy link
Author

I check the stack size required is 1 mb 65 kb. For some reason in debug mode it fails to correctly allocate stack.

@James-LG
Copy link
Owner

I am able to reproduce this on Windows. I'll see what I can do.

@James-LG
Copy link
Owner

I think I have fixed this in #27

Try using version 0.6.1-beta.0 and let me know.

@ArtemGolovko
Copy link
Author

Thank you. I test your fix and it works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants