Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if search index out of bounds #21

Merged
merged 3 commits into from
Dec 16, 2023

Conversation

masc-it
Copy link
Contributor

@masc-it masc-it commented Dec 12, 2023

Problem

Let's assume I want to search for a strong[1], with a parent node //div, but some of the divs don't have it. As the search is implemented right now, the code will just panic, since it does not handle out of bound indexing.

Solution
Just add a simple check on the search index. I didn't fix it deeper, at the DocumentNodeSet level, since raw indexing is heavily used in a lot of places and would require more effort.

Copy link
Owner

@James-LG James-LG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your PR.

FYI the indexing in general is currently broken and requires much larger changes; it indexes based on the entire set of nodes, rather than for each parent node.

For example, applying //node/div[1] to the html below should return <div>hello</div> and <div>world</div> but actually only returns <div>hello</div>.

<root>
    <node>
        <div>hello</div>
        <div>foo</div>
    </node>
    <node>
        <div>world</div>
        <div>bar</div>
    </node>
    <node>
        <p>not a div</p>
    </node>
</root>

I'm currently working on a full rewrite of the XPath side of the library since it was originally hacked together without regard for the official XPath specification as a "good enough" library for another project I had.

I'm putting my effort towards the rewrite (which is quite an undertaking) rather than fixing this code master...james/nom

Just to avoid the panic I'll accept these changes if you fix the off-by-one error and ensure if the index is greater than the matched nodes, none are returned rather than all.

src/xpath/search.rs Outdated Show resolved Hide resolved
@masc-it
Copy link
Contributor Author

masc-it commented Dec 15, 2023

@James-LG that's awesome news, thanks for your effort!

In the last week I was having a deep dive of the current main branch and noticed some things, which maybe you're already addressing in the new version:

  • contains(@attribute, 'value') filter
    • I've implemented a working solution offline btw
    • As an alternative, one could implement a way easier to parse contains symbol, CSS-style, with *= (even though is not XPath compliant, but still, maybe it's worth for a simpler implementation and a seamless transition in case of a CSS migration!)
  • and / or in predicates
  • possibility to use @text as attribute, just like you can in chrome. ( e.g. //div[contains(@text, 'nice')] )
  • and yeah, indexing has a very weak implementation but we already know it.

BTW, I'll take a look to the new nom branch, happy to help if needed :)

@James-LG James-LG merged commit bf171c3 into James-LG:master Dec 16, 2023
@James-LG
Copy link
Owner

BTW, I'll take a look to the new nom branch, happy to help if needed :)

I'd like to get the basic use-cases working first, so the structure is a bit more settled than it currently is, but after that support on things like the contains functions would be great! XPath is pretty huge so there's lots of parallel work once the basics are in place.

@masc-it
Copy link
Contributor Author

masc-it commented Dec 16, 2023

Clear! Do you have a roadmap in place? Or a discord channel to post updates?

@James-LG
Copy link
Owner

James-LG commented Dec 17, 2023

Since GitHub apparently doesn't have direct messaging I created a brand new discord channel https://discord.gg/jWK42bWK

As for roadmap, I don't have anything formal. Vaguely it will be getting basic steps working / and // (including initial occurrences which behave differently), then filtered expressions like /div[@class='hi'], and go from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants