Check if search index out of bounds #21

masc-it · 2023-12-12T12:01:06Z

Problem

Let's assume I want to search for a strong[1], with a parent node //div, but some of the divs don't have it. As the search is implemented right now, the code will just panic, since it does not handle out of bound indexing.

Solution
Just add a simple check on the search index. I didn't fix it deeper, at the DocumentNodeSet level, since raw indexing is heavily used in a lot of places and would require more effort.

James-LG

Hi, thanks for your PR.

FYI the indexing in general is currently broken and requires much larger changes; it indexes based on the entire set of nodes, rather than for each parent node.

For example, applying //node/div[1] to the html below should return <div>hello</div> and <div>world</div> but actually only returns <div>hello</div>.

<root>
    <node>
        <div>hello</div>
        <div>foo</div>
    </node>
    <node>
        <div>world</div>
        <div>bar</div>
    </node>
    <node>
        <p>not a div</p>
    </node>
</root>

I'm currently working on a full rewrite of the XPath side of the library since it was originally hacked together without regard for the official XPath specification as a "good enough" library for another project I had.

I'm putting my effort towards the rewrite (which is quite an undertaking) rather than fixing this code master...james/nom

Just to avoid the panic I'll accept these changes if you fix the off-by-one error and ensure if the index is greater than the matched nodes, none are returned rather than all.

src/xpath/search.rs

masc-it · 2023-12-15T08:12:34Z

@James-LG that's awesome news, thanks for your effort!

In the last week I was having a deep dive of the current main branch and noticed some things, which maybe you're already addressing in the new version:

contains(@attribute, 'value') filter
- I've implemented a working solution offline btw
- As an alternative, one could implement a way easier to parse contains symbol, CSS-style, with *= (even though is not XPath compliant, but still, maybe it's worth for a simpler implementation and a seamless transition in case of a CSS migration!)
and / or in predicates
possibility to use @text as attribute, just like you can in chrome. ( e.g. //div[contains(@text, 'nice')] )
and yeah, indexing has a very weak implementation but we already know it.

BTW, I'll take a look to the new nom branch, happy to help if needed :)

James-LG · 2023-12-16T00:28:51Z

BTW, I'll take a look to the new nom branch, happy to help if needed :)

I'd like to get the basic use-cases working first, so the structure is a bit more settled than it currently is, but after that support on things like the contains functions would be great! XPath is pretty huge so there's lots of parallel work once the basics are in place.

masc-it · 2023-12-16T07:58:00Z

Clear! Do you have a roadmap in place? Or a discord channel to post updates?

James-LG · 2023-12-17T03:29:52Z

Since GitHub apparently doesn't have direct messaging I created a brand new discord channel https://discord.gg/jWK42bWK

As for roadmap, I don't have anything formal. Vaguely it will be getting basic steps working / and // (including initial occurrences which behave differently), then filtered expressions like /div[@class='hi'], and go from there.

masc-it added 2 commits December 12, 2023 12:58

check if index out of bounds

79c2f4e

cargo fmt

8004e3d

James-LG requested changes Dec 15, 2023

View reviewed changes

src/xpath/search.rs Outdated Show resolved Hide resolved

off-by-one error, return empty node set

d69fa99

James-LG approved these changes Dec 16, 2023

View reviewed changes

James-LG merged commit bf171c3 into James-LG:master Dec 16, 2023

James-LG mentioned this pull request Dec 29, 2023

BREAKING: Complete xpath module rewrite #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check if search index out of bounds #21

Check if search index out of bounds #21

masc-it commented Dec 12, 2023

James-LG left a comment

masc-it commented Dec 15, 2023

James-LG commented Dec 16, 2023

masc-it commented Dec 16, 2023

James-LG commented Dec 17, 2023 •

edited

Check if search index out of bounds #21

Check if search index out of bounds #21

Conversation

masc-it commented Dec 12, 2023

James-LG left a comment

Choose a reason for hiding this comment

masc-it commented Dec 15, 2023

James-LG commented Dec 16, 2023

masc-it commented Dec 16, 2023

James-LG commented Dec 17, 2023 • edited

James-LG commented Dec 17, 2023 •

edited