Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching in the context of a node returns (potentially) unexpected results #61

Closed
mna opened this issue Feb 24, 2024 · 2 comments
Closed

Comments

@mna
Copy link

mna commented Feb 24, 2024

Hello Andy,

The following program returns the 2 <td> nodes under the first <tr> even though the selector gives the impression that it should look for a .start class in the decendents of that <tr> (and should not find any):

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/PuerkitoBio/goquery"
	"github.com/andybalholm/cascadia"
)

var data = `
<!DOCTYPE html>
<html>
<body>
    <table class="start">
        <tbody>
            <tr>
                <td>test1</td>
                <td>test2</td>
            </tr>
            <tr>
            <td>
                <table>
                    <tbody>
                        <tr>
                           <td>test3</td>
                           <td>test4</td>
                        </tr>
                        <tr>
                           <td>test5</td>
                           <td>test6</td>
                        </tr>
                    </tbody>
                </table>
              </td>
            </tr>
        </tbody>
    </table>
</body>
</html>
`

func main() {
	doc, err := goquery.NewDocumentFromReader(strings.NewReader(data))
	if err != nil {
		log.Fatal(err)
	}

	// find outer tr
	rowSelection := doc.Find(".start > tbody > tr")
	fmt.Println("row selection length: ", len(rowSelection.Nodes))
	rowSelection.Each(func(i int, s *goquery.Selection) {
		fmt.Println(i, goquery.NodeName(s), s.AttrOr("class", ""))
	})
	fmt.Println()

	// get first outer <tr> and look for .start inside it
	tr0 := rowSelection.Get(0)

	cs := getMatcher(".start")
	matches := cascadia.QueryAll(tr0, cs)
	fmt.Println("expecting 0, returns 0: ", len(matches))

	cs = getMatcher(".start > tbody")
	matches = cascadia.QueryAll(tr0, cs)
	fmt.Println("expecting 0, returns 0: ", len(matches))

	cs = getMatcher(".start > tbody > tr")
	matches = cascadia.QueryAll(tr0, cs)
	fmt.Println("expecting 0, returns 0: ", len(matches))

	cs = getMatcher(".start > tbody > tr > td")
	matches = cascadia.QueryAll(tr0, cs)
	fmt.Println("expecting 0, returns 2: ", len(matches))
}

func getMatcher(s string) cascadia.Matcher {
	m, err := cascadia.ParseWithPseudoElement(s)
	if err != nil {
		log.Fatal(err)
	}
	return m
}

Correct me if I'm wrong but I think it might be working as intended in Cascadia, even though it differs from what folks may be used to with jQuery, in that (IIUC) the selector is always started from the root of the document, but only descendents of the contextual node are returned (if they do match).

This has come up in the context of PuerkitoBio/goquery#468, but after investigation and reading through some issues you closed, I have the feeling it is by design.

Thanks,
Martin

@andybalholm
Copy link
Owner

Yes, you're understanding it correctly. Cascadia selectors have no concept of context. They work like vanilla JavaScript, not like JQuery. To understand what I mean by that, open that HTML in a browser and type the following in the console:

tr0 = document.querySelector(".start > tbody > tr"); 
matches = tr0.querySelectorAll(".start > tbody > tr > td");
console.log(matches);

@mna
Copy link
Author

mna commented Feb 24, 2024

Thanks, makes sense. I'll make sure to document it clearly in goquery to manage expectations.

@mna mna closed this as completed Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants