Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle XPath selectors #537

Closed
leeroybrun opened this issue Aug 25, 2017 · 41 comments · Fixed by #1620
Closed

Handle XPath selectors #537

leeroybrun opened this issue Aug 25, 2017 · 41 comments · Fixed by #1620
Assignees

Comments

@leeroybrun
Copy link

leeroybrun commented Aug 25, 2017

Is it planned/wanted to implement XPath selectors?

This can be done quite easily, but I don't know if it's a choice to not handle them?

I've implemented two methods in my own code to handle them :

async waitForXpath(selector, options = { polling: 'mutation' }) {
  return this.waitForFunction(selector => {
    return null !== document.evaluate(selector, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
  }, options, selector);
}

async $XPath(selector) {
  const remoteObject = await this._rawEvaluate(selector => {
    return document.evaluate(selector, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
  }, selector);

  if (remoteObject.subtype === 'node')
    return new ElementHandle(this._client, remoteObject, this._mouse);

  await utils.releaseObject(this._client, remoteObject);
  return null;
}

I can make a PR if you're interested.

@aslushnikov
Copy link
Contributor

There have been no plans to add xpath since it hasn't been requested much. However, the #382 would make it possible to implement polyfils atop of Puppeteer API in just a few lines of code - would it be good enough for you?

@leeroybrun
Copy link
Author

XPath can be really useful and more powerful in some situations (selecting an element by it's text content is just one of the many use cases).

But the #382 could work, XPath will just not be "out-of-the-box" and users will still have to implement it themselves, I guess?

@xprudhomme
Copy link
Contributor

+1 , I , would like too XPath selectors to be handled.

XPath is so much more powerful than CSS selectors when it comes to query the DOM. As it has previously been said, it's a pain not being able to select an element based on its text content (which often a lot), or to select nodes based on ancestors predicates ( for instant //*[@Class='whateverclass'][not(ancestor::div[@id='abcdef'])] )

Home-made methods have to be implemented to handle XPaths, it does the job, but I'm pretty sure we are not the only ones with this need.

@kensoh
Copy link
Contributor

kensoh commented Sep 6, 2017

XPath is really expressive, for example and, or, contains can be used to form very specific selection criteria hard to write or not possible with css selectors. However it is not straightforward to implement, and maybe considering after initial stable release is better. To reduce the additional work, fixes, updates prior to that, etc.

I'll highlight 2 examples why @leeroybrun implementation after being implemented, will not be the end of the story and will open more fixes / iterations needed, as users start to try out and report failing use cases with XPath.

  1. The document context (in bold) will not work, I believe, for handling elements in a frame. That document context would have to be changed to something like frame_element.contentDocument to work on elements within the frame. document.evaluate(selector, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue

  2. Also, for handling multiple elements to be returned to iterate through the elements, something like XPathResult.ORDERED_NODE_SNAPSHOT_TYPE may have to be used in place of XPathResult.FIRST_ORDERED_NODE_TYPE

I personally prefer XPath anytime over CSS selectors, however there are much more use cases it offers than can be easily implemented. Leeroy's implementation could be a great first iteration but I believe there'll be more work to come to fully let XPath implementation work across Puppeteer's API.

@leeroybrun
Copy link
Author

leeroybrun commented Sep 7, 2017

My implementation was really simple, and only tested on one of my personnal projects.
The goal was mainly to open the discussion and see if it was something needed by others too.

Regarding the point 2, the $XPath function was the equivalent of the $ one in the API (selecting only one element).
I've made a separate $$XPath function (equivalent of the $$ one) using the ORDERED_NODE_SNAPSHOT_TYPE and returning multiple elements.

It's basically a copy of the $$ function.
Not a very beautiful way to do it, but it works (again, in my use case).
If it was implemented natively, we could abstract some parts and avoid the code duplication between $$/$XPath, and $/$XPath.

Anyway, here it is, in case it would be useful to someone else :

  async $$XPath(selector) {
    const remoteObject = await this._rawEvaluate(selector => {
      let results = [];
      let query = document.evaluate(selector, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
      for (let i=0, length = query.snapshotLength; i < length; ++i) {
        results.push(query.snapshotItem(i));
      }
      return results;
    }, selector);
    const response = await this._client.send('Runtime.getProperties', {
      objectId: remoteObject.objectId,
      ownProperties: true
    });
    const properties = response.result;
    const result = [];
    const releasePromises = [utils.releaseObject(this._client, remoteObject)];
    for (const property of properties) {
      if (property.enumerable && property.value.subtype === 'node')
        result.push(new ElementHandle(this._client, property.value, this._mouse));
      else
        releasePromises.push(utils.releaseObject(this._client, property.value));
    }
    await Promise.all(releasePromises);
    return result;
  }

In my opinion, having XPath implemented in the puppeteer API will be a big advantage.
But I understand your position and wanting to have a stable release before adding more features.

@kaushiksundar
Copy link

+1 for XPath

1 similar comment
@mdeora
Copy link

mdeora commented Sep 11, 2017

+1 for XPath

@matthewlilley
Copy link

+1

@sradu
Copy link

sradu commented Oct 7, 2017

@aslushnikov Now that JSHandles are merged, any advice on the best way to implement xpath?

I'm especially wondering about the best way to use it with waitForSelector, where visible is incredibly helpful.

@aslushnikov
Copy link
Contributor

@sradu the following script works fine for me:

async function xpath(page, path) {
  const resultsHandle = await page.evaluateHandle(path => {
    let results = [];
    let query = document.evaluate(path, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
    for (let i=0, length = query.snapshotLength; i < length; ++i) {
      results.push(query.snapshotItem(i));
    }
    return results;
  }, path);
  const properties = await resultsHandle.getProperties();
  const result = [];
  const releasePromises = [];
  for (const property of properties.values()) {
    const element = property.asElement();
    if (element)
      result.push(element);
    else
      releasePromises.push(property.dispose());
  }
  await Promise.all(releasePromises);
  return result;
}

const pptr = require('puppeteer');
(async () => {
  const browser = await pptr.launch();
  const page = await browser.newPage();
  await page.setContent('<div>hello!</div><div>other!</div>');
  const [handle1, handle2] = await xpath(page, '//div');
  console.log(await page.evaluate(e => e.textContent, handle1));
  console.log(await page.evaluate(e => e.textContent, handle2));
  await page.close();
  await browser.close();
})();

@bootstraponline
Copy link

XPath support would be awesome.

@jpap
Copy link

jpap commented Oct 10, 2017

+1 to this. I'd also love to see a function that waits for an XPath to "disappear", i.e. XPath not valid any longer, e.g. a popup is closed.

@huan
Copy link

huan commented Oct 18, 2017

+1

@geekkun
Copy link

geekkun commented Oct 27, 2017

+1

@aslushnikov
Copy link
Contributor

If someone comes up with a puppeteer-xpath package, we'd be happy to point to it in our docs.

@xprudhomme
Copy link
Contributor

xprudhomme commented Oct 31, 2017

@jpap I've developed a whole bunch of XPath methods, such as this one which might help you (similar to the Frame.waitForSelector method):

      /**
       * Wait for the XPath selector to appear in page. If at the moment of calling the method the selector already exists, 
       * The method will return immediately. If the selector doesn't appear after the timeout milliseconds of waiting, the function will throw.
       * @param {String}  An XPath selector of an element to wait for
       * @param {Object} Optional waiting parameters. (Same as in waitForSelector method)
       * @return {Promise} Promise which resolves when element specified by XPath string is added to DOM.
       */
      async waitForXpath(selector, options = {}) { // , options = { polling: 'mutation' }

        const timeout = options.timeout || 30000;
        const waitForVisible = !!options.visible;
        const waitForHidden = !!options.hidden;
        const polling = waitForVisible || waitForHidden ? 'raf' : 'mutation';

        console.log(" [waitForXpath] About to wait for element: " + selector);

        return this.page.waitForFunction(predicate, {timeout, polling}, selector, waitForVisible, waitForHidden);

        /**
         * @param {string} XPath selector
         * @param {boolean} waitForVisible
         * @param {boolean} waitForHidden
         * @return {boolean}
         */
        function predicate(selector, waitForVisible, waitForHidden) {

          const node = document.evaluate(selector, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;

          if (!node)
            return waitForHidden;

          if (!waitForVisible && !waitForHidden)
            return true;

          const style = window.getComputedStyle(node);
          const isVisible = style && style.display !== 'none' && style.visibility !== 'hidden';

          return (waitForVisible === isVisible || waitForHidden === !isVisible);
        }
      }

@Mukeshcse31
Copy link

+1

1 similar comment
@divyamvn
Copy link

divyamvn commented Nov 7, 2017

+1

@TildeWill
Copy link

+1 we also implemented our own XPath selector for text as the first thing after installing Puppeteer

@stackflows
Copy link

+1 We, too, are in dire need of XPath in puppeteer, as we are dealing with a so-called "web application" whose elements are - for the vast majority - not addressable by CSS-selectors.
Only XPath works.

@DavertMik
Copy link

DavertMik commented Dec 18, 2017

At CodeceptJS we implemented XPath locators support for Puppeteer
CodeceptJS is a high-level testing framework and uses Puppeteer as one of the supported backends. So you can try to use it as the sane abstraction over the current API.

Anyway, native XPath support from Puppeteer would be appreciated )

@paulirish
Copy link
Contributor

#1620 is proposed and introduces:

  • page.xpath()
  • frame.xpath()
  • elementHandle.xpath()

⬇️ Mini-poll for the puppeteer community: ⬇️

Please 👍 this comment if you would use elementHandle.xpath(). Or you can 👎 this comment if you'd use the other two xpath methods but not this one. Thanks.

aslushnikov pushed a commit that referenced this issue Dec 20, 2017
This patch adds xpath support with the following methods:
- page.xpath
- frame.xpath
- elementHandle.xpath

Fixes #537
@ctsstc
Copy link

ctsstc commented Apr 2, 2018

Has this not been added to the documentation yet? I see there's waitForXPath and $x

@aslushnikov
Copy link
Contributor

Has this not been added to the documentation yet? I see there's waitForXPath and $x

@ctsstc the page.xpath got renamed later into page.$x.

@5t4rdu5t
Copy link

+1

1 similar comment
@heyAyushh
Copy link

+1

@tindecken
Copy link

+1 for xpath supporting

@aslushnikov
Copy link
Contributor

As of Puppeteer v1.7.0, xpath is well-supported.
Query using xpath:

Wait for nodes using xpath:

@sputnick-dev
Copy link

It's not that usefull as casperjs's __utils__ functions :

https://github.com/casperjs/casperjs/blob/master/modules/clientutils.js#L590

__utils__.getElementsByXPath('//xpath/expression', 'optional node')

My 2 cents :)

@xprudhomme
Copy link
Contributor

Why is it not that useful as casperjs's functions?

page.$x(expression) does exactly the same thing. Puppeteer does even more, the waitForXPath feature is really useful :)

@sputnick-dev
Copy link

sputnick-dev commented Sep 7, 2018

A sample use case:

__utils__.getElementsByXPath('//xpath/expression')
.forEach((node) => {
    __utils__.getElementByXPath('//xpath/another_expression_on_specific_node', node)
})

How you do it on contextual node with page.$x ?

@aslushnikov
Copy link
Contributor

@sputnick-dev the page.$x returns an array of element handles. So you can do the following:

const handles = await page.$x('//xpath/expression');
for (const e of handles)
  await e.$x('//xpath/another_expression_on_specific_node');

@haikyuu
Copy link

haikyuu commented May 9, 2019

@aslushnikov i guess using count isn't supported

  const count = await currentPage.$x(`count(//*[./text()="${text}"])`);

I get this error

Error: Evaluation failed: TypeError: Failed to execute 'evaluate' on 'Document':
The result is not a node set, and therefore cannot be converted to the desired type.
          at __puppeteer_evaluation_script__:3:37

@subrato29
Copy link

I tried to use the below code to for waitForXpath(), but it didn't work out. Can someone tell me what wrong I am doing and pls provide code for waitForXpath(), if it has
async function waitForXpath(page,xpath) {
const elements = await page.$x(xpath);
await elements[0].waitForSelector()
}

@subrato29
Copy link

Tried with
await page.waitForXpath("//input[@value='Finish']", { timeout: 8000 }), but getting the below issue
TypeError: page.waitForXpath is not a function

@xprudhomme
Copy link
Contributor

xprudhomme commented Jun 18, 2019 via email

@subrato29
Copy link

@xprudhomme : Yes it worked. Thanks much

@subrato29
Copy link

I was trying to implement a select function in puppeteer using xpath.
async function selectOptionByText(xpath, text) {
await waitForXPath(xpath);
const elements = await page.$x(xpath);
await elements[0].select(text);
}
getting below error
TypeError: elements[0].select is not a function...any idea where I am doing mistake

@xprudhomme
Copy link
Contributor

@subrato29

The $x method returns a Promise which when resolved is an Array of ElementHandles.

There do not seem to be any "select" method on ElementHandle objects...

@subrato29
Copy link

okay, then how to I select a value from dropdown, can you please share the code if you have it, it would be really helpful for me.

@jevgenijusmarinuskinas
Copy link

Did anyone encounter an issue with using brackets? For example,

"(//div[@id='content'])"

In my case is failing with page.waitFor, while the same selector without the brackets works perfectly fine...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.