Handle XPath selectors #537

leeroybrun · 2017-08-25T08:18:32Z

Is it planned/wanted to implement XPath selectors?

This can be done quite easily, but I don't know if it's a choice to not handle them?

I've implemented two methods in my own code to handle them :

async waitForXpath(selector, options = { polling: 'mutation' }) {
  return this.waitForFunction(selector => {
    return null !== document.evaluate(selector, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
  }, options, selector);
}

async $XPath(selector) {
  const remoteObject = await this._rawEvaluate(selector => {
    return document.evaluate(selector, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
  }, selector);

  if (remoteObject.subtype === 'node')
    return new ElementHandle(this._client, remoteObject, this._mouse);

  await utils.releaseObject(this._client, remoteObject);
  return null;
}

I can make a PR if you're interested.

aslushnikov · 2017-08-25T17:47:04Z

There have been no plans to add xpath since it hasn't been requested much. However, the #382 would make it possible to implement polyfils atop of Puppeteer API in just a few lines of code - would it be good enough for you?

leeroybrun · 2017-08-28T08:51:50Z

XPath can be really useful and more powerful in some situations (selecting an element by it's text content is just one of the many use cases).

But the #382 could work, XPath will just not be "out-of-the-box" and users will still have to implement it themselves, I guess?

xprudhomme · 2017-09-06T09:21:30Z

+1 , I , would like too XPath selectors to be handled.

XPath is so much more powerful than CSS selectors when it comes to query the DOM. As it has previously been said, it's a pain not being able to select an element based on its text content (which often a lot), or to select nodes based on ancestors predicates ( for instant //*[@Class='whateverclass'][not(ancestor::div[@id='abcdef'])] )

Home-made methods have to be implemented to handle XPaths, it does the job, but I'm pretty sure we are not the only ones with this need.

kensoh · 2017-09-06T10:05:32Z

XPath is really expressive, for example and, or, contains can be used to form very specific selection criteria hard to write or not possible with css selectors. However it is not straightforward to implement, and maybe considering after initial stable release is better. To reduce the additional work, fixes, updates prior to that, etc.

I'll highlight 2 examples why @leeroybrun implementation after being implemented, will not be the end of the story and will open more fixes / iterations needed, as users start to try out and report failing use cases with XPath.

The document context (in bold) will not work, I believe, for handling elements in a frame. That document context would have to be changed to something like frame_element.contentDocument to work on elements within the frame. document.evaluate(selector, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue
Also, for handling multiple elements to be returned to iterate through the elements, something like XPathResult.ORDERED_NODE_SNAPSHOT_TYPE may have to be used in place of XPathResult.FIRST_ORDERED_NODE_TYPE

I personally prefer XPath anytime over CSS selectors, however there are much more use cases it offers than can be easily implemented. Leeroy's implementation could be a great first iteration but I believe there'll be more work to come to fully let XPath implementation work across Puppeteer's API.

leeroybrun · 2017-09-07T12:57:49Z

My implementation was really simple, and only tested on one of my personnal projects.
The goal was mainly to open the discussion and see if it was something needed by others too.

Regarding the point 2, the $XPath function was the equivalent of the $ one in the API (selecting only one element).
I've made a separate $$XPath function (equivalent of the $$ one) using the ORDERED_NODE_SNAPSHOT_TYPE and returning multiple elements.

It's basically a copy of the $$ function.
Not a very beautiful way to do it, but it works (again, in my use case).
If it was implemented natively, we could abstract some parts and avoid the code duplication between $$/$XPath, and $/$XPath.

Anyway, here it is, in case it would be useful to someone else :

  async $$XPath(selector) {
    const remoteObject = await this._rawEvaluate(selector => {
      let results = [];
      let query = document.evaluate(selector, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
      for (let i=0, length = query.snapshotLength; i < length; ++i) {
        results.push(query.snapshotItem(i));
      }
      return results;
    }, selector);
    const response = await this._client.send('Runtime.getProperties', {
      objectId: remoteObject.objectId,
      ownProperties: true
    });
    const properties = response.result;
    const result = [];
    const releasePromises = [utils.releaseObject(this._client, remoteObject)];
    for (const property of properties) {
      if (property.enumerable && property.value.subtype === 'node')
        result.push(new ElementHandle(this._client, property.value, this._mouse));
      else
        releasePromises.push(utils.releaseObject(this._client, property.value));
    }
    await Promise.all(releasePromises);
    return result;
  }

In my opinion, having XPath implemented in the puppeteer API will be a big advantage.
But I understand your position and wanting to have a stable release before adding more features.

kaushiksundar · 2017-09-10T06:08:10Z

+1 for XPath

mdeora · 2017-09-11T14:27:36Z

+1 for XPath

matthewlilley · 2017-09-26T10:54:56Z

+1

sradu · 2017-10-07T00:50:52Z

@aslushnikov Now that JSHandles are merged, any advice on the best way to implement xpath?

I'm especially wondering about the best way to use it with waitForSelector, where visible is incredibly helpful.

aslushnikov · 2017-10-07T08:10:53Z

@sradu the following script works fine for me:

async function xpath(page, path) {
  const resultsHandle = await page.evaluateHandle(path => {
    let results = [];
    let query = document.evaluate(path, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
    for (let i=0, length = query.snapshotLength; i < length; ++i) {
      results.push(query.snapshotItem(i));
    }
    return results;
  }, path);
  const properties = await resultsHandle.getProperties();
  const result = [];
  const releasePromises = [];
  for (const property of properties.values()) {
    const element = property.asElement();
    if (element)
      result.push(element);
    else
      releasePromises.push(property.dispose());
  }
  await Promise.all(releasePromises);
  return result;
}

const pptr = require('puppeteer');
(async () => {
  const browser = await pptr.launch();
  const page = await browser.newPage();
  await page.setContent('<div>hello!</div><div>other!</div>');
  const [handle1, handle2] = await xpath(page, '//div');
  console.log(await page.evaluate(e => e.textContent, handle1));
  console.log(await page.evaluate(e => e.textContent, handle2));
  await page.close();
  await browser.close();
})();

bootstraponline · 2017-10-08T03:46:50Z

XPath support would be awesome.

jpap · 2017-10-10T22:14:58Z

+1 to this. I'd also love to see a function that waits for an XPath to "disappear", i.e. XPath not valid any longer, e.g. a popup is closed.

huan · 2017-10-18T19:16:48Z

+1

geekkun · 2017-10-27T11:10:01Z

+1

aslushnikov · 2017-10-31T06:54:01Z

If someone comes up with a puppeteer-xpath package, we'd be happy to point to it in our docs.

xprudhomme · 2017-10-31T09:00:26Z

@jpap I've developed a whole bunch of XPath methods, such as this one which might help you (similar to the Frame.waitForSelector method):

      /**
       * Wait for the XPath selector to appear in page. If at the moment of calling the method the selector already exists, 
       * The method will return immediately. If the selector doesn't appear after the timeout milliseconds of waiting, the function will throw.
       * @param {String}  An XPath selector of an element to wait for
       * @param {Object} Optional waiting parameters. (Same as in waitForSelector method)
       * @return {Promise} Promise which resolves when element specified by XPath string is added to DOM.
       */
      async waitForXpath(selector, options = {}) { // , options = { polling: 'mutation' }

        const timeout = options.timeout || 30000;
        const waitForVisible = !!options.visible;
        const waitForHidden = !!options.hidden;
        const polling = waitForVisible || waitForHidden ? 'raf' : 'mutation';

        console.log(" [waitForXpath] About to wait for element: " + selector);

        return this.page.waitForFunction(predicate, {timeout, polling}, selector, waitForVisible, waitForHidden);

        /**
         * @param {string} XPath selector
         * @param {boolean} waitForVisible
         * @param {boolean} waitForHidden
         * @return {boolean}
         */
        function predicate(selector, waitForVisible, waitForHidden) {

          const node = document.evaluate(selector, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;

          if (!node)
            return waitForHidden;

          if (!waitForVisible && !waitForHidden)
            return true;

          const style = window.getComputedStyle(node);
          const isVisible = style && style.display !== 'none' && style.visibility !== 'hidden';

          return (waitForVisible === isVisible || waitForHidden === !isVisible);
        }
      }

Mukeshcse31 · 2017-11-02T00:46:31Z

+1

divyamvn · 2017-11-07T19:10:52Z

+1

TildeWill · 2017-11-14T17:59:06Z

+1 we also implemented our own XPath selector for text as the first thing after installing Puppeteer

stackflows · 2017-11-29T12:20:46Z

+1 We, too, are in dire need of XPath in puppeteer, as we are dealing with a so-called "web application" whose elements are - for the vast majority - not addressable by CSS-selectors.
Only XPath works.

DavertMik · 2017-12-18T01:16:31Z

At CodeceptJS we implemented XPath locators support for Puppeteer
CodeceptJS is a high-level testing framework and uses Puppeteer as one of the supported backends. So you can try to use it as the sane abstraction over the current API.

Anyway, native XPath support from Puppeteer would be appreciated )

paulirish · 2017-12-19T00:53:46Z

#1620 is proposed and introduces:

page.xpath()
frame.xpath()
elementHandle.xpath()

⬇️ Mini-poll for the puppeteer community: ⬇️

Please 👍 this comment if you would use elementHandle.xpath(). Or you can 👎 this comment if you'd use the other two xpath methods but not this one. Thanks.

This patch adds xpath support with the following methods: - page.xpath - frame.xpath - elementHandle.xpath Fixes #537

ctsstc · 2018-04-02T23:05:32Z

Has this not been added to the documentation yet? I see there's waitForXPath and $x

aslushnikov · 2018-04-03T03:44:29Z

Has this not been added to the documentation yet? I see there's waitForXPath and $x

@ctsstc the page.xpath got renamed later into page.$x.

5t4rdu5t · 2018-05-16T08:45:43Z

+1

heyAyushh · 2018-07-05T16:09:22Z

+1

tindecken · 2018-09-04T14:40:53Z

+1 for xpath supporting

aslushnikov · 2018-09-04T15:50:37Z

As of Puppeteer v1.7.0, xpath is well-supported.
Query using xpath:

Wait for nodes using xpath:

sputnick-dev · 2018-09-05T22:36:58Z

It's not that usefull as casperjs's __utils__ functions :

https://github.com/casperjs/casperjs/blob/master/modules/clientutils.js#L590

__utils__.getElementsByXPath('//xpath/expression', 'optional node')

My 2 cents :)

xprudhomme · 2018-09-06T08:50:51Z

Why is it not that useful as casperjs's functions?

page.$x(expression) does exactly the same thing. Puppeteer does even more, the waitForXPath feature is really useful :)

sputnick-dev · 2018-09-07T19:46:41Z

A sample use case:

__utils__.getElementsByXPath('//xpath/expression')
.forEach((node) => {
    __utils__.getElementByXPath('//xpath/another_expression_on_specific_node', node)
})

How you do it on contextual node with page.$x ?

aslushnikov · 2018-09-07T20:40:57Z

@sputnick-dev the page.$x returns an array of element handles. So you can do the following:

const handles = await page.$x('//xpath/expression');
for (const e of handles)
  await e.$x('//xpath/another_expression_on_specific_node');

haikyuu · 2019-05-09T13:45:50Z

@aslushnikov i guess using count isn't supported

  const count = await currentPage.$x(`count(//*[./text()="${text}"])`);

I get this error

Error: Evaluation failed: TypeError: Failed to execute 'evaluate' on 'Document':
The result is not a node set, and therefore cannot be converted to the desired type.
          at __puppeteer_evaluation_script__:3:37

subrato29 · 2019-06-18T19:54:36Z

I tried to use the below code to for waitForXpath(), but it didn't work out. Can someone tell me what wrong I am doing and pls provide code for waitForXpath(), if it has
async function waitForXpath(page,xpath) {
const elements = await page.$x(xpath);
await elements[0].waitForSelector()
}

subrato29 · 2019-06-18T20:20:25Z

Tried with
await page.waitForXpath("//input[@value='Finish']", { timeout: 8000 }), but getting the below issue
TypeError: page.waitForXpath is not a function

xprudhomme · 2019-06-18T22:04:35Z

You are missing the right syntax: waitForXPath , with valid case, not waitForXpath

…

On Tue, 18 Jun 2019, 22:21 Subrato Sarkar, ***@***.***> wrote: Tried with await ***@***.*** <https://github.com/value>='Finish']", { timeout: 8000 }), but getting the below issue TypeError: page.waitForXpath is not a function — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#537>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACQQEDJCKAXKARWD6ISCF3P3E7VDANCNFSM4DYJNY5Q> .

subrato29 · 2019-06-19T02:21:43Z

@xprudhomme : Yes it worked. Thanks much

subrato29 · 2019-06-19T16:26:56Z

I was trying to implement a select function in puppeteer using xpath.
async function selectOptionByText(xpath, text) {
await waitForXPath(xpath);
const elements = await page.$x(xpath);
await elements[0].select(text);
}
getting below error
TypeError: elements[0].select is not a function...any idea where I am doing mistake

xprudhomme · 2019-06-19T17:20:29Z

@subrato29

The $x method returns a Promise which when resolved is an Array of ElementHandles.

There do not seem to be any "select" method on ElementHandle objects...

subrato29 · 2019-06-19T17:41:09Z

okay, then how to I select a value from dropdown, can you please share the code if you have it, it would be really helpful for me.

jevgenijusmarinuskinas · 2020-10-28T09:28:49Z

Did anyone encounter an issue with using brackets? For example,

"(//div[@id='content'])"

In my case is failing with page.waitFor, while the same selector without the brackets works perfectly fine...

aslushnikov mentioned this issue Oct 7, 2017

help me to $XPath() #969

Closed

gajus mentioned this issue Oct 20, 2017

How to return ElementHandle from page.evaluate? #1108

Closed

cboulanger mentioned this issue Nov 11, 2017

Universal IDs for widgets and other objects qooxdoo/qooxdoo#9422

Closed

aslushnikov assigned JoelEinbinder Dec 18, 2017

JoelEinbinder mentioned this issue Dec 19, 2017

feat: add page.xpath #1620

Merged

aslushnikov closed this as completed in #1620 Dec 20, 2017

aslushnikov pushed a commit that referenced this issue Dec 20, 2017

feat: add page.xpath (#1620)

60ba8c3

This patch adds xpath support with the following methods: - page.xpath - frame.xpath - elementHandle.xpath Fixes #537

cybairfly mentioned this issue Nov 5, 2019

Create generic function to perform login to page using Puppeteer apify/crawlee#230

Open

Handle XPath selectors #537

Handle XPath selectors #537

Comments

leeroybrun commented Aug 25, 2017 • edited Loading

aslushnikov commented Aug 25, 2017

leeroybrun commented Aug 28, 2017

xprudhomme commented Sep 6, 2017

kensoh commented Sep 6, 2017

leeroybrun commented Sep 7, 2017 • edited Loading

kaushiksundar commented Sep 10, 2017

mdeora commented Sep 11, 2017

matthewlilley commented Sep 26, 2017

sradu commented Oct 7, 2017

aslushnikov commented Oct 7, 2017

bootstraponline commented Oct 8, 2017

jpap commented Oct 10, 2017

huan commented Oct 18, 2017

geekkun commented Oct 27, 2017

aslushnikov commented Oct 31, 2017

xprudhomme commented Oct 31, 2017 • edited Loading

Mukeshcse31 commented Nov 2, 2017

divyamvn commented Nov 7, 2017

TildeWill commented Nov 14, 2017

stackflows commented Nov 29, 2017

DavertMik commented Dec 18, 2017 • edited Loading

paulirish commented Dec 19, 2017

ctsstc commented Apr 2, 2018

aslushnikov commented Apr 3, 2018

5t4rdu5t commented May 16, 2018

heyAyushh commented Jul 5, 2018

tindecken commented Sep 4, 2018

aslushnikov commented Sep 4, 2018

sputnick-dev commented Sep 5, 2018

xprudhomme commented Sep 6, 2018

sputnick-dev commented Sep 7, 2018 • edited Loading

aslushnikov commented Sep 7, 2018

haikyuu commented May 9, 2019

subrato29 commented Jun 18, 2019

subrato29 commented Jun 18, 2019

xprudhomme commented Jun 18, 2019 via email

subrato29 commented Jun 19, 2019

subrato29 commented Jun 19, 2019

xprudhomme commented Jun 19, 2019

subrato29 commented Jun 19, 2019

jevgenijusmarinuskinas commented Oct 28, 2020

leeroybrun commented Aug 25, 2017 •

edited

Loading

leeroybrun commented Sep 7, 2017 •

edited

Loading

xprudhomme commented Oct 31, 2017 •

edited

Loading

DavertMik commented Dec 18, 2017 •

edited

Loading

sputnick-dev commented Sep 7, 2018 •

edited

Loading