Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppeteer is stripping £ from element content. #1704

Closed
jackfranklin opened this issue Jan 2, 2018 · 5 comments
Closed

Puppeteer is stripping £ from element content. #1704

jackfranklin opened this issue Jan 2, 2018 · 5 comments

Comments

@jackfranklin
Copy link
Collaborator

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: 1.0.0-rc (but also seen this in 0.13)
  • Platform / OS version: Mac OS X Sierra (and also Docker w/ Debian Jessie)
  • URLs (if applicable):
  • Node.js version: 8

What steps will reproduce the problem?

Please include code that reproduces the issue.

We have some HTML that looks like so:

<span itemprop="price" class="js-item-price">£20</span>

And we are accessing the contents of that element in Puppeteer with:

const element = await page.$('.js-item-price')
const text = (await element.getProperty('textContent')).jsonValue()

And sometimes (probably about 50% of the time), we get back the string "21". Other times we get back "£21" as expected.

I've tried a variety of ways to get at this value via the Puppeteer API, including also doing page.$eval('.js-item-price', x => x.textContent) but they have all had this problem.

I'm not really sure if it's something I'm doing wrong in the test or a bug in Puppeteer but I wanted to report it in case anyone has any thoughts.

@JoelEinbinder
Copy link
Collaborator

I wasn't able to duplicate that with this script:

const p = require('puppeteer');
p.launch().then(async browser => {
  for (let i = 0; i < 100; i++) {
    const page = await browser.newPage();
    await page.setContent(`<span itemprop="price" class="js-item-price">£20</span>`);
    const element = await page.$('.js-item-price');
    const text = await (await element.getProperty('textContent')).jsonValue();
    console.log(text);
    await page.close();
  }
  await browser.close();
});

@yujiosaka
Copy link

Me neither.

@jackfranklin
Can you provide an actual URL? Only one possibility I came up with is that £ is added dynamically by JavaScript.
When the JavaScript is executed before puppeteer extract element, £ may look stripped.

@rustkt
Copy link

rustkt commented Jan 3, 2018

AFAIK, the frontend developers is used to place price unit into ::before selector as a pseudo element:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta http-equiv="X-UA-Compatible" content="ie=edge">
  <title>Document</title>
  <style>
    .js-item-price::before {
      content: "£"
    }
  </style>
</head>
<body>
  <div class="js-item-price">20</div>
</body>
</html>

So it is break away from html content, in this case, we cannot capture the price unit in div element's innerHTML property.

@aslushnikov
Copy link
Contributor

Thanks everybody for the investigation; if @indexmotion's suggestion is right, then this works as intended.
Closing this since noone can reproduce otherwise.

@jackfranklin
Copy link
Collaborator Author

Thanks everyone for taking the time to look into this; sorry I didn't reply sooner.

We are updating the span entirely with JS but that wouldn't explain why it sometimes reads '20' rather than '£20', I would expect it to read either all or nothing. We're not using the CSS trick (although that's a neat idea!).

I'll keep doing some digging and report back if I find anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants