New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppeteer is stripping `£` from element content. #1704

Closed
jackfranklin opened this Issue Jan 2, 2018 · 5 comments

Comments

Projects
None yet
5 participants
@jackfranklin

jackfranklin commented Jan 2, 2018

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: 1.0.0-rc (but also seen this in 0.13)
  • Platform / OS version: Mac OS X Sierra (and also Docker w/ Debian Jessie)
  • URLs (if applicable):
  • Node.js version: 8

What steps will reproduce the problem?

Please include code that reproduces the issue.

We have some HTML that looks like so:

<span itemprop="price" class="js-item-price">£20</span>

And we are accessing the contents of that element in Puppeteer with:

const element = await page.$('.js-item-price')
const text = (await element.getProperty('textContent')).jsonValue()

And sometimes (probably about 50% of the time), we get back the string "21". Other times we get back "£21" as expected.

I've tried a variety of ways to get at this value via the Puppeteer API, including also doing page.$eval('.js-item-price', x => x.textContent) but they have all had this problem.

I'm not really sure if it's something I'm doing wrong in the test or a bug in Puppeteer but I wanted to report it in case anyone has any thoughts.

@JoelEinbinder

This comment has been minimized.

Show comment
Hide comment
@JoelEinbinder

JoelEinbinder Jan 2, 2018

Collaborator

I wasn't able to duplicate that with this script:

const p = require('puppeteer');
p.launch().then(async browser => {
  for (let i = 0; i < 100; i++) {
    const page = await browser.newPage();
    await page.setContent(`<span itemprop="price" class="js-item-price">£20</span>`);
    const element = await page.$('.js-item-price');
    const text = await (await element.getProperty('textContent')).jsonValue();
    console.log(text);
    await page.close();
  }
  await browser.close();
});
Collaborator

JoelEinbinder commented Jan 2, 2018

I wasn't able to duplicate that with this script:

const p = require('puppeteer');
p.launch().then(async browser => {
  for (let i = 0; i < 100; i++) {
    const page = await browser.newPage();
    await page.setContent(`<span itemprop="price" class="js-item-price">£20</span>`);
    const element = await page.$('.js-item-price');
    const text = await (await element.getProperty('textContent')).jsonValue();
    console.log(text);
    await page.close();
  }
  await browser.close();
});
@yujiosaka

This comment has been minimized.

Show comment
Hide comment
@yujiosaka

yujiosaka Jan 2, 2018

Contributor

Me neither.

@jackfranklin
Can you provide an actual URL? Only one possibility I came up with is that £ is added dynamically by JavaScript.
When the JavaScript is executed before puppeteer extract element, £ may look stripped.

Contributor

yujiosaka commented Jan 2, 2018

Me neither.

@jackfranklin
Can you provide an actual URL? Only one possibility I came up with is that £ is added dynamically by JavaScript.
When the JavaScript is executed before puppeteer extract element, £ may look stripped.

@indexmotion

This comment has been minimized.

Show comment
Hide comment
@indexmotion

indexmotion Jan 3, 2018

AFAIK, the frontend developers is used to place price unit into ::before selector as a pseudo element:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta http-equiv="X-UA-Compatible" content="ie=edge">
  <title>Document</title>
  <style>
    .js-item-price::before {
      content: "£"
    }
  </style>
</head>
<body>
  <div class="js-item-price">20</div>
</body>
</html>

So it is break away from html content, in this case, we cannot capture the price unit in div element's innerHTML property.

indexmotion commented Jan 3, 2018

AFAIK, the frontend developers is used to place price unit into ::before selector as a pseudo element:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta http-equiv="X-UA-Compatible" content="ie=edge">
  <title>Document</title>
  <style>
    .js-item-price::before {
      content: "£"
    }
  </style>
</head>
<body>
  <div class="js-item-price">20</div>
</body>
</html>

So it is break away from html content, in this case, we cannot capture the price unit in div element's innerHTML property.

@aslushnikov

This comment has been minimized.

Show comment
Hide comment
@aslushnikov

aslushnikov Jan 4, 2018

Contributor

Thanks everybody for the investigation; if @indexmotion's suggestion is right, then this works as intended.
Closing this since noone can reproduce otherwise.

Contributor

aslushnikov commented Jan 4, 2018

Thanks everybody for the investigation; if @indexmotion's suggestion is right, then this works as intended.
Closing this since noone can reproduce otherwise.

@aslushnikov aslushnikov closed this Jan 4, 2018

@jackfranklin

This comment has been minimized.

Show comment
Hide comment
@jackfranklin

jackfranklin Jan 5, 2018

Thanks everyone for taking the time to look into this; sorry I didn't reply sooner.

We are updating the span entirely with JS but that wouldn't explain why it sometimes reads '20' rather than '£20', I would expect it to read either all or nothing. We're not using the CSS trick (although that's a neat idea!).

I'll keep doing some digging and report back if I find anything.

jackfranklin commented Jan 5, 2018

Thanks everyone for taking the time to look into this; sorry I didn't reply sooner.

We are updating the span entirely with JS but that wouldn't explain why it sometimes reads '20' rather than '£20', I would expect it to read either all or nothing. We're not using the CSS trick (although that's a neat idea!).

I'll keep doing some digging and report back if I find anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment