Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page.setContent should wait for resources to be loaded #728

Closed
aslushnikov opened this issue Sep 9, 2017 · 37 comments

Comments

Projects
None yet
@aslushnikov
Copy link
Contributor

commented Sep 9, 2017

(as mentioned in #486 and other places)

We need a way to wait for page to load all the resources after the page.setContent.

The lifecycle events might help help.

@aslushnikov

This comment has been minimized.

Copy link
Contributor Author

commented Oct 4, 2017

Meanwhile. a good workaround for page.setContent that waits for all the resources to load:

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });
@LeonineKing1199

This comment has been minimized.

Copy link

commented Oct 10, 2017

It's amazing that you posted the workaround. I'm running exactly into this issue, attempting to use puppeteer as a PDF generating service from HTML.

Thank you for filing a formal issue.

kimmobrunfeldt added a commit to alvarcarto/url-to-pdf-api that referenced this issue Oct 13, 2017

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Oct 23, 2017

refactor: migrate NavigatorWatcher to lifecycle events
This patch:
- migrates navigation watcher to use protocol-issued lifecycle events.
- removes `networkIdleTimeout` and `networkIdleInflight` options for
  `page.goto` method
- adds a new `networkidle0` value to the waitUntil option of navigation
  methods

References GoogleChrome#728.

BREAKING CHANGE:

As an implication of this new approach, the `networkIdleTimeout` and
`networkIdleInflight` options are no longer supported. Interested
clients should implement the behavior themselves using the `request` and
`response` events.

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Oct 23, 2017

refactor: migrate NavigatorWatcher to lifecycle events
This patch:
- migrates navigation watcher to use protocol-issued lifecycle events.
- removes `networkIdleTimeout` and `networkIdleInflight` options for
  `page.goto` method
- adds a new `networkidle0` value to the waitUntil option of navigation
  methods

References GoogleChrome#728.

BREAKING CHANGE:

As an implication of this new approach, the `networkIdleTimeout` and
`networkIdleInflight` options are no longer supported. Interested
clients should implement the behavior themselves using the `request` and
`response` events.

aslushnikov added a commit that referenced this issue Oct 24, 2017

refactor: migrate NavigatorWatcher to lifecycle events (#1141)
This patch:
- migrates navigation watcher to use protocol-issued lifecycle events.
- removes `networkIdleTimeout` and `networkIdleInflight` options for
  `page.goto` method
- adds a new `networkidle0` value to the waitUntil option of navigation
  methods

References #728.

BREAKING CHANGE:

As an implication of this new approach, the `networkIdleTimeout` and
`networkIdleInflight` options are no longer supported. Interested
clients should implement the behavior themselves using the `request` and
`response` events.

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Oct 24, 2017

feat(Page): teach Page.setContent to wait for resources to load
This patch adds "options" parameter to the `page.setContent` method. The
parameter is the same as a navigation parameter and allows to specify
maximum timeout to wait for resources to be loaded, as well as to
describe events that should be emitted before the setContent operation
would be considered successful.

Fixes GoogleChrome#728.

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Oct 24, 2017

feat(Page): teach Page.setContent to wait for resources to load
This patch adds "options" parameter to the `page.setContent` method. The
parameter is the same as a navigation parameter and allows to specify
maximum timeout to wait for resources to be loaded, as well as to
describe events that should be emitted before the setContent operation
would be considered successful.

Fixes GoogleChrome#728.

aslushnikov added a commit that referenced this issue Oct 24, 2017

feat(Page): teach Page.setContent to wait for resources to load (#1152)
This patch adds "options" parameter to the `page.setContent` method. The
parameter is the same as a navigation parameter and allows to specify
maximum timeout to wait for resources to be loaded, as well as to
describe events that should be emitted before the setContent operation
would be considered successful.

Fixes #728.

ithinkihaveacat added a commit to ithinkihaveacat/puppeteer that referenced this issue Oct 31, 2017

refactor: migrate NavigatorWatcher to lifecycle events (GoogleChrome#…
…1141)

This patch:
- migrates navigation watcher to use protocol-issued lifecycle events.
- removes `networkIdleTimeout` and `networkIdleInflight` options for
  `page.goto` method
- adds a new `networkidle0` value to the waitUntil option of navigation
  methods

References GoogleChrome#728.

BREAKING CHANGE:

As an implication of this new approach, the `networkIdleTimeout` and
`networkIdleInflight` options are no longer supported. Interested
clients should implement the behavior themselves using the `request` and
`response` events.

ithinkihaveacat added a commit to ithinkihaveacat/puppeteer that referenced this issue Oct 31, 2017

feat(Page): teach Page.setContent to wait for resources to load (Goog…
…leChrome#1152)

This patch adds "options" parameter to the `page.setContent` method. The
parameter is the same as a navigation parameter and allows to specify
maximum timeout to wait for resources to be loaded, as well as to
describe events that should be emitted before the setContent operation
would be considered successful.

Fixes GoogleChrome#728.

@aslushnikov aslushnikov reopened this Nov 8, 2017

@murilozilli

This comment has been minimized.

Copy link

commented Nov 22, 2017

I'm having trouble with this too on version 0.13-alpha with browser configs as:

"waitUntil": "networkidle2",
"timeout": 60000
@HanXHX

This comment has been minimized.

Copy link

commented Dec 7, 2017

I confirm the same trouble.

@Padam87

This comment has been minimized.

Copy link

commented Dec 13, 2017

with the latest release this hangs too:

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle' });

wait for networkidle0 instead

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });
@HanXHX

This comment has been minimized.

Copy link

commented Dec 15, 2017

Hey @aslushnikov

You said in #1312 to wait for https://chromium-review.googlesource.com/c/chromium/src/+/747805
The patch is merged...

Everything is OK to solve this issue?

Cheers!

@aslushnikov

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2018

@HanXHX this requires more work upstream: in order to reuse lifecycle events, page.setContent should initiate a navigation, which in turn should be plumbed through browser-side navigation aka "plznavigate".

@aslushnikov

This comment has been minimized.

Copy link
Contributor Author

commented Sep 23, 2018

As I've spotted your other trick that involves request interception today, I'm wondering if this would enable me to pass HTML bigger than 2Mo to chromium while having both features of networkIdle0 and custom events working, all of this without having to manage lifecycle of a newly created temporary file ?
Is there any limitations related to this trick I should be aware of before using it ?

@Mumeii I'm not aware about limitations; it should just work.

@Jackychans

This comment has been minimized.

Copy link

commented Oct 8, 2018

As I've spotted your other trick that involves request interception today, I'm wondering if this would enable me to pass HTML bigger than 2Mo to chromium while having both features of networkIdle0 and custom events working, all of this without having to manage lifecycle of a newly created temporary file ?

I'm running the same issue that handling another newly created file. I think this is not the best practice since you might run into the race condition when concurrent requests come at once, the hard disk keeps receiving many write requests.

I'm wondering if there is any workaround such as streaming files instead of saving file and use "file//" protocol

@tzieleniewski

This comment has been minimized.

Copy link

commented Oct 10, 2018

Hi Team!

Please advise at the WO is not working in our case. (puppeteer 1.9.0)
I am trying to convert the XHTML content. I am providing XHTML content as an excaped inlined string.
The generated document contains raw XHTML and there are no external resources requests (in this case CSS).

Example

'use strict'

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch()
    const page = await browser.newPage()
    await page.setRequestInterception(true)

    const xhtml = `<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset=utf-8"/> <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/> <link href="css.css" media="all" rel="stylesheet" type="text/css"/> </head> <body> some content </body> </html>`

    console.log(xhtml)

    page.on('request', request => {
        console.log(`Intercepted request with URL: ${request.url()}`)
        request.continue()
    });

    await page.goto(`data:text/html,${xhtml}`, {
        waitUntil: 'networkidle0'
    });
    await page.pdf({
        path: 'xhtml.pdf'
    })
    await browser.close()
})()

Here is the initial document content

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <meta charset=utf-8"/>
      <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
      <link href="css.css" media="all" rel="stylesheet" type="text/css"/>
   </head>
   <body>
      some content 
   </body>
</html>
@ObviouslyGreen

This comment has been minimized.

Copy link

commented Nov 16, 2018

Is there a way to set the url if we end up using the data:text/html workaround? Do relative paths for resources work via this method?

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Nov 16, 2018

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Nov 16, 2018

feat(chromium): roll Chromium to r608752
This roll includes:
- https://crrev.com/608658 - DevTools: emit "init" lifecycle event when document gets opened

References GoogleChrome#728

aslushnikov added a commit that referenced this issue Nov 16, 2018

feat(chromium): roll Chromium to r608752 (#3555)
This roll includes:
- https://crrev.com/608658 - DevTools: emit "init" lifecycle event when document gets opened

References #728

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Nov 16, 2018

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Nov 16, 2018

feat(page): support wait options for `page.setContent`
This patch teaches `page.setContent` to await resources in
the new document.

**NOTE**: This patch changes behavior: currently, `page.setContent`
awaits the `"domcontentloaded"` event; with this patch, we can now await
other lifecycle events, and switched default to the `"load"` event.

The change is justified since current behavior made `page.setContent`
unusable for its main designated usecases, pushing our client
to use [dataURL workaround](GoogleChrome#728 (comment)).

Fixes GoogleChrome#728
@kamekazemaster

This comment has been minimized.

Copy link

commented Nov 20, 2018

I had the exact same problem with external resources. So the workaround from @aslushnikov helped me a lot. But as @ObviouslyGreen points out it lacks the support of resolving relative paths. I investigated what puppeteer takes as "url" when using this workaround and it is the whole html (obviously).

I could solve the problem with relative paths (for me in CSS styles) with the following approach:

  1. create a folder (let's name it dist) in which all relative resources are placed in
  2. generate the html as needed (paths should be relative to the root of dist)
  3. write the html file to the root of dist
  4. use the following code to load the html with all the relative resources resolved correctly:
const pathToHtml = path.join(__dirname, 'dist', `${randomName}.html`);

const page = await browser.newPage();
await page.goto(`file:${pathToHtml}`, { waitUntil: 'networkidle0' });

Note that the html file needs to have the '.html' suffix for puppeteer to render the html properly (at least for me this was the case).

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Nov 20, 2018

aslushnikov added a commit that referenced this issue Nov 20, 2018

feat(page): support waitUntil option for `page.setContent` (#3557)
This patch teaches `page.setContent` to await resources in
the new document.

**NOTE**: This patch changes behavior: currently, `page.setContent`
awaits the `"domcontentloaded"` event; with this patch, we can now await
other lifecycle events, and switched default to the `"load"` event.

The change is justified since current behavior made `page.setContent`
unusable for its main designated usecases, pushing our client
to use [dataURL workaround](#728 (comment)).

Fixes #728
@kamekazemaster

This comment has been minimized.

Copy link

commented Nov 21, 2018

@aslushnikov Great to see the same options for page.setContent!

Is it possible now with page.setContentto load resources with relative paths as I described in my workaround in the above comment?

@aslushnikov

This comment has been minimized.

Copy link
Contributor Author

commented Nov 21, 2018

@kamekazemaster yeah, the paths should be resolved against the page's URL.

await page.goto('https://example.com');
// logo.png becomes https://example.com/logo.png
await page.setContent('<img src="/logo.png"></img>');
@tzieleniewski

This comment has been minimized.

Copy link

commented Nov 22, 2018

@aslushnikov when can we expect next release with updated setContent?

randytarampi added a commit to randytarampi/me that referenced this issue Dec 15, 2018

randytarampi added a commit to randytarampi/resume-cli that referenced this issue Dec 15, 2018

fix(export): Adjust `createPdf` for the `puppeteer^1.11.0` API.
Per GoogleChrome/puppeteer#728 and https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetcontenthtml-options.

The latest `puppeteer` seems to have broken PDF generation (with the [`page.goto`](GoogleChrome/puppeteer#728 (comment)) workaround), at least locally for me both here in `resume-cli` and in [`jsonresume-theme-randytarampi`](https://www.npmjs.com/package/jsonresume-theme-randytarampi).

LarenDorr added a commit to LarenDorr/puppeteer that referenced this issue Mar 19, 2019

feat(chromium): roll Chromium to r608752 (GoogleChrome#3555)
This roll includes:
- https://crrev.com/608658 - DevTools: emit "init" lifecycle event when document gets opened

References GoogleChrome#728

LarenDorr added a commit to LarenDorr/puppeteer that referenced this issue Mar 19, 2019

feat(page): support waitUntil option for `page.setContent` (GoogleChr…
…ome#3557)

This patch teaches `page.setContent` to await resources in
the new document.

**NOTE**: This patch changes behavior: currently, `page.setContent`
awaits the `"domcontentloaded"` event; with this patch, we can now await
other lifecycle events, and switched default to the `"load"` event.

The change is justified since current behavior made `page.setContent`
unusable for its main designated usecases, pushing our client
to use [dataURL workaround](GoogleChrome#728 (comment)).

Fixes GoogleChrome#728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.