New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to wait until all images completed loading? #338

Closed
petehouston opened this Issue Aug 17, 2017 · 15 comments

Comments

Projects
None yet
@petehouston

petehouston commented Aug 17, 2017

I'm trying to take a full-page screenshot for a website, that contains lots of images. I write below scripts:

const puppeteer = require('puppeteer');

(async() => {
	const browser = await puppeteer.launch();

	const page = await browser.newPage();

	await page.goto('https://blog.petehouston.com', {
		waitUntil: 'networkidle'
	});

	await page.waitForSelector('img');

	await page.screenshot({
		path: 'capture.jpg',
		fullPage: true
	});

	browser.close();
})();

However, the result screenshot capture.jpg shows that puppeteer hasn't completed loading all images on the site.

Is there any way to wait for all images img completed loading?

@Garbee

This comment has been minimized.

Contributor

Garbee commented Aug 17, 2017

They are downloaded. But they are lazily shown as the viewport is scrolled into position. So you'd need to scroll the page (ref #305 for that information) to get the images to get shown by the page itself.

You can turn headless off (headless property on the options for launch()) to see what is happening and turn off the browser close to keep it open for manual inspection.

@aslushnikov aslushnikov added the bug label Aug 18, 2017

@pavelfeldman pavelfeldman added feature P1 and removed bug labels Aug 18, 2017

@ks07

This comment has been minimized.

ks07 commented Aug 25, 2017

As the title stands I agree that this is a feature request. However, surely this is a bug in the full page screenshot functionality if it captures elements before they have been rendered by the browser?

@Garbee

This comment has been minimized.

Contributor

Garbee commented Aug 25, 2017

However, surely this is a bug in the full page screenshot functionality if it captures elements before they have been rendered by the browser?

There is no "bug". The browser is doing exactly what it is told to do by the page. Then the screenshot is being taken without taking into account lazy loading. Since the page is looking for a scroll of the images into view before rendering them. So, if you don't scroll before taking the screenshot, they don't render.

IMO this is actually working exactly as intended since it is taking a screenshot of the page in its current state as requested.

@linpekka

This comment has been minimized.

linpekka commented Aug 30, 2017

I also ran into this, adding a scroll evalution as suggested in #305 did not help at all. Im open for other ideas, but it seems like an event when lazy render is done would be sweet, or even better, an option on screenshot to w8 for lazy render.

´page.evaluate(_ => {
window.scrollBy(0, window.innerHeight);
});
´

@Sinequanonh

This comment has been minimized.

Sinequanonh commented Sep 30, 2017

Some lazy contents only appear after a few seconds with no scroll down. It'd be nice to have an option to wait for x seconds before taking the screenshot

@jwillingham789

This comment has been minimized.

jwillingham789 commented Jan 3, 2018

@Sinequanonh

const puppeteer = require('puppeteer');

async function timeout(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

(async() => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('url');
    await timeout(5000);
    await page.screenshot({
	path: 'example.jpg',
	fullPage: true
    });
    browser.close();
})();

You can add a timeout just after going to the page if you want to wait a set amount of time before taking the screenshot

@shunwen

This comment has been minimized.

shunwen commented Feb 7, 2018

I apply both scroll and timeout to solve my problem; otherwise, my lengthy pages are still not captured correctly:

    await page.evaluate(() => {
      window.scrollBy(0, window.innerHeight);
    })
    await timeout(5000)
@jacobACN

This comment has been minimized.

jacobACN commented Mar 5, 2018

Is there no way of doing this without a set timeout? If it takes 1 s to display the image, then this takes 4 s longer than necessary.

@Worie

This comment has been minimized.

Worie commented Mar 22, 2018

I'm struggling with it for few days, too.
I manually set the html content, css and js and I didn't find a good way to capture the moment when all assets are loaded - page.waitForNavigation seems to be working only for real navigating between pages. The only thing that somehow works is await page.waitFor(10000); which is far from optimal :/

@jacobACN

This comment has been minimized.

jacobACN commented Mar 22, 2018

I ended up with the following:

const html = <html> All the html here </html>
const page: Page = await browser.newPage();
await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });
await page.addStyleTag({ path: stylesheet });
await page.evaluate(() => { window.scrollBy(0, window.innerHeight); })
const result = await page.pdf({ format: 'A4' });

This is because I have my HTML as a string variable. I still use goto() because otherwise I can't wait for the network to be idle. I also scroll to the bottom of the page to load all content.

@Worie

This comment has been minimized.

Worie commented Mar 22, 2018

Big thanks @jacobACN , I haven't thought of this!

@hanvyj

This comment has been minimized.

hanvyj commented Apr 1, 2018

Great solution! I'm also loading things from a string. Seems like setting the content should behave the same way to me. I just assumed networkidle0 was broken.

@omeneses

This comment has been minimized.

omeneses commented Jun 18, 2018

I solved this, editing the onReady.js file located at /engine_scripts/puppet/

The final .js file looks like this:

module.exports = async (page, scenario, vp)=>{
function wait (ms) {
return new Promise(resolve => setTimeout(() => resolve(), ms));
}

await page.goto(scenario.url, {waitUntil: 'load'});
// Get the height of the rendered page
const bodyHandle = await page.$('body');
const { height } = await bodyHandle.boundingBox();
await bodyHandle.dispose();

// Scroll one viewport at a time, pausing to let content load
const viewportHeight = page.viewport().height;
let viewportIncr = 0;
while (viewportIncr + viewportHeight < height) {
await page.evaluate(_viewportHeight => {
window.scrollBy(0, _viewportHeight);
}, viewportHeight);
await wait(2000);
viewportIncr = viewportIncr + viewportHeight;
}

// Scroll back to top
await page.evaluate(_ => {
window.scrollTo(0, 0);
});

// Some extra delay to let images load
await wait(2000);
};

I found this solution here: https://www.screenshotbin.com/blog/handling-lazy-loaded-webpages-puppeteer .

@chenxiaochun

This comment has been minimized.

chenxiaochun commented Jul 13, 2018

This is my solution, you can try it: chenxiaochun/blog#35

@aslushnikov

This comment has been minimized.

Contributor

aslushnikov commented Sep 5, 2018

The key take-aways from the discussion above:

  • looks like page.goto with networkidle0 does the trick
  • you can use page.goto to navigate to HTML content using data URLs. This is the workaround for #728
  • you'd need to have some logic to scroll page and initiate lazy content loading if there's any

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment