Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fullPage screenshot duplicates page (doubles/tripples page length) #1576

Open
grantstephens opened this issue Dec 11, 2017 · 34 comments
Open
Labels
bug chromium Issues with Puppeteer-Chromium confirmed P3 upstream

Comments

@grantstephens
Copy link

grantstephens commented Dec 11, 2017

So I'm having a weird problem with certain websites that I am trying to screenshoot. Essentially the page is shot and then replicated a number of times down in the png to make a really long screenshot that contains all these replications. Its like somebody copy pasted the page a couple of times onto the bottom of the original.

I have not been able to figure out which sites cause it, but the example below is an example- I can supply more if needed.

Steps to reproduce

Tell us about your environment:

What steps will reproduce the problem?

Sample code with fullPage=True, i.e:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://bonobos.com/shop/tops');
  await page.screenshot({path: 'example.png', fullPage: true});

  await browser.close();
})();

What is the expected result?
A pull page screenshot- not just the visible part

What happens instead?
A very long image that contains multiple copies of the full page screenshot.

example
Don't think the example will be visible, but worth a shot.

@hbakhtiyor
Copy link

i've the same issue, why nobody react of the issue since 9 days passed

@hbakhtiyor
Copy link

hbakhtiyor commented Dec 20, 2017

also it's happening for not very long websites, with using --use-gl=swiftshader option.

chrom version: 65.0.3294.5 dev.

screenshot

@aslushnikov aslushnikov added the bug label Jan 4, 2018
@grantstephens
Copy link
Author

So after some more playing around with this I have found that the problem seems to lie with fullpage: true. If you set the viewport to the size you want to take a picture of and then take the screenshot with fullpage off it seems to work, but only if the page is less than a certain length (10000 seems to work- haven't checked the exact breaking point).

@Ryuurock
Copy link

When can you fix this problem, please

@Flamefork
Copy link

+1

1 similar comment
@murilozilli
Copy link

+1

@luoyjx
Copy link

luoyjx commented Feb 11, 2018

The resolution of these situations is split into pieces with height less than 16000px ,then merge

@kosaa
Copy link

kosaa commented Feb 15, 2018

+1

@joelgriffith
Copy link
Contributor

I've got some inspiration here on how to split and merge (cuts a site in half vertically and rejoins): https://gist.github.com/joelgriffith/a9b2d72c0672fd3170bd9ba33cf17f37. There appears to be a finite limit of ~10mb, which might be a WebSocket issue, however "chunking" the image and re-composing is likely the best thing to do right now.

@haeky
Copy link

haeky commented Feb 28, 2018

I wonder if this is related to the issue I just opened #2123

@zxy198717
Copy link

@aslushnikov Is there a plan to fix it?

@jiajunli
Copy link

How can we solve this problem in view of the height of uncertain web pages?

@Ryuurock
Copy link

@jiajunli

 const dpr = page.viewport().deviceScaleFactor || 1;
  const maxScreenshotHeight = Math.floor( bugMaxHeight / dpr );
  const imgArr = [];
  // 小于16 * 1024像素高的图片直接截图
  if ( contentSize.height < maxScreenshotHeight ) {
    // 防止意外发生未关闭标签页造成内存爆炸
    let timeoutID = setTimeout( () => page.close(), 2e4 )

    return page.screenshot( {
      fullPage: true
    } ).then( buffer => ( clearTimeout( timeoutID ), page.close(), buffer ) );
  }
  // 大于16 * 1024高度的图片循环截图 放在系统提供的缓存里
  for ( let ypos = 0; ypos < contentSize.height; ypos += maxScreenshotHeight ) {
    const height = Math.min( contentSize.height - ypos, maxScreenshotHeight );
    const tmpName = tmp.tmpNameSync();
    fs.writeFileSync( tmpName, await page.screenshot( {
      clip: {
        x: 0,
        y: ypos,
        width: contentSize.width,
        height
      }
    } ) )
    imgArr.push( tmpName )
  }
  return new Promise( ( resolve, reject ) => {
    // 使用gm这个包进行拼接
    gm( imgArr.shift() )
      .append( imgArr )
      .toBuffer( ( err, buffer ) => {
        page.close()
        err ? reject( err ) : resolve( buffer )
      } );
  } ).catch( () => page.close() );

@jiajunli
Copy link

@Ryuurock
Thank you !
After the screenshots are taken together, the plan is feasible.
But there is still a little imperfections to operate the Tmall details page, and the Google browser goes directly to the login page, so the address of the login page is intercepted.

@AutoSponge
Copy link

I can replicate this issue on a page only 5657px tall (with ~80% fail rate).

clip = {x: 0, y: 0, width: 1062, height: 5687, scale: 0.5}

When inspected, the resulting image is only the height of the viewport (902px) but the viewport is repeated ~3.5 times within the image (width is 128px). This leads me to believe it's not based on web socket limits as previously suggested. But it's also not puppeteer's fault directly, IMO. I got these results with the debugger protocol directly.

@jiajunli
Copy link

Hello, everyone, about puppeteer resources can not be completely released what solutions?

@jiajunli
Copy link

Hello, everyone, about segmented interception, for about 60000px high pages, after repeated tests, sometimes there will be intercepted only the background, such as http://cnemall.blog.hexun.com/

@arckalsun
Copy link

@jiajunli Have you solved this problem? I have the same problem as you.
If this problem cannot be solved, then I can only use Firefox Developer Edition. It has a built-in extension of capture entire page. Unfortunately, I did not find the api of this feature. So you can only use "headless=false" to open a window, and then simulate the mouse click on this extension to completely capture the entire page.

@aslushnikov aslushnikov added the chromium Issues with Puppeteer-Chromium label Dec 6, 2018
@js-sli
Copy link

js-sli commented Feb 25, 2019

This bug is still reproducing on the latest google chromium build.

@DinisCruz
Copy link

Hi , we had this exact same problem (with a small variation that in happened when running on our lambda function, but worked ok when running on OSX)

The solution was to set the view port height explicitly (before we were only setting the width and were letting chrome calculate the height of the full screen automatically)

@BrianHung
Copy link

Note: I found the same error when using the Chrome Debugger API and Dev Tools protocol, which shares some logic with puppeteer. However, I haven't test whether this solution works for puppeteer yet.

If anyone is running into a duplication issue like in the image below, try adding a timeout or sleep function in between Emulation.setDeviceMetricsOverride and Page.captureScreenshot around line 911, something like

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}
await sleep(100);

My hypothesis is that it takes a few ms for to emulate the page after Emulation.setDeviceMetricsOverride is called.

hello (18)

@whimboo
Copy link
Collaborator

whimboo commented Jan 21, 2020

As far as I can see no-one filed a Chromium bug yet. So I did that now: https://bugs.chromium.org/p/chromium/issues/detail?id=1043959

@StephanBijzitter
Copy link

I had the same issue.

Page is about 3000 pixels height, viewport was just less than 1000 pixels in height.
The screenshot was the visible part of the page, repeated 3 times (and a little bit).

defaultViewport: null as a launch option resolved this for me.

@mclxly
Copy link

mclxly commented Jun 18, 2020

@StephanBijzitter Not work for me, still duplicated (header part) 3 times

const browser = await puppeteer.launch({ headless: false, executablePath: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome', defaultViewport: null, });

Chrome: 83.0.4103.106

@ihortkachuk
Copy link

ihortkachuk commented Oct 23, 2020

Same here! I have a page about 12000 px and it duplicates two times. After investigating I've noticed that the issue might be in property {deviceScaleFactor: 2}. If you set it to 1 and the viewport size is equal to the size of the whole page you'll get the correct image. Has someone any idea how to solve it?

@paulirish
Copy link
Collaborator

paulirish commented Nov 13, 2020

I've just added details about this bug to the upstream chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=770769#c12
#359 is very related, where you can't get screenshots taller than 16384px aka (16*1024)px

Folks are correct that dpr/deviceScaleFactor is involved. :)
Your content will repeat every (16 * 1024) / deviceScaleFactor pixels.

This is ultimately a Chromium compositor bug. It cannot capture a texture larger than 16384px/dpr. So you'll need a workaround like taking smaller screenshots and stitching together.

I just found and tried out https://github.com/morteza-fsh/puppeteer-full-page-screenshot and had some success. (And put up a PR for a bug I ran into). The implementation seems straightforward enough.

@MaheshCasiraghi
Copy link

Is there a way to fallback to the https://github.com/morteza-fsh/puppeteer-full-page-screenshot behavior into puppeteer when pages are longer than 16000px? I am running into issues with .jpg JIMP buffers with the above package.

@zubriktomas
Copy link

zubriktomas commented Nov 25, 2020

It seems that with {deviceScaleFactor: 0} it actually works as expected.

Add
await page.setViewport({ width: 1000, height: 600, deviceScaleFactor: 0 });
after
const browser = await puppeteer.launch(); const page = await browser.newPage();
This solution works for pages with height up to approx. 8700px

Because the issue haven't been resolved yet, I suggest you to use Playwright instead.
It uses very similar syntax (most of the functions are named the same) and fullPage screenshot works like a charm without webpage height restrictions.

@Doflatango
Copy link

@zubriktomas Playwright works perfectly! Thanks!

@chenxingshuang
Copy link

    return new Promise(resolve => setTimeout(resolve, ms));

I use the the same logic as you in electron , but is does not work. the duplicate page emerges agian.

@samuksilv
Copy link

@zubriktomas thanks this works for me too. 😅

@stale
Copy link

stale bot commented Jun 23, 2022

We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days.

@whimboo
Copy link
Collaborator

whimboo commented Jun 24, 2022

This is a known and confirmed issue with CDP and has been reported via #1576 (comment).

@stale
Copy link

stale bot commented Aug 30, 2022

We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug chromium Issues with Puppeteer-Chromium confirmed P3 upstream
Projects
None yet
Development

No branches or pull requests