-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fullPage screenshot duplicates page (doubles/tripples page length) #1576
Comments
i've the same issue, why nobody react of the issue since 9 days passed |
So after some more playing around with this I have found that the problem seems to lie with fullpage: true. If you set the viewport to the size you want to take a picture of and then take the screenshot with fullpage off it seems to work, but only if the page is less than a certain length (10000 seems to work- haven't checked the exact breaking point). |
When can you fix this problem, please |
+1 |
1 similar comment
+1 |
The resolution of these situations is split into pieces with height less than 16000px ,then merge |
+1 |
I've got some inspiration here on how to split and merge (cuts a site in half vertically and rejoins): https://gist.github.com/joelgriffith/a9b2d72c0672fd3170bd9ba33cf17f37. There appears to be a finite limit of ~10mb, which might be a WebSocket issue, however "chunking" the image and re-composing is likely the best thing to do right now. |
I wonder if this is related to the issue I just opened #2123 |
@aslushnikov Is there a plan to fix it? |
How can we solve this problem in view of the height of uncertain web pages? |
const dpr = page.viewport().deviceScaleFactor || 1;
const maxScreenshotHeight = Math.floor( bugMaxHeight / dpr );
const imgArr = [];
// 小于16 * 1024像素高的图片直接截图
if ( contentSize.height < maxScreenshotHeight ) {
// 防止意外发生未关闭标签页造成内存爆炸
let timeoutID = setTimeout( () => page.close(), 2e4 )
return page.screenshot( {
fullPage: true
} ).then( buffer => ( clearTimeout( timeoutID ), page.close(), buffer ) );
}
// 大于16 * 1024高度的图片循环截图 放在系统提供的缓存里
for ( let ypos = 0; ypos < contentSize.height; ypos += maxScreenshotHeight ) {
const height = Math.min( contentSize.height - ypos, maxScreenshotHeight );
const tmpName = tmp.tmpNameSync();
fs.writeFileSync( tmpName, await page.screenshot( {
clip: {
x: 0,
y: ypos,
width: contentSize.width,
height
}
} ) )
imgArr.push( tmpName )
}
return new Promise( ( resolve, reject ) => {
// 使用gm这个包进行拼接
gm( imgArr.shift() )
.append( imgArr )
.toBuffer( ( err, buffer ) => {
page.close()
err ? reject( err ) : resolve( buffer )
} );
} ).catch( () => page.close() ); |
@Ryuurock |
I can replicate this issue on a page only 5657px tall (with ~80% fail rate). clip = {x: 0, y: 0, width: 1062, height: 5687, scale: 0.5} When inspected, the resulting image is only the height of the viewport (902px) but the viewport is repeated ~3.5 times within the image (width is 128px). This leads me to believe it's not based on web socket limits as previously suggested. But it's also not puppeteer's fault directly, IMO. I got these results with the debugger protocol directly. |
Hello, everyone, about puppeteer resources can not be completely released what solutions? |
Hello, everyone, about segmented interception, for about 60000px high pages, after repeated tests, sometimes there will be intercepted only the background, such as http://cnemall.blog.hexun.com/ |
@jiajunli Have you solved this problem? I have the same problem as you. |
This bug is still reproducing on the latest google chromium build. |
Hi , we had this exact same problem (with a small variation that in happened when running on our lambda function, but worked ok when running on OSX) The solution was to set the view port height explicitly (before we were only setting the width and were letting chrome calculate the height of the full screen automatically) |
Note: I found the same error when using the Chrome Debugger API and Dev Tools protocol, which shares some logic with puppeteer. However, I haven't test whether this solution works for puppeteer yet. If anyone is running into a duplication issue like in the image below, try adding a timeout or sleep function in between function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
await sleep(100); My hypothesis is that it takes a few ms for to emulate the page after |
As far as I can see no-one filed a Chromium bug yet. So I did that now: https://bugs.chromium.org/p/chromium/issues/detail?id=1043959 |
I had the same issue. Page is about 3000 pixels height, viewport was just less than 1000 pixels in height.
|
@StephanBijzitter Not work for me, still duplicated (header part) 3 times
Chrome: 83.0.4103.106 |
Same here! I have a page about 12000 px and it duplicates two times. After investigating I've noticed that the issue might be in property |
I've just added details about this bug to the upstream chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=770769#c12 Folks are correct that dpr/deviceScaleFactor is involved. :) This is ultimately a Chromium compositor bug. It cannot capture a texture larger than 16384px/dpr. So you'll need a workaround like taking smaller screenshots and stitching together. I just found and tried out https://github.com/morteza-fsh/puppeteer-full-page-screenshot and had some success. (And put up a PR for a bug I ran into). The implementation seems straightforward enough. |
Is there a way to fallback to the https://github.com/morteza-fsh/puppeteer-full-page-screenshot behavior into puppeteer when pages are longer than 16000px? I am running into issues with .jpg JIMP buffers with the above package. |
It seems that with Add Because the issue haven't been resolved yet, I suggest you to use Playwright instead. |
@zubriktomas Playwright works perfectly! Thanks! |
I use the the same logic as you in electron , but is does not work. the duplicate page emerges agian. |
@zubriktomas thanks this works for me too. 😅 |
We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days. |
This is a known and confirmed issue with CDP and has been reported via #1576 (comment). |
We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days. |
So I'm having a weird problem with certain websites that I am trying to screenshoot. Essentially the page is shot and then replicated a number of times down in the png to make a really long screenshot that contains all these replications. Its like somebody copy pasted the page a couple of times onto the bottom of the original.
I have not been able to figure out which sites cause it, but the example below is an example- I can supply more if needed.
Steps to reproduce
Tell us about your environment:
What steps will reproduce the problem?
Sample code with fullPage=True, i.e:
What is the expected result?
A pull page screenshot- not just the visible part
What happens instead?
A very long image that contains multiple copies of the full page screenshot.
Don't think the example will be visible, but worth a shot.
The text was updated successfully, but these errors were encountered: