Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

page.paperSize is not accurate for .pdf #11590

Closed
ovidiuch opened this issue Aug 28, 2013 · 11 comments
Closed

page.paperSize is not accurate for .pdf #11590

ovidiuch opened this issue Aug 28, 2013 · 11 comments

Comments

@ovidiuch
Copy link

I'm trying to export a single-page pdf by taking up the entire size of the rendered document, but it is proving impossible.

I created this test scenario to test this. Here's your usual rasterize.js script:

var page = require('webpage').create(),
    system = require('system');

var url = system.args[1];
var output = system.args[2];

// Not relevant, but for the sake of trying let's set it to the same size
page.viewportSize = {width: 400, height: 1200};
page.paperSize = {width: 400, height: 1200};

page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Unable to load the address ' + url + '!');
        phantom.exit();
    } else {
        window.setTimeout(function () {
            page.render(output);
            phantom.exit();
        }, 1000);
    }
});

With a HTML that generates a 400x1200px grid (example output)

<html>
<head>
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.0.3/jquery.min.js"></script>
</head>
<body style="
  margin: 0;
  padding: 0;
">
  <div id="grid" style="
    position: relative;
    width: 400px;
    height: 1200px;
    background: #ccc;
    font-size: 10px;
  "></div>
  <script>
    var width = $('#grid').width();
    var height = $('#grid').height();
    for (var i = 0; i < width; i += 25) {
      for (var j = 0; j < height; j += 25) {
        $('#grid').append($('<div></div>').css({
          position: 'absolute',
          width: '24px',
          height: '24px',
          borderRight: '1px solid #444',
          borderBottom: '1px solid #444',
          left: i + 'px',
          top: j + 'px',
          lineHeight: '24px',
          textAlign: 'center'
        }).text(Math.max(i, j) + 25));
      }
    }
  </script>
</body>
</html>

After simply calling phantomjs rasterize.js grid.html grid.pdf, the pdf has more than one page, and is somewhat zoomed, fitting only around 315px * 945px inside a pdf page. (example output)

It turns out the magic ratio is 1.27x, and multiplying both the width and height to this would render a single page with the exact grid contents, but this ratio is no constant, it just matches this 400x1200 size.

Any insight on how this works or it this a straight up bug?

@glowka
Copy link

glowka commented Sep 26, 2013

I guess it is all about your system default dpi setting. According to issue #10659 on all platforms, except Windows, dpi is now set to 72. You can possibly have on your system 96dpi set, so this makes this ~1.25 ratio.

I'm not sure of this as I'm trying to handle similar dpi problems myself too, so better check it yourself. Anyway, I hope it might be a good direction.

@ecsv
Copy link
Contributor

ecsv commented Oct 19, 2013

Can you please test commit 966902a and 7d2b311 (it will hopefully be pulled or is available from https://github.com/ecsv/phantomjs branch pdf-accurate-pagesize in the meantime)

andrey-p added a commit to andrey-p/apocalism-js that referenced this issue Dec 11, 2013
The fix involves some crazy magic numbers. This will be corrected at a later
date when the relevant phantomJS bug
(ariya/phantomjs#11590) gets fixed

Skipped the relevant test in images as well until this is resolved
@mateuszjarzewski
Copy link

There is very simple way to fix this.
Just set this:
page.zoomFactor = 0.821980

@ecsv
Copy link
Contributor

ecsv commented Mar 7, 2014

@mateuszjarzewski This is not a solution because the zoom is dynamically calculated and your zoomFactor is only working for your specific problem/site. See my patch 7d2b311

The other problem is the wrong conversion function which you can be seen fixed in 966902a

Unfortunately, the maintainer and the issue reported didn't reply to this issue since ~5 months. I have the (in my eyes) working fixes and discussion is happening. In result, these fixes/workarounds are not merged and the issue is still open

@mateuszjarzewski
Copy link

You're right. It's always about 0.82 but not exacly..
Can I somehow download phantom version with your paths?

@ecsv
Copy link
Contributor

ecsv commented Mar 9, 2014

Can I somehow download phantom version with your paths?

I don't know any source where you can download precompiled versions with my patches. I think some people asked different Distributions (like Debian/Ubuntu) to include them but I think it wasn't done yet.

But you can download the source from https://github.com/ecsv/phantomjs/archive/pdf-accurate-pagesize.zip and build it yourself.

@ecsv
Copy link
Contributor

ecsv commented Mar 9, 2014

For me it's work on all websites.

@mateuszjarzewski But for me it doesn't work. Here is a screenshot for a website (actually, it is a slideshow) which works with png and my patches. But without my patches and with your scale ("page.zoomFactor = 0.821980;") it doesn't fit exactly a slide on one pdf page. With your scale (and without my patches) it needs to split the single slide in two pdf pages:

slide_split_on_two_pdf_pages

When I would do the zoom factor approach then I would end up with 0.79026....something. But even then it might happen that the scale factor is different on different slides and thus I would need different zoomFactor on these slides. This is especially problematic when I am using svg images which just "disappear" for some reason when not defining the correct fixed width and height (looks like they are rendered outside of the viewport).

@mateuszjarzewski
Copy link

Sure man, you right.
I currently working on solution that uses mozilla PDF.js to determine number of pages in test PDF file.
Each dec point requires ~5 treys.
So to find factor like 0.821980
my program need to loop about 30 times.
PDF files used in test are pretty small (there is only one element, 1px wide and N px/mm high)
it weights about 2KB, so program takes only about ~500 ms to loop.
After that I can save my factor for given document height and move along :)
I think it its better solution that recompiling Phantom.
If you want that script, you can ask :)

@ecsv
Copy link
Contributor

ecsv commented Mar 9, 2014

If you want that script, you can ask :)

Nope, I prefer to have it recompiled to avoid this approaches which seems to kill my svg images. Especially when I have a (imho) cleaner solution which doesn't have the svg problem and doesn't need the calibration rounds.

Btw. here is how it looks rendered with the modified phantomjs:

slide_on_one_page_patched_phantomjs

And here is the rendering of the example given by the bug reporter with the patched phantomjs (converted from pdf to png with imagemagick's "convert" tool):

test

@ariya and @takaki: Any comments regarding the issue/patches?

ariya pushed a commit that referenced this issue May 31, 2014
The unit px is one point inside the HTML source page but phantomjs handles it
without reason as 1/2.54 points. This makes the page smaller than expected when
trying to render a page as PDF.

#11590 ("page.paperSize is not
accurate for .pdf")
ariya pushed a commit that referenced this issue May 31, 2014
PDFs are not rendered like PNG or other image formats by phantomjs because it
uses the printer functionality of Qt+Webkit. But Webkit uses some printer
"optimization" to save paper by shrinking the output. Such shrinking results in
too small content on a page.

#11590 ("page.paperSize is not
accurate for .pdf")
ariya pushed a commit that referenced this issue May 31, 2014
The last reference test.pdf in the tests was using a complete different region
of the input "image.jpg" because the used phantomjs version used to generate it
had a bug. This bug was fixed in 833eb82
("Don't scale the unit px to 1/2.54 points for PDFs") and
1daa2eb ("Disable page shrinking for pdf
printing to create accurate output").

The new version should now use the same region of the input image as the other
generated files for the gif, jpg and png tests. The only difference is the
extra height is still displayed on a second page but this is currently expected
by phantomjs.

#11590 ("page.paperSize is not
accurate for .pdf")
@ariya ariya closed this as completed May 31, 2014
@milianw
Copy link
Contributor

milianw commented Oct 23, 2014

The patch that was applied here will break a common use case of using PhantomJS as a testing platform for printing websites. The big issue I see is that it sets printingMaximumShrinkFactor to 1, thereby preventing any automatic rescaling of the website to fit. Try e.g. this command:

./bin/phantomjs --ssl-protocol=any ./examples/rasterize.js http://github.com/ariya/phantomjs/commit/1daa2eb4dd49efb848ff96e37f298a774520cc9b test.pdf A4

With this patch applied, the website is cut-off at the right side. Printing the website from any normal browser would resize it to fit the page. Setting the maximum shrink factor to 1 prevents this behavior.

The issues described in this bug report (at least with the supplied html I could test) are solely related to printingMinimumShrinkFactor. Setting that alone to 1 makes the print behave as the reporter intends. But even then, I'd argue that the patch is too simplistic. What you probably want to do instead is change the code to check whether a website fits into the unscaled page. If so, the minimum shrink factor should not applied. Yes, this is more work. But at least it ensures the behavior is equal to what you'd get when you print a normal website with PhantomJS compared to printing it from a browser.

@LiamKarlMitchell
Copy link

For me setting page.zoomFactor = 1; seemed to solve all my problems.
I am just using a binary of 1.9.7 on linux.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants