Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are dimensions rounded to two decimal places? #196

Open
PhilterPaper opened this issue Aug 18, 2023 · 8 comments
Open

Why are dimensions rounded to two decimal places? #196

PhilterPaper opened this issue Aug 18, 2023 · 8 comments
Labels
general discussion roadmaps, etc., discuss direction question how do I... ?

Comments

@PhilterPaper
Copy link
Owner

PhilterPaper commented Aug 18, 2023

@sasozivanovic wrote in PDF::API2 ticket ssimms/pdfapi2/issues/67:

I'm using PDF::API2 in a script accompanying a TeX package I have developed (Memoize, currently in the process of submission to CTAN). The purpose of the script is to extract selected pages from the PDF generated by TeX. As a paranoid measure, I'm comparing the expected page size to the size of MediaBox. (I know that TeX (pt) and PDF (bp) points are not the same, and I'm performing the conversion.) Some TeX engines (pdfTeX, LuaTeX) can set the number of decimal points used in PDF dimensions (using \pdfdecimaldigits or \pdfvariable decimaldigits). However, when PDF::API2 reads the dimensions with more than two decimal digits, it rounds them down to two. Could this be remedied, or is there a deeper reason for this behaviour?

Sašo, I don't have a direct answer to your question. I can't post this in your PDF::API2 ticket, as I have been banned from that repository! Anyway, I maintain a fork of PDF::API2, which is PDF::Builder, and it presumably has the same issue. What I'm curious about is for what purpose you require more than two decimal places of precision in a PDF coordinate? 0.01pt/bp is only 0.00353mm (0.000139 inch), so does it have any practical visible effect? Is it just a matter of coordinates not exactly numerically matching? Is TeX normally using a smaller Point than the PostScript/PDF Big Point (1/72 inch)?

I presume that the original purpose of rounding was simply to reduce the size of a PDF file by rounding down coordinates to a "reasonable" precision. Two decimal places would likely be invisible to the eye.

By the way, when dealing with Perl, I have encountered from time to time problems caused by different Perl builds using different extended precision libraries, to get a few more digits of precision. Unfortunately, this results in incompatibilities among the different libraries, which may extend to differences with "standard" floating point precision (presumably IEEE-754). Johan Vromans (@sciurius) just reported a possible issue with this in #89, and I have seen it with comparing full-precision floating point numbers in t-tests (I ended up having to round them down to single-precision!).

@PhilterPaper PhilterPaper added question how do I... ? general discussion roadmaps, etc., discuss direction labels Aug 18, 2023
@sasozivanovic
Copy link

sasozivanovic commented Aug 18, 2023 via email

@PhilterPaper
Copy link
Owner Author

Is it interface-compatible with PDF::API2? If so, I could adopt it as an alternative.

It's very largely API-compatible. It has some new function, but otherwise I try hard to keep it compatible (a functional superset). There may be a handful of cases where there are some differences (see INFO/KNOWN_INCOMPAT). It would be much appreciated if you listed PDF::Builder as an acceptable alternative to PDF::API2.

Generally speaking, I think it's just that the basic expectation is that if you save something somewhere, you get the same thing back when you load it.

The problem with Perl is that floating point operations frequently yield 15+ digits of precision, which would really bloat the PDF file if you included the full precision of each coordinate value. As I said, that's probably why the original authors chose to round the results. Frankly, unless you're tremendously magnifying the document, or tiling huge numbers of pages, I find it hard to believe that the rounding would actually result in a visible change.

On the other hand, if you can come up with an example that shows visible effects from rounding (and round-tripping between PDFTeX and PDF::Builder), I'll be happy to think about some control on rounding -- to use the raw input precision or round to a specified amount. The tradeoff for a perfect round-trip (?) is much larger PDF files and probably a slowdown in processing speed. I just don't want to go down this rabbit-hole unless there is a clear benefit to be gained.

Well, TeX and floating point can't even be mentioned in the same sentence ;-)

If you won't, I will! What sort of precision is PDFTeX/PDFLaTeX producing PDFs at? I wouldn't be shocked if TeX is using scaled integers internally, given the timeframe of TeX's origin. Anyway, if it's not excessive, it might be feasible to allow higher precision in PDF::Builder, especially if it's processing PDFTeX-produced input (i.e., accept the full precision and don't round when writing it back). What sort of precision do PDF Readers (especially Adobe Acrobat) handle? If they're just rounding input to double (or even single) precision anyway, handling excessive precision may be an exercise in futility.

BTW, I have used PDFLaTeX to generate documentation, and found it quite nice (although I recall having a terrible time trying to get it to use TTF fonts).

@sasozivanovic
Copy link

It's very largely API-compatible. It has some new function, but otherwise I try hard to keep it compatible (a functional superset). There may be a handful of cases where there are some differences (see INFO/KNOWN_INCOMPAT). It would be much appreciated if you listed PDF::Builder as an acceptable alternative to PDF::API2.

I had to test right away, of course. The API indeed doesn't seem to be the problem, but I get an error: PDF Integrity Check: Root object 17.0 not found! At first I thought the problem was on my side (i.e. that my package produced an ill-formed PDF), but I then realized that the problem persists with any(?) LaTeX-generated file. I guess I should open another ticket?

Regarding PDF precision, I dug up the docs I was reading a while ago:

  • ISO 32000 (PDF), which I have great trouble understanding, states:
    • In Table C.1 – Architectural limits, entry for "real quantity": "Number of significant decimal digits of precision in fractional part (approximate) = 5".
    • Table 253 – Entries in a number format dictionary: I have no idea what the number format dictionary is, and the text is too convoluted to quote here, but the default values seem to boil down to two decimal digits.
  • XeTeX has the number of decimal digits fixed to 2, much like PDF::Builder, and in accord with the ISO default, if I understand that correctly.
  • pdfTeX allows from 0 to 4 decimal digits in the PDF output. The manual states that "in most cases the optimal value is 2". The default is \pdfdecimaldigits=4.
  • LuaTeX allows from 3 to 6 decimal digits. Again, the default is 4. My experiments show that setting \pdfvariable decimaldigits=6 in LuaTeX does not have the expected result. The resulting number of digits is 5. Perhaps the reason is the "architectural limit" in ISO, but again, I don't know if I understand that correctly.
  • TeXLive (the major TeX distribution) down-sets \pdfdecimaldigits=3 for both pdfTeX and LuaTeX.

On the other hand, if you can come up with an example that shows visible effects from rounding (and round-tripping between PDFTeX and PDF::Builder)

This was fun! I cooked up an "inner" document and two "outer" documents, where the outer one includes the inner one, with some vertical space in front. One of the outer documents includes the inner one using the MediaBox size, and the other uses the TeX size used to produce the inner document. I set the decimal precision used to produce the inner document to 2 (to simulate the PDF::Builder situation) and went hunting for the right amount (or rather, just the wrong amount) of vertical space which will cause a different page break. And indeed, at some point one of the outer documents produced two pages while the other one still created only one.

inner.tex:

\pdfdecimaldigits=2
%\pdfcompresslevel=0
\documentclass{article}
\usepackage[paperwidth=4cm,paperheight=4cm,margin=0pt,margin=5mm]{geometry}
\begin{document}
\begin{center}
4cm x 4cm,\\where\\4cm\\=\\
\dimen0=4cm\relax\the\dimen0\relax\\=\\113.38582bp
\end{center}
\end{document}

outer1.tex (including inner.pdf with MediaBox size):

\documentclass[a4paper]{article}
\begin{document}
\vspace*{469.785pt}%
\fbox{%
    \pdfximage page 1 mediabox {inner.pdf}%
    \pdfrefximage\pdflastximage
}%
\end{document}

outer2.tex (including inner.pdf exactly at 4x4cm):

\documentclass[a4paper]{article}
\begin{document}
\vspace*{469.785pt}%
\fbox{%
  \pdfximage page 1 mediabox {inner.pdf}%
  \setbox0=\hbox{\pdfrefximage\pdflastximage}%
  \wd0=4cm
  \ht0=4cm
  \dp0=0cm
  \box0
}%
\end{document}

Some calculations. TeX creates the inner document 4cm high. 4cm = 113.81102pt = 113.38582bp. In the MediaBox, this gets rounded to 113.39bp (I checked). Coming back to TeX, this is 113.81521pt. Diff: 0.00419pt. And indeed, this is the range of vertical space (about 469.785pt –469.789pt) where the two documents have a different number of pages.

Crucially, the issue disappears when I set pdfdecimaldigits=3 in inner.tex. And not to leave PDF::Builder out of the picture, the issue reappears under the pdfdecimaldigits=3 setting if I post-process inner.pdf with the following script:

use PDF::API2;
$in = PDF::API2->open("inner.pdf");
$out =  PDF::API2->new();
$out->import_page($in, 1);
$out->saveas("inner.pdf")

(Sorry, I used PDF::API2, because as I said above, PDF::Builder refuses to load LaTeX-produced PDFs.)

I'll be happy to think about some control on rounding -- to use the raw input precision or round to a specified amount. The tradeoff for a perfect round-trip (?) is much larger PDF files and probably a slowdown in processing speed. I just don't want to go down this rabbit-hole unless there is a clear benefit to be gained.

That would make my inner perfectionist happy ;-) But the exact implementation is something to think about, indeed. The downsides you list are very valid. The raw input precision makes sense to me, although there is a problem with backward compatibility? Perhaps both raw input and specified amount, selectable, with the default on the latter and 2, would be best?

BTW, I have used PDFLaTeX to generate documentation, and found it quite nice (although I recall having a terrible time trying to get it to use TTF fonts).

Thankfully, those days are over. We now have LuaTeX, which has no problem with TTF, OTF, or whichever fonts. (And before we had LuaTeX, XeTeX filled the gap.)

@PhilterPaper
Copy link
Owner Author

If your LaTeX-produced PDF does indeed include an object 17, that may be a bug in PDF::Builder (it does extra integrity checks that PDF::API2 doesn't, although I don't think any of them should be fatal -- was this one fatal?). Please open a ticket, including a PDF that shows this behavior.

If you think you've shown visible differences "in the wild" due to the rounding of coordinates, what sort of remedy would be useful to you? I don't want to never round coordinates, as this would unnecessarily bloat PDF file sizes, but I'm willing to add capability to not round from an input PDF file, or to round to a specified number of decimal places. Any newly-added material would probably still be rounded (just not imported material). Let me know what would be useful to you!

@sasozivanovic
Copy link

If your LaTeX-produced PDF does indeed include an object 17,

Well, my PDF reading skills are not the best, but object 17 seems to be there.

that may be a bug in PDF::Builder (it does extra integrity checks that PDF::API2 doesn't, although I don't think any of them should be fatal -- was this one fatal?). Please open a ticket, including a PDF that shows this behavior.

Will do.

If you think you've shown visible differences "in the wild" due to the rounding of coordinates,

Well, my example above is obviously a constructed one (attaching the resulting PDFs now: outer1.pdf outer2.pdf) but in my experience, every problem eventually crops up in the wild, as well. And then, it's very hard to track down the source of the problem.

what sort of remedy would be useful to you? I don't want to never round coordinates, as this would unnecessarily bloat PDF file sizes, but I'm willing to add capability to not round from an input PDF file, or to round to a specified number of decimal places. Any newly-added material would probably still be rounded (just not imported material). Let me know what would be useful to you!

This sounds very reasonable. An imported PDF page is precisely where someone else already rounded the numbers for you!

@PhilterPaper
Copy link
Owner Author

The API2 ticket was closed with the observation that the rounding was done to ensure consistency among different implementations:

PDF::API2 limits the number of decimal places so that its tests work properly across multiple architectures. Before it did this, one architecture might stop after ten decimals while another would include sixteen, resulting in different PDFs being generated during tests.

The rounding that PDF::API2 should be close enough for real-world usage (e.g. 0.01pt = 0.0035mm, and most of the rounding is to four or more decimal places rather than two).

I will still think about giving users control of some sort over the use of rounding, but don't expect anything too soon, as I have a lot of stuff on my plate. We definitely want to do rounding on freshly-created coordinates, to avoid silly (and inconsistent) 15+ digit values, and if input values are already "reasonably" rounded, that could be good enough.

@sasozivanovic
Copy link

The API2 reply, as well as your comments, made me rethink this whole issue. I realized that insisting on as tight a sanity check as possible can actually lead to false negatives. So I'll fix the size-check tolerance to two digits, and be done with it.

@PhilterPaper
Copy link
Owner Author

OK, if that doesn't place a huge burden on you, it works for me. If you tell me that you've gone ahead and accepted two-digit rounding, I will make any rounding control a lower priority request (and leave this ticket open for now).

  • Freshly created coordinates: round to specified number of decimal places (default 2) current behavior
  • Read-in coordinates from a PDF input: by default, leave as-is (no further rounding), or can specify the decimal places

I think the read-in coordinates are not preserved as text, but become floating point numbers, so I may end up having to somehow detect the input precision used, and round to that when writing (if I can't preserve them as text). It requires some thought!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general discussion roadmaps, etc., discuss direction question how do I... ?
Projects
None yet
Development

No branches or pull requests

2 participants