change: encode non-JPEG images as PNGs instead of JPEG2000 images #1481

aschmitz · 2021-10-29T05:44:42Z

This uses Pillow to re-encode any non-JPEG image as a PNG, then inline that
image's IDAT chunks as a FlateDecode value, which allows us to reuse the work
from the PNG encoder. This means we'll reencode some PNGs we could have passed
through directly, but that could be changed later. Alpha layers continue to be
handled separately, as appears to be required by the PDF spec.

Provides a potential fix for #1444, which works in local tests (and appears to pass the existing test suite, at least locally).

There's a lingering optimization where we could directly pass through PNGs that meet certain criteria:

Must have no alpha transparency (it has to be broken out into a separate layer).
May need to not be indexed / using a palette (I don't remember if PDF supports this directly, but if it does, it would require more work).
We would need to do more work to handle gamma correction (which PDF does support), color spaces (maybe a PDF feature), rotation, etc. if relevant for the image - although I don't actually know if Pillow is covering those for us now or not.
If we're particularly suspicious of the encoding of the PNG, optimize_size could lead us to push Pillow to re-encode the image, although I suspect this would rarely show a benefit and could increase sizes in already-extra-compressed images.

Because that requires more investigation and work, I haven't done it here, this is mostly just the minimum effort to get the change out the door. It could be worth filing an issue to investigate "PNG passthrough" for the future, but this solves my immediate concerns of JPEG2000 being slow. I expect - but haven't thoroughly tested - that it may also resolve issues like #1475. (For the example in #1462, the output PDF with this patch is 6.06 KiB, which is bigger than version 52, but much smaller than version 53.0.)

This uses Pillow to re-encode any non-JPEG image as a PNG, then inline that image's IDAT chunks as a FlateDecode value, which allows us to reuse the work from the PNG encoder. This means we'll reencode some PNGs we could have passed through directly, but that could be changed later. Alpha layers continue to be handled separately, as appears to be required by the PDF spec.

aschmitz · 2021-10-29T05:51:57Z

I see the failing test, but I don't understand what it's saying. Do you understand what the issue is? I suspect there's probably something up with the way we're lightly abusing BytesIO (possibly the .seek(0), although that seems to work fine), but I'll admit that Python is not a native language for me.

(Feel free to push a fix to my branch, of course, or fix-and-merge some other way if that's most expedient for you.)

liZe · 2021-10-30T09:53:33Z

💜 Thanks a lot for this pull request! 💜

There's a lingering optimization where we could directly pass through PNGs that meet certain criteria:

Sure. As you said, it can be done later, this solution is probably already faster than using JPEG2000.

I expect - but haven't thoroughly tested - that it may also resolve issues like #1475. (For the example in #1462, the output PDF with this patch is 6.06 KiB, which is bigger than version 52, but much smaller than version 53.0.)

#1475 is probably solved (@gnyers and @mpth, feel free to test when it’s merged!). #1462 uses a 1-bit image as far as I can remember, so it could be improved if we can use a palette in the future.

We’ll probably have to get a lot of different image formats for tests. We should have done this in version 53 already, but it would be a good topic for version 55.

I see the failing test, but I don't understand what it's saying. Do you understand what the issue is? I suspect there's probably something up with the way we're lightly abusing BytesIO (possibly the .seek(0), although that seems to work fine), but I'll admit that Python is not a native language for me.

I suppose that the linter doesn’t like document.py (maybe a useless import), and that a bug in a pytest plugin gives this error. Nothing to worry about, I’ll fix it!

liZe · 2021-10-30T10:15:08Z

(For the record: the bug is tholo/pytest-flake8#81.)

mpth · 2021-10-30T10:21:32Z

Do I have to clone the repo and build the binary myself or can I get it via pip or from elsewhere?

liZe · 2021-10-30T10:36:59Z

Do I have to clone the repo and build the binary myself or can I get it via pip or from elsewhere?

You can use pip install git+https://github.com/Kozea/WeasyPrint.

liZe merged commit c3136ad into Kozea:master Oct 30, 2021

liZe added the performance Too slow renderings label Oct 30, 2021

liZe added this to the 54.0 milestone Oct 30, 2021

liZe mentioned this pull request Oct 30, 2021

Raster image rendering performance dropped #1444

Closed

liZe mentioned this pull request Mar 4, 2022

Performance issue on long documents #578

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change: encode non-JPEG images as PNGs instead of JPEG2000 images #1481

change: encode non-JPEG images as PNGs instead of JPEG2000 images #1481

aschmitz commented Oct 29, 2021

aschmitz commented Oct 29, 2021

liZe commented Oct 30, 2021

liZe commented Oct 30, 2021

mpth commented Oct 30, 2021

liZe commented Oct 30, 2021

change: encode non-JPEG images as PNGs instead of JPEG2000 images #1481

change: encode non-JPEG images as PNGs instead of JPEG2000 images #1481

Conversation

aschmitz commented Oct 29, 2021

aschmitz commented Oct 29, 2021

liZe commented Oct 30, 2021

liZe commented Oct 30, 2021

mpth commented Oct 30, 2021

liZe commented Oct 30, 2021