Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change: encode non-JPEG images as PNGs instead of JPEG2000 images #1481

Merged
merged 1 commit into from Oct 30, 2021

Conversation

aschmitz
Copy link
Contributor

This uses Pillow to re-encode any non-JPEG image as a PNG, then inline that
image's IDAT chunks as a FlateDecode value, which allows us to reuse the work
from the PNG encoder. This means we'll reencode some PNGs we could have passed
through directly, but that could be changed later. Alpha layers continue to be
handled separately, as appears to be required by the PDF spec.

Provides a potential fix for #1444, which works in local tests (and appears to pass the existing test suite, at least locally).

There's a lingering optimization where we could directly pass through PNGs that meet certain criteria:

  • Must have no alpha transparency (it has to be broken out into a separate layer).
  • May need to not be indexed / using a palette (I don't remember if PDF supports this directly, but if it does, it would require more work).
  • We would need to do more work to handle gamma correction (which PDF does support), color spaces (maybe a PDF feature), rotation, etc. if relevant for the image - although I don't actually know if Pillow is covering those for us now or not.
  • If we're particularly suspicious of the encoding of the PNG, optimize_size could lead us to push Pillow to re-encode the image, although I suspect this would rarely show a benefit and could increase sizes in already-extra-compressed images.

Because that requires more investigation and work, I haven't done it here, this is mostly just the minimum effort to get the change out the door. It could be worth filing an issue to investigate "PNG passthrough" for the future, but this solves my immediate concerns of JPEG2000 being slow. I expect - but haven't thoroughly tested - that it may also resolve issues like #1475. (For the example in #1462, the output PDF with this patch is 6.06 KiB, which is bigger than version 52, but much smaller than version 53.0.)

This uses Pillow to re-encode any non-JPEG image as a PNG, then inline that
image's IDAT chunks as a FlateDecode value, which allows us to reuse the work
from the PNG encoder. This means we'll reencode some PNGs we could have passed
through directly, but that could be changed later. Alpha layers continue to be
handled separately, as appears to be required by the PDF spec.
@aschmitz
Copy link
Contributor Author

I see the failing test, but I don't understand what it's saying. Do you understand what the issue is? I suspect there's probably something up with the way we're lightly abusing BytesIO (possibly the .seek(0), although that seems to work fine), but I'll admit that Python is not a native language for me.

(Feel free to push a fix to my branch, of course, or fix-and-merge some other way if that's most expedient for you.)

@liZe
Copy link
Member

liZe commented Oct 30, 2021

💜 Thanks a lot for this pull request! 💜

There's a lingering optimization where we could directly pass through PNGs that meet certain criteria:

Sure. As you said, it can be done later, this solution is probably already faster than using JPEG2000.

I expect - but haven't thoroughly tested - that it may also resolve issues like #1475. (For the example in #1462, the output PDF with this patch is 6.06 KiB, which is bigger than version 52, but much smaller than version 53.0.)

#1475 is probably solved (@gnyers and @mpth, feel free to test when it’s merged!). #1462 uses a 1-bit image as far as I can remember, so it could be improved if we can use a palette in the future.

We’ll probably have to get a lot of different image formats for tests. We should have done this in version 53 already, but it would be a good topic for version 55.

I see the failing test, but I don't understand what it's saying. Do you understand what the issue is? I suspect there's probably something up with the way we're lightly abusing BytesIO (possibly the .seek(0), although that seems to work fine), but I'll admit that Python is not a native language for me.

I suppose that the linter doesn’t like document.py (maybe a useless import), and that a bug in a pytest plugin gives this error. Nothing to worry about, I’ll fix it!

@liZe liZe merged commit c3136ad into Kozea:master Oct 30, 2021
@liZe liZe added the performance Too slow renderings label Oct 30, 2021
@liZe liZe added this to the 54.0 milestone Oct 30, 2021
@liZe
Copy link
Member

liZe commented Oct 30, 2021

(For the record: the bug is tholo/pytest-flake8#81.)

@mpth
Copy link

mpth commented Oct 30, 2021

Do I have to clone the repo and build the binary myself or can I get it via pip or from elsewhere?

@liZe
Copy link
Member

liZe commented Oct 30, 2021

Do I have to clone the repo and build the binary myself or can I get it via pip or from elsewhere?

You can use pip install git+https://github.com/Kozea/WeasyPrint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Too slow renderings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants