Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Embed jpeg images in pdf #6

Closed
mbarkhau opened this Issue · 14 comments

4 participants

@mbarkhau

Currently jpeg images are converted into png. Cairo supports embeding jpegs via cairo_surface_set_mime_data.

However this function doesn't seem to have been exposed via pycairo yet. It would be great if this were implemented, currently my pdf files are ca. 10x the size they need to be.

@SimonSapin
Owner

Yes, this would be nice to have. The docs say " See corresponding backend surface docs for details about which MIME types it can handle. " I didn’t find such docs for PDF, but looking through the source shows some signs of supporting JPEG and JPEG2000, so that looks good.

As you say, it’s not on pycairo yet. There are a few options:

  • Take this to the cairo mailing-list, wait for someone to feel like doing it, then wait for the next pycairo version.
  • Take this to the cairo mailing-list, do it ourselves, then wait for the next pycairo version.
  • Do it in C in a separated extension that uses the pycairo C API
  • Do it in Cython in a separated extension that uses the pycairo C API

Which do you like most?

@mbarkhau

I don't think 3 or 4 are very good options. I've already written to the cairo mailing-list concerning 1, I'd see what they think before digging into the pycairo implementation myself.

@SimonSapin
Owner

Great. I subscribed to the cairo ML, we’ll see how it goes.

@SimonSapin
Owner

Actually this shouldn’t be too hard to do in pycairo. Accept any object with the buffer protocol, INCREF it in Surface.set_mime_data (on CAIRO_STATUS_SUCCESS ) and register a destroy callback that does a DECREF.

@mbarkhau

It doesn't seem that anybody on the mailing list is picking up on this.

@mbarkhau

Looks like I spoke too soon.

The function should appear in the next version of pycairo. You could
open a bug report if you want to make sure it is not forgotten, and
submit a patch if you are really keen.

The question is then, when will the next version of pycairo come out. By the looks of it once every year, but then we're overdue atm.

@SimonSapin
Owner

I’ll have a go at it unless you want to. Assuming we get a working patch, the question remain of when it will be in a release.
By the way, do you use Python 2 or 3? (py2cairo and pycairo are separate code bases.)

@mbarkhau

I won't have time for at least another month, so by all means have a go.
We're currently on python 2.

@liujuncn

Is there a patch for it?

@SimonSapin
Owner

@liujuncn not yet. The plan goes like this:

  1. Patch pycairo and py2cairo to add Surface.set_mime_data(), exposing cairo’s existing function.
  2. Patch WeasyPrint to use it if available (we still want to support older version of pycairo.)

(I think it’s a bad idea to start 2 speculatively before 1 is done at least in a git version of pycairo.)

I intend to work on this at some point but it’s low-priority for me at the moment. Anyone interested is welcome to start on 1.

@SimonSapin
Owner

Hi @liujuncn .

For 1. see the CSS for Paged Media spec. (It’s down at the moment but should be back soon.) At some point I plan on writing some documentation that is more author-friendly than a spec, but it’s not there yet :) In brief: @page { size: A5 portrait; margin: 1cm }

  1. seems related to #9

However both of your questions are unrelated to this bug. If you have further questions, please start a new issue or write to the mailing list or IRC; keep this about embedding JPEG images in PDF.

@SimonSapin
Owner

The patch for pycairo is ready:
SimonSapin/py2cairo@ebe487c
http://lists.cairographics.org/archives/cairo/2012-December/023839.html

The change in WeasyPrint should be easy. What remains is having a pycairo release.

@seiflotfy

Assuming 1 gets accepted! What in WeazyPrint needs to be modified?

@SimonSapin SimonSapin closed this issue from a commit
@SimonSapin SimonSapin Embed JPEG-encoded images in PDF. Fix #6
If an image is in JPEG format, embed it as-is in the PDF output.
This often results in smaller PDF file size.

(The image is still decoded however,
so there is no rendering speed improvement.)
f243dbc
@SimonSapin
Owner

@mbarkhau, @liujuncn, @seiflotfy: using WeasyPrint’s git version with the py2cairo patch should Just Work (no change in client code.) Please test :)

Ideally, run the test suite (see docs on the website.) Otherwise test with b'/Filter /DCTDecode' in HTML(…).write_pdf()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.