Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with printing weasyprint generated PDF #565

Closed
edkirin opened this issue Jan 25, 2018 · 10 comments
Closed

Problem with printing weasyprint generated PDF #565

edkirin opened this issue Jan 25, 2018 · 10 comments
Labels
bug
Milestone

Comments

@edkirin
Copy link

@edkirin edkirin commented Jan 25, 2018

I need to print PDF file generated with weasyprint using CUPS PDF printer on linux machine. I know it's redundant, but that's the requirement.

The problem is that printing pdf file generated with weasyprint results in crippled pdf file.

Here are the files.
generated_by_weasyprint.pdf
printed_with_cups.pdf

I'm printing file with:
$ lp -d PDF generated_by_weasyprint.pdf

eden@sunce:/tmp> pdfinfo generated_by_weasyprint.pdf 
Title:          {{ title }}
Keywords:       
Author:         
Creator:        cairo 1.14.6 (http://cairographics.org)
Producer:       WeasyPrint 0.42 (http://weasyprint.org/)
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      no
Page size:      595 x 841 pts
Page rot:       0
File size:      97265 bytes
Optimized:      no
PDF version:    1.3

The same problem appears when printing the same file from python using cups lib. I guess weasyprint generates invalid pdf file somehow. I didn't had this problem with weasyprint v0.40 and earlier.

Environment:

  • Linux Mint 18.2
  • python 3.5.2
  • weasyprint 0.42
  • cups-pdf 2.6.1
@edkirin
Copy link
Author

@edkirin edkirin commented Jan 25, 2018

Here are source files for html to reproduce the problem. Please note that file:///tmp/xx/src/ in html should be replaced by file path where those resource files are unpacked.

src.zip

@liZe liZe added this to the 43 milestone Jan 26, 2018
@liZe
Copy link
Member

@liZe liZe commented Jan 26, 2018

I thought that it was a duplicate of #550, but convert doesn't find any problem in your generated file. There must be something wrong with the files generated by pdfrw.

It's really frustrating to get these errors only with some implementations. At least ImageMagick (and thus GhostScript), Evince (and thus Poppler) and Google Chrome can read this PDF file perfectly. Is the problem in Cairo, WeasyPrint, pdfrw, or Cups? 😢

@jonlesser
Copy link

@jonlesser jonlesser commented Mar 24, 2018

I believe I am experiencing a similar issue. I am generating a PDF which I send to a label printer with the lp command. Some PDFs generated with WeasyPrint v41+ consistently print with missing parts.

I use Jinja2 to fill an HTML template string which I then pass to WeasyPrint to generate a PDF. When I inspect the HTML I always see the content as I expect. When I view the generated PDFs in a Chrome tab or in Preview, they always appear to have all of the content I expect.

There are not very many elements to my template: ID number, timestamp, item description, and a barcode that encodes the ID number. I can tweak these values in ways that will result in a fully printed label, but I don't understand why certain values result in fully printed labels and some result in partial labels. When it's partial, it consistently just prints the logo (svg) and a horizontal rule.

For example, a label with ID number 90809 results in a fully printed ticket, but simply changing the ID number to 92101 results in a partially printed label. I can likewise get some partial tickets by just tweaking the timestamp or the item description.

I initially observed this with 0.42.1. After reading this issue, I tried downgrading to previous versions. Versions 0.41 also has this problem. Version 0.40 and 0.39 do not have this problem. I cycled through version with a simple "sudo pip install WeasyPrint==0.XX".

The attached zip file contains a PDF generated with 0.40 and one generated with 0.41 along with a photo of the resulting labels. I also attached the output of pip freeze and dpkg -l.

v41regression.zip
pip_freeze.txt
dpkg_versions.txt

Python snippet where I'm generating the PDF:

  # Fill HTML template.
  data = json.loads(message.data)
  template_str = template.render(data=data)

  # Convert HTML to a PDF, save it to disk, and send it to the printer.
  html = weasyprint.HTML(string=template_str, url_fetcher=barcode_fetcher)
  with tempfile.NamedTemporaryFile(mode='w') as tmp_file:
    tmp_file.file.write(html.write_pdf())
    tmp_file.file.flush()
    # If we can't print, we won't ack the message.
    try:
      output = subprocess.check_output(
          ['/usr/bin/lp', '-o', 'outputorder=reverse', tmp_file.name],
          stderr=subprocess.STDOUT)
      logging.info('lp output: %s', output)
    except subprocess.CalledProcessError as e:
      logging.error('lp complained. `%s` returned code %d. Output: %s',
                    e.cmd, e.returncode, e.output)`
@liZe
Copy link
Member

@liZe liZe commented Mar 24, 2018

I wonder if #596 is not a duplicate of this bug.

@edkirin @jonlesser Which version of Cairo do you use? If you have 1.14.x, could you please try 1.15.x?

@jonlesser
Copy link

@jonlesser jonlesser commented Mar 24, 2018

I currently have the libcairo2 1.14.6-1 package installed on my Ubuntu 16.04 system. That's the latest for 16.04. There is a 1.15.10-2 package for Ubuntu 18.04, but upgrading is not an option for me right now.

@liZe
Copy link
Member

@liZe liZe commented Mar 24, 2018

I have no problem reading these files with various PDF viewers or converters. As the text is missing, I'm pretty sure now that this issue and #596, #550 and #523 are the same. They all appear with PDF files generated by Cairo 1.14 with a version of WeasyPrint based on pdfrw.

Based on how pdfrw works and on the different results I get using Python 2 and Python 3, I think that the problem is caused by the way pdfrw "shuffles" data. PDF files generated by Cairo 1.14 and modified by pdfrw are sometimes "wrong". PDF generation changed a lot in 1.15.4 and is now (hopefully) fixed.

It's hard to know where the "real" bug is. The v41 PDF provided by @jonlesser works with a lot of various implementations and is OK according to some validators I've found online. If anyone knows a PDF guru, we'd be happy to know what's wrong in these documents, or at least what's different between your documents with different IDs.

liZe added a commit that referenced this issue Apr 13, 2018
Related to #339, #565, #616.
@liZe
Copy link
Member

@liZe liZe commented Apr 21, 2018

@jonlesser I've found that a bug has been fixed in Ghostscript 9.21, and Ubuntu 16.04 provides only 9.18. Do you have the possibility to update the Ghostscript?

@liZe
Copy link
Member

@liZe liZe commented Apr 21, 2018

@edkirin What's your version of Ghostscript?

@edkirin
Copy link
Author

@edkirin edkirin commented Apr 21, 2018

@liZe Ghostscript 9.18, the default one which comes with Linux Mint 18.2.

@liZe
Copy link
Member

@liZe liZe commented Apr 21, 2018

Ghostscript 9.18, the default one which comes with Linux Mint 18.2.

Then I'd be really interested to know if you get the same problem wit Ghostscript 9.21+ (even on another computer if you can't upgrade Ghostscript on this one).

liZe added a commit that referenced this issue Aug 6, 2018
pdfrw is a great piece of software, but we don't know PDF enough to debug the
problems we've met. It's safer to use the new cairo API and get back to manual
edition for attachments and bleed boxes.

We only have two regressions for now:
- some internal links are broken,
- PDF producer is not overwritten.

A mail has been sent to cairo's mailing-list about that:
https://lists.cairographics.org/archives/cairo/2018-August/028694.html

Fix #639, #615, fix #596, fix #565.
@liZe liZe closed this in #665 Aug 17, 2018
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Nov 14, 2018
Version 43
----------

Released on 2018-11-09.

Bug fixes:

* `#726 <https://github.com/Kozea/WeasyPrint/issues/726>`_:
  Make empty strings clear previous values of named strings
* `#729 <https://github.com/Kozea/WeasyPrint/issues/729>`_:
  Include tools in packaging

This version also includes the changes from unstable rc1 and rc2 versions
listed below.

Version 43rc2
-------------

Released on 2018-11-02.

**This version is experimental, don't use it in production. If you find bugs,
please report them!**

Bug fixes:

* `#706 <https://github.com/Kozea/WeasyPrint/issues/706>`_:
  Fix text-indent at the beginning of a page
* `#687 <https://github.com/Kozea/WeasyPrint/issues/687>`_:
  Allow query strings in file:// URIs
* `#720 <https://github.com/Kozea/WeasyPrint/issues/720>`_:
  Optimize minimum size calculation of long inline elements
* `#717 <https://github.com/Kozea/WeasyPrint/issues/717>`_:
  Display <details> tags as blocks
* `#691 <https://github.com/Kozea/WeasyPrint/issues/691>`_:
  Don't recalculate max content widths when distributing extra space for tables
* `#722 <https://github.com/Kozea/WeasyPrint/issues/722>`_:
  Fix bookmarks and strings set on images
* `#723 <https://github.com/Kozea/WeasyPrint/issues/723>`_:
  Warn users when string() is not used in page margin


Version 43rc1
-------------

Released on 2018-10-15.

**This version is experimental, don't use it in production. If you find bugs,
please report them!**

Dependencies:

* Python 3.4+ is now needed, Python 2.x is not supported anymore
* Cairo 1.15.4+ is now needed, but 1.10+ should work with missing features
  (such as links, outlines and metadata)
* Pdfrw is not needed anymore

New features:

* `Beautiful website <https://weasyprint.org>`_
* `#579 <https://github.com/Kozea/WeasyPrint/issues/579>`_:
  Initial support of flexbox
* `#592 <https://github.com/Kozea/WeasyPrint/pull/592>`_:
  Support @font-face on Windows
* `#306 <https://github.com/Kozea/WeasyPrint/issues/306>`_:
  Add a timeout parameter to the URL fetcher functions
* `#594 <https://github.com/Kozea/WeasyPrint/pull/594>`_:
  Split tests using modern pytest features
* `#599 <https://github.com/Kozea/WeasyPrint/pull/599>`_:
  Make tests pass on Windows
* `#604 <https://github.com/Kozea/WeasyPrint/pull/604>`_:
  Handle target counters and target texts
* `#631 <https://github.com/Kozea/WeasyPrint/pull/631>`_:
  Enable counter-increment and counter-reset in page context
* `#622 <https://github.com/Kozea/WeasyPrint/issues/622>`_:
  Allow pathlib.Path objects for HTML, CSS and Attachment classes
* `#674 <https://github.com/Kozea/WeasyPrint/issues/674>`_:
  Add extensive installation instructions for Windows

Bug fixes:

* `#558 <https://github.com/Kozea/WeasyPrint/issues/558>`_:
  Fix attachments
* `#565 <https://github.com/Kozea/WeasyPrint/issues/565>`_,
  `#596 <https://github.com/Kozea/WeasyPrint/issues/596>`_,
  `#539 <https://github.com/Kozea/WeasyPrint/issues/539>`_:
  Fix many PDF rendering, printing and compatibility problems
* `#614 <https://github.com/Kozea/WeasyPrint/issues/614>`_:
  Avoid crashes and endless loops caused by a Pango bug
* `#662 <https://github.com/Kozea/WeasyPrint/pull/662>`_:
  Fix warnings and errors when generating documentation
* `#666 <https://github.com/Kozea/WeasyPrint/issues/666>`_,
  `#685 <https://github.com/Kozea/WeasyPrint/issues/685>`_:
  Fix many table layout rendering problems
* `#680 <https://github.com/Kozea/WeasyPrint/pull/680>`_:
  Don't crash when there's no font available
* `#662 <https://github.com/Kozea/WeasyPrint/pull/662>`_:
  Fix support of some align values in tables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.