Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop and memory leak with NBSP after a long word #614

Closed
greycat70 opened this issue Apr 11, 2018 · 12 comments
Closed

Infinite loop and memory leak with NBSP after a long word #614

greycat70 opened this issue Apr 11, 2018 · 12 comments
Labels
bug Existing features not working as expected crash Problems preventing documents from being rendered
Milestone

Comments

@greycat70
Copy link

greycat70 commented Apr 11, 2018

The following example causes weasyprint 0.42 (on Debian 9 amd64) to go into an infinite (or practically infinite) loop and consume all possible memory. I've worked around it in the application that generates the HTML, partially, but this is really quite nasty and my band-aid won't cover all possible user inputs. I've also had to add resource limits on the script that runs weasyprint.

This is a line.
This is another line.
----------------------------------------------------------------------------------------------------------------------------------------  
No more lines.

Ugh, the bug tracking system is rendering the HTML. I don't want it to render the HTML. Here, here's the HTML in a text file that you can download: http://wooledge.org/~greg/weasybug614

@liZe
Copy link
Member

liZe commented Apr 18, 2018

Thank you for this bug report.

I've tried to reproduce the bug, but everything works fine for me, even with older versions of WeasyPrint. Could you please:

  • tell me what is your default font (you can find it by rendering another document with no CSS and check what font is included in your PDF file, or just attach it here if you don't know how to do this),
  • try if possible the current master version of WeasyPrint (warning: it's Python 3 only) and tell me if the bug is gone.

@liZe liZe added crash Problems preventing documents from being rendered bug Existing features not working as expected labels Apr 18, 2018
@liZe liZe added this to the 43 milestone Apr 18, 2018
@greycat70
Copy link
Author

Here's the PDF result from http://wooledge.org/~greg/weasy614.html

weasy614.pdf

@greycat70
Copy link
Author

Following the "git clone" instructions with a python3 virtualenv, the problem still occurs. I get this warning: "UserWarning: There are known rendering problems with cairo < 1.15.4. WeasyPrint may work with older versions, but please read the note about the needed cairo version on the "Install" page of the documentation before reporting bugs."

Debian 9 has libcairo 1.14.8

@liZe
Copy link
Member

liZe commented Apr 18, 2018

Here's the PDF result from http://wooledge.org/~greg/weasy614.html

I have the same default font as you (DejaVu Serif) but I can't reproduce this bug, and in this situation it's hard to help you…

Debian 9 has libcairo 1.14.8

Your issue is probably not related to Cairo.

Do you have the problem when launching weasyprint --verbose http://wooledge.org/~greg/weasybug614 test.pdf?

@greycat70
Copy link
Author

greycat70 commented Apr 18, 2018

(env) wooledg:~$ weasyprint --verbose http://wooledge.org/~greg/weasybug614 test.pdf
/home/wooledg/weasy/WeasyPrint/weasyprint/document.py:32: UserWarning: There are known rendering problems with cairo < 1.15.4. WeasyPrint may work with older versions, but please read the note about the needed cairo version on the "Install" page of the documentation before reporting bugs.
  'There are known rendering problems with cairo < 1.15.4. WeasyPrint '
INFO: Step 1 - Fetching and parsing HTML - http://wooledge.org/~greg/weasybug614
INFO: Step 3 - Applying CSS
INFO: Step 4 - Creating formatting structure
INFO: Step 5 - Creating layout - Page 1
[…]
INFO: Step 5 - Creating layout - Page 111
^CTraceback (most recent call last):
  File "/home/wooledg/weasy/WeasyPrint/env/bin/weasyprint", line 11, in <module>
    load_entry_point('WeasyPrint', 'console_scripts', 'weasyprint')()
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/__main__.py", line 173, in main
    getattr(html, 'write_' + format_)(output, **kwargs)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/__init__.py", line 185, in write_pdf
    font_config=font_config).write_pdf(
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/__init__.py", line 146, in render
    font_config)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/document.py", line 329, in _render
    [Page(p, enable_hinting) for p in page_boxes],
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/document.py", line 329, in <listcomp>
    [Page(p, enable_hinting) for p in page_boxes],
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/__init__.py", line 53, in layout_document
    context, root_box, html, cascaded_styles, computed_styles))
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/pages.py", line 600, in make_all_pages
    context, root_box, page_type, resume_at, page_number)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/pages.py", line 519, in make_page
    positioned_boxes, positioned_boxes, adjoining_margins)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/blocks.py", line 60, in block_level_layout
    adjoining_margins)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/blocks.py", line 111, in block_box_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/blocks.py", line 646, in block_container_layout
    adjoining_margins)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/blocks.py", line 60, in block_level_layout
    adjoining_margins)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/blocks.py", line 111, in block_box_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/blocks.py", line 518, in block_container_layout
    for line, resume_at in lines_iterator:
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/inlines.py", line 50, in iter_line_boxes
    device_size, absolute_boxes, fixed_boxes, first_letter_style)
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/inlines.py", line 107, in get_next_linebox
    waiting_floats, line_children=[])
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/layout/inlines.py", line 771, in split_inline_box
    last_letter + first, child.style['lang'])
  File "/home/wooledg/weasy/WeasyPrint/weasyprint/text.py", line 1188, in can_break_text
    return any(attr.is_line_break for attr in log_attrs[1:length - 1])
KeyboardInterrupt

@liZe
Copy link
Member

liZe commented Apr 18, 2018

I've tried WeasyPrint versions between 0.42 and current master branch, I've tried with Cairo 1.14.x (just in case), I've tried with the same font as you, I've tried to change the number of hyphens, I didn't reproduce your bug.

The only solution I can think of now is to find an old version of WeasyPrint that works for you and to use git bisect to find which commit caused this problem.

@liZe
Copy link
Member

liZe commented Apr 18, 2018

May be related to #585.

@greycat70
Copy link
Author

greycat70 commented Apr 19, 2018

WeasyPrint version 0.40 ("git checkout v0.40") works.
WeasyPrint version 0.42 fails.
WeasyPrint version 0.41 works.

Starting from there, git bisect leads me to:
883cbac is the first bad commit
commit 883cbac
Author: Guillaume Ayoub
Date: Mon Dec 11 17:22:54 2017 +0100

Fix more line-breaking use cases

@Tontyna
Copy link
Contributor

Tontyna commented Apr 19, 2018

Can reproduce the infinite loop. On Windows 7 (64bit), having Pango 1.40.11. With this reduced html:

<html><body>
<br>--------------------------------------------------------------------------------------------------------------------&nbsp;
<br>
</body></html>

The &nbsp; and the two <br> are required to force the bug. The long line must be wider than the page margins and it must follow the first <br> immediately. If it starts on a new line #585 is generated.

When I force the otherwise never-ending make_all_pages to return (e.g. at page_number==13) a PDF filled with pages like that is produced:
endlesslines

Tried to debug but must confess: The way WeasyPrint layouts the pages is a mystery to me. Anyway: It looks like Pango in can_break_text reports that the long line can be broken, but the PangoLayout, used in split_first_line fails to split the line and WeasyPrint gets stuck forever which this non-breakable breakable line.

Definitely resume_at in make_page never reaches None, it stays forever at [0, [0, [1, None]]], whatever that means.

@liZe
Copy link
Member

liZe commented Apr 21, 2018

Can reproduce the infinite loop. On Windows 7 (64bit), having Pango 1.40.11

The bug is in Pango and have been fixed in 1.40.13 (see bug 788115 and bug 785978).

It's becoming harder and harder to have recent versions of WeasyPrint working with older versions of Cairo, Pango, GhostScript, etc. Changes introduced in 0.41 and 0.42 rely on a lot of PDF- and Unicode-related features, and having a bleeding-edge system doesn't help me to find corner cases like this one.

I'm seriously thinking about upgrading the needed library versions on the install page (I already did with Cairo), and adding warnings at runtime and in the documentation if the needed library versions are not installed. WeasyPrint 0.40 seems to be a good candidate for users with older versions. If I had time, I would even backport some fixes of the 0.42.x branch on 0.40 and make the next 0.x versions from this hybrid version (basically with Python 2 support, no pdfrw, no #528).

@Tontyna
Copy link
Contributor

Tontyna commented Apr 21, 2018

Not to mention how hard it is for Windows users like me to keep their GTK up-to-date.

Looking at the Pango bugs and the Pango source, I'm not shure whether even with an up-to-date Pango there won't be similiar issues -- in this case: discrepancies between pango.pango_get_log_attrs(), reporting that a string can be wrapped and PangoLayout not wrapping it at all.

More debugging and experimenting revealed the problem and an attempt to solve it.

Problem

After asking Pango whether a text is breakable -- weasyprint.text.can_break_text() -- WeasyPrint relies on that information and assumes that the text actually will be wrapped when layouted in weasyprint.text.split_first_line().
But that's not always the case. E.g. in Pango 1.40.11 not for

  • xxxöxxx
  • xxx&nbsp;xxx
  • ---&nbsp;---

The last letter in those strings is attributed as is_line_break by pango.pango_get_log_attrs(). Dont know why the ö or the &nbsp; (and probably a lot more 'special chars') have that effect (in Pango < 1.40.13), but they have and as one can see in the Pango source code it obviously tries to fulfill an internationalization-Unicode-whatnot-standard, which looks rather complicated, so bugs are to be expected.
When those strings are layouted, PangoLayout does not wrap the last letter onto a new line -- which seems ok to me, but gives the impression that PangoLayout doesnt coordinate its job with PangoLogAttr 😁

This inconsistency is the cause of #585 and #614.

Idea to work around the Pango bug

If split_inline_level() returns a child_resume_at of None we know that PangoLayout decided not to wrap the text.
Not shure how to handle that case properly (as I said: I don't really understand WeasyPrint*s layout process), but the following patch, prevents crash and infinite loop.

diff --git a/weasyprint/layout/inlines.py b/weasyprint/layout/inlines.py
index 0270de50..57d0d126 100644
--- a/weasyprint/layout/inlines.py
+++ b/weasyprint/layout/inlines.py
@@ -809,8 +809,12 @@ def split_inline_box(context, box, position_x, max_x, skip_stack,
                         if (child.is_in_normal_flow() and
                                 can_break_inside(child)):
                             # This waiting child is in flow and can be broken,
-                            # let's break it!
-                            break_found = True
+                            # let's TRY TO break it!
+
+                            # cant be shure about that!
+                            # Thats what PangoLogAttr told us, have to wait for
+                            # PangoLayout
+                            # break_found = True

                             # We break the waiting child at its last possible
                             # breaking point.
@@ -826,6 +830,13 @@ def split_inline_box(context, box, position_x, max_x, skip_stack,
                                     absolute_boxes, fixed_boxes,
                                     line_placeholders, waiting_floats,
                                     line_children))
+                            # prevent #585 and #614
+                            # TODO: Expert's review required!
+                            break_found = not child_resume_at is None
+                            if child_resume_at is None:
+                                # PangoLayout decided NOT to break the child
+                                child_resume_at = (0, None)
+
                             children = children + waiting_children_copy
                             if child_new_child is None:
                                 # May be None where we have an empty TextBox.

As you can see, there are definitely issues with the <br> boxes -- expert required!

patched

@liZe liZe closed this as completed in c9d1a59 Apr 30, 2018
@liZe
Copy link
Member

liZe commented Apr 30, 2018

Idea to work around the Pango bug

According to what's in the Pango bugs, your workaround seems to be a perfect solution!

As you can see, there are definitely issues with the
boxes -- expert required!

I think you hit the only xfail test of WeasyPrint 😉. git blame following renamed files tells us that it was already known in 2012 (search xfail in d6b78b3).

liZe added a commit that referenced this issue Oct 21, 2018
Solution given by 🍰 @Tontyna 🍰

Fix #614, fix #585.
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Nov 14, 2018
Version 43
----------

Released on 2018-11-09.

Bug fixes:

* `#726 <https://github.com/Kozea/WeasyPrint/issues/726>`_:
  Make empty strings clear previous values of named strings
* `#729 <https://github.com/Kozea/WeasyPrint/issues/729>`_:
  Include tools in packaging

This version also includes the changes from unstable rc1 and rc2 versions
listed below.

Version 43rc2
-------------

Released on 2018-11-02.

**This version is experimental, don't use it in production. If you find bugs,
please report them!**

Bug fixes:

* `#706 <https://github.com/Kozea/WeasyPrint/issues/706>`_:
  Fix text-indent at the beginning of a page
* `#687 <https://github.com/Kozea/WeasyPrint/issues/687>`_:
  Allow query strings in file:// URIs
* `#720 <https://github.com/Kozea/WeasyPrint/issues/720>`_:
  Optimize minimum size calculation of long inline elements
* `#717 <https://github.com/Kozea/WeasyPrint/issues/717>`_:
  Display <details> tags as blocks
* `#691 <https://github.com/Kozea/WeasyPrint/issues/691>`_:
  Don't recalculate max content widths when distributing extra space for tables
* `#722 <https://github.com/Kozea/WeasyPrint/issues/722>`_:
  Fix bookmarks and strings set on images
* `#723 <https://github.com/Kozea/WeasyPrint/issues/723>`_:
  Warn users when string() is not used in page margin


Version 43rc1
-------------

Released on 2018-10-15.

**This version is experimental, don't use it in production. If you find bugs,
please report them!**

Dependencies:

* Python 3.4+ is now needed, Python 2.x is not supported anymore
* Cairo 1.15.4+ is now needed, but 1.10+ should work with missing features
  (such as links, outlines and metadata)
* Pdfrw is not needed anymore

New features:

* `Beautiful website <https://weasyprint.org>`_
* `#579 <https://github.com/Kozea/WeasyPrint/issues/579>`_:
  Initial support of flexbox
* `#592 <https://github.com/Kozea/WeasyPrint/pull/592>`_:
  Support @font-face on Windows
* `#306 <https://github.com/Kozea/WeasyPrint/issues/306>`_:
  Add a timeout parameter to the URL fetcher functions
* `#594 <https://github.com/Kozea/WeasyPrint/pull/594>`_:
  Split tests using modern pytest features
* `#599 <https://github.com/Kozea/WeasyPrint/pull/599>`_:
  Make tests pass on Windows
* `#604 <https://github.com/Kozea/WeasyPrint/pull/604>`_:
  Handle target counters and target texts
* `#631 <https://github.com/Kozea/WeasyPrint/pull/631>`_:
  Enable counter-increment and counter-reset in page context
* `#622 <https://github.com/Kozea/WeasyPrint/issues/622>`_:
  Allow pathlib.Path objects for HTML, CSS and Attachment classes
* `#674 <https://github.com/Kozea/WeasyPrint/issues/674>`_:
  Add extensive installation instructions for Windows

Bug fixes:

* `#558 <https://github.com/Kozea/WeasyPrint/issues/558>`_:
  Fix attachments
* `#565 <https://github.com/Kozea/WeasyPrint/issues/565>`_,
  `#596 <https://github.com/Kozea/WeasyPrint/issues/596>`_,
  `#539 <https://github.com/Kozea/WeasyPrint/issues/539>`_:
  Fix many PDF rendering, printing and compatibility problems
* `#614 <https://github.com/Kozea/WeasyPrint/issues/614>`_:
  Avoid crashes and endless loops caused by a Pango bug
* `#662 <https://github.com/Kozea/WeasyPrint/pull/662>`_:
  Fix warnings and errors when generating documentation
* `#666 <https://github.com/Kozea/WeasyPrint/issues/666>`_,
  `#685 <https://github.com/Kozea/WeasyPrint/issues/685>`_:
  Fix many table layout rendering problems
* `#680 <https://github.com/Kozea/WeasyPrint/pull/680>`_:
  Don't crash when there's no font available
* `#662 <https://github.com/Kozea/WeasyPrint/pull/662>`_:
  Fix support of some align values in tables
liZe added a commit that referenced this issue Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Existing features not working as expected crash Problems preventing documents from being rendered
Projects
None yet
Development

No branches or pull requests

3 participants