Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnboundLocalError: local variable 'lower_guess' referenced before assignment #770

Closed
andreas-gruenbacher opened this Issue Jan 16, 2019 · 38 comments

Comments

3 participants
@andreas-gruenbacher
Copy link

andreas-gruenbacher commented Jan 16, 2019

I'm having issue #409 with weasyprint 44. That issue is closed and I don't see how I could reopen it. I've described what's happening in that issue for now.

@liZe liZe added crash bug labels Jan 16, 2019

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 18, 2019

Unfortunately, I can't reproduce your error, I get a beautiful PDF. Do you have the problem when setting other fonts in your CSS file?

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 18, 2019

That's interesting. The crash only triggers with white-space:nowrap and with either font-family:"Trebuchet MS" or font-family:Arial for me. The Trebuchet MS and Arial fonts are from Microsoft's old TrueType core fonts for the Web from a long time ago. Microsoft doesn't offer these for download anymore, but the Wayback Machine links still work. Extracted the exe files with cabextract.

It might be that the fonts themselves are broken, but even if so, that still shouldn't cause WeasyPrint to crash.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 19, 2019

Even with the fonts I can't reproduce. I've set font-family to Trebuchet Ms or Arial, but everything works as expected, and the fonts embedded in the PDF files are the Microsoft fonts.

Is it possible for you to edit the /usr/local/lib/python3.7/site-packages/weasyprint/layout/tables.py file and add the line print(guess, assignable_width) on line 588 (before if upper_guess == lower_guess:, with the same number of spaces before)? Relaunching WeasyPrint should give you numbers that will help me to understand what's wrong.

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 19, 2019

Sure, here's what I'm getting immediately before the traceback:
[20.73333333333333, 21.9947265625, 141.26666666666665, 31.93333333333333, 114.06666666666666, 143.9333333333333, 20.2, 20.2, 20.2, 20.2, 25.53333333333333, 18.066666666666666, 18.066666666666666, 18.066666666666666, 18.066666666666666, 23.4] 662.5947265625002

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

Ooops, I mean print(guesses, assignable_width).

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 20, 2019

([20.73333333333333, 21.9947265625, 141.26666666666665, 31.93333333333333, 114.06666666666666, 143.9333333333333, 20.2, 20.2, 20.2, 20.2, 25.53333333333333, 18.066666666666666, 18.066666666666666, 18.066666666666666, 18.066666666666666, 23.4], [20.73333333333333, 21.9947265625, 141.26666666666665, 31.93333333333333, 114.06666666666666, 143.9333333333333, 20.2, 20.2, 20.2, 20.2, 25.53333333333333, 18.066666666666666, 18.066666666666666, 18.066666666666666, 18.066666666666666, 23.4], [20.73333333333333, 21.9947265625, 141.26666666666665, 31.93333333333333, 114.06666666666666, 143.9333333333333, 20.2, 20.2, 20.2, 20.2, 25.53333333333333, 18.066666666666666, 18.066666666666666, 18.066666666666666, 18.066666666666666, 23.4], [20.73333333333333, 21.9947265625, 141.26666666666665, 31.93333333333333, 100.73333333333333, 143.9333333333333, 20.2, 20.2, 20.2, 20.2, 25.53333333333333, 18.066666666666666, 18.066666666666666, 18.066666666666666, 18.066666666666666, 23.4]) 662.5947265625002

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

Thank you.

OK, we don't have a floating point rounding error, which is good news. Minimum content width is larger than maximum content width, I have to understand why.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

I'm afraid it's a duplicate of #628. You only get the problem with white-space:nowrap because it makes the string ' A' (space + A) appear in some cells when calculating minimum size of cells, instead of putting the words starting with 'A' at the beginning of a new line.

There IS something wrong with some MS fonts and Pango.

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 20, 2019

I can reproduce #628 with font-family: "Trebuchet MS" as well, but that doesn't crash WeasyPrint. So why does WeasyPrint crash in this case?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Jan 20, 2019

Minimum content width is larger than maximum content width, I have to understand why.

Who calculates those widthes? Is this another outcome of a dissent between PangoLayout and PangoLogAttr like in #614?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Jan 20, 2019

BTW: Cannot reproduce the crash, either.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

First of all, I'm not sure yet it's a duplicate of #628.

So why does WeasyPrint crash in this case?

Because this use case makes the table layout crash, but #628 doesn't involve the table layout.

Who calculates those widthes?

I have to check that.

I'd love to understand what's wrong. I can't reproduce #628 on my Gentoo Linux computer but can on my Ubuntu one. I'm not able to reproduce #770 anywhere.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

@andreas-gruenbacher could you please tell what your Linux distribution is (with the version)?

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 20, 2019

Fedora 29. WeasyPrint has been installed with pip3 install WeasyPrint.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Jan 20, 2019

I'd love to understand what's wrong.

Looks like it's something special in the 5th column, but I don't see nothing special in the 5th column

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

Packages in Fedora 29 seem to be up to date, just as on my Gentoo computer. Why can't I reproduce this problem?

Looks like it's something special in the 5th column, but I don't see nothing special in the 5th column

Neither do I.

@andreas-gruenbacher One (last 😇) question: could you provide the whole traceback with the prints, so that we can know which table has the problem?

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 20, 2019

Sure, I've already posted that in #409.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

Sure, I've already posted that in #409.

With the print(…) line added in tables.py, it would be perfect 😉.

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 20, 2019

Can anyone explain what the code below # We have to work around floating point rounding errors here in auto_table_layout is trying to achieve? This looks rather misguided.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

Can anyone explain what the code below # We have to work around floating point rounding errors here in auto_table_layout is trying to achieve

We have a tuple like ([1, 2], [3, 4], [5, 6], [7, 8]), corresponding to possible column widths found according to various variables. We also have an "assignatable width", let's say 8. We want to guess the list whose sum is the greatest but lower than 8 (it's [3, 4]), and the list whose sum is the lowest but greater than 8 (it's [5, 6]).

The code only works when the tuple is sorted (sum([1, 2]) <= sum([3, 4]) <= …) and when assignatable_width is between the first and last lists. It should logically always be the case: if it's not, there's a bug somewhere.

We could of course silently ignore this bug (for example, setting an original value for lower_guess and upper_guess), but that wouldn't solve the root of our problem and only make further debugging even harder.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

YEAH. I can reproduce the bug with Trebuchet.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

Minimal HTML sample:

<table>
  <th style="font-family: Trebuchet MS; white-space: nowrap">
    Fahrzeug
  </th>
</table>

The famous mystery of the 5th column 😨

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

OK, I understand what's going on. The problem appears when two letters have a special kerning value ("Fa" here, but also crashes with famous couples like "LV", "FA", etc).

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

AND OF COURSE " A".

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Jan 20, 2019

...and the solution is: DONT USE UPPERCASE, especially when there is a 5th column

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 20, 2019

We could of course silently ignore this bug

It seems like you're close to finding out what's causing the bug and so it can hopefully be fixed. Maybe the algorithm could be made a bit more robust to perhaps only warn when this kind of problem occurs in the future?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Jan 20, 2019

Good luck @liZe! Me myself get brain damages when reading the layout source code 😏

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

It's difficult to find where the bug comes from, but I'm now 99% sure it's not a bug in WeasyPrint. The same layout created twice with the same parameters doesn't give the same result. Setting this code twice in text.py (line 987) solves the problem:

        layout = create_layout(
            text, style, context, max_width, justification_spacing)
        first_line, index = layout.get_first_line()

It's probably a bug in Cairo / Pango / FontConfig / Freetype / etc. @andreas-gruenbacher Is it possible for you to give the version of these libraries on your system?

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

Maybe the algorithm could be made a bit more robust to perhaps only warn when this kind of problem occurs in the future?

Yes, if there's a bug in an external library, we'll have to add a workaround.

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 20, 2019

@liZe the versions of which packages do you want to know exactly? Isn't it good enough for now that you can reproduce?

I can reproduce your text.py hack from above. It also seems a bit weird that split_first_line is called six times in the minimal HTML example: with text = 'Fahrzeug ', ' Fahrzeug ', ' Fahrzeug', and then again 'Fahrzeug ', ' Fahrzeug ', ' Fahrzeug'. If I remove all the whitespace around 'Fahrzeug' in the minimal HTML example, that reduces to two calls with text = 'Fahrzeug'.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Jan 20, 2019

The same layout created twice with the same parameters doesn't give the same result

... first result being erroneous ... minimum column widths being calculated first ...

I vote for a workaround not along the lines of if pango-or-whatever-library-version == ???: but rather sth. like if sum(min-values) > sum(max-values): or an original value for lower_guess / upper_guess

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 20, 2019

the versions of which packages do you want to know exactly? Isn't it good enough for now that you can reproduce?

Cairo, Pango, FontConfig and Freetype. I'd like to compare with the ones on my systems to find where the bug comes from.

It also seems a bit weird that split_first_line is called six times in the minimal HTML example […]. If I remove all the whitespace around 'Fahrzeug' in the minimal HTML example, that reduces to two calls

You can't imagine the incredible amount of work needed to find the size of cells in a table (a hint). You at least need the min-content width and the max-content width of the cell, that's why you need at least two calls.

Handling white spaces also requires some work, that's why you get extra calls when there are spaces and empty lines. Of course, there are lots of "useless" function calls in WeasyPrint, but it's often hard to remove them 😄.

I vote for a workaround not along the lines of if pango-or-whatever-library-version == ???: but rather sth. like if sum(min-values) > sum(max-values): or an original value for lower_guess / upper_guess

Of course.

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 21, 2019

cairo-1.16.0-3.fc29.x86_64
cairo-gobject-1.16.0-3.fc29.x86_64
cairomm-1.12.0-9.fc29.x86_64
fontconfig-2.13.1-3.fc29.x86_64
freetype-2.9.1-6.fc29.x86_64
pango-1.42.4-1.fc29.x86_64
pangomm-2.40.1-6.fc29.x86_64
python3-cairo-1.17.1-2.fc29.x86_64
python3-cairocffi-0.7.2-15.fc29.noarch
python3-cairosvg-1.0.20-9.fc29.noarch

/usr/local/lib/python3.7/site-packages/ contains the following versioned directories:
cairocffi-0.9.0-py3.7.egg-info
cssselect2-0.2.1.dist-info
tinycss2-0.6.1.dist-info
WeasyPrint-44.dist-info

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 21, 2019

@andreas-gruenbacher thanks a lot.

I have exactly the same versions on my Gentoo box that works. Options left are font configuration and different compilation flags…

@liZe liZe closed this in 0083fce Jan 21, 2019

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 21, 2019

It should be fixed now, but I really need to know the real reason for this (and fix #628).

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 21, 2019

Almost, but the assertion below if upper_guess == lower_guess in tables.py triggers now. Once removed, WeasyPrint no longer crashes.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 21, 2019

Almost, but the assertion below if upper_guess == lower_guess in tables.py triggers now. Once removed, WeasyPrint no longer crashes.

Fixed in 89041e8, thanks.

@andreas-gruenbacher

This comment has been minimized.

Copy link
Author

andreas-gruenbacher commented Jan 21, 2019

This is working for now, thanks.

Tontyna pushed a commit to Tontyna/WeasyPrint that referenced this issue Jan 21, 2019

@liZe liZe added this to the 45 milestone Feb 20, 2019

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Feb 21, 2019

py-weasyprint: Update to 45.
Version 45
----------

Released on 2019-02-20.

WeasyPrint now has a `code of conduct
<https://github.com/Kozea/WeasyPrint/blob/master/CODE_OF_CONDUCT.rst>`_.

A new website has been launched, with beautiful and useful graphs about speed
and memory use across versions: check `WeasyPerf
<https://kozea.github.io/WeasyPerf/index.html>`_.

Dependencies:

* Python 3.5+ is now needed, Python 3.4 is not supported anymore

Bug fixes:

* `798 <https://github.com/Kozea/WeasyPrint/pull/798>`_:
  Prevent endless loop and index out of range in pagination
* `767 <https://github.com/Kozea/WeasyPrint/issues/767>`_:
  Add a ``--quiet`` CLI parameter
* `784 <https://github.com/Kozea/WeasyPrint/pull/784>`_:
  Fix library loading on Alpine
* `791 <https://github.com/Kozea/WeasyPrint/pull/791>`_:
  Use path2url in tests for Windows
* `789 <https://github.com/Kozea/WeasyPrint/pull/789>`_:
  Add LICENSE file to distributed sources
* `788 <https://github.com/Kozea/WeasyPrint/pull/788>`_:
  Fix pending references
* `780 <https://github.com/Kozea/WeasyPrint/issues/780>`_:
  Don't draw patterns for empty page backgrounds
* `774 <https://github.com/Kozea/WeasyPrint/issues/774>`_:
  Don't crash when links include quotes
* `637 <https://github.com/Kozea/WeasyPrint/issues/637>`_:
  Fix a problem with justified text
* `763 <https://github.com/Kozea/WeasyPrint/pull/763>`_:
  Launch tests with Python 3.7
* `704 <https://github.com/Kozea/WeasyPrint/issues/704>`_:
  Fix a corner case with tables
* `804 <https://github.com/Kozea/WeasyPrint/pull/804>`_:
  Don't logger handlers defined before importing WeasyPrint
* `109 <https://github.com/Kozea/WeasyPrint/issues/109>`_,
  `748 <https://github.com/Kozea/WeasyPrint/issues/748>`_:
  Don't include punctuation for hyphenation
* `770 <https://github.com/Kozea/WeasyPrint/issues/770>`_:
  Don't crash when people use uppercase words from old-fashioned Microsoft
  fonts in tables, especially when there's an 5th column
* Use a `separate logger
  <https://weasyprint.readthedocs.io/en/latest/tutorial.htmllogging>`_ to
  report the rendering process
* Add a ``--debug`` CLI parameter and set debug level for unknown prefixed CSS
  properties
* Define minimal versions of Python and setuptools in setup.cfg

Documentation

* `796 <https://github.com/Kozea/WeasyPrint/pull/796>`_:
  Fix a small typo in the tutorial
* `792 <https://github.com/Kozea/WeasyPrint/pull/792>`_:
  Document no alignement character support
* `773 <https://github.com/Kozea/WeasyPrint/pull/773>`_:
  Fix phrasing in Hacking section
* `402 <https://github.com/Kozea/WeasyPrint/issues/402>`_:
  Add a paragraph about fontconfig error
* `764 <https://github.com/Kozea/WeasyPrint/pull/764>`_:
  Fix list of dependencies for Alpine
* Fix API documentation of HTML and CSS classes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.