New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
skip_first_whitespace IndexError #783
Comments
Looks like in When exception happens, the given skip_stack ist result = skip_first_whitespace(box.children[index], next_skip_stack) the recursive call with The concerned InlineBox is the BTW: 113 is a prime number 😬 |
Lol, IIRC the real HTML crashed with (107, None) 😆 Edit: Confirmed, it's the prime number bug!
|
Yep, let's call it the prime number bug |
Another problem, probably related: <p>*<span>****************************************** *** **** ** ** ******* *********** ************************************************************************* <b>l</b></span><b>aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</b></p> |
Encircling the issue reveals: The prime number isn't meant to be the index of a child box, but the index of the letter (space) in a TextBox's text where the overlong text should be/has been broken. After splitting all the text snippets from the loooong text into separate LineBoxes, the Subsequently Painstaking preparation is required to trigger this erroneous skip_stack -- particular nesting of InlineBoxes and TextBoxes in combination with the right (wrong?) text content, page width and font size. The TextBox, its text extending close to the right margin, and a following InlineBox ( This fine-tuned snippet <p><span>
********************************* *******
********* *************** ***
********* *************** ***
********************************** ** ******* *** <b>l</b></span>x</p> crashes with prime number 73 😄 @liZe Your snippet is another issue. It just doesn't break where it ought to. The skip_stack is ok, throughout. No leftovers from TextBoxes. |
Found the bug but cannot fix it. Happens in the most ugly most dirty most abominable part of the inline layout code in WeasyPrint/weasyprint/layout/inlines.py Lines 854 to 859 in fc089f5
BTW: The broken_child-detection was introduced to fix #580. Under prime number bug conditions By forcibly setting So obviously the algorithm to detect This minimal snippet crashes with prime number 1: <p><span>
**************************************************************
***********
<b>fits</b></span>xxx</p> And yes, the three whitespaces (here: linebreaks) are vital to the IndexError. Ditto vital: no whitespace in front of the |
Recipe to reproduce the IndexError: <p><span title="span required to trigger the bug">
*breakable*text* *followed*by*withespace*
<b>+fit+</b></span>CrossingTheMarginWithoutBreakRaisesIndexError</p> Though @liZe's snippet doesnt crash with IndexError, it's related -- of course! he was right! <p><span title="span required to trigger the bug">
*non*breaking*text**followed*by*withespace*
<b>+fit+</b></span>CrossingTheMarginWithoutBreakExtendsTheLineUntil next whitespace goes on the next line</p> The crash is prevented because WeasyPrint/weasyprint/layout/inlines.py Line 820 in fc089f5
returns False for the *non*breaking*text**followed*by*withespace* , while for the *breakable*text* *followed*by*withespace* we get True and enter "The dirty solution".
|
BTW: Taking "The dirty solution" path (i.e. calling |
At least it's documented now, it saves me hours each time I have to understand it again 😉.
Exactly. We had to check that the split children is the same, not only at level 1 but for all nested children until we find the same text box. It's fixed now.
8724bc3 fixes this bug too, but … there's another bug in |
That's because That's because It's a pity that we must take |
Yes, I've learned a lot of things fixing #301, and Unicode is really fascinating. I'm glad to have Pango 😄. We already have (almost) the whole logic in WeasyPrint: |
Version 50 ---------- Released on 2019-09-19. New features: * `#209 <https://github.com/Kozea/WeasyPrint/issues/209>`_: Make ``break-*`` properties work inside tables * `#661 <https://github.com/Kozea/WeasyPrint/issues/661>`_: Make blocks with ``overflow: auto`` grow to include floating children Bug fixes: * `#945 <https://github.com/Kozea/WeasyPrint/issues/945>`_: Don't break pages between a list item and its marker * `#727 <https://github.com/Kozea/WeasyPrint/issues/727>`_: Avoid tables lost between pages * `#831 <https://github.com/Kozea/WeasyPrint/issues/831>`_: Ignore auto margins on flex containers * `#923 <https://github.com/Kozea/WeasyPrint/issues/923>`_: Fix a couple of crashes when splitting a line twice * `#896 <https://github.com/Kozea/WeasyPrint/issues/896>`_: Fix skip stack order when using a reverse flex direction Contributors: - grewn0uille - Guillaume Ayoub Version 49 ---------- Released on 2019-09-11. Performance: * Speed and memory use have been largely improved. New features: * `#700 <https://github.com/Kozea/WeasyPrint/issues/700>`_: Handle ``::marker`` pseudo-selector * `135dc06c <https://github.com/Kozea/WeasyPrint/commit/135dc06c>`_: Handle ``recto`` and ``verso`` parameters for page breaks * `#907 <https://github.com/Kozea/WeasyPrint/pull/907>`_: Provide a clean way to build layout contexts Bug fixes: * `#937 <https://github.com/Kozea/WeasyPrint/issues/937>`_: Fix rendering of tables with empty lines and rowspans * `#897 <https://github.com/Kozea/WeasyPrint/issues/897>`_: Don't crash when small columns are wrapped in absolute blocks * `#913 <https://github.com/Kozea/WeasyPrint/issues/913>`_: Fix a test about gradient colors * `#924 <https://github.com/Kozea/WeasyPrint/pull/924>`_: Fix title for document with attachments * `#917 <https://github.com/Kozea/WeasyPrint/issues/917>`_: Fix tests with Pango 1.44 * `#919 <https://github.com/Kozea/WeasyPrint/issues/919>`_: Fix padding and margin management for column flex boxes * `#901 <https://github.com/Kozea/WeasyPrint/issues/901>`_: Fix width of replaced boxes with no intrinsic width * `#906 <https://github.com/Kozea/WeasyPrint/issues/906>`_: Don't respect table cell width when content doesn't fit * `#927 <https://github.com/Kozea/WeasyPrint/pull/927>`_: Don't use deprecated ``logger.warn`` anymore * `a8662794 <https://github.com/Kozea/WeasyPrint/commit/a8662794>`_: Fix margin collapsing between caption and table wrapper * `87d9e84f <https://github.com/Kozea/WeasyPrint/commit/87d9e84f>`_: Avoid infinite loops when rendering columns * `789b80e6 <https://github.com/Kozea/WeasyPrint/commit/789b80e6>`_: Only use in flow children to set columns height * `615e298a <https://github.com/Kozea/WeasyPrint/commit/615e298a>`_: Don't include floating elements each time we try to render a column * `48d8632e <https://github.com/Kozea/WeasyPrint/commit/48d8632e>`_: Avoid not in flow children to compute column height * `e7c452ce <https://github.com/Kozea/WeasyPrint/commit/e7c452ce>`_: Fix collapsing margins for columns * `fb0887cf <https://github.com/Kozea/WeasyPrint/commit/fb0887cf>`_: Fix crash when using currentColor in gradients * `f66df067 <https://github.com/Kozea/WeasyPrint/commit/f66df067>`_: Don't crash when using ex units in word-spacing in letter-spacing * `c790ff20 <https://github.com/Kozea/WeasyPrint/commit/c790ff20>`_: Don't crash when properties needing base URL use var functions * `d63eac31 <https://github.com/Kozea/WeasyPrint/commit/d63eac31>`_: Don't crash with object-fit: non images with no intrinsic size Documentation: * `#900 <https://github.com/Kozea/WeasyPrint/issues/900>`_: Add documentation about semantic versioning * `#692 <https://github.com/Kozea/WeasyPrint/issues/692>`_: Add a snippet about PDF magnification * `#899 <https://github.com/Kozea/WeasyPrint/pull/899>`_: Add .NET wrapper link * `#893 <https://github.com/Kozea/WeasyPrint/pull/893>`_: Fixed wrong nested list comprehension example * `#902 <https://github.com/Kozea/WeasyPrint/pull/902>`_: Add ``state`` to the ``make_bookmark_tree`` documentation * `#921 <https://github.com/Kozea/WeasyPrint/pull/921>`_: Fix typos in the documentation * `#328 <https://github.com/Kozea/WeasyPrint/issues/328>`_: Add CSS sample for forms Contributors: - grewn0uille - Guillaume Ayoub - Raphael Gaschignard - Stani - Szmen - Thomas Dexter - Tontyna Version 48 ---------- Released on 2019-07-08. Dependencies: * CairoSVG 2.4.0+ is now needed New features: * `#891 <https://github.com/Kozea/WeasyPrint/pull/891>`_: Handle ``text-overflow`` * `#878 <https://github.com/Kozea/WeasyPrint/pull/878>`_: Handle ``column-span`` * `#855 <https://github.com/Kozea/WeasyPrint/pull/855>`_: Handle all the ``text-decoration`` features * `#238 <https://github.com/Kozea/WeasyPrint/issues/238>`_: Don't repeat background images when it's not needed * `#875 <https://github.com/Kozea/WeasyPrint/issues/875>`_: Handle ``object-fit`` and ``object-position`` * `#870 <https://github.com/Kozea/WeasyPrint/issues/870>`_: Handle ``bookmark-state`` Bug fixes: * `#686 <https://github.com/Kozea/WeasyPrint/issues/686>`_: Fix column balance when children are not inline * `#885 <https://github.com/Kozea/WeasyPrint/issues/885>`_: Actually use the content box to resolve flex items percentages * `#867 <https://github.com/Kozea/WeasyPrint/issues/867>`_: Fix rendering of KaTeX output, including (1) set row baseline of tables when no cells are baseline-aligned, (2) set baseline for inline tables, (3) don't align lines larger than their parents, (4) force CairoSVG to respect image size defined by CSS. * `#873 <https://github.com/Kozea/WeasyPrint/issues/873>`_: Set a minimum height for empty list elements with outside marker * `#811 <https://github.com/Kozea/WeasyPrint/issues/811>`_: Don't use translations to align flex items * `#851 <https://github.com/Kozea/WeasyPrint/issues/851>`_, `#860 <https://github.com/Kozea/WeasyPrint/issues/860>`_: Don't cut pages when content overflows a very little bit * `#862 <https://github.com/Kozea/WeasyPrint/issues/862>`_: Don't crash when using UTC dates in metadata Documentation: * `#854 <https://github.com/Kozea/WeasyPrint/issues/854>`_: Add a "Tips & Tricks" section Contributors: - Gabriel Corona - Guillaume Ayoub - Manuel Barkhau - Nathan de Maestri - grewn0uille - theopeek Version 47 ---------- Released on 2019-04-12. New features: * `#843 <https://github.com/Kozea/WeasyPrint/pull/843>`_: Handle CSS variables * `#846 <https://github.com/Kozea/WeasyPrint/pull/846>`_: Handle ``:nth()`` page selector * `#847 <https://github.com/Kozea/WeasyPrint/pull/847>`_: Allow users to use a custom SSL context for HTTP requests Bug fixes: * `#797 <https://github.com/Kozea/WeasyPrint/issues/797>`_: Fix underlined justified text * `#836 <https://github.com/Kozea/WeasyPrint/issues/836>`_: Fix crash when flex items are replaced boxes * `#835 <https://github.com/Kozea/WeasyPrint/issues/835>`_: Fix ``margin-break: auto`` Version 46 ---------- Released on 2019-03-20. New features: * `#771 <https://github.com/Kozea/WeasyPrint/issues/771>`_: Handle ``box-decoration-break`` * `#115 <https://github.com/Kozea/WeasyPrint/issues/115>`_: Handle ``margin-break`` * `#821 <https://github.com/Kozea/WeasyPrint/issues/821>`_: Continuous integration includes tests on Windows Bug fixes: * `#765 <https://github.com/Kozea/WeasyPrint/issues/765>`_, `#754 <https://github.com/Kozea/WeasyPrint/issues/754>`_, `#800 <https://github.com/Kozea/WeasyPrint/issues/800>`_: Fix many crashes related to the flex layout * `#783 <https://github.com/Kozea/WeasyPrint/issues/783>`_: Fix a couple of crashes with strange texts * `#827 <https://github.com/Kozea/WeasyPrint/pull/827>`_: Named strings and counters are case-sensitive * `#823 <https://github.com/Kozea/WeasyPrint/pull/823>`_: Shrink min/max-height/width according to box-sizing * `#728 <https://github.com/Kozea/WeasyPrint/issues/728>`_, `#171 <https://github.com/Kozea/WeasyPrint/issues/171>`_: Don't crash when fixed boxes are nested * `#610 <https://github.com/Kozea/WeasyPrint/issues/610>`_, `#828 <https://github.com/Kozea/WeasyPrint/issues/828>`_: Don't crash when preformatted text lines end with a space * `#808 <https://github.com/Kozea/WeasyPrint/issues/808>`_, `#387 <https://github.com/Kozea/WeasyPrint/issues/387>`_: Fix position of some images * `#813 <https://github.com/Kozea/WeasyPrint/issues/813>`_: Don't crash when long preformatted text lines end with ``\n`` Documentation: * `#815 <https://github.com/Kozea/WeasyPrint/pull/815>`_: Add documentation about custom ``url_fetcher``
Version 50 ---------- Released on 2019-09-19. New features: * `#209 <https://github.com/Kozea/WeasyPrint/issues/209>`_: Make ``break-*`` properties work inside tables * `#661 <https://github.com/Kozea/WeasyPrint/issues/661>`_: Make blocks with ``overflow: auto`` grow to include floating children Bug fixes: * `#945 <https://github.com/Kozea/WeasyPrint/issues/945>`_: Don't break pages between a list item and its marker * `#727 <https://github.com/Kozea/WeasyPrint/issues/727>`_: Avoid tables lost between pages * `#831 <https://github.com/Kozea/WeasyPrint/issues/831>`_: Ignore auto margins on flex containers * `#923 <https://github.com/Kozea/WeasyPrint/issues/923>`_: Fix a couple of crashes when splitting a line twice * `#896 <https://github.com/Kozea/WeasyPrint/issues/896>`_: Fix skip stack order when using a reverse flex direction Contributors: - grewn0uille - Guillaume Ayoub Version 49 ---------- Released on 2019-09-11. Performance: * Speed and memory use have been largely improved. New features: * `#700 <https://github.com/Kozea/WeasyPrint/issues/700>`_: Handle ``::marker`` pseudo-selector * `135dc06c <https://github.com/Kozea/WeasyPrint/commit/135dc06c>`_: Handle ``recto`` and ``verso`` parameters for page breaks * `#907 <https://github.com/Kozea/WeasyPrint/pull/907>`_: Provide a clean way to build layout contexts Bug fixes: * `#937 <https://github.com/Kozea/WeasyPrint/issues/937>`_: Fix rendering of tables with empty lines and rowspans * `#897 <https://github.com/Kozea/WeasyPrint/issues/897>`_: Don't crash when small columns are wrapped in absolute blocks * `#913 <https://github.com/Kozea/WeasyPrint/issues/913>`_: Fix a test about gradient colors * `#924 <https://github.com/Kozea/WeasyPrint/pull/924>`_: Fix title for document with attachments * `#917 <https://github.com/Kozea/WeasyPrint/issues/917>`_: Fix tests with Pango 1.44 * `#919 <https://github.com/Kozea/WeasyPrint/issues/919>`_: Fix padding and margin management for column flex boxes * `#901 <https://github.com/Kozea/WeasyPrint/issues/901>`_: Fix width of replaced boxes with no intrinsic width * `#906 <https://github.com/Kozea/WeasyPrint/issues/906>`_: Don't respect table cell width when content doesn't fit * `#927 <https://github.com/Kozea/WeasyPrint/pull/927>`_: Don't use deprecated ``logger.warn`` anymore * `a8662794 <https://github.com/Kozea/WeasyPrint/commit/a8662794>`_: Fix margin collapsing between caption and table wrapper * `87d9e84f <https://github.com/Kozea/WeasyPrint/commit/87d9e84f>`_: Avoid infinite loops when rendering columns * `789b80e6 <https://github.com/Kozea/WeasyPrint/commit/789b80e6>`_: Only use in flow children to set columns height * `615e298a <https://github.com/Kozea/WeasyPrint/commit/615e298a>`_: Don't include floating elements each time we try to render a column * `48d8632e <https://github.com/Kozea/WeasyPrint/commit/48d8632e>`_: Avoid not in flow children to compute column height * `e7c452ce <https://github.com/Kozea/WeasyPrint/commit/e7c452ce>`_: Fix collapsing margins for columns * `fb0887cf <https://github.com/Kozea/WeasyPrint/commit/fb0887cf>`_: Fix crash when using currentColor in gradients * `f66df067 <https://github.com/Kozea/WeasyPrint/commit/f66df067>`_: Don't crash when using ex units in word-spacing in letter-spacing * `c790ff20 <https://github.com/Kozea/WeasyPrint/commit/c790ff20>`_: Don't crash when properties needing base URL use var functions * `d63eac31 <https://github.com/Kozea/WeasyPrint/commit/d63eac31>`_: Don't crash with object-fit: non images with no intrinsic size Documentation: * `#900 <https://github.com/Kozea/WeasyPrint/issues/900>`_: Add documentation about semantic versioning * `#692 <https://github.com/Kozea/WeasyPrint/issues/692>`_: Add a snippet about PDF magnification * `#899 <https://github.com/Kozea/WeasyPrint/pull/899>`_: Add .NET wrapper link * `#893 <https://github.com/Kozea/WeasyPrint/pull/893>`_: Fixed wrong nested list comprehension example * `#902 <https://github.com/Kozea/WeasyPrint/pull/902>`_: Add ``state`` to the ``make_bookmark_tree`` documentation * `#921 <https://github.com/Kozea/WeasyPrint/pull/921>`_: Fix typos in the documentation * `#328 <https://github.com/Kozea/WeasyPrint/issues/328>`_: Add CSS sample for forms Contributors: - grewn0uille - Guillaume Ayoub - Raphael Gaschignard - Stani - Szmen - Thomas Dexter - Tontyna Version 48 ---------- Released on 2019-07-08. Dependencies: * CairoSVG 2.4.0+ is now needed New features: * `#891 <https://github.com/Kozea/WeasyPrint/pull/891>`_: Handle ``text-overflow`` * `#878 <https://github.com/Kozea/WeasyPrint/pull/878>`_: Handle ``column-span`` * `#855 <https://github.com/Kozea/WeasyPrint/pull/855>`_: Handle all the ``text-decoration`` features * `#238 <https://github.com/Kozea/WeasyPrint/issues/238>`_: Don't repeat background images when it's not needed * `#875 <https://github.com/Kozea/WeasyPrint/issues/875>`_: Handle ``object-fit`` and ``object-position`` * `#870 <https://github.com/Kozea/WeasyPrint/issues/870>`_: Handle ``bookmark-state`` Bug fixes: * `#686 <https://github.com/Kozea/WeasyPrint/issues/686>`_: Fix column balance when children are not inline * `#885 <https://github.com/Kozea/WeasyPrint/issues/885>`_: Actually use the content box to resolve flex items percentages * `#867 <https://github.com/Kozea/WeasyPrint/issues/867>`_: Fix rendering of KaTeX output, including (1) set row baseline of tables when no cells are baseline-aligned, (2) set baseline for inline tables, (3) don't align lines larger than their parents, (4) force CairoSVG to respect image size defined by CSS. * `#873 <https://github.com/Kozea/WeasyPrint/issues/873>`_: Set a minimum height for empty list elements with outside marker * `#811 <https://github.com/Kozea/WeasyPrint/issues/811>`_: Don't use translations to align flex items * `#851 <https://github.com/Kozea/WeasyPrint/issues/851>`_, `#860 <https://github.com/Kozea/WeasyPrint/issues/860>`_: Don't cut pages when content overflows a very little bit * `#862 <https://github.com/Kozea/WeasyPrint/issues/862>`_: Don't crash when using UTC dates in metadata Documentation: * `#854 <https://github.com/Kozea/WeasyPrint/issues/854>`_: Add a "Tips & Tricks" section Contributors: - Gabriel Corona - Guillaume Ayoub - Manuel Barkhau - Nathan de Maestri - grewn0uille - theopeek Version 47 ---------- Released on 2019-04-12. New features: * `#843 <https://github.com/Kozea/WeasyPrint/pull/843>`_: Handle CSS variables * `#846 <https://github.com/Kozea/WeasyPrint/pull/846>`_: Handle ``:nth()`` page selector * `#847 <https://github.com/Kozea/WeasyPrint/pull/847>`_: Allow users to use a custom SSL context for HTTP requests Bug fixes: * `#797 <https://github.com/Kozea/WeasyPrint/issues/797>`_: Fix underlined justified text * `#836 <https://github.com/Kozea/WeasyPrint/issues/836>`_: Fix crash when flex items are replaced boxes * `#835 <https://github.com/Kozea/WeasyPrint/issues/835>`_: Fix ``margin-break: auto`` Version 46 ---------- Released on 2019-03-20. New features: * `#771 <https://github.com/Kozea/WeasyPrint/issues/771>`_: Handle ``box-decoration-break`` * `#115 <https://github.com/Kozea/WeasyPrint/issues/115>`_: Handle ``margin-break`` * `#821 <https://github.com/Kozea/WeasyPrint/issues/821>`_: Continuous integration includes tests on Windows Bug fixes: * `#765 <https://github.com/Kozea/WeasyPrint/issues/765>`_, `#754 <https://github.com/Kozea/WeasyPrint/issues/754>`_, `#800 <https://github.com/Kozea/WeasyPrint/issues/800>`_: Fix many crashes related to the flex layout * `#783 <https://github.com/Kozea/WeasyPrint/issues/783>`_: Fix a couple of crashes with strange texts * `#827 <https://github.com/Kozea/WeasyPrint/pull/827>`_: Named strings and counters are case-sensitive * `#823 <https://github.com/Kozea/WeasyPrint/pull/823>`_: Shrink min/max-height/width according to box-sizing * `#728 <https://github.com/Kozea/WeasyPrint/issues/728>`_, `#171 <https://github.com/Kozea/WeasyPrint/issues/171>`_: Don't crash when fixed boxes are nested * `#610 <https://github.com/Kozea/WeasyPrint/issues/610>`_, `#828 <https://github.com/Kozea/WeasyPrint/issues/828>`_: Don't crash when preformatted text lines end with a space * `#808 <https://github.com/Kozea/WeasyPrint/issues/808>`_, `#387 <https://github.com/Kozea/WeasyPrint/issues/387>`_: Fix position of some images * `#813 <https://github.com/Kozea/WeasyPrint/issues/813>`_: Don't crash when long preformatted text lines end with ``\n`` Documentation: * `#815 <https://github.com/Kozea/WeasyPrint/pull/815>`_: Add documentation about custom ``url_fetcher``
Hi there, thanks for creating WeasyPrint, it does a really good job!
I've been using it with CKEditor which generates really ugly HTML (specially when someone pastes a Word document into it).
It worked fine until recently where I hit a bug, it seems the combination of style + characters in this HTML breaks it, I stripped it down all I could (original HTML was huge with lot of attributes).
Seems to be a corner case in skip_first_whitespace.
I've created a minimal test (style + html) which fails, if I change some letters/remove style it works, it seems to be an edge combination.
I'm not sure how to properly fix it, I modified the code to keep rendering (and seems fine so far) by catching the IndexError exception.
This a a test case to reproduce the bug
knixeur@64b66f9
The "fix"
knixeur@340a8c7
Stack trace of the test when ran against master
Let me know if I can help you in any way, I tried to follow the code to find the real cause but couldn't and have to keep going on other stuff.
Edit: fixed formatting
Edit2: inlined test and "fix"
The text was updated successfully, but these errors were encountered: