Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve page break related to inline(-block) elements #2497

Closed
wants to merge 11 commits into from

Conversation

Mellthas
Copy link
Member

  • Inline frames now properly report their height, the same as text frames. This resolves inline frames extending into the page margins instead of breaking onto a new page.
  • The positioning of inline frames has also been improved, allowing line breaks within them in more instances (instead of breaking the whole frame).
  • Allow line breaks and page breaks before inline-block elements, which were not allowed before.

Nested inline frames with margin, border, or padding still cause bugs at line breaks, because they are added to the text children on-the-fly. To properly fix these cases, it would probably be best to refactor the code in such a way that margin, border, and padding are properly handled on the inline frame itself, instead of being added to first resp. last child. It might be even worth considering always containing text frames in an anonymous inline box if needed; that could allow handling these properties consistently while simplifying the code. But the changes should improve the situation for now, nonetheless.

I have a few test cases in preparation that will demonstrate several broken scenarios that are fixed by this PR.

@Mellthas
Copy link
Member Author

I have added one more commit that should improve auto-width calculation for inline-block and floating elements by sticking more closely to the spec.

@Mellthas
Copy link
Member Author

Mellthas commented Aug 1, 2021

Added another commit to trim trailing whitespace after line breaks in more cases, notably after inline and inline-block frames.

@Mellthas
Copy link
Member Author

Mellthas commented Aug 3, 2021

For a sample of the scenarios the commits here intend to fix, see the test cases included and listed in #2510.

@bsweeney
Copy link
Member

bsweeney commented Sep 11, 2021

Lots to review here, in the meantime when running through my standard test bed I noticed two issues. I haven't had a chance to look and see what's going on yet, so here's the errors.

First, input buttons render with no width:

<input type="submit" value="submit" />

Second, rendering http://eclecticgeek.com/dompdf/core_tests/encoding_utf-8_all.html throws the following exception:

[11-Sep-2021 19:33:25 America/New_York] 
[11-Sep-2021 19:33:25 America/New_York] TypeError: Return value of Dompdf\FrameReflower\Text::_collapse_white_space() must be of the type string, null returned in dompdf\src\FrameReflower\Text.php:86
Stack trace:
[0] dompdf\src\FrameReflower\Text.php(261): Dompdf\FrameReflower\Text->_collapse_white_space('D800  \xED\xA0\x80\xED\xA0\x81\xED\xA0\x82...')
[1] dompdf\src\FrameReflower\Text.php(396): Dompdf\FrameReflower\Text->_layout_line()
[2] dompdf\src\FrameDecorator\AbstractFrameDecorator.php(902): Dompdf\FrameReflower\Text->reflow(Object(Dompdf\FrameDecorator\Block))
[3] dompdf\src\FrameReflower\Block.php(872): Dompdf\FrameDecorator\AbstractFrameDecorator->reflow(Object(Dompdf\FrameDecorator\Block))
[4] dompdf\src\FrameDecorator\AbstractFrameDecorator.php(902): Dompdf\FrameReflower\Block->reflow(NULL)
[5] dompdf\src\FrameReflower\Page.php(141): Dompdf\FrameDecorator\AbstractFrameDecorator->reflow()
[6] dompdf\src\FrameDecorator\AbstractFrameDecorator.php(902): Dompdf\FrameReflower\Page->reflow(NULL)
[7] dompdf\src\Dompdf.php(837): Dompdf\FrameDecorator\AbstractFrameDecorator->reflow()
[8] test\run-tests.php(145): Dompdf\Dompdf->render()
[9] {main}

@Mellthas
Copy link
Member Author

The second error is due preg_replace returning null in case of an error. In case of https://eclecticgeek.com/dompdf/core_tests/encoding_utf-8_all.html, it seems to be related to the surrogate code points starting at D800. Notice how there is no output for these when running through the debug helper at https://eclecticgeek.com. On my system (Ubuntu 20.04 with the default PHP 7.4.3), preg_replace runs without errors in these cases, the resulting output has the characters included as question marks (which I think is as expected). See the attached output file.

I have pushed an update to resolve the error by explicitly retaining the current behavior: Effectively returning the empty string in case of a null return value of preg_replace. Whether that behavior is fine, or if it would make more sense to preserve the original string in case of an error is I think a discussion unrelated to the current pull request.

@Mellthas
Copy link
Member Author

By the way, there is a broken br tag down there in encoding_utf-8_all.html ;-).

@Mellthas
Copy link
Member Author

Input buttons rendering without width is probably related to the change to use shrink-to-fit width resolution for float and inline-block elements. Generated content is only set right before reflowing the generated-content frame, so it is not yet set when calling get_min_max(_content)_width. This already affects the width calculations for table cells.

@Mellthas
Copy link
Member Author

I don't see a straight-forward way for handling the generated content right now, so I am going to split up the PR.

For simplicity, return a `float` value consistently.
The `get_break_margins` method was unused.
Also, add type declarations to `_collapse_white_space`.
Consistent argument names and more type declarations.
@Mellthas Mellthas mentioned this pull request Sep 12, 2021
Always evaluated to `0` before, because their style's height is always
`auto`.

The new method is a copy of the one in `FrameDecorator/Text`. Woud be
nice to refactor this in the future.
The original code would:
* Not position inline frames properly, which lead to erroneous
interaction with page breaks
* Break inline frames to early, if the first word in the frame was
not the longest one (unless in a fixed-position parent)
* Not split inline frames properly, if they had text-node children
after other children
* Never move inline-block frames to a new line
* Never allow page breaks before inline-block frames
Fixes backgrounds or borders bleeding into the page margins at page-
break locations. This is achieved by fixing and using the existing
`remove_frames_from_line` method.
Originally introduced in commit 542483a
for calculating auto widths for float and inline-block frames.
Move the logic for trimming trailing whitespace after a line break to
the `add_line` method of `BlockFrameDecorator`, so it is always
applied when a new line box is added. This now also trims trailing
whitespace if a text frame is used in block formatting context, but
this may even be desirable.

Also, restore trailing whitespace removed in this way after a split
to address issues such as dompdf#2250.
The condition is wrong, as an `auto` width can resolve to more than
100% if the minimum width of the content is larger.
@Mellthas
Copy link
Member Author

The split is done: I would review #2545, #2546, #2547 first, then #2548. That finally leaves #2549, which might require more work to prevent the mentioned regressions related to generated content.

@bsweeney
Copy link
Member

By the way, there is a broken br tag down there in encoding_utf-8_all.html ;-).

heh, yes thanks

@bsweeney
Copy link
Member

The split is done: I would review #2545, #2546, #2547 first, then #2548. That finally leaves #2549, which might require more work to prevent the mentioned regressions related to generated content.

Are those replacements for this PR or in addition to it? Should I go ahead and finish reviewing this one before I move on to those?

@Mellthas
Copy link
Member Author

Those PRs replace this one completely, so no need to continue reviewing here.

@Mellthas Mellthas closed this Sep 12, 2021
@Mellthas Mellthas deleted the page-break branch September 16, 2021 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants