-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
White space between inline elements lost #68
Comments
Just a thought: I'm not very familiar with the way HTML treats white space in all the possible special ways they can occour, so the following idea might not be the ideal solution but maybe it could be an approach to not render a text node, if it only consists of white space while the previous node being a block element, like a paragraph? ...
} else if (node instanceof TextNode) {
String wholeText = ((TextNode) node).getWholeText();
if (node.previousSibling() instanceof Element && ((Element) node.previousSibling()).isBlock() && wholeText.trim().isEmpty()) {
// White space only text node after block element is ignored
return;
}
renderText(wholeText);
} |
In fact, the motivation for creating this library is not to handle arbitrary HTML strings, but to handle content generated by rich text editors and support exporting it to WORD. As far as I know, rich text editors usually do not use the white-space style to handle whitespace characters (at least I have not encountered it, please correct me if I am wrong). Using regular expressions to remove whitespace between tags is largely to reduce the number of DOM nodes parsed by Jsoup and improve efficiency. If we have to consider the impact of white-space style, the complexity of rendering text nodes will be greatly increased, and I need some time to consider it. |
I did some more testing and it turns out the white-space style isn't the problem here. <p><em style="color: red; ">red-italic</em> <strong style="color: blue;">blue-bold</strong></p> |
Please try version 0.4.3 |
Describe the bug
The white space between two inline elements (siblings or nested) gets lost.
HTML content:
Example 1 (siblings):
<p style="white-space: pre-wrap;"><strong>bold</strong> <em>italic</em></p>
Space between
</strong>
end tag and the<em>
start tagExample 2 (nested):
<p style="white-space: pre-wrap;"><strong><span style="color: red;">red-bold</span> </strong>bold</p>
Space between
</span>
end tag and the<strong>
endtagExpected behavior
The white space should remain (either fully or collapsed, depending on the "white-space: " CSS property)
Screenshots
![lost-whitespace](https://user-images.githubusercontent.com/31596699/230926112-b305067d-29ea-4eab-b285-78b599a61edc.png)
Wrong result:
poi-tl-ext version:
0.4.2-poi5
poi-tl version:
1.12.1
Additional context
The problem results from replacing all occurences of the pattern
>\\s+<
with><
in this line of code, thus also removing the space between the</strong>
end tag and the<em>
start tag from example 1 above or the space between the two end tags in example 2.I assume the intention of that code is to remove empty lines between two HTML block elements? At least removing it leads to an unnecessary empty line between the end of one paragraph and the start of a new one like in this example:
(cursor between the the lines indicates an additional empty line between the two actual lines)
The text was updated successfully, but these errors were encountered: