Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

White space between inline elements lost #68

Closed
TekkiWuff opened this issue Apr 10, 2023 · 4 comments
Closed

White space between inline elements lost #68

TekkiWuff opened this issue Apr 10, 2023 · 4 comments

Comments

@TekkiWuff
Copy link

Describe the bug
The white space between two inline elements (siblings or nested) gets lost.

HTML content:
Example 1 (siblings): <p style="white-space: pre-wrap;"><strong>bold</strong> <em>italic</em></p>
Space between </strong> end tag and the <em> start tag

Example 2 (nested): <p style="white-space: pre-wrap;"><strong><span style="color: red;">red-bold</span> </strong>bold</p>
Space between </span> end tag and the <strong> endtag

Expected behavior
The white space should remain (either fully or collapsed, depending on the "white-space: " CSS property)

Screenshots
Wrong result:
lost-whitespace

poi-tl-ext version:
0.4.2-poi5

poi-tl version:
1.12.1

Additional context
The problem results from replacing all occurences of the pattern >\\s+< with >< in this line of code, thus also removing the space between the </strong> end tag and the <em> start tag from example 1 above or the space between the two end tags in example 2.
I assume the intention of that code is to remove empty lines between two HTML block elements? At least removing it leads to an unnecessary empty line between the end of one paragraph and the start of a new one like in this example:

<p style="white-space: pre-wrap;"><strong>bold</strong> <em>italic</em></p>
<p style="white-space: pre-wrap;"><strong><span style="color: red;">red-bold</span> </strong>bold</p>

additional-new-line
(cursor between the the lines indicates an additional empty line between the two actual lines)

@TekkiWuff
Copy link
Author

Just a thought: I'm not very familiar with the way HTML treats white space in all the possible special ways they can occour, so the following idea might not be the ideal solution but maybe it could be an approach to not render a text node, if it only consists of white space while the previous node being a block element, like a paragraph?
Maybe something like this in renderNode(Node node) methode?

    ...
    } else if (node instanceof TextNode) {
        String wholeText = ((TextNode) node).getWholeText();
        if (node.previousSibling() instanceof Element && ((Element) node.previousSibling()).isBlock() && wholeText.trim().isEmpty()) {
            // White space only text node after block element is ignored
            return;
        }
        renderText(wholeText);
    }

@draco1023
Copy link
Owner

In fact, the motivation for creating this library is not to handle arbitrary HTML strings, but to handle content generated by rich text editors and support exporting it to WORD. As far as I know, rich text editors usually do not use the white-space style to handle whitespace characters (at least I have not encountered it, please correct me if I am wrong). Using regular expressions to remove whitespace between tags is largely to reduce the number of DOM nodes parsed by Jsoup and improve efficiency. If we have to consider the impact of white-space style, the complexity of rendering text nodes will be greatly increased, and I need some time to consider it.

@TekkiWuff
Copy link
Author

I did some more testing and it turns out the white-space style isn't the problem here.
Even the following basic paragraph with just two differently formatted spans following eachother created in TinyMCE results in the loss of the space between words.
HTML:

<p><em style="color: red; ">red-italic</em> <strong style="color: blue;">blue-bold</strong></p>

Result:
MissingSpaceBetweenWords

@draco1023
Copy link
Owner

Please try version 0.4.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants