Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank lines are added when serializing HTML between elements that should not have whitespace #1304

Closed
Frenchomatic opened this Issue Aug 1, 2018 · 9 comments

Comments

Projects
None yet
2 participants
@Frenchomatic
Copy link

Frenchomatic commented Aug 1, 2018

Class AMP_Content_Sanitizer in class-amp-content.php

$content containing >< from the editor is fine up until where it gets sanitized. At this point a blank line appears to get inserted into $sanitzed_content. If you trace it through I think there is a problem in class AMP_Content_Sanitizer with this file being called class-amp-dom-utils.php - it is doing something strange. In other words

<li><strong> is in $content (the editor) but in $sanitzed_content you get <li>BLANK LINE<strong>

Another example

<h2>Blal Bla <span class="green">more text</span></h2> gets rendered as 

<h2>Blal Bla <span class="green">more text</span>
</h2>
@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Aug 1, 2018

I've noticed this as well. It's actually an issue with PHP's DOM library (which uses libxml) and the behavior of its HTML serializer. Whitespace is not being preserved exactly. This became a problem for us in one case where the original HTML had two adjoining elements without any whitespace that are laid out side-by-side with display:inline-block and width:50%. When the HTML is serialized and the newlines are added, then the two elements would no longer fit on the same line because of the additional space added.

The method in question for where serialization is happening is AMP_DOM_Utils ::get_content_from_dom_node().

In particular:

https://github.com/Automattic/amp-wp/blob/6607160506050db3df09eefc38d9041b2a92976a/includes/utils/class-amp-dom-utils.php#L424

That being said, the issue could potentially come down to how DOMDocument is constructed in the first place in AMP_DOM_Utils::get_dom(). There is supposed to be a preserveWhiteSpace option that is enabled by default, but it doesn't seem to preserve the whitespace completely.

We might need to find a workaround if there isn't something that can be done at the libxml level.

At the same time, for improved performance (by reducing byte size) it would actually be best if whitespace were collapsed and not preserved when serializing. That is, unless the whitespace is inside of a pre of course.

@westonruter westonruter changed the title Blank lines inserted in html - Sanitizer Blank lines are added when serializing HTML between elements that should not have whitespace Aug 1, 2018

@Frenchomatic

This comment has been minimized.

Copy link
Author

Frenchomatic commented Aug 1, 2018

Glad someone knows what I am talking about. However, only a few of those making AMP pages even bother to look at the html being generated. AMP is a dead duck if you can't even produce clean html from a wordpress editor. We are not even talking about complex html constructs but basic stuff.

This problem occurs not just in this plugin but AMP for WP (https://wordpress.org/plugins/accelerated-mobile-pages/ ) also as it uses the vendor code.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Aug 1, 2018

AMP and Genesis are not mutually exclusive. A Genesis theme should be fully compatible with the plugin's AMP theme support, which was introduced in 0.7 and is being improved for the 1.0 release; it is now in 1.0-beta1.

@Frenchomatic

This comment has been minimized.

Copy link
Author

Frenchomatic commented Aug 1, 2018

@westonruter sorry - that was probably not called for. What I mean is I don't get these types of issues with Genesis. If I put something in the editor then that is what I get. If I write bad html I get bad html on the output. If I write good html then I get good html. What we have here is good html being turned into broken html (which ever way you look at it). It effects this plugin and another I use - https://wordpress.org/plugins/accelerated-mobile-pages/ . My PHP skills are not good enough to fix it.

Again - sorry for the remark because the Genesis part is irrelevant. Just using a bog standard WP install and I get the same issue.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Aug 1, 2018

I understand. And it is a bug, as you've identified. So we'll try to fix it. It's difficult because it doesn't seem to be a PHP coding issue at all but a lower-level libxml issue.

@Frenchomatic

This comment has been minimized.

Copy link
Author

Frenchomatic commented Aug 1, 2018

thanks @westonruter - I thought it may just be me because I have been banging on about it with the other plugin developer but not had responses. I now understand that it is probably a really difficult problem to solve.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Aug 2, 2018

I can see that the issue is fixed in PHP 7.3 which is nearing release: https://3v4l.org/77dc4

But this won't help most users.

@westonruter

This comment has been minimized.

Copy link
Member

westonruter commented Aug 2, 2018

@Frenchomatic

This comment has been minimized.

Copy link
Author

Frenchomatic commented Aug 3, 2018

@westonruter Lovely so far on my end. Tested a few scenarios - in particular this one <h2>Bal Bla <span class="something">more text</span></h2> now renders as it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.