Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autop: Use placeholders to protect script, style tags #15129

Closed
wants to merge 3 commits into from

Conversation

aduth
Copy link
Member

@aduth aduth commented Apr 23, 2019

Fixes #9056
Related: https://core.trac.wordpress.org/ticket/2833#comment:19

This pull request seeks to resolve an issue with the autop implementation where br and p elements are added within the contents of a script or style tag. It also avoids adding a wrapping p to script tags (already present for style).

The underlying issue is described at #9056 (comment) . While one possible solution would have been to fix the incorrectly-ported regular expression backreference, it would have left the remaining issue of paragraphs being added within the elements. For that reason, and inspired by prior art by @cmmarslender at Trac#2833, the solution here extends the approach used for pre tag placeholder substitution to apply to style and script tags as well.

Testing Instructions:

Repeat steps to reproduce from #9056, verifying that neither br nor p tags are inserted within the contents of a post consisting only of this content (i.e. no blocks content, equivalent to a pre-WordPress 5.0 post).

Ensure unit tests pass:

npm run test-unit packages/autop

@aduth aduth added [Type] Bug An existing feature does not function as intended [Package] Autop /packages/autop labels Apr 23, 2019
@aduth aduth requested a review from youknowriad as a code owner April 23, 2019 19:07
@aduth
Copy link
Member Author

aduth commented Apr 24, 2019

@azaozz indirectly left some feedback on this approach from my related comments on the PHP-equivalent Trac ticket:

https://core.trac.wordpress.org/ticket/2833#comment:37

From that, I'm wondering if the specific change to include script in the allBlocks pattern should be omitted as part of this fix, or at least so as to be conservative until a decision there can be reached, while still allowing the primary issue here to be resolved.

https://github.com/WordPress/gutenberg/pull/15129/files#diff-eb22f4279633ce3065fcbb5f18189916L169

@azaozz
Copy link
Contributor

azaozz commented Apr 26, 2019

I'm wondering if the specific change to include script in the allBlocks pattern should be omitted

Yes, thinking it would be better to not add it to allBlocks. "Unwrapping" them would be good, but only when the <script> tag is the only content of the <p> tag. Same for the <style> tags.

The problem is the <script> tag is allowed/can be added everywhere #text is allowed. This brings edge cases when having markup like:

<p><script>alert(1);</script>This is some text...</p>

<p>This is some more text...<script>alert(2);</script></p>

There may be other edge cases like:

<p><script>alert(1);</script>
This is some text...</p>

where the line break may turn into a redundant <br> (I know, this is getting pretty complex)...

As autop hopefully is "on its way out", thinking it will be sufficient to unwrap <script> and <style> tags only when there isn't other content in the <p>s. Perhaps something like:

// If a script or style tag is wrapped in <p>, unwrap it.
text = text.replace( /<p>(<(script|style)[/s/S]+?<\/\\2>)<\/p>/g, '$1' );

@azaozz
Copy link
Contributor

azaozz commented Apr 26, 2019

On second thought, can we just drop autop from Gutenberg completely? :)

The PHP wpautop() "display filter" would happily work with HTML containing <p> tags. There are couple of exceptions that may need fixing, like line breaks inside tags (between < and >), but that "rare edge case" should be fixable. So, when it is unclear if the content was filtered through pre_wpautop on saving, we can still run wpautop on displaying a post.

The edge case of having only one classic block can be solved perhaps by adding an "empty" block at the beginning? Something like <!-- wp:classic /--> would do it. Then in PHP has_blocks() will be true for really all posts that were edited in Gutenberg, no "hacks" :)

@youknowriad
Copy link
Contributor

@azaozz @aduth This PR seems to close a high-priority bug in Gutenberg. how can we move it forward? Is the current state good enough? Can we get it in and iterate on the ideal solution separately?

@youknowriad
Copy link
Contributor

Up again here :) Should we land this?

@azaozz
Copy link
Contributor

azaozz commented Jan 29, 2020

Looking again, the only potential problem I see is adding of the <script> tag to const allBlocks (this also doesn't match the PHP autop). Also, another tag to consider for this treatment is <svg>.

More context: "historically" the js version of autop was intended to work together with contenteditable and the TinyMCE's cleanup on loading content, and on saving content. There are/were some deviations from the PHP autop.

Perhaps having script in allBlocks will work in the js version as the content will either be converted to blocks or loaded in TinyMCE/contenteditable (that has additional "cleanup").

@aduth aduth force-pushed the fix/9056-autop-script-style branch from 256fab7 to 3a0aebf Compare January 29, 2020 19:40
@aduth
Copy link
Member Author

aduth commented Jan 29, 2020

Thanks for the bump, and apologies I missed your original comment from October @youknowriad . It must have been a casualty of my extended leave.

Thanks also @azaozz for the feedback. It's been a while since I looked at this, so I needed a refresher. Based on the prior conversation, it seemed to me that the original approach here in trying to consider script as a block-level element was misguided (it's not, nor is style for that matter`). But (per @azaozz's prior remarks about "unwrapping"), we'd still like for them to inherit the same unwrapping behavior we apply to block tags.

The additional changes in 3a0aebf should address this:

  • Don't consider script or style tags as block tags, in order to avoid them being pulled out
  • Do consider them for unwrapping, in addition to the block tags.

Also, another tag to consider for this treatment is <svg>.

Added this as well in dc39e37, using as a test case the example from Trac#9437.

@aduth
Copy link
Member Author

aduth commented Jan 29, 2020

Thinking on this some more, I'm starting to wonder how much this would really help, at least without a corresponding change in the PHP implementation (Trac#2833).

This code is relevant for only the classic block. Even if the output in the browser is "correct", the server will still likely reintroduce the same problems when it applies wpautop to content. The only exception would be if the post also has other blocks in addition to the Classic/Freeform block (source). That's a pretty small subset of cases we'd be having any improvement over.

Maybe I'm missing something. Is there anything specific about Gutenberg and the issues being described in #9056 which weren't already present in the 14 years since Trac#2833 was opened?

@aduth
Copy link
Member Author

aduth commented May 25, 2020

Note about the status here: The changes here should effectively be considered complete. However, as noted in #15129 (comment), this is only one part of the bigger issue with how autop works, since it depends on equivalent changes to the server-side implementation of wpautop for the content to not be altered unexpectedly in preparing for front-end display. A coordinated effort to port the changes to PHP would be required for this to be "solved".

@aduth aduth added the [Status] Blocked Used to indicate that a current effort isn't able to move forward label May 25, 2020
@youknowriad
Copy link
Contributor

@azaozz Do you think bringing those changes to PHP is something you could help with? or who would be the right person to help here? I admit I don't trust myself to make this kind of changes in WP :)

@azaozz
Copy link
Contributor

azaozz commented May 26, 2020

Do you think bringing those changes to PHP is something you could help with?

Sure. Have looked at both the JS and PHP autop many times over the years, and the changes in this PR make sense.

IMHO this is a good place to do some "tests first" coding. Will need to find all edge cases and write tests for them before changing the actual code. Looking at the current patches (both JS and PHP) they'll need quite a bit more testing before deemed "ready".

As far as I see there are three parts to this:

  1. Fix the existing regex (in JS) that replaces line breaks inside script tags with a placeholder, see https://github.com/WordPress/gutenberg/blob/master/packages/autop/src/index.js#L282. The fix is described here: Line Break Tags Added to Scripts and Styles #9056 (comment) (needs both the [\s\S]+ and fixing the backref). This will bring the JS autop inline with the PHP autop, and has the advantage of being 100% backwards compatible (i.e. it won't introduce new edge cases, but won't fix existing edge cases either).
  2. Add tests and implement the unwrapping (if needed after implementing # 1 above). Looking at the changes here, the tests should account for cases like <p>Text text.<script>alert(1)</script></p> which should not be unwrapped.
  3. Alternatively to # 1, implement the "whole tag exclusion" as the proposed patch here. This looks better than # 1, but might bring new edge cases, so a "tests first" approach seems best for both JS and PHP.

@gwwar
Copy link
Contributor

gwwar commented Feb 25, 2021

@mcsf @youknowriad @azaozz were folks still working on this one? What do y'all suggest for next steps? This one is pretty tricky so I think we'd at least need reviews from folks who are familiar with autop.

This came up in core-editor triage

@mcsf
Copy link
Contributor

mcsf commented Feb 26, 2021

were folks still working on this one? What do y'all suggest for next steps? This one is pretty tricky so I think we'd at least need reviews from folks who are familiar with autop.

No work happening from my side or Riad's. I don't know about @azaozz. Indeed, it's tricky, and Andrew raised good questions about the pertinence of the change in #15129 (comment). If we do want to go forward, it's a delicate task that someone would need to own.

Base automatically changed from master to trunk March 1, 2021 15:42
@annezazu annezazu closed this Jul 27, 2022
@youknowriad youknowriad deleted the fix/9056-autop-script-style branch September 7, 2022 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Package] Autop /packages/autop [Status] Blocked Used to indicate that a current effort isn't able to move forward [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Line Break Tags Added to Scripts and Styles
7 participants