Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only emit CDATA wrappers for inline scripts for JavaScript #5925

Closed
wants to merge 4 commits into from

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jan 22, 2024

Trac ticket: https://core.trac.wordpress.org/ticket/60320

Commit message

Script Loader: Only emit CDATA wrapper comments in wp_get_inline_script_tag() for JavaScript.

This avoids erroneously adding CDATA wrapper comments for non-JavaScript scripts, including those for JSON such as the importmap for script modules in #56313.

Props westonruter, flixos90, mukesh27, dmsnell.
Fixes #60320.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Comment on lines +2887 to +2890
str_contains( $attributes['type'], 'javascript' ) ||
str_contains( $attributes['type'], 'ecmascript' ) ||
str_contains( $attributes['type'], 'jscript' ) ||
str_contains( $attributes['type'], 'livescript' )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will account for a range of possible legacy types beyond the current text/javascript, such as: application/javascript, text/x-javascript, text/javascript1.5, application/javascript, and so on.

For example, see this list from WPRocket: https://github.com/wp-media/wp-rocket/blob/8d510d7b160011ff175488e17d9d6d1254ed16f9/inc/Engine/Optimization/DelayJS/HTML.php#L43-L59

See also a query on Bard: https://g.co/bard/share/eb1358920977

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is type="module" relevant here? I guess it may not be mutually exclusive, but in which world would someone not use HTML5 but JS modules? 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is relevant. A theme may not "support" HTML5 but a plugin may still add modules to the page.

Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter Looks great! Only a few minor points.

Comment on lines +2887 to +2890
str_contains( $attributes['type'], 'javascript' ) ||
str_contains( $attributes['type'], 'ecmascript' ) ||
str_contains( $attributes['type'], 'jscript' ) ||
str_contains( $attributes['type'], 'livescript' )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is type="module" relevant here? I guess it may not be mutually exclusive, but in which world would someone not use HTML5 but JS modules? 😆

@dmsnell
Copy link
Contributor

dmsnell commented Jan 22, 2024

The only situation where we could break out of a SCRIPT tag is when the JS contents include the string </script and then some content and then >. Would it make more sense simply to rip out the fake CDATA?

We would only need it I think in XML circumstances, meaning inside RSS feeds, which don't get SCRIPT elements.

Since these pages are truly delivering HTML5 even without theme support, this seems a bit of a practical choice. We could reject JavaScript code containing </script, or more robustly check for content that would exit the SCRIPT element and reject it. Reject, because we can't know how to split it, if it's in a JavaScript string, comment, etc…

@westonruter
Copy link
Member Author

@dmsnell AFAIK the only purpose of the CDATA wrappers was to prevent scripts from breaking XML when they use <, for example in an if statement. The CDATA wrapper allows inline scripts to use a literal < instead of &lt; when the browser is in XML parsing mode. I don't think the intention was ever to guard against a script breaking out.

So I think more protection against accidentally (or not) breaking out of a script would be a good idea, but should probably be done in a separate ticket. And since json_encode() escapes slashes by default, I think this is already usually mitigated.

@dmsnell
Copy link
Contributor

dmsnell commented Jan 23, 2024

@westonruter "AFAIK the only purpose of the CDATA wrappers was to prevent scripts from breaking XML when they use <, for example in an if statement." this was my point exactly. I'm suggesting that given how near-impossible it is to trigger the XML parsing mode within WordPress, this code isn't practically necessary and the breakage it introduces more than outweighs a hypothetical breakage we could see without any of it.

In fact, on the pages this serves, the CDATA wrappers are not letting inline scripts use &lt; or <. Scripts may already use < and &lt; will come across as the verbatim four-character string in the script. That is, based on your web querying and what I've found in my own tests, adding CDATA is more harmful than removing it.

Thus my question is should we rip it all out since it's not fulfilling the purpose for which it's there? (Okay technically it's fulfilling the purpose if and only if the page is properly sent as XML, which you confirmed "it is not.") In those cases the burden is on the extender to properly escape their own JavaScript code. This affects a fraction of the less-than 0.0001% of sites serving XML that also contain script data that would break without the CDATA. On the other hand, it seems like it's breaking a noticeable number of normative sites where the CDATA's presence or absence has zero impact on how the SCRIPT contents are interpreted.

@westonruter
Copy link
Member Author

I think you're right that the CDATA wrappers should be removed. But maybe we should do it in a follow-up ticket so there is more visibility?

@westonruter
Copy link
Member Author

Something else we should do is make it clear that these script functions are not exclusively for JavaScript. Currently it explicitly is mentioning "JavaScript" in the function description and in the initial argument.

@felixarntz
Copy link
Member

+1 to discussing removal of CDATA wrappers separately. This PR is about a bug with the CDATA wrappers rather than removing them.

@westonruter
Copy link
Member Author

Committed in r57341 (5139923)

@westonruter
Copy link
Member Author

Something else we should do is make it clear that these script functions are not exclusively for JavaScript. Currently it explicitly is mentioning "JavaScript" in the function description and in the initial argument.

See Core-60331.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants