Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 19 additions & 8 deletions src/wp-includes/html-api/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -3780,17 +3780,28 @@ public function set_modifiable_text( string $plaintext_content ): bool {

switch ( $this->get_tag() ) {
case 'SCRIPT':
/*
/**
* This is over-protective, but ensures the update doesn't break
* out of the SCRIPT element. A more thorough check would need to
* ensure that the script closing tag doesn't exist, and isn't
* also "hidden" inside the script double-escaped state.
* the HTML structure of the SCRIPT element.
*
* More thorough analysis could track the HTML tokenizer states
* and to ensure that the SCRIPT element closes at the expected
* SCRIPT close tag as is done in {@see ::skip_script_data()}.
*
* It may seem like replacing `</script` with `<\/script` would
* properly escape these things, but this could mask regex patterns
* that previously worked. Resolve this by not sending `</script`
* A SCRIPT element could be closed prematurely by contents
* like `</script>`. A SCRIPT element could be prevented from
* closing by contents like `<!--<script>`.
*
* The following strings are essential for dangerous content,
* although they are insufficient on their own. This trade-off
* prevents dangerous scripts from being sent to the browser.
* It is also unlikely to produce HTML that may confuse more
* basic HTML tooling.
*/
if ( false !== stripos( $plaintext_content, '</script' ) ) {
if (
false !== stripos( $plaintext_content, '</script' ) ||
false !== stripos( $plaintext_content, '<script' )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dangerous thing here is to enter the double escaped state. I wrote extensively about that here.

This check could be for both <!-- and <script, which are necessary to enter the double escaped state, but I think the <script check is sufficient for now.

Strictly speaking, the double escaped state is something like /<!---*(?!>).*<script[ \t\n\r\f\/>]/i and the dangerous closer /<\/script[ \t\n\r\f\/>]/i.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with better understanding of this now it still makes me wonder about allowing vs. not allowing it. we can detect if we enter and exit the double-escaped state, so technically we could allow entering it before rejecting the update.

but assuming we can allow this, it carries the risk downstream to unaware code. not allowing it at all means simpler code doesn’t have to know about these nuances. but then again, we can’t always go out of our way to ensure that broken code doesn’t break.

no insight, just questions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wondering about this as well. I'm confident in the logic around processing script tag contents. It would require some refactoring, but we could determine whether script tag contents are dangerous (with regards to HTML structure) in spec-compliant ways instead of these very basic tests.

  • 👍 This check ensures unsafe scripts are not allowed
  • 👎 This check disallows some safe scripts that appear dangerous.

A good example is that <!--<script></script> is perfectly fine. It includes both of the substrings that are dangerous, but when combined in the correct way they become safe. The HTML API is smart enough to recognize and allow this, but it's very likely to confuse other simpler processing such as regular expression based approaches.

My feeling, at least at this time, is that it's preferable to be more restrictive now and fail on some inputs that are not safe but look dangerous. In the future it should be easy to make these checks more accurate and allow more safe but dangerous-seeming input.

) {
return false;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -490,6 +490,7 @@ public static function data_unallowed_modifiable_text_updates() {
'Comment with --!>' => array( '<!-- this is a comment -->', 'Invalid but legitimate comments end in --!>' ),
'SCRIPT with </script>' => array( '<script>Replace me</script>', 'Just a </script>' ),
'SCRIPT with </script attributes>' => array( '<script>Replace me</script>', 'before</script id=sneak>after' ),
'SCRIPT with "<script " opener' => array( '<script>Replace me</script>', '<!--<script ' ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earmarking for follow-up: we could probably create a new unit test that better spells out what we’re wanting to enshrine here

/**
 * @param 'early-exit'|'prevents-exit' $violation Whether the violating contents close the SCRIPT element early or prevent its normal closure.
 */
public function rejects_script_contents_which_escape_the_script_element_boundaries( string $violating_script_contents, string $violation ) {
	…
}

);
}
}
Loading