Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Interactivity API store JSON encoding #6520

Closed
wants to merge 12 commits into from
33 changes: 32 additions & 1 deletion src/wp-includes/interactivity-api/class-wp-interactivity-api.php
Original file line number Diff line number Diff line change
Expand Up @@ -167,10 +167,41 @@ public function print_client_interactivity_data() {
}

if ( ! empty( $interactivity_data ) ) {
/*
* This data will be printed as JSON inside a script tag like this:
* <script type="application/json"></script>
*
* A script tag must be closed by a sequence beginning with `</`. It's impossible to
* close a script tag without using `<`. We ensure that `<` is escaped and `/` can
* remain unescaped, so `</script>` will be printed as `\u003C/script\u00E3`.
*
* - JSON_HEX_TAG: All < and > are converted to \u003C and \u003E.
* - JSON_UNESCAPED_SLASHES: Don't escape /.
*
* If the page will use UTF-8 encoding, it's safe to print unescaped unicode:
*
* - JSON_UNESCAPED_UNICODE: Encode multibyte Unicode characters literally (instead of as `\uXXXX`).
* - JSON_UNESCAPED_LINE_TERMINATORS: The line terminators are kept unescaped when
* JSON_UNESCAPED_UNICODE is supplied. It uses the same behaviour as it was
* before PHP 7.1 without this constant. Available as of PHP 7.1.0.
*
* The JSON specification requires encoding in UTF-8, so if the generated HTML page
* is not encoded in UTF-8 then it's not safe to include those literals. They must
* be escaped to avoid encoding issues.
*
* @see https://www.rfc-editor.org/rfc/rfc8259.html for details on encoding requirements.
* @see https://www.php.net/manual/en/json.constants.php for details on these constants.
* @see https://html.spec.whatwg.org/#script-data-state for details on script tag parsing.
*/
$json_encode_flags = JSON_HEX_TAG | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_LINE_TERMINATORS;
if ( ! is_utf8_charset() ) {
$json_encode_flags = JSON_HEX_TAG | JSON_UNESCAPED_SLASHES;
}

wp_print_inline_script_tag(
wp_json_encode(
$interactivity_data,
JSON_HEX_TAG | JSON_HEX_AMP
$json_encode_flags
),
array(
'type' => 'application/json',
Expand Down
70 changes: 64 additions & 6 deletions tests/phpunit/tests/interactivity-api/wpInteractivityAPI.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ public function set_up() {
$this->interactivity = new WP_Interactivity_API();
}

public function charset_iso_8859_1() {
return 'iso-8859-1';
}

/**
* Tests that the state and config methods return an empty array at the
* beginning.
Expand Down Expand Up @@ -349,22 +353,76 @@ public function test_config_printed_correctly_with_nested_empty_array() {
* properly escaped.
*
* @ticket 60356
* @ticket 61170
*
* @covers ::state
* @covers ::config
* @covers ::print_client_interactivity_data
*/
public function test_state_and_config_escape_special_characters() {
$this->interactivity->state( 'myPlugin', array( 'amps' => 'http://site.test/?foo=1&baz=2' ) );
$this->interactivity->config( 'myPlugin', array( 'tags' => 'Tags: <!-- <script>' ) );
$this->interactivity->state(
'myPlugin',
array(
'ampersand' => '&',
'less-than sign' => '<',
'greater-than sign' => '>',
'solidus' => '/',
'line separator' => "\u{2028}",
'paragraph separator' => "\u{2029}",
'flag of england' => "\u{1F3F4}\u{E0067}\u{E0062}\u{E0065}\u{E006E}\u{E0067}\u{E007F}",
'malicious script closer' => '</script>',
'entity-encoded malicious script closer' => '&lt;/script&gt;',
)
);
$this->interactivity->config( 'myPlugin', array( 'chars' => '&<>/' ) );

$interactivity_data_markup = get_echo( array( $this->interactivity, 'print_client_interactivity_data' ) );
preg_match( '/<script type="application\/json" id="wp-interactivity-data">.*?(\{.*\}).*?<\/script>/s', $interactivity_data_markup, $interactivity_data_string );
preg_match( '~<script type="application/json" id="wp-interactivity-data">\s*(\{.*\})\s*</script>~s', $interactivity_data_markup, $interactivity_data_string );

$this->assertEquals(
'{"config":{"myPlugin":{"tags":"Tags: \u003C!-- \u003Cscript\u003E"}},"state":{"myPlugin":{"amps":"http:\/\/site.test\/?foo=1\u0026baz=2"}}}',
$interactivity_data_string[1]
$expected = <<<"JSON"
{"config":{"myPlugin":{"chars":"&\\u003C\\u003E/"}},"state":{"myPlugin":{"ampersand":"&","less-than sign":"\\u003C","greater-than sign":"\\u003E","solidus":"/","line separator":"\u{2028}","paragraph separator":"\u{2029}","flag of england":"\u{1F3F4}\u{E0067}\u{E0062}\u{E0065}\u{E006E}\u{E0067}\u{E007F}","malicious script closer":"\\u003C/script\\u003E","entity-encoded malicious script closer":"&lt;/script&gt;"}}}
JSON;
$this->assertEquals( $expected, $interactivity_data_string[1] );
}

/**
* Tests that special characters in the initial state and configuration are
* properly escaped when the blog_charset is not UTF-8 (unicode compatible).
*
* This this test, unicode and line terminators should be escaped to their
* JSON unicode sequences.
*
* @ticket 61170
*
* @covers ::state
* @covers ::config
* @covers ::print_client_interactivity_data
*/
public function test_state_and_config_escape_special_characters_non_utf8() {
add_filter( 'pre_option_blog_charset', array( $this, 'charset_iso_8859_1' ) );
$this->interactivity->state(
'myPlugin',
array(
'ampersand' => '&',
'less-than sign' => '<',
'greater-than sign' => '>',
'solidus' => '/',
'line separator' => "\u{2028}",
'paragraph separator' => "\u{2029}",
'flag of england' => "\u{1F3F4}\u{E0067}\u{E0062}\u{E0065}\u{E006E}\u{E0067}\u{E007F}",
'malicious script closer' => '</script>',
'entity-encoded malicious script closer' => '&lt;/script&gt;',
)
);
$this->interactivity->config( 'myPlugin', array( 'chars' => '&<>/' ) );

$interactivity_data_markup = get_echo( array( $this->interactivity, 'print_client_interactivity_data' ) );
preg_match( '~<script type="application/json" id="wp-interactivity-data">\s*(\{.*\})\s*</script>~s', $interactivity_data_markup, $interactivity_data_string );

$expected = <<<"JSON"
{"config":{"myPlugin":{"chars":"&\\u003C\\u003E/"}},"state":{"myPlugin":{"ampersand":"&","less-than sign":"\\u003C","greater-than sign":"\\u003E","solidus":"/","line separator":"\\u2028","paragraph separator":"\\u2029","flag of england":"\\ud83c\\udff4\\udb40\\udc67\\udb40\\udc62\\udb40\\udc65\\udb40\\udc6e\\udb40\\udc67\\udb40\\udc7f","malicious script closer":"\\u003C/script\\u003E","entity-encoded malicious script closer":"&lt;/script&gt;"}}}
JSON;
$this->assertEquals( $expected, $interactivity_data_string[1] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: This could use the HTML Processor to step over the tokens in $interactivity_data_markup as an additional check to ensure that the malicious script closer does not prematurely close the script. Maybe use the PHP's DOM API as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same thing as I worked with these tests. However, in this case I think it makes sense to match and work with literal strings. I don't have much confidence in any of the parsers to do what I expect and not try to interpret any of this as HTML markup. I want to know exactly what characters are output and don't want any entities to be transformed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML Processor doesn't support SCRIPT tags right now, and the tag processor is much more rudimentary. I'm not sure either is ready to handle what you suggest. @dmsnell thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case the Tag Processor should be inscrutable for testing SCRIPT elements, because it does consume the entire thing in one go. you can compare get_modifiable_text() to your expectation to see what was inside the script

}

/**
Expand Down