-
Notifications
You must be signed in to change notification settings - Fork 3.2k
HTML API: Ensure correct encoding of modified class names #10591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Ensure correct encoding of modified class names #10591
Conversation
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
e5a60c7 to
cf42da8
Compare
| 'expected' => '<pre foo="bar" id="<code" class="wp-block-code <code is poetry> firstTag"><code class="secondTag">This <is> a <strong is="true">thing.</code></pre><span>test</span>', | ||
| ), | ||
| 'HTML tag brackets in attribute values and data markup' => array( | ||
| 'input' => '<pre id="<code->-block->" class="wp-block-code <code is poetry>"><code>This <is> a <strong is="true">thing.</code></pre><span>test</span>', | ||
| 'expected' => '<pre foo="bar" id="<code->-block->" class="wp-block-code <code is poetry&gt; firstTag"><code class="secondTag">This <is> a <strong is="true">thing.</code></pre><span>test</span>', | ||
| 'expected' => '<pre foo="bar" id="<code->-block->" class="wp-block-code <code is poetry> firstTag"><code class="secondTag">This <is> a <strong is="true">thing.</code></pre><span>test</span>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seem to have been ensuring incorrect behavior.
In both cases updated here an input class like poetry> (decoded value poetry>) was changed to poetry&gt; (decoded value poetry>).
They were updated in #10143. This change restores their original (correct) assertion.
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
| $this->html, | ||
| $this->attributes['class']->value_starts_at, | ||
| $this->attributes['class']->value_length | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve been staring at this for a bit now and have no idea how we missed it. Surely we didn’t overlook this since 6.2 in the original introduction?
but you know what? we did!
php > $p = new WP_HTML_Tag_Processor( '<div class="& abc">' );
php > $p->next_tag();
php > $p->add_class( '&' );
php > $p->add_class( 'abc' );
php > echo $p->get_updated_html();
<div class="& abc & abc">
php > $p->remove_class( 'abc' );
php > echo $p->get_updated_html();
<div class="& abc &">
php > $p->remove_class( '&' );
php > echo $p->get_updated_html();
<div class="& abc &">now I don’t remember why; maybe it was performance-related or maybe it was a choice to lean on already-existing bugs, but I suppose it’s proper here to fix it regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's been a latent issue for a while that was covered up by the behavior of esc_html(). It's become very clear with the change in 6.9 to use more rigorous attribute encoding.
|
@ktmn this is a proposed fix for WordPress/gutenberg#73713. |
dmsnell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’d be nice to have performance evaluations of this, but that could be hard. It’s not directly in the hot path, and only happens when flushing updates. Either way, I prefer correctness than raw performance, and WP_HTML_Decoder::decode_attribute() is already checking for the presence of & so in most calls will be little more than a noop.
I tested and confirmed the broken behavior, that it originally appeared in 6.2 with the inception of the Tag Processor, and that the test is corrected. The theory checks out, so this should be solid.
Some class names with HTML character references could be mishandled, for example: - Failure to remove an existing class like `&` with `::remove_class( '&' )` - Double-encoding of an existing class like `&` after a modification, becoming `&` The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919]. Developed in #10591. Props jonsurrell, dmsnell. Fixes #64340. git-svn-id: https://develop.svn.wordpress.org/trunk@61346 602fd350-edb4-49c9-b593-d223f7449a82
Some class names with HTML character references could be mishandled, for example: - Failure to remove an existing class like `&` with `::remove_class( '&' )` - Double-encoding of an existing class like `&` after a modification, becoming `&` The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919]. Developed in WordPress/wordpress-develop#10591. Props jonsurrell, dmsnell. Fixes #64340. Built from https://develop.svn.wordpress.org/trunk@61346 git-svn-id: http://core.svn.wordpress.org/trunk@60658 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Some class names with HTML character references could be mishandled, for example: - Failure to remove an existing class like `&` with `::remove_class( '&' )` - Double-encoding of an existing class like `&` after a modification, becoming `&` The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919]. Developed in WordPress/wordpress-develop#10591. Props jonsurrell, dmsnell. Fixes #64340. Built from https://develop.svn.wordpress.org/trunk@61346 git-svn-id: https://core.svn.wordpress.org/trunk@60658 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Ensure the class name modifications preserve correct HTML encoding when calling
add_classandget_updated_htmlrepeatedly.There was an inconsistency in this branching where
$existing_classwould get a correctly decoded attribute value if there was a value returned fromget_enqueued_attribute_value()but would otherwise get raw, unparsed HTML if there was a class attribute in the HTML:wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php
Lines 2339 to 2350 in 3a87241
Note that
::get_enqueued_attribute_value()returns a decoded value:wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php
Line 2734 in 3a87241
This inconsistent behavior was hidden by the use of
esc_attr()which avoids double-escaping character references, so when::set_attribute()was later called it would only encoded character references that did not appear to already be encoded.wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php
Line 2479 in 3a87241
When
:set_attribute()was changed in r60919 (4892d46) to always escape attribute values, the double-escaping manifested.Trac ticket: https://core.trac.wordpress.org/ticket/64340
Originally reported in WordPress/gutenberg#73713.
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.