Skip to content

Conversation

@sirreal
Copy link
Member

@sirreal sirreal commented Dec 3, 2025

Ensure the class name modifications preserve correct HTML encoding when calling add_class and get_updated_html repeatedly.

There was an inconsistency in this branching where $existing_class would get a correctly decoded attribute value if there was a value returned from get_enqueued_attribute_value() but would otherwise get raw, unparsed HTML if there was a class attribute in the HTML:

$existing_class = $this->get_enqueued_attribute_value( 'class' );
if ( null === $existing_class || true === $existing_class ) {
$existing_class = '';
}
if ( false === $existing_class && isset( $this->attributes['class'] ) ) {
$existing_class = substr(
$this->html,
$this->attributes['class']->value_starts_at,
$this->attributes['class']->value_length
);
}

Note that ::get_enqueued_attribute_value() returns a decoded value:

return WP_HTML_Decoder::decode_attribute( $enqueued_value );

This inconsistent behavior was hidden by the use of esc_attr() which avoids double-escaping character references, so when ::set_attribute() was later called it would only encoded character references that did not appear to already be encoded.

$this->set_attribute( 'class', $class );

When :set_attribute() was changed in r60919 (4892d46) to always escape attribute values, the double-escaping manifested.

WP_HTML_Tag_Processor and WP_HTML_Processor may incorrectly encode class names containing the characters &, <, >, ", or ' when modifying them via class methods like ::add_class() and calling ::get_updated_html().

For example:

$p = new WP_HTML_Tag_Processor('<div></div>');
$p->next_tag();
$p->add_class('&');
echo $p->get_updated_html() . "\n";
$p->add_class('OK');
echo $p->get_updated_html() . "\n";

Will print:

<div class="&amp;"></div>
<div class="&amp;amp; OK"></div>

Notice that the first pass is correct, & has been correctly encoded in the class attribute as &. However, after calling ::add_class() and ::get_updated_html() again, the & has incorrectly been double-encoded as &amp;amp;.

The same code in WordPress 6.8 would print:

<div class="&amp;"></div>
<div class="&amp; OK"></div>

This is related to r60919 that was released in WordPress 6.9. The double-encoding behavior was present before, but it was "corrected" in this case by the use of esc_attr() that avoids any double-encoding. When esc_attr() usage was removed in r60919, the double-escaping behavior manifests causing this issue.

Trac ticket: https://core.trac.wordpress.org/ticket/64340

Originally reported in WordPress/gutenberg#73713.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@sirreal sirreal force-pushed the trac-64340/html-api-double-encoded-class-names branch from e5a60c7 to cf42da8 Compare December 3, 2025 13:16
Comment on lines +2890 to +2894
'expected' => '<pre foo="bar" id="<code" class="wp-block-code &lt;code is poetry&gt; firstTag"><code class="secondTag">This &lt;is> a &lt;strong is="true">thing.</code></pre><span>test</span>',
),
'HTML tag brackets in attribute values and data markup' => array(
'input' => '<pre id="<code-&gt;-block-&gt;" class="wp-block-code <code is poetry&gt;"><code>This &lt;is> a &lt;strong is="true">thing.</code></pre><span>test</span>',
'expected' => '<pre foo="bar" id="<code-&gt;-block-&gt;" class="wp-block-code &lt;code is poetry&amp;gt; firstTag"><code class="secondTag">This &lt;is> a &lt;strong is="true">thing.</code></pre><span>test</span>',
'expected' => '<pre foo="bar" id="<code-&gt;-block-&gt;" class="wp-block-code &lt;code is poetry&gt; firstTag"><code class="secondTag">This &lt;is> a &lt;strong is="true">thing.</code></pre><span>test</span>',
Copy link
Member Author

@sirreal sirreal Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seem to have been ensuring incorrect behavior.

In both cases updated here an input class like poetry&gt; (decoded value poetry>) was changed to poetry&amp;gt; (decoded value poetry&gt;).

They were updated in #10143. This change restores their original (correct) assertion.

@sirreal sirreal marked this pull request as ready for review December 3, 2025 13:38
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@sirreal sirreal requested a review from dmsnell December 3, 2025 13:38
$this->html,
$this->attributes['class']->value_starts_at,
$this->attributes['class']->value_length
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve been staring at this for a bit now and have no idea how we missed it. Surely we didn’t overlook this since 6.2 in the original introduction?

but you know what? we did!

php > $p = new WP_HTML_Tag_Processor( '<div class="&amp; &#x61;bc">' );
php > $p->next_tag();
php > $p->add_class( '&' );
php > $p->add_class( 'abc' );
php > echo $p->get_updated_html();
<div class="&amp; &#x61;bc &amp; abc">
php > $p->remove_class( 'abc' );
php > echo $p->get_updated_html();
<div class="&amp; &#x61;bc &amp;">
php > $p->remove_class( '&' );
php > echo $p->get_updated_html();
<div class="&amp; &#x61;bc &amp;">

now I don’t remember why; maybe it was performance-related or maybe it was a choice to lean on already-existing bugs, but I suppose it’s proper here to fix it regardless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's been a latent issue for a while that was covered up by the behavior of esc_html(). It's become very clear with the change in 6.9 to use more rigorous attribute encoding.

@sirreal
Copy link
Member Author

sirreal commented Dec 3, 2025

@ktmn this is a proposed fix for WordPress/gutenberg#73713.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’d be nice to have performance evaluations of this, but that could be hard. It’s not directly in the hot path, and only happens when flushing updates. Either way, I prefer correctness than raw performance, and WP_HTML_Decoder::decode_attribute() is already checking for the presence of & so in most calls will be little more than a noop.

I tested and confirmed the broken behavior, that it originally appeared in 6.2 with the inception of the Tag Processor, and that the test is corrected. The theory checks out, so this should be solid.

pento pushed a commit that referenced this pull request Dec 3, 2025
Some class names with HTML character references could be mishandled, for example:
- Failure to remove an existing class like `&` with `::remove_class( '&' )`
- Double-encoding of an existing class like `&` after a modification, becoming `&amp;`

The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919].

Developed in #10591.

Props jonsurrell, dmsnell.
Fixes #64340.


git-svn-id: https://develop.svn.wordpress.org/trunk@61346 602fd350-edb4-49c9-b593-d223f7449a82
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

A commit was made that fixes the Trac ticket referenced in the description of this pull request.

SVN changeset: 61346
GitHub commit: 74b60c2

This PR will be closed, but please confirm the accuracy of this and reopen if there is more work to be done.

@github-actions github-actions bot closed this Dec 3, 2025
@sirreal sirreal deleted the trac-64340/html-api-double-encoded-class-names branch December 3, 2025 17:28
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Dec 3, 2025
Some class names with HTML character references could be mishandled, for example:
- Failure to remove an existing class like `&` with `::remove_class( '&' )`
- Double-encoding of an existing class like `&` after a modification, becoming `&amp;`

The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919].

Developed in WordPress/wordpress-develop#10591.

Props jonsurrell, dmsnell.
Fixes #64340.

Built from https://develop.svn.wordpress.org/trunk@61346


git-svn-id: http://core.svn.wordpress.org/trunk@60658 1a063a9b-81f0-0310-95a4-ce76da25c4cd
github-actions bot pushed a commit to gilzow/wordpress-performance that referenced this pull request Dec 3, 2025
Some class names with HTML character references could be mishandled, for example:
- Failure to remove an existing class like `&` with `::remove_class( '&' )`
- Double-encoding of an existing class like `&` after a modification, becoming `&amp;`

The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919].

Developed in WordPress/wordpress-develop#10591.

Props jonsurrell, dmsnell.
Fixes #64340.

Built from https://develop.svn.wordpress.org/trunk@61346


git-svn-id: https://core.svn.wordpress.org/trunk@60658 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants