Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Remove all duplicate copies of an attribute when removing #4317

Conversation

dmsnell
Copy link
Contributor

@dmsnell dmsnell commented Apr 11, 2023

Description

Trac ticket #58119

When encountering an HTML tag with duplicate copies of an attribute the tag processor ignores the duplicate values, according to the specification. However, when removing an attribute it must remove all copies of that attribute lest one of the duplicates becomes the primary and it appears as if no attributes were removed.

In this patch we're adding tests that will be used to ensure that all attribute copies are removed from a tag when one is request to be removed.

Before

$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
// <br id="two" id='three' id>

After

$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
// <br>

Previously we have been overlooking duplicate attributes since they don't have an impact on what parses into the DOM. However, as one unit test affirmed (asserting the presence of the bug in the tag processor) when removing an attribute where duplicates exist this meant we ended up changing the value of an attribute instead of removing it.

In this patch we're tracking the text spans of the parsed duplicate attributes so that if we attempt to remove them then we'll have the appropriate information necessary to do so. When an attribute isn't removed we'll simply forget about the tracked duplicates. This involves some overhead for normal operation when in fact there are duplicate attributes on a tag, but that overhead is minimal in the form of integer pairs of indices for each duplicated attribute.

cc: @adamziel @ockham

@dmsnell dmsnell changed the title HTML API: Delete all copies of an attribute when removing and duplicates exist HTML API: Remove all duplicate copies of an attribute when removing Apr 12, 2023
@dmsnell dmsnell marked this pull request as ready for review April 12, 2023 13:23
@adamziel
Copy link
Contributor

The logic LGTM @dmsnell, thank you! I left a few nitpicks about formatting, but that's it.

@dmsnell dmsnell force-pushed the html-api/fix-remove-all-duplicated-attributes branch 4 times, most recently from a123d29 to 84cbdde Compare August 22, 2023 02:16
@dmsnell dmsnell force-pushed the html-api/fix-remove-all-duplicated-attributes branch from 84cbdde to f91eece Compare August 22, 2023 03:34
@dmsnell dmsnell force-pushed the html-api/fix-remove-all-duplicated-attributes branch 2 times, most recently from d493967 to 16555d0 Compare September 5, 2023 20:16
@dmsnell dmsnell force-pushed the html-api/fix-remove-all-duplicated-attributes branch from 16555d0 to ba10849 Compare September 18, 2023 23:55
Comment on lines +1306 to +1309
* While `$this->duplicated_attributes` could always be stored as an `array()`,
* which would simplify the logic here, storing a `null` and only allocating
* an array when encountering duplicates avoids needless allocations in the
* normative case of parsing tags with no duplicate attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm reading this correctly, but allocating one empty array (per tag processor) doesn't seem terribly expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. doesn't seem that it's any clearer though to be more expensive than less does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ever so slightly clearer, but fine to keep as-is 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point I'll try and make a reasonable benchmark. until then I'll leave it at avoiding allocation. I'm afraid of allocation

@@ -1067,7 +1061,7 @@ public function test_remove_first_when_duplicated_attribute() {
$p->remove_attribute( 'id' );

$this->assertSame(
'<div id="ignored-id"><span id="second">Text</span></div>',
'<div ><span id="second">Text</span></div>',
$p->get_updated_html(),
'First attribute (when duplicates exist) was not removed'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'First attribute (when duplicates exist) was not removed'
'First attribute (and duplicates) were not removed'

maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the test to try and be more specific that this one test is solely about removing the first instance of the attribute, as when updating attributes it's the first that matters. Previously this test simply locked-in the behavior that the first was the one to go.

Copy link
Contributor

@ockham ockham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Left a few minor notes (mostly non-blocking).

Planning to land this (with updated @since PHPDocs) later tonight, so we have it ready for both 6.4 and 6.3.2.

dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Sep 25, 2023
…tes exist. (WordPress#4317)

When encountering an HTML tag with duplicate copies of an attribute the tag
processor ignores the duplicate values, according to the specification. However,
when removing an attribute it must remove all copies of that attribute lest
one of the duplicates becomes the primary and it appears as if no attributes
were removed.

In this patch we're adding tests that will be used to ensure that all attribute
copies are removed from a tag when one is requested to be removed.

Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
@dmsnell dmsnell force-pushed the html-api/fix-remove-all-duplicated-attributes branch from ba10849 to eecadb6 Compare September 25, 2023 18:21
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Sep 25, 2023
…tes exist. (WordPress#4317)

When encountering an HTML tag with duplicate copies of an attribute the tag
processor ignores the duplicate values, according to the specification. However,
when removing an attribute it must remove all copies of that attribute lest
one of the duplicates becomes the primary and it appears as if no attributes
were removed.

In this patch we're adding tests that will be used to ensure that all attribute
copies are removed from a tag when one is requested to be removed.

Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
@dmsnell dmsnell force-pushed the html-api/fix-remove-all-duplicated-attributes branch from eecadb6 to 3399f68 Compare September 25, 2023 18:26
…tes exist. (WordPress#4317)

When encountering an HTML tag with duplicate copies of an attribute the tag
processor ignores the duplicate values, according to the specification. However,
when removing an attribute it must remove all copies of that attribute lest
one of the duplicates becomes the primary and it appears as if no attributes
were removed.

In this patch we're adding tests that will be used to ensure that all attribute
copies are removed from a tag when one is requested to be removed.

Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
Copy link
Contributor

@ockham ockham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ockham
Copy link
Contributor

ockham commented Sep 25, 2023

Committed to Core trunk in https://core.trac.wordpress.org/changeset/56684.

@ockham ockham closed this Sep 25, 2023
@ockham
Copy link
Contributor

ockham commented Sep 25, 2023

Committed to Core's 6.3 branch in https://core.trac.wordpress.org/changeset/56685.

@dmsnell dmsnell deleted the html-api/fix-remove-all-duplicated-attributes branch September 25, 2023 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants