Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Remove all duplicate copies of an attribute when removing #4317

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
51 changes: 46 additions & 5 deletions src/wp-includes/html-api/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,16 @@ class WP_HTML_Tag_Processor {
*/
private $attributes = array();

/**
* Tracks spans of duplicate attributes on a given tag, used for removing
* all copies of an attribute when calling `remove_attribute()`.
*
* @since 6.3.2
*
* @var (WP_HTML_Span[])[]|null
*/
private $duplicate_attributes = null;

/**
* Which class names to add or remove from a tag.
*
Expand Down Expand Up @@ -1286,6 +1296,25 @@ private function parse_next_attribute() {
$attribute_end,
! $has_value
);

return true;
}

/*
* Track the duplicate attributes so if we remove it, all disappear together.
*
* While `$this->duplicated_attributes` could always be stored as an `array()`,
* which would simplify the logic here, storing a `null` and only allocating
* an array when encountering duplicates avoids needless allocations in the
* normative case of parsing tags with no duplicate attributes.
Comment on lines +1306 to +1309
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm reading this correctly, but allocating one empty array (per tag processor) doesn't seem terribly expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. doesn't seem that it's any clearer though to be more expensive than less does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ever so slightly clearer, but fine to keep as-is 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point I'll try and make a reasonable benchmark. until then I'll leave it at avoiding allocation. I'm afraid of allocation

*/
$duplicate_span = new WP_HTML_Span( $attribute_start, $attribute_end );
if ( null === $this->duplicate_attributes ) {
$this->duplicate_attributes = array( $comparable_name => array( $duplicate_span ) );
} elseif ( ! array_key_exists( $comparable_name, $this->duplicate_attributes ) ) {
$this->duplicate_attributes[ $comparable_name ] = array( $duplicate_span );
} else {
$this->duplicate_attributes[ $comparable_name ][] = $duplicate_span;
}

return true;
Expand All @@ -1307,11 +1336,12 @@ private function skip_whitespace() {
*/
private function after_tag() {
$this->get_updated_html();
$this->tag_name_starts_at = null;
$this->tag_name_length = null;
$this->tag_ends_at = null;
$this->is_closing_tag = null;
$this->attributes = array();
$this->tag_name_starts_at = null;
$this->tag_name_length = null;
$this->tag_ends_at = null;
$this->is_closing_tag = null;
$this->attributes = array();
$this->duplicate_attributes = null;
}

/**
Expand Down Expand Up @@ -2080,6 +2110,17 @@ public function remove_attribute( $name ) {
''
);

// Removes any duplicated attributes if they were also present.
if ( null !== $this->duplicate_attributes && array_key_exists( $name, $this->duplicate_attributes ) ) {
foreach ( $this->duplicate_attributes[ $name ] as $attribute_token ) {
$this->lexical_updates[] = new WP_HTML_Text_Replacement(
$attribute_token->start,
$attribute_token->end,
''
);
}
}

return true;
}

Expand Down
69 changes: 59 additions & 10 deletions tests/phpunit/tests/html-api/wpHtmlTagProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1049,15 +1049,9 @@ public function test_next_tag_and_set_attribute_in_a_loop_update_all_tags_in_the
* Removing an attribute that's listed many times, e.g. `<div id="a" id="b" />` should remove
* all its instances and output just `<div />`.
*
* Today, however, WP_HTML_Tag_Processor only removes the first such attribute. It seems like a corner case
* and introducing additional complexity to correctly handle this scenario doesn't seem to be worth it.
* Let's revisit if and when this becomes a problem.
* @since 6.3.2 Removes all duplicated attributes as expected.
*
* This test is in place to confirm this behavior, which while incorrect, is well-defined.
* A later fix introduced to the Tag Processor should update this test to reflect the
* wanted and correct behavior.
*
* @ticket 56299
* @ticket 58119
*
* @covers WP_HTML_Tag_Processor::remove_attribute
*/
Expand All @@ -1066,8 +1060,8 @@ public function test_remove_first_when_duplicated_attribute() {
$p->next_tag();
$p->remove_attribute( 'id' );

$this->assertSame(
'<div id="ignored-id"><span id="second">Text</span></div>',
$this->assertStringNotContainsString(
'update-me',
$p->get_updated_html(),
'First attribute (when duplicates exist) was not removed'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'First attribute (when duplicates exist) was not removed'
'First attribute (and duplicates) were not removed'

maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the test to try and be more specific that this one test is solely about removing the first instance of the attribute, as when updating attributes it's the first that matters. Previously this test simply locked-in the behavior that the first was the one to go.

);
Expand All @@ -1090,6 +1084,61 @@ public function test_remove_attribute_with_an_existing_attribute_name_removes_it
);
}

/**
* @ticket 58119
*
* @since 6.3.2 Removes all duplicated attributes as expected.
*
* @covers WP_HTML_Tag_Processor::remove_attribute
*
* @dataProvider data_html_with_duplicated_attributes
*/
public function test_remove_attribute_with_duplicated_attributes_removes_all_of_them( $html_with_duplicate_attributes, $attribute_to_remove ) {
$p = new WP_HTML_Tag_Processor( $html_with_duplicate_attributes );
$p->next_tag();

$p->remove_attribute( $attribute_to_remove );
$this->assertNull( $p->get_attribute( $attribute_to_remove ), 'Failed to remove all copies of an attribute when duplicated in modified source.' );

// Recreate a tag processor with the updated HTML after removing the attribute.
$p = new WP_HTML_Tag_Processor( $p->get_updated_html() );
dmsnell marked this conversation as resolved.
Show resolved Hide resolved
$p->next_tag();
$this->assertNull( $p->get_attribute( $attribute_to_remove ), 'Failed to remove all copies of duplicated attributes when getting updated HTML.' );
}

/**
* @ticket 58119
*
* @since 6.3.2 Removes all duplicated attributes as expected.
*
* @covers WP_HTML_Tag_Processor::remove_attribute
*/
public function test_previous_duplicated_attributes_are_not_removed_on_successive_tag_removal() {
dmsnell marked this conversation as resolved.
Show resolved Hide resolved
$p = new WP_HTML_Tag_Processor( '<span id=one id=two id=three><span id=four>' );
$p->next_tag();
$p->next_tag();
$p->remove_attribute( 'id' );

$this->assertSame( '<span id=one id=two id=three><span >', $p->get_updated_html() );
}

/**
* Data provider.
*
* @ticket 58119
*
* @return array[].
*/
public function data_html_with_duplicated_attributes() {
dmsnell marked this conversation as resolved.
Show resolved Hide resolved
return array(
'Double attributes' => array( '<div id=one id=two>', 'id' ),
'Triple attributes' => array( '<div id=one id=two id=three>', 'id' ),
'Duplicates around another' => array( '<img src="test.png" alt="kites flying in the wind" src="kites.jpg">', 'src' ),
'Case-variants of attribute' => array( '<button disabled inert DISABLED dISaBled INERT DisABleD>', 'disabled' ),
'Case-variants of attribute name' => array( '<button disabled inert DISABLED dISaBled INERT DisABleD>', 'DISABLED' ),
);
}

/**
* @ticket 56299
*
Expand Down