Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WP_HTML_Tag_Processor: Implement get_attribute_names() method #46685

Closed
wants to merge 8 commits into from

Conversation

ockham
Copy link
Contributor

@ockham ockham commented Dec 20, 2022

What?

Add a get_attribute_names() method to WP_HTML_Tag_Processor, which returns the names of all attributes in a given tag.

$p = new WP_HTML_Tag_Processor( '<div enabled class="test" data-test-id="14">Test</div>' );
$p->next_tag();
$p->get_attribute_names() === array( 'enabled', 'class', 'data-test-id' );

Alternative to adding a get_attribute_by_prefix method (#46672).

Why?

Getting all attributes comes in handy for larger classes of attributes without knowing exactly which ones to expect. Examples include prefixed attributes for datasets (data-) and custom namespaces (e.g. ng- or x-).

Furthermore, it helps with syntax like <img wp-bind:src="myblock.imageSource" />, which is a requirement for the Block Interactivity API (see).

More background: WordPress/block-interactivity-experiments#118 (comment).

How?

Basically returns array_keys( $this->attributes ) 🤷‍♂️

Testing Instructions

See included unit test.

@dmsnell
Copy link
Contributor

dmsnell commented Dec 20, 2022

At first sight I like this approach better than get_all_attributes(). It's relying on data we've already parsed and allocated, and it doesn't have the same worst-case scenario.

Still I wonder if get_attribute_names_with_prefix() might not be a good idea to entertain. The less we expose by default then I would guess the better the using code would be. I'm far less concerned here about allowing zero-length prefixes as it won't be the same foot-gun that get_attributes() is, plus, it may just be annoying enough that it pushes people to only ask for the attributes they care about, leading to less noise on the output.

foreach ( $p->get_attribute_names_prefixed_by( 'wp-' ) as $attribute ) {
	// we won't have to filter out the attributes we don't care about here
}

@adamziel
Copy link
Contributor

adamziel commented Dec 21, 2022

Oh hm, I guess $p->get_attribute_names_prefixed_by( '' ) would work. It is weird, though, isn't it? And being able to inquire about what was just parsed, as in the tag name, the attributes names, and the values, seems like a natural choice. Maybe it would make sense to have both get_attribute_names_prefixed_by and get_attributes_names...?

@dmsnell
Copy link
Contributor

dmsnell commented Dec 21, 2022

It is weird, though, isn't it?

yes, and I like that, because the first impression we get is that we should be asking specifically for what we want rather than asking for everything.

the use of the tag processor with wp directives is also pushing this system into a range of operation that we considered worst-case scenario when developing it. that is, on render, it visits every single HTML tag in a document and inspects every single attribute (or in this case, every attribute name and a possibly-zero subset of those attributes).

if the system is a bit awkward to use in the more dire performance scenarios I think that's better than making the dangerous behaviors the easier approach.

it's more philosophical with this than with get_attributes_prefix_by() though because we've already allocated these attribute name values and can return them quickly with array_keys() though the addition of the prefix not only provides that subtle philosophical bump but its also seems like it can provide a convenience for calling code like in the wp directives because it eliminates incidental filtering by only getting what we want.

@ockham
Copy link
Contributor Author

ockham commented Dec 22, 2022

I agree with your reasoning @dmsnell. One thing to consider: I'm not sure if we'll have forever control over what methods people add to the class, and if something is missing that's all-too "obvious" or frequently requested, there's a chance that people might just go ahead and add it (a get_attributes() method that returns key/value pairs, in this case -- with all its performance implications). Now the question is if either get_attributes_by_prefix() or get_attribute_names() could be "good enough" so they wouldn't feel compelled to add get_attributes() 😬

I think there's a slightly higher chance that people will use get_attribute_names() to filter whatever attributes they're really interested in (especially if their filter criteria can't be expressed as a prefix) and then use get_attribute() on each individual attribute that they care about -- which would avoid the need for get_attributes().

(Then again, prefixes really seem to represent the bulk of filtering scenarios. I have yet to come up with a realistic non-prefix criterion. Anything that y'all can think off?)

it's more philosophical with this than with get_attributes_prefix_by() though because we've already allocated these attribute name values and can return them quickly with array_keys()

Yeah. IMO, the fact that the names are so readily available could prompt people to want to expose them via a get_attribute_names() method (see my initial argument at the top of this comment).


I don't really feel strongly one way or another. Based on the above, I might be slightly inclined towards get_attribute_names. In addition to those arguments, it's also simpler to implement and requires fewer computations by the class method. So I guess it's both "more obvious" and "cheaper", which might be strong enough arguments in its favor.

@adamziel
Copy link
Contributor

adamziel commented Dec 22, 2022

@dmsnell I don't mind asking for all the attributes – it's what I'm used to in many other APIs. As for the awkward method call – I like it because of the reasons you mentioned, and at the same time I don't like how it forces me, the API consumer, into using specific a naming pattern for my attributes.

All in all, I don't have a strong opinion here. I'd slightly prefer get_attribute_names(), but I'm happy to move forward with either.

* @covers WP_HTML_Tag_Processor::get_attribute_names
*/
public function test_get_attribute_names() {
$p = new WP_HTML_Tag_Processor( '<div enabled class="test" data-test-id="14">Test</div>' );
Copy link
Contributor

@adamziel adamziel Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also test what happens when:

  • There are no attributes on a given tag
  • An attribute was added but not flushed
  • An attribute was added and flushed
  • We're on a tag closer
  • We're not on any tag (before the first next_tag)

Any other corner cases you can think of @dmsnell ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, these are all great suggestions -- I'll add them 👍

(Even if we proceed with get_attribute_names_prefixed_by(), we should be able to carry them over 😊 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests for the following in ca06c64:

  • There are no attributes on a given tag
  • We're on a tag closer
  • We're not on any tag (before the first next_tag)

(Mostly by copying and modifying the corresponding get_attribute() tests.) This also included a test that navigates to a non-existing tag and verifies that get_attribute_names() returns null.)

Since there wasn't a test to check that get_attribute() returned null when in a closing tag, I've added one in 8c51859.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the remaining test cases

An attribute was added but not flushed
An attribute was added and flushed

...I'd like to wait for the outcome of #46680 before adding coverage for "added but not flushed", since we might need it.

"Added and flushed" I'll add now 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Added and flushed" I'll add now 👍

5b248c0

@dmsnell
Copy link
Contributor

dmsnell commented Dec 22, 2022

I have yet to come up with a realistic non-prefix criterion

this is probably a good enough reason for now to stick with get_attribute_names_prefixed_by() IMO since we're contrasting something that maintains relative performance characteristics and meets a known need vs. creating something that is arguably more familiar but doesn't have any known need at this time.

to be clear, I was not arguing for get_attributes_prefixed_by() but rather get_attribute_names_prefixed_by()

I'm not sure if we'll have forever control over what methods people add to the class, and if something is missing that's all-too "obvious" or frequently requested, there's a chance that people might just go ahead and add it

Yeah I get that, but while we have the privilege and responsibility I want to let the known needs drive the interface and how it evolves. The tag processor is defined so far by its constraints, and arguably it's already an inconvenient interface because of it (e.g. no "get parent node"), but those constraints are also what makes it possible to exist.

it's what I'm used to

that's no surprise! DOM APIs expose all sorts of conveniences and expectations that we don't have, but the tag processor is fundamentally different than those APIs. the fact that it looks different might actually help us out here by making it more obvious that our other familiarities and assumptions about how it works may not carry over; might help avoid people feeling burnt by this and give up on it.

given that we're not preventing people from reading all the attribute names with this it may even serve as a bridge to help educate people how the API works as they use it.

at the same time I don't like how it forces me, the API consumer, into using specific a naming pattern for my attributes.

I'm probably missing something here because I don't follow how this forces you to make any specific naming choices. the argument, as I understand it, is you probably already have prefixed attributes you care about because that's almost universally how all of these embedded DSLs work, and getting only those prefixed-attributes can actually remove a few steps from calling code because of it.

in the case you want them all you do the dirty thing and request a zero-length prefix. it's good that it stands out, because it provides a moment of pause to reflect on whether that's actually needed (usually isn't). so I'm looking at it like this - how can we preserve the freedom for someone to do this while making the likely-more-appropriate thing the clearer first choice?

@ockham
Copy link
Contributor Author

ockham commented Jan 2, 2023

Rebased (to include #46748).

@ockham
Copy link
Contributor Author

ockham commented Jan 2, 2023

to be clear, I was not arguing for get_attributes_prefixed_by() but rather get_attribute_names_prefixed_by()

My bad, I had indeed read that as get_attributes_by_prefix()!

Your reasoning makes sense to me -- I'll file a PR to implement get_attribute_names_prefixed_by() 😄

@ockham
Copy link
Contributor Author

ockham commented Jan 2, 2023

I'll file a PR to implement get_attribute_names_prefixed_by() 😄

#46840

@ockham
Copy link
Contributor Author

ockham commented Jan 10, 2023

Closing if favor of #46840.

@ockham ockham closed this Jan 10, 2023
@ockham ockham deleted the add/get-attribute-names branch January 10, 2023 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants