Skip to content

Conversation

@brusdev
Copy link
Member

@brusdev brusdev commented Jan 4, 2024

No description provided.

@jbertram
Copy link
Contributor

jbertram commented Jan 4, 2024

At this point I think we should stick with the currently documented behavior, i.e. wildcards match (whole) words separated by a delimiter. The design of the matching is to be hierarchical which is relatively easy to understand and configure with words separated by a delimiter and wildcards that represent one (i.e. * by default) or more (i.e. # by default) words. The fact that partial matches work now (for whatever reason) is not a sufficient reason to change the documented functionality. I chalk this up to an implementation detail and not something that users should rely upon as it may change in the future. It is an undocumented, incidental behavior.

I'm curious about others' thoughts.

@jbertram
Copy link
Contributor

jbertram commented Jan 5, 2024

Thinking about this more...I'm more against it than before. I believe that opening this door is going to be bad for usability - both for users and developers.

Right now, * means a single word. If we start accepting "partial words" then what does * become? Is it a single word when used alone and then something else when used with a partial word? If the latter, is it any single character? Is it 0 or more of any character (e.g. as it might be in a regular expression)? Also, how do partial words compare to each other in a hierarchy where matches are ordered from general to specific? Would ab* be more specific than a* when matching abc? What about a*c? Is that even supported? Where do we draw the line?

Furthermore, what do we do with #? Should we support partial matches with it? If so, what does that mean? If not, why not?

The potential configurations start to expand very quickly and will no doubt add complication to the code, the test-suite, and the documentation.

The currently documented functionality is simple & powerful, and we should keep it that way.

If there's a bug here it's that undocumented behavior is allowed and somewhat functional leading folks to assume it's intentional. I'm not saying we should fix that necessarily, but we should at least consider it so we don't keep letting folks get confused.

@brusdev
Copy link
Member Author

brusdev commented Jan 8, 2024

The design of the matching is to be hierarchical which is relatively easy to understand and configure with words separated by a delimiter and wildcards that represent one (i.e. * by default) or more (i.e. # by default) words.

Partial words also would be hierarchical, I mean * would never match a delimiter.

Right now, * means a single word. If we start accepting "partial words" then what does * become? Is it a single word when used alone and then something else when used with a partial word? If the latter, is it any single character? Is it 0 or more of any character (e.g. as it might be in a regular expression)?

My tentative was to implement the behavior similar to the * in the shells: matching zero or more characters but not the delimiter to respect the hierarchy.

Also, how do partial words compare to each other in a hierarchy where matches are ordered from general to specific? Would ab* be more specific than a* when matching abc? What about a*c? Is that even supported? Where do we draw the line?

Good catch, I hadn't thought to this use case but if that would be supported then a*c would be more specific of a*

Furthermore, what do we do with #? Should we support partial matches with it? If so, what does that mean? If not, why not?

Theoretically, * should be enough for any partial match use cases because # matches zero or more words.

The potential configurations start to expand very quickly and will no doubt add complication to the code, the test-suite, and the documentation.

This is an important point from the development point of view, are you thinking to any specific cases?

If there's a bug here it's that undocumented behavior is allowed and somewhat functional leading folks to assume it's intentional. I'm not saying we should fix that necessarily, but we should at least consider it so we don't keep letting folks get confused.

My tentative was to clarify this gray area without causing issues to users that are already using this officially unsupported behavior.

Comment on lines +52 to +56
But `news.*` would _not_ match:

* `news.europe.sport`
* `news.usa`
* `news.usa.sport`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of whether we should or shouldnt do partial matching (I've yet to think on that)...this bit of documentation seems incorrect. Guessing the noted address/other wildcard wasnt updated as intended.


/**
* Compares to matches to see which one is more specific.
* A match on the any-words delimiter (#) is considered less specific than a match without it, i.e. abc.def.# is less specific than abc.def and abc.def and abc.d*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either repetition or some other issue: "is less specific than abc.def and abc.def"

== Matching a Single or Partial Word

The character `*` means "match a single word".
The character `*` means "match a single or partial word".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not clear whether the "single-word" configuration item (detailed lower down this documentation) which controls single word matches would also change this partial-word matching.

@jbertram
Copy link
Contributor

jbertram commented Jan 8, 2024

Good catch, I hadn't thought to this use case but if that would be supported then a*c would be more specific of a*

What about multiple * characters? For example, would a*c* match abcd and ac and abc?

This is an important point from the development point of view, are you thinking to any specific cases?

I'm not thinking about any specific case. I'm mainly thinking that the possible combinations that need to be tested will increase substantially with this change, especially if multiple * characters are supported.

This change will mean that while * by itself still means a single word when * is combined with other characters it will completely change its meaning to zero or more characters. I think this will ultimately hurt usability.

@brusdev brusdev marked this pull request as draft January 12, 2024 07:36
@brusdev
Copy link
Member Author

brusdev commented Jan 12, 2024

@gemmellr @jbertram thanks for your feedback, I converted this PR to draft because I need more time to think.

@brusdev brusdev closed this Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants