Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComboboxControl: Added less restrictive search matching. #41952

Closed

Conversation

markbiek
Copy link
Contributor

@markbiek markbiek commented Jun 24, 2022

What?

Added less restrictive search matching to the ComboboxControl

This is related to Calypso issue Automattic/wp-calypso#64677

Why?

A number of places in Calypso use the ComboboxControl to display a combination of countries and states using the label format COUNTRY -- STATE. This makes matching difficult unless you type out the entire label.

How?

If the "starts-with" and "contains" searches fail, we attempt to match by removing all puncuation from the option. Any results are appended to the other results so we don't omit anything that would have previously been displayed.

Testing Instructions

There is a new Storybook story demonstrating the new prop:

http://localhost:50240/?path=/story/components-comboboxcontrol--loose-match

Screenshots or screencast

Current behavior

CleanShot.2022-06-24.at.13.13.52.mp4

Custom search behavior

CleanShot.2022-06-24.at.13.13.18.mp4

@markbiek markbiek requested a review from ajitbohra as a code owner June 24, 2022 17:02
@github-actions github-actions bot added the First-time Contributor Pull request opened by a first-time contributor to Gutenberg repository label Jun 24, 2022
@github-actions
Copy link

👋 Thanks for your first Pull Request and for helping build the future of Gutenberg and WordPress, @markbiek! In case you missed it, we'd love to have you join us in our Slack community, where we hold regularly weekly meetings open to anyone to coordinate with each other.

If you want to learn more about WordPress development in general, check out the Core Handbook full of helpful information.

@markbiek markbiek force-pushed the 64677-update/improved-combobox-search branch from 922d8ae to eda9e32 Compare June 24, 2022 17:05
@markbiek markbiek changed the title WIP: 64677 update/improved combobox search ComboboxControl: Added a new prop to allow for less restrictive search matching. Jun 24, 2022
@alexstine alexstine added [Type] Enhancement A suggestion for improvement. [Package] Components /packages/components labels Jun 25, 2022
@alexstine alexstine added this to In progress (owned) ⏳ in WordPress Components via automation Jun 25, 2022
Copy link
Member

@amustaque97 amustaque97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given minor comments.

Thank you for your contribution 👏🏻

packages/components/src/combobox-control/README.md Outdated Show resolved Hide resolved
packages/components/src/combobox-control/README.md Outdated Show resolved Hide resolved
@mirka mirka requested review from ciampo and mirka June 27, 2022 12:16
Copy link
Member

@mirka mirka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the improvement suggestion! It really helps us to see limitations in real-life use cases like this.

API-wise, I'd prefer we figure something out that is either a usability enhancement for the general case, or a customizable extension for any consumer.

For example, I might agree that normalizing em dashes and en dashes down to a hyphen is a usability enhancement for the general case, similar to how deburr is.

However, I think that removing a predefined set of characters from the string is too restrictive and arbitrary to warrant a boolean prop like looseSearch. Consumers will reasonably want a different set of characters to remove from their strings. If we want to enhance the search matching functionality in this direction, it would make sense to add a more flexible API. For example, a property in the option object that allows the consumer to provide a separate string just for matching. Or a prop that allows the consumer to bring their own string match logic.

What do folks think? If we agree that normalizing en/em dashes to hyphens is a generally good idea and is a non-breaking change, we can just implement that without adding a new prop. I feel like that would be the easiest way forward in this particular case, given the complexity of updating the logic that adds underlines to the matched string range.

packages/components/src/combobox-control/stories/index.js Outdated Show resolved Hide resolved
@markbiek markbiek force-pushed the 64677-update/improved-combobox-search branch 3 times, most recently from 84a2cba to 8124440 Compare June 29, 2022 14:06
@markbiek markbiek changed the title ComboboxControl: Added a new prop to allow for less restrictive search matching. ComboboxControl: Added custom match string handling to allow for less restrictive search matching. Jun 29, 2022
@amustaque97
Copy link
Member

LGTM 🚀

Copy link
Contributor

@ciampo ciampo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with the points made by @mirka:

API-wise, I'd prefer we figure something out that is either a usability enhancement for the general case, or a customizable extension for any consumer.

With respect to the latest approach taken in the PR (adding an optional match key to the array of objects passed as the options prop), I'd like to point out that we're considering deprecating the options props from CustomSelectControl and related components like ComboboxControl in favour of a more declarative approach (see #41466 for more details).

Therefore, I'd prefer to look at alternatives approaches that will allows us to keep this feature even in case we deprecate the options in the future. Two alternatives come to mind:

  1. if we believe that "normalising" all hypens matches is a feature that will benefit all users of the component, we may just add this further normalisation in the private implementation of the component. We could even go a step forward and look at "normalising" all punctuation?
  2. if we want to make this feature more flexible, we could add a new prop that allows a user to add an intermediate layer of "string normalisation" that gets applied before trying to match the search string to the combobox items

In case this fix is urgent, I suggest we look at implementing option 1 for now.

Regarding option 2, we'd need to make sure that, in case we switch to using ariakit under the hood, we'll have a way to provide the same "custom string normalization" prop to ariakit's combobox component — @diegohaz , do you have any pointers on this, so that we can future-proof this feature and have it working in case we switch to using ariakit ?

packages/components/CHANGELOG.md Outdated Show resolved Hide resolved
@diegohaz
Copy link
Member

@diegohaz , do you have any pointers on this, so that we can future-proof this feature and have it working in case we switch to using ariakit ?

While Ariakit has a basic filtering API built-in, that's optional, and you have full control over the filtering logic as long as you render the matching items using ComboboxItem. Something like this works:

import {
  useComboboxState,
  Combobox,
  ComboboxPopover,
  ComboboxItem,
} from "ariakit/combobox";

function ComboboxControl() {
  const combobox = useComboboxState();
  const matchingSuggestions = useMemo(/* filtering logic here */);
  return (
    <>
      <Combobox state={combobox} />
      <ComboboxPopover state={combobox}>
        {matchingSuggestions.map((item) => (
          <ComboboxItem key={item.value} value={item.value}>{item.label}</ComboboxItem>
        ))}
      </ComboboxPopover>
    </>
  );
}

For reference, here's a video where I use an external API to filter the items: https://www.youtube.com/watch?v=dcm8fjBfro8

I'd like to point out that we're considering deprecating the options props from CustomSelectControl and related components like ComboboxControl in favour of a more declarative approach (see #41466 for more details).

I understand that's about deprecating options in favor of passing the items in JSX, right?

Since ComboboxControl filters the options internally, I'm not sure I'd get rid of options on this component. You'll need some data to get the matching suggestions. While you can do that with JSX, a plain array of strings or objects is generally better.

But I can think of some APIs to make it more flexible:

// Filtering and rendering logic is controlled by the user:
<ComboboxControl>
  <ComboboxControlOption value="a" label="A" />
  <ComboboxControlOption value="b" label="B" />
  <ComboboxControlOption value="c" label="C" />
</ComboboxControl>

// Filtering and rendering logic is controlled by the library:
const options = [
  { value: "a", label: "A" },
  { value: "b", label: "B" },
  { value: "c", label: "C" },
]
<ComboboxControl options={options} />

// Filtering logic is controlled by the library. Rendering logic, by the user:
<ComboboxControl
  options={options}
  renderOption={(option) => <ComboboxControlOption {...option} />}
/>

@markbiek
Copy link
Contributor Author

Thanks everyone! This is all really great feedback and I think the direction the control is heading is a good one, especially in terms of making things more declarative and moving to ariakit under the hood.

if we believe that "normalising" all hypens matches is a feature that will benefit all users of the component, we may just add this further normalisation in the private implementation of the component. We could even go a step forward and look at "normalising" all punctuation?

That said, I would love to implement the above and implement the removal of all punctuation when matching to resolve this particular issue.

Is that something that everyone would approve of in the short-term?

@ciampo
Copy link
Contributor

ciampo commented Jun 30, 2022

That said, I would love to implement the above and implement the removal of all punctuation when matching to resolve this particular issue.
Is that something that everyone would approve of in the short-term?

Hey @markbiek, I'd be ok with that.

What we could do is to:

  • match the search string against the list of options, like it happens on trunk
  • in parallel, match the search string also against a transformed version of the list of options, where we replace all punctuation with an empty string
  • merge the search results of the 2 lists of matches, and display it to the user

This way, we're just making the search slightly looser, but without omitting any search results that would have been previously displayed to the end user.

What do you think?


I will also add that, for the specific scenario highlighted in Automattic/wp-calypso#64677, the correct solution would be to use option groups for Countries, and options only for state/provinces.

@ciampo
Copy link
Contributor

ciampo commented Jun 30, 2022

@diegohaz , thank you so much for your quick reply — it's super useful!

While Ariakit has a basic filtering API built-in, that's optional, and you have full control over the filtering logic as long as you render the matching items using ComboboxItem

Thank you for the reference, I'll take a closer look at the examples and at the YouTube video.

I understand that's about deprecating options in favor of passing the items in JSX, right?

Yes, that's correct

Since ComboboxControl filters the options internally, I'm not sure I'd get rid of options on this component. You'll need some data to get the matching suggestions. While you can do that with JSX, a plain array of strings or objects is generally better.

But I can think of some APIs to make it more flexible:

Thank you again for your suggestion.

Probably the best thing to do at this point is go ahead with the shorter term fix as drafted in #41952 (comment), and discuss about the mid-long term approach separately from this PR (maybe in #41466)

@markbiek
Copy link
Contributor Author

markbiek commented Jul 5, 2022

Thanks again everyone for the feedback. I'll move forward with the short term fix described in #41952 (comment)

@markbiek markbiek force-pushed the 64677-update/improved-combobox-search branch 2 times, most recently from c88b161 to 8a95217 Compare July 5, 2022 20:37
@markbiek
Copy link
Contributor Author

markbiek commented Jul 5, 2022

@ciampo I pushed an updated implementation based on our discussion above. Let me know if you have any feedback!

@markbiek markbiek changed the title ComboboxControl: Added custom match string handling to allow for less restrictive search matching. ComboboxControl: Added less restrictive search matching. Jul 5, 2022
@mirka
Copy link
Member

mirka commented Jul 5, 2022

Sorry, I was AFK for a couple days and wasn't following this thread!

What's our plan for the "matched string underline" logic in this scenario? One reason I recommended normalization (as opposed to character removal) as the quickest fix is so we don't have to spend time reworking the underline logic 😆 It will work as before as long as the character count remains the same.

Not saying the underline is a must-have feature, but whatever UX regression we'd be introducing by potentially removing it altogether, or the complexity we'd need to add to make it work for fuzzy searches, should be a part of the cost/benefit analysis.

@ciampo
Copy link
Contributor

ciampo commented Jul 6, 2022

@mirka has definitely got a point (which I didn't consider in my previous message) — in fact, it's currently possible to reproduce the underlining inconsistency described above in the new Storybook example:

Screenshot 2022-07-06 at 12 23 10

Not saying the underline is a must-have feature, but whatever UX regression we'd be introducing by potentially removing it altogether, or the complexity we'd need to add to make it work for fuzzy searches, should be a part of the cost/benefit analysis.

This is very true. At this point, an internal string normalization (which only replaces characters, without changing the length of the string) seems like the most conservative option, since it won't introduce changes to the component's APIs and it won't cause regressions to the match highlighting logic. How do folks feel about this approach? Is there any better alternative, in terms of tradeoffs?

(@markbiek , I'm sorry if this PR is taking longer than expected and if you feel pulled in many directions! We need to ensure that we make the best overall decision, factoring in many use-cases 🙇 )


Finally, just noting that this PR needs a rebase after #42169 got merged.

@markbiek
Copy link
Contributor Author

@ciampo Can you give me more detail about this new approach? I'm not entirely sure I'm understanding. 🙏

const LooseMatchComboboxControl = () => {
const [ value, setValue ] = useState( null );

const countryStateOptions = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match key is not needed anymore in this sample data object.

Also, I've added a couple of different dashes to showcase the new "dash normalization" feature

Click to expand
diff --git a/packages/components/src/combobox-control/stories/index.js b/packages/components/src/combobox-control/stories/index.js
index 5f1100a59d..11111ae609 100644
--- a/packages/components/src/combobox-control/stories/index.js
+++ b/packages/components/src/combobox-control/stories/index.js
@@ -291,56 +291,24 @@ const LooseMatchComboboxControl = () => {
 	const [ value, setValue ] = useState( null );
 
 	const countryStateOptions = [
-		{ value: 'AO:BGO', label: 'Angola — Bengo', match: 'angola bengo' },
-		{
-			value: 'AO:BLU',
-			label: 'Angola — Benguela',
-			match: 'angola — benguela',
-		},
-		{ value: 'AO:BIE', label: 'Angola — Bié', match: 'angola bié' },
-		{
-			value: 'AO:CAB',
-			label: 'Angola — Cabinda',
-			match: 'angola — cabinda',
-		},
-		{ value: 'AO:CNN', label: 'Angola — Cunene', match: 'angola cunene' },
-		{ value: 'AO:HUA', label: 'Angola — Huambo', match: 'angola huambo' },
-		{ value: 'AO:HUI', label: 'Angola — Huíla', match: 'angola huíla' },
-		{
-			value: 'AO:CCU',
-			label: 'Angola — Kuando Kubango',
-			match: 'angola kuando kubango',
-		},
-		{
-			value: 'AO:CNO',
-			label: 'Angola — Kwanza-Norte',
-			match: 'angola kwanza norte',
-		},
-		{
-			value: 'AO:CUS',
-			label: 'Angola — Kwanza-Sul',
-			match: 'angola kwanza sul',
-		},
-		{ value: 'AO:LUA', label: 'Angola — Luanda', match: 'angola luanda' },
-		{
-			value: 'AO:LNO',
-			label: 'Angola — Lunda-Norte',
-			match: 'angola lunda norte',
-		},
-		{
-			value: 'AO:LSU',
-			label: 'Angola — Lunda-Sul',
-			match: 'angola lunda sul',
-		},
-		{
-			value: 'AO:MAL',
-			label: 'Angola — Malanje',
-			match: 'angola malanje',
-		},
-		{ value: 'AO:MOX', label: 'Angola — Moxico', match: 'angola moxico' },
-		{ value: 'AO:NAM', label: 'Angola — Namibe', match: 'angola namibe' },
-		{ value: 'AO:UIG', label: 'Angola — Uíge', match: 'angola uíge' },
-		{ value: 'AO:ZAI', label: 'Angola — Zaire', match: 'angola zaire' },
+		{ value: 'AO:BGO', label: 'Angola ~ Bengo' },
+		{ value: 'AO:BLU', label: 'Angola - Benguela' },
+		{ value: 'AO:BIE', label: 'Angola – Bié' },
+		{ value: 'AO:CAB', label: 'Angola — Cabinda' },
+		{ value: 'AO:CNN', label: 'Angola — Cunene' },
+		{ value: 'AO:HUA', label: 'Angola — Huambo' },
+		{ value: 'AO:HUI', label: 'Angola — Huíla' },
+		{ value: 'AO:CCU', label: 'Angola — Kuando Kubango' },
+		{ value: 'AO:CNO', label: 'Angola — Kwanza-Norte' },
+		{ value: 'AO:CUS', label: 'Angola — Kwanza-Sul' },
+		{ value: 'AO:LUA', label: 'Angola — Luanda' },
+		{ value: 'AO:LNO', label: 'Angola — Lunda-Norte' },
+		{ value: 'AO:LSU', label: 'Angola — Lunda-Sul' },
+		{ value: 'AO:MAL', label: 'Angola — Malanje' },
+		{ value: 'AO:MOX', label: 'Angola — Moxico' },
+		{ value: 'AO:NAM', label: 'Angola — Namibe' },
+		{ value: 'AO:UIG', label: 'Angola — Uíge' },
+		{ value: 'AO:ZAI', label: 'Angola — Zaire' },
 	];
 
 	return (

Copy link
Contributor

@ciampo ciampo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ciampo Can you give me more detail about this new approach? I'm not entirely sure I'm understanding. 🙏

Sure, what I meant is something like this:

Click to expand
diff --git a/packages/components/src/combobox-control/index.js b/packages/components/src/combobox-control/index.js
index 3b86561f1b..7daad5d2d3 100644
--- a/packages/components/src/combobox-control/index.js
+++ b/packages/components/src/combobox-control/index.js
@@ -31,6 +31,76 @@ import withFocusOutside from '../higher-order/with-focus-outside';
 
 const noop = () => {};
 
+// Inspiration taken from https://jkorpela.fi/dashes.html
+const ALL_UNICODE_DASH_CHARACTERS = new RegExp(
+	`[${ [
+		// - (hyphen-minus)
+		'\u002d',
+		// ~ (tilde)
+		'\u007e',
+		// ­ (soft hyphen)
+		'\u00ad',
+		// ֊ (armenian hyphen)
+		'\u058a',
+		// ־ (hebrew punctuation maqaf)
+		'\u05be',
+		// ᐀ (canadian syllabics hyphen)
+		'\u1400',
+		// ᠆ (mongolian todo soft hyphen)
+		'\u1806',
+		// ‐ (hyphen)
+		'\u2010',
+		// non-breaking hyphen)
+		'\u2011',
+		// ‒ (figure dash)
+		'\u2012',
+		// – (en dash)
+		'\u2013',
+		// — (em dash)
+		'\u2014',
+		// ― (horizontal bar)
+		'\u2015',
+		// ⁓ (swung dash)
+		'\u2053',
+		// superscript minus)
+		'\u207b',
+		// subscript minus)
+		'\u208b',
+		// − (minus sign)
+		'\u2212',
+		// ⸗ (double oblique hyphen)
+		'\u2e17',
+		// ⸺ (two-em dash)
+		'\u2e3a',
+		// ⸻ (three-em dash)
+		'\u2e3b',
+		// 〜 (wave dash)
+		'\u301c',
+		// 〰 (wavy dash)
+		'\u3030',
+		// ゠ (katakana-hiragana double hyphen)
+		'\u30a0',
+		// ︱ (presentation form for vertical em dash)
+		'\ufe31',
+		// ︲ (presentation form for vertical en dash)
+		'\ufe32',
+		// ﹘ (small em dash)
+		'\ufe58',
+		// ﹣ (small hyphen-minus)
+		'\ufe63',
+		// - (fullwidth hyphen-minus)
+		'\uff0d',
+	].join( '' ) }]`,
+	'g'
+);
+
+const normalizeTextString = ( value ) => {
+	// TODO: swap deburr with `removeAccents` after rebasing
+	return deburr( value )
+		.toLocaleLowerCase()
+		.replace( ALL_UNICODE_DASH_CHARACTERS, '-' );
+};
+
 const DetectOutside = withFocusOutside(
 	class extends Component {
 		handleFocusOutside( event ) {
@@ -71,31 +141,18 @@ function ComboboxControl( {
 	const matchingSuggestions = useMemo( () => {
 		const startsWithMatch = [];
 		const containsMatch = [];
-		const looseMatch = [];
-		const match = deburr( inputValue.toLocaleLowerCase() );
+		const match = normalizeTextString( inputValue );
 
 		options.forEach( ( option ) => {
-			const optionLabel = deburr( option.label ).toLocaleLowerCase();
-			// For loose search matching, remove ASCII characters in the following ranges:
-			// ! through @
-			// [ through `
-			const looseOptionLabel = optionLabel
-				// First remove the characters described above.
-				.replace( /[!-@—]|[\[-`]/g, '' )
-				// Then make sure we don't have any double spaces.
-				.replace( /\s{2,}/g, ' ' );
-
-			const index = optionLabel.indexOf( match );
+			const index = normalizeTextString( option.label ).indexOf( match );
 			if ( index === 0 ) {
 				startsWithMatch.push( option );
 			} else if ( index > 0 ) {
 				containsMatch.push( option );
-			} else if ( looseOptionLabel.includes( match ) ) {
-				looseMatch.push( option );
 			}
 		} );
 
-		return startsWithMatch.concat( containsMatch, looseMatch );
+		return startsWithMatch.concat( containsMatch );
 	}, [ inputValue, options, value ] );
 
 	const onSuggestionSelected = ( newSelectedSuggestion ) => {

We may want to be less comprehensive on the list of dash-like characters that we normalize (for discussion).

@markbiek
Copy link
Contributor Author

markbiek commented Jul 13, 2022

@ciampo Thanks for the clarification, that's very helpful!

My only concern is that it doesn't solve the original issue we were attempting to fix.

Even with all of the hyphen-like characters being normalized to an ASCII -, it seems like we would still run into the issue:

Let's say have an option and the label is Angola - Cabinda.

When someone types angola c into the combo box, they won't get any search matches because of the hyphen.

@ciampo
Copy link
Contributor

ciampo commented Jul 13, 2022

As explained in details in previous messages, any matching logic that changes the length of the string that is being matched (i.e. by ignoring punctuation or collapsing whitespace) will inherently break the text-highliting functionality of the component.

We could implement this change (ignore punctuation and repeated whitespace) at the cost of removing the text-highlighting functionality — what do you think, @WordPress/gutenberg-design ?

This change would also need @mtias 's seal of approval, since it could be seen like a potential UX regression.

@jameskoster
Copy link
Contributor

In terms of the original issue, if you're searching for the Cabinda province specifically, isn't it more likely that you'd just type that rather than 'Angola ca...'?

Could there also be situations where the punctuation is critical to the search?

It seems a shame to lose the text highlighting, and I suspect that might be an a11y regression.

@markbiek
Copy link
Contributor Author

In terms of the original issue, if you're searching for the Cabinda province specifically, isn't it more likely that you'd just type that rather than 'Angola ca...'?

Could there also be situations where the punctuation is critical to the search?

It seems a shame to lose the text highlighting, and I suspect that might be an a11y regression.

@jameskoster I definitely agree that we don't want any regressions! That said, we opened this PR due to a reported issue with Calypso, albeit a minor way.

I think the best thing to do here is to implement the normalization described above and we'll take a different approach to fixing Automattic/wp-calypso#64677 on Calypso.

Does that seem fair to everyone?

@jameskoster
Copy link
Contributor

If it won't break the text highlighting that seems fair to me.

@ciampo
Copy link
Contributor

ciampo commented Jul 14, 2022

I think the best thing to do here is to implement the normalization described above and we'll take a different approach to fixing Automattic/wp-calypso#64677 on Calypso.

Does that seem fair to everyone?

If you're referring to this suggestion, we can go ahead as that won't introduce regressions with the text highlighting logic (with respect to my suggested changes, we may just want to understand how "comprehensive" we'd like to be with respect to the list "dash-like" characters to normalize).

@markbiek markbiek closed this Aug 3, 2022
@markbiek markbiek force-pushed the 64677-update/improved-combobox-search branch from 8a95217 to a5ca435 Compare August 3, 2022 14:39
WordPress Components automation moved this from In progress (owned) ⏳ to Abandoned ⛔️ Aug 3, 2022
@markbiek markbiek deleted the 64677-update/improved-combobox-search branch August 3, 2022 15:24
@markbiek
Copy link
Contributor Author

markbiek commented Aug 3, 2022

This was automatically closed and I can't reopen it.

The work normalizing hyphens can be found in #42942

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
First-time Contributor Pull request opened by a first-time contributor to Gutenberg repository [Package] Components /packages/components [Type] Enhancement A suggestion for improvement.
Projects
WordPress Components
Abandoned ⛔️
Development

Successfully merging this pull request may close these issues.

None yet

7 participants