Quick Open heuristic should prefer contiguous substrings #2068

njx · 2012-11-06T20:41:57Z

Open the brackets source folder (as of b455ec3)
Cmd-Shift O
Type spec/live

Result: The highlighting in the first few entries suggests they're being sorted to the top because of the "v" and "e" being scattered throughout the path, which seems weird. It seems like the heuristic should prefer the contiguous "live" at the beginning of the path entry.

The text was updated successfully, but these errors were encountered:

njx · 2012-11-06T20:43:47Z

Another example: do a quick open for samples/index.html. The first hit is for src/thirdparty/CodeMirror2/mode/ntriples/index.html, but it seems like ones that actually start with "samples" should be a higher priority.

njx · 2012-11-06T20:48:37Z

Assigning to @peterflynn

njx · 2012-11-07T00:36:02Z

Actually, I think the reason this happens is because we walk right-to-left, on the theory that the rightmost segments are more specific. That makes sense at the segment level, but I don't think it makes sense at the character level. What if we walked right-to-left through segments, but within each segment we walk left-to-right?

njx · 2012-11-07T00:55:37Z

Reviewed.

njx · 2012-11-08T19:20:53Z

Bumping to medium priority--this is really breaking some common cases. For example, if you do Quick Open on "Commands", you get "CommandManager.js" as the first hit instead of "Commands.js".

njx · 2012-11-10T02:36:02Z

Nominating for sprint 17 since the new heuristic has slightly worse behavior in some cases.

peterflynn · 2012-11-14T22:26:44Z

Yeah... the heuristic is definitely improved on the whole (the robustness to typos alone is worth it!), but there are definitely a lot of little cases where the top hit isn't what I'd expect. Here are some more examples:

Searching for "extensions" lists ExtensionLoader.js above items with the contiguous string "extensions" in their path (seems like the last-segment boost is outweighing the longer chain of contiguous chars)
Searching "@Doc" in DocumentManager doesn't list Document first. Matches starting at the start of a segment should probably get a bigger boost.
Searching "@get" in WorkingSetSort lists _handleSortWorkingSetByType before "get" (exact match) and "getCommandID" & "getCompareFn" (exact contiguous prefixes). Seems like the bonus for uppercase matches is outweighing the bonus for contiguous matches. (Same if you search "@init" in Menus.js -- init() is listed last, after three highly discontiguous matches that include an uppercase "I". Or if searching for "DMan": "DOMAgent" ranks higher than "DocumentManager").
"EUtil" lists EditorUtils first, but ExtensionUtils 6th, behind FileUtils/CodeHintUtils/LESSUtils. Seems like discontiguous matches should be penalized much less when the gap includes a separator (capitalization change, non-word char, etc.).
We should consider favoring shorter strings, all else being equal. E.g. "menus" favors "Menus-test.js" over "Menus.js", and "root/strings" will favor the strings.js buried deep under src/extensions/samples rather than the one closer to the top level in src/nls.

I like the idea of walking left-to-right within each segment, but I don't think it would fix any of these ranking problems (nor the ones NJ mentioned above). It does seem worth fixing, since it leads to odd highlighting (e.g. "menus" highlights "Menus.js" when you'd expect "Menus.js"). But it seems lower priority.

Also note that regardless of direction, because the matching is greedy there will always be cases where we fail to find the longest matching substring. E.g. left-to-right searching for "abcd" in "abcxxxabcd" would yield "abcxxxabcd" which isn't optimal either.

njx · 2012-11-26T19:43:15Z

Moving out to sprint 18. Somewhat risky, not an official feature.

dangoor · 2012-12-20T13:35:30Z

I've probably spent more time than was desired/anticipated on this, and I'm closer to a "solution" but not there yet. I put "solution" in quotes, because I don't think it's really possible to have the first matched item always be the one you'd anticipate, but it can be the case most of the time.

We can probably make things better by tweaking the scoring, but I think relying purely on right-to-left walking is going to leave us with many cases that feel funny.

My first thought was to find the longest contiguous substring and build a result around that. Unfortunately, finding the longest contiguous substring is slow (not just in running time complexity but in actual observed time on my machine). Beyond that, while I think having the longest contiguous substring would make things better, I think we'd still end up with suboptimal cases.

I observed what other editors do and found a rough comparison order that I like. I say "rough" because I haven't implemented it yet, and I know there are little gremlins. I believe that this would produce fundamentally better results than what we have now.

split the string up into segments (split on "/")
compare each character of the query string preferring matches in this order:
1. start with last (most-specific) segment
2. compare first character
3. compare other special characters (first capital letter after a lower case, to help with camelCase, ".", or "-")
4. failing that, leave the first segment and compare the first+special characters of the other segments
5. failing that, start from the left and compare the non-special characters
for the next character of the query string, you only search for characters that are sequentially after that first one
I'm leaning toward checking first for contiguous strings and then following the order outlined in step 2
if it tries to cram everything into the last segment and fails, it would need to start over not preferring the last segment

I don't know if I'll have this done in time for the end of the sprint, especially given review.

pthiess · 2012-12-20T17:16:48Z

I think it would be ok to move this out to sprint 19. Good feedback, Kevin!

njx · 2012-12-20T17:24:09Z

Good thinking here. I think the general approach of right-to-left for the whole list of segments and left-to-right-with-contiguity for individual segments makes sense. (I'm assuming that by "split the string up into segments", you're talking about the string to be matched, not the query.) A couple of other thoughts:

If the user types a query that itself has multiple segments (e.g. "src/index.html"), how should we deal with that? I'm guessing we should basically apply the steps you outline above for the last segment of their query. For the immediately previous segment of their query, apply the same steps except starting at the second-to-last segment of the match string. Then maybe add some weighting factor that prefers "contiguous segments" (i.e., if they type "src/index.html", prefer "foo/src/index.html" over "src/foo/index.html").
I wonder if it would be worth vetting the heuristics against a decent-sized set of representative use cases that we think will be most common. As you pointed out, we'll never get every case right, but we might want to try to ensure that a certain set of use cases are predictably correct. (I don't even know that we need to do this manually right now--we could just essentially build those cases as a set of unit tests, try the heuristics against them, and then keep tweaking the heuristics until they get them right.)

dangoor · 2012-12-20T17:34:27Z

@pthiess I'm making good progress now (TDD FTW in a case like this), but yeah, this looks like a "land early in sprint 19" thing given the timing. I am hoping to have a pull request up today.

@njx:

I don't know for sure yet, but I don't think we need to account for multiple segments in the query. "/" is considered a "special character" and automatically gets special treatment when searching and scoring in my new algorithm.

I totally agree with having a decent set of representative cases and I already have a test function that takes a query and a list of strings to test and verifies that the strings are scored to appear in the order that the test provides. I will feed that a bunch of cases (you and @peterflynn have provided a bunch of great ones here) to ensure sane results.

pthiess · 2012-12-20T17:48:24Z

@dangoor I moved it to Sprint 19 and deleted it from the Trello card for Sprint 18

dangoor · 2012-12-21T22:56:44Z

Another update: yesterday, I got the matching working and now I have the scoring working well. The results seem much better than the old algorithm.

There's one bug left to fix and I need to write a lot of comments.

njx · 2012-12-21T23:00:17Z

Sweet! Looking forward to it.

Fix for #2068: better QuickOpen heuristics

peterflynn · 2013-01-18T21:49:20Z

FBNC @njx

njx · 2013-01-18T21:54:56Z

Looks great. The only case that I found that seems a little weird to me is "cm.js" matches "CodeHintManager.js" before "CommandManager.js"--I would expect the latter to come first because the two uppercase characters are contiguous in the query. But we're never going to make Quick Open read my mind in every case :)

ghost assigned peterflynn Nov 6, 2012

ghost assigned dangoor Dec 12, 2012

dangoor mentioned this issue Jan 2, 2013

Fix for #2068: better QuickOpen heuristics #2462

Merged

dangoor mentioned this issue Jan 9, 2013

Make stringMatch faster, quickly #2496

Closed

peterflynn added a commit that referenced this issue Jan 18, 2013

Merge pull request #2462 from adobe/dangoor/fix-2068

289489d

Fix for #2068: better QuickOpen heuristics

ghost assigned njx Jan 18, 2013

njx closed this as completed Jan 18, 2013

core-ai-bot mentioned this issue Aug 29, 2021

[CLOSED] Quick Open heuristic should prefer contiguous substrings brackets-archive/bracketsIssues#1999

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Open heuristic should prefer contiguous substrings #2068

Quick Open heuristic should prefer contiguous substrings #2068

njx commented Nov 6, 2012

njx commented Nov 6, 2012

njx commented Nov 6, 2012

njx commented Nov 7, 2012

njx commented Nov 7, 2012

njx commented Nov 8, 2012

njx commented Nov 10, 2012

peterflynn commented Nov 14, 2012

njx commented Nov 26, 2012

dangoor commented Dec 20, 2012

pthiess commented Dec 20, 2012

njx commented Dec 20, 2012

dangoor commented Dec 20, 2012

pthiess commented Dec 20, 2012

dangoor commented Dec 21, 2012

njx commented Dec 21, 2012

peterflynn commented Jan 18, 2013

njx commented Jan 18, 2013

Quick Open heuristic should prefer contiguous substrings #2068

Quick Open heuristic should prefer contiguous substrings #2068

Comments

njx commented Nov 6, 2012

njx commented Nov 6, 2012

njx commented Nov 6, 2012

njx commented Nov 7, 2012

njx commented Nov 7, 2012

njx commented Nov 8, 2012

njx commented Nov 10, 2012

peterflynn commented Nov 14, 2012

njx commented Nov 26, 2012

dangoor commented Dec 20, 2012

pthiess commented Dec 20, 2012

njx commented Dec 20, 2012

dangoor commented Dec 20, 2012

pthiess commented Dec 20, 2012

dangoor commented Dec 21, 2012

njx commented Dec 21, 2012

peterflynn commented Jan 18, 2013

njx commented Jan 18, 2013