-
Notifications
You must be signed in to change notification settings - Fork 8.6k
Update command palette search to prioritize "longest substring" match. #18700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Sorry for the spam open/close. Let me know if I you guys are interested in this and I can reopen it. |
Generally speaking, probably yes, but I personally haven't gotten a chance yet to look at it. Currently every member of the terminal team is unfortunately either working on another project or on sick leave, hence the overall delays since late last year. One of the things @zadjii-msft made was released just recently: https://learn.microsoft.com/en-us/windows/powertoys/command-palette/overview I plan to review this PR after I fixed the remaining major issues the the Preview/Canary version has. (Which is what has occupied what little time I had for Windows Terminal since then.) |
Sorry, didn't mean to rush you, was more worried I was spamming you guys with all of these fuzzy finding PRs. Please let me know if it doesn't make sense for what ever reason and I can close it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I'd like to thank you so much for taking care of the licensing situation!
Unfortunately, there's still some issues with this code and non-English languages.
I'm also somewhat confused by the code's dissimilarity with the original fzf to be honest (that's not an issue in on itself; it just surprised me). It prompted me to try and use Copilot to transform the original Go code to C++ though. It almost one-shotted the entire translation. I've since patched up the issues it had and here's the current state: https://gist.github.com/lhecker/dd840ed2f74e8f66d0f43fd49e999b88
Unfortunately, it doesn't work correctly, and I ran out of time working on it. If you want to play around with that code, please feel free to do so!
That aside, for the upcoming project that I've worked on, I ported VS Code's fuzzy matcher to Rust. Here's the original: https://github.com/microsoft/vscode/blob/071eafaf5ab4de3814466b616fe00a66d2739ad2/src/vs/base/common/fuzzyScorer.ts
If you'd like I could generate a C++ port of it next week based on my Rust version. The benefit of that code is that it's a lot simpler. The disadvantage is that fzf is a leagues better algorithm and if we had it, it would definitely be superior of course. I guess it's a trade-off.
If we stick with this fzf port, I really don't think it has to be perfect for us to merge it. But I do think though that we should at least get rid of the towlower
calls and similar. The current version uses lstrcmpiW
to implement case-insensitive comparisons which do work correctly (even if it's honestly bad to abuse that function for this purpose haha).
Weight(_computeWeight()); | ||
HighlightedName(_computeHighlightedName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will execute the fzf matching logic twice. I think we should replace the two functions with a single one called _update
(or similar) which runs ParsePattern
only once and updates both members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, I think I may have gone too far with this, I realized that ParsePattern was getting executed for every item and moved the pattern parsing outside of UpdateFilter which ended up bringing the snippets pane and suggestions UI into scope. Hopefully this isn't too invasive.
src/cascadia/TerminalApp/fzf/fzf.cpp
Outdated
enum CharClass | ||
{ | ||
NonWord = 0, | ||
CharLower = 1, | ||
CharUpper = 2, | ||
Digit = 3, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished reviewing this implementation today and it has me a little confused to be honest. It's quite different from the original fzf. Among others, the CharClass
enum is missing the Delimiter, Letter, and Whitespace types.
That aside, I'd turn this enum into a enum class
and assign it an uint8_t
size explicitly, so that the LUT I'll suggest later is going to be 4x more compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
src/cascadia/TerminalApp/fzf/fzf.cpp
Outdated
size_t start = str.find_first_not_of(L" "); | ||
return (start == std::wstring::npos) ? std::wstring_view() : std::wstring_view(str).substr(start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: find_first_not_of(L" ")
and find_first_not_of(L' ')
are not the same and the latter will be much faster. This can also exploit the fact that npos
is guaranteed to be the max. size_t
value:
const auto off = str.find_first_not_of(L' ');
return str.substr(std::min(off, str.size()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
src/cascadia/TerminalApp/fzf/fzf.cpp
Outdated
if (std::iswlower(ch)) | ||
return CharLower; | ||
if (std::iswupper(ch)) | ||
return CharUpper; | ||
if (std::iswdigit(ch)) | ||
return Digit; | ||
return NonWord; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 functions are documented to only work with ASCII. I recommend using u_charType
and a LUT. Something like this:
static constexpr auto lut = []() {
std::array<CharClass, U_CHAR_CATEGORY_COUNT> lut;
lut.fill(CharClass::NonWord);
lut[U_UPPERCASE_LETTER] = CharClass::Upper;
lut[U_LOWERCASE_LETTER] = CharClass::Lower;
lut[U_TITLECASE_LETTER] = CharClass::Letter;
lut[U_MODIFIER_LETTER] = CharClass::Letter;
lut[U_OTHER_LETTER] = CharClass::Letter;
lut[U_DECIMAL_DIGIT_NUMBER] = CharClass::Number;
lut[U_SPACE_SEPARATOR] = CharClass::White;
lut[U_LINE_SEPARATOR] = CharClass::White;
lut[U_PARAGRAPH_SEPARATOR] = CharClass::White;
return lut;
}();
return lut[u_charType(c)];
(This code is untested, and I wrote it for the fzf port I mentioned elsewhere.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, I didn't realize those only work with ASCII, I am you glad mentioned that, I didn't understand why lstrcmpi fixed the original issue until now...
src/cascadia/TerminalApp/fzf/fzf.cpp
Outdated
std::ranges::transform(textCopy, textCopy.begin(), [](wchar_t c) { | ||
return std::towlower(c); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies here. I believe that creating this copy is unnecessary, as you can just lowercase each char from the original text
within the loop below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated and then changed when I moved everything to UChar32.
src/cascadia/TerminalApp/fzf/fzf.cpp
Outdated
// We use the same comparison method as upon sorting to guarantee consistent behavior | ||
const WCHAR searchCharAsString[] = { currentPatternChar, L'\0' }; | ||
const WCHAR currentCharAsString[] = { currentChar, L'\0' }; | ||
auto isCurrentCharMatched = lstrcmpi(searchCharAsString, currentCharAsString) == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple uses of lstrcmpiW
throughout this file (and in the current matching logic before this PR) and I don't think that's a good way to implement this. Each call to lstrcmpiW
goes through multiple layers just to arrive at CompareStringEx
which is rather expensive for just comparing two characters with folding.
In this particular instance, currentChar
is already lowercase so the only problem is currentPatternChar
which could simply get lowercases with ICU and then compared against the currentChar
. Technically we then also need to take care of the difference between lowercasing and casefolding (e.g. the lowercase of Greek "sigma" has two variants), but I think that's a secondary issue.
FWIW, we also technically need to iterate by codepoint here instead of by character. Right now, this code is limited to UCS2 (no surrogate pair support), but I think that's similarly a secondary concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I "think" I have this in a better state now.
- Updated to use u_foldCase for the lower casing and folding. I think this works for everything except for some of the more complex folding, ex fold results in multiple code points
- The text and pattern both get converted to UChar32 to iterate over the codepoints.
- The positions get converted back to Utf16 positions for the highlighting
- I tried to add some test cases for the multi language support and surrogate pairs, but there are probably still some major holes in my understanding of Unicode, are the there other scenarios that I should be testing for?
src/cascadia/TerminalApp/fzf/fzf.cpp
Outdated
int whiteSpaces = 0; | ||
for (wchar_t c : text) | ||
{ | ||
if (c != L' ') | ||
break; | ||
whiteSpaces++; | ||
} | ||
return whiteSpaces; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use the same find_first_not_of
logic as above, but without slicing the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated and then changed when I moved everything to UChar32.
src/cascadia/TerminalApp/fzf/fzf.cpp
Outdated
while ((found = trimmedPatternStr.find(L' ', pos)) != std::wstring::npos) | ||
{ | ||
std::wstring term = std::wstring{ trimmedPatternStr.substr(pos, found - pos) }; | ||
std::ranges::transform(term, term.begin(), ::towlower); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another 2 instances of towlower
FWIW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated and then changed when I moved everything to UChar32.
b931738
to
87a9911
Compare
I think there are a couple of things that contributed this.
Should I try to update use the latest fzf logic? |
Totally understand if the VS Code algo is more maintainable etc, let me know what direction you guys want to go. I am happy to try to port the typescript or the rust code or incorporate your port of the rust code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhecker do you happen to know how to run these tests. When I try with runut.cmd or VS, I keep getting an error about them being "blocked" from this Error: TAEF: [HRESULT: 0x80080204] Failed to create the test host process for out of process test execution. (Failed to find executable "TestHostApp.exe" specified in custom AppX manifest.)
I am sure this something that I am doing wrong but I am struggling to figure it out.
a5b854f
to
fbf76f2
Compare
Summary of the Pull Request
Addresses: #6693
Repurposed work from #16586
Feel free to close if it doesn't make sense for what ever reason (E.g. License file still an issue, wrong direction for fix, too complex, etc).
References and Relevant Issues
#6693
Detailed Description of the Pull Request / Additional comments
Validation Steps Performed
PR Checklist