Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Update fuzzaldrin-plus to 0.6.0 #328
Description of the Change
Brings in updated fuzzy sorting logic.
#21 is a bit of a mixed bag of user scenarios, but I think I address the latest issue and many of the other mentioned.
I know it's not the same repo but I would like to raise awareness of
No they aren't, but the issues are very easy to reproduce;
Open fuzzy finder and type
This is unexpected, the second result matches better than the first (e.g. has fewer "missing" characters in between).
VS code sorts it as expected;
@jeancroy thanks for looking into it.
The second case might appear complicated, but it's not really, it's basically a case where you know in what folder your file is sitting and you have multiple files called the same way.
This is a very common situation to get in once you start to get larger apps to develop, as you'll have for example multiple different components which each might have a subcomponent with a common name like
So then when you want to target those files and type in the name of the component and then subcomponent, you expect the algorithm to prioritise those matches -- not something that has more different characters in the name.
In the boo/baz example, I think the problem is that Atom's fuzzy finder is trying to match the single letter
VS Code seems to do this well, where they don't even mark the letter
The idea being I think, that developers will know what they are searching for, and if they want
The fact that I type a folder name shouldn't demerit the use case. We're no longer encouraged to store a million files in sock drawers called
The other way of looking at it, is to see which search result has the fewest non-matching characters between the first and last match.
Going back to that example again,
If you ignore symbols like path separators and dots, the example becomes even more clear with the first result essentially being a perfect match, and the second result having 4 non matching letters. The perfect match should "win" in this case.
There's a bunch of things that Atom does better than VS Code, but this is one example where the algorithms in VS Code make a lot more sense, so you might want to take a look at those and see how they prioritise the results.
Let me know if you need any other feedback! Keen to help you out with these issues.
Hi @adamreisnz, I have now updated the library based on your report.
I'll have this to say.
c) I've increased penalty for long filename.
d) There's an online demo to play with latest version.
@jeancroy thanks for your work on this! I'll check this tomorrow and have a look.
Note that while my example cases didn't appear realistic, they were modelled after real use cases, so their principles were the same. But I will look into it and post here if anything still looks off.
I've had a play with your online demo. Here's the real-life data I used;
Looks like the case of
I stand by my idea that the 3rd result in the screenshot should be 2nd in the list, because it has fewer missing characters in the result, and as such is a closer match. Not sure how your algorithm prioritises
It works ok if I specify the dot though;
I think I just started using a space because I find that easier to type for a quick search. Can try to start using the dot for filenames, but I'd still recommend you to look into it some more because as mentioned VS Code's algorithm does pick up on it correctly as you can see here:
It put's the "shortest" results on top, clearly seen by the pyramid shape of results, and prioritises "file" names over "path" names.
I get the impression your search algorithm is quite generic, as indicated by the default example of movie names, but maybe it should be tailored more to file systems somehow if used inside a code editor.
This seems to be working as expected now:
This is working as expected now:
Another scenario that has always bothered me was the following;
Let's say I forgot whether I called the
This will yield no results, because there's one letter added that doesn't fit.
The algorithm handles missing letters reasonably well, but additional letters not at all. Not sure if there's anything that can be done about it, but thought I'd mention it.
Those are force that contradict each other, that's what you see here.
I've updated the weight for your 4 booking ordering.
This is by design.
It's possible to have a more local definition of missing. Eg between the first and last character of match. But that tend to penalize someone that would like to match folder near the start of the path. (And I don't want to assume such an usage bias)
Anyways the task of computing important missing characters is a bit hard to define and expensive to compute. I chose to use those cpu cycle to assert quality of present characters instead of missing ones.
Things like size of filename, total path, directory depth are handled by "dumb" penalty multiplier that I try to adjust with real life path. They are however bound to interact with small match quality difference like a single "s"
Oh you mean additional letter in the query.
I don't think it's necessarily filename that needs to be short, but it's the number of extra characters in the results that aren't present in the search query.
For example, searching for
This is not always the shortest filename – there might be longer filenames that are better match, for instance searching for
Anyway, it can be argued which result should be the best match, and maybe this is a matter of preference.
We can leave it at this for now, if I see anything else coming up I'll let you know. Thanks!