Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Sorting search results #21

Open
6 tasks
corbanbrook opened this issue Mar 10, 2014 · 34 comments
Open
6 tasks

Sorting search results #21

corbanbrook opened this issue Mar 10, 2014 · 34 comments

Comments

@corbanbrook
Copy link

Multiple schemes can be employed to achieve results which are most relevant to the user. Predicting which file a user wants out of a long list of possible matches and presenting it first can help speed up development time/maintain flow and train of thought. Here are some ideas to discuss:

  • Sort least fuzzy results to the top. Fuzziness determined by search term run length in the result. ie. Search term of 'src' would score a higher fuzziness score on a file like .jshintrc than a file called app_src.js or files in the src/ directory. run length of first file is 2 while the second file has a run length of 3.
  • Filename match or directory match. A match on a filename should be sorted at a higher priority than a match on a full path, but what about matches which higher run length on a full path vs less run length on a filename. ie. search of 'src ap' currently displays images/destroy_discard.png higher than src/app.coffee. Although destroy_discard technically has more matched character within the filename than src/app.coffee src/app has more run length. (something to think about)
  • Dot files. Hidden files are less important than non hidden of equal score.
  • Ignored files (.gitignore) can be sorted at lower priority than files with equal score. ie. I want the index.html in my templates/ dir. not my build/ dir because this is in .gitignore. Editing temporary file by mistake can add confusion and lost work.
  • Sort files with most recent last modified date at a higher priority than those of equal score.
  • Filter out/sort to bottom files that are already open.
@hkdobrev
Copy link

@corbanbrook AFAIK this issue should be filed against the fuzzaldrin library which is used by fuzzy-finder to filter and score results.

@corbanbrook
Copy link
Author

fuzzaldrin simply does scoring and sorting of arrays of strings or objects. Some of my above recommendations might be outside the scope of the project.

One solution would be for the fuzzaldrin to add option for custom filter/sorting callbacks.
Another solution would be to simply use fuzzaldrin to provide the initial score to use in further sorting schemes within this project like dot file priority, ignored file priority, and last modified priority.

@jamby
Copy link

jamby commented Mar 12, 2014

Would also be nice if the fuzzy finder could have the results filtered in terms of importance for type of project. For instance, a Ruby on Rails project, if I start typing a model's name, have the first result usually be '/app/models/model_name.rb', instead of having the first result be 'spec/models/model_name_spec.rb'.

Most times I want to deal with the model, not the spec.

@miletbaker
Copy link

It would be nice as well to have more recency to the find logic, although it will mainly (it does seem intermittent especially of you switch to another app and back again) suggest the last file accessed to allow quick switching between files, it would be good if it always gave precedence on the file based on last access allowing to easily work between several files.

A good example of where this works well is Textmate's implementation of cmd-t find file. The sorting there works well.

@dmnd
Copy link

dmnd commented Jun 20, 2014

Here's an example where the order isn't great. The second result is what I want, and it's a much closer match, so I don't know why it's second.

image

@dmnd
Copy link

dmnd commented Jun 24, 2014

Even worse:

image

I wanted the last result in this instance.

(I hope these examples are useful, apologies if they're noise)

@dmnd
Copy link

dmnd commented Jul 12, 2014

Another one

image

@adammw
Copy link

adammw commented Jul 16, 2014

Coming from ST3, the fuzzy matcher really drives me crazy that it lists the specs before the actual controllers I want.

fuzzy

Is there any config which changes how the fuzzy finder works, or do we need to improve the underlying fuzzy finding library to improve the searching?

@lewispb
Copy link

lewispb commented Nov 17, 2014

+1 for this

@davepeck
Copy link

I decided to play with Atom for the first time this weekend; I immediately found myself frustrated with the strange fuzzy ordering in Atom's select list views.

If we're going to improve fuzzy matching in Atom, there are lots of things to consider:

  • The "right" ordering is fundamentally subjective. It's clear from github issues and Atom forums that lots of people would like to see improvement, but it's equally clear that we won't ever fully agree on what's "best." At the very least, changes to fuzzaldrin should continue to respect the current scoring tests; these represent the only codified community judgment we have so far. We'll probably want to augment these tests with examples from real-world projects, too.
  • The "right" ordering is probably context dependent. There's a hint of this in fuzzaldrin's filter method, which takes the strange queryHasSlashes parameter and invokes the specialized scorer.basenameScore depending. I'd expect any update to filter (a) will need to be parameterizable by the caller — for example, to indicate separators, weights, etc. and (b) will need sensible defaults so it can be invoked without more than the needle and haystack. As an example, we might want path separators to have importance when invoking filter with a list of file names, but we might want the colon-space (:) to have importance when invoking filter from the command palette.
  • There's great prior art to learn from. TextMate's ranking algorithm is highly regarded, although at first glance I find the implementation hard to grok. (It seems to have a dynamic programming component in matrix but lacks essentially any useful comments.) Command-T also has a well-liked algorithm. Gary Bernhardt's selecta ranking algorithm was based on some interesting discussion that considered this prior art.
  • Especially in fuzzy-finder's case, there's a lot of metadata we can and probably should use to improve ranking. The venerable PeepOpen ranking algorithm takes into account file modification times, last opened, git status, etc. Probably this more sophisticated ranking belongs strictly in fuzzy-finder, as a new "meta scoring" layer; fuzzaldrin should continue to just be about ranking a needle in a haystack of strings.
  • My smartscore branch tries to codify some basic intuitions about what makes a match "better". These include: touching the "starts of words" counts for more; some separators are worth more than others (in file contexts, '/' is probably worth more than '-' or ' '); on the whole, we should prefer fewer contiguous runs of longer length; full word matches along the way are always preferable; etc.
  • Performance is a consideration. Right now every call to filter starts fresh. But it seems to me that (a) it may prove desirable to pre-process each string in the haystack before ever invoking filter, and (b) if the user is simply appending characters to the query string, it might (?) be possible to iteratively re-score the results.

Alright — hopefully this is useful/interesting to someone. I plan to slowly work on improvements to both fuzzaldrin and fuzzy-finder in my personal branches. Suggestions and feedback are most welcome!

(For fun, I started by replacing fuzzaldrin's current score method with a coffeescript re-implementation of TextMate 2's ranking algorithm; it works and, after a minor tweak, passes all fuzzaldrin tests.)

@nj
Copy link

nj commented Apr 14, 2015

👍 as the current solution is rather useless - and can even be faster to find the file manually

@matugm
Copy link

matugm commented Apr 24, 2015

👍 Would be great if we could get some progress on this.

@walles
Copy link

walles commented Jul 10, 2015

Improved sorting / scoring:
atom/fuzzaldrin#22

The above pull request addresses at least some of the issues raised here.

@ghost
Copy link

ghost commented Jul 24, 2015

Since I'm working on a lot of Rails projects with ActiveAdmin, I'm often annoyed when I end up in an ActiveAdmin file for a particular resource instead of a model file.

I was thinking about improving this by sorting the fuzzy-finder results by usage. I.e. if a files in some folder are worked on more often, they are ranked higher.

I'm happy to implement this experimentally and make a pull request if other people approve of this idea also.

@Soleone
Copy link

Soleone commented Jul 30, 2015

👍 for some improvements that make finding commonly used files easier. sublime seemed to have done a better job putting the file i actually want to open at the top (using rails here as well)

@dmnd
Copy link

dmnd commented Sep 8, 2015

Just in case further examples are helpful:

image

@kevinsimper
Copy link

@jeancroy Did #22 solve this issue?

@jeancroy
Copy link
Contributor

jeancroy commented Dec 2, 2015

There's now an "use Alternate Scoring" option in fuzzy finder that use it.
It address many issue about the search by file name / path.

But it does not cover any knowledge about the file themselves, such as preference for recent / frequent / certain files.

@r-owen
Copy link

r-owen commented Dec 3, 2015

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a huge improvement, though still not as good as Sublime Text. I have a project with a huge number of files, including Doxygen generated html files that I rarely want to look at. I tried to find a file named "matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right file as the first suggestion. In Atom Beta it is the ninth choice, preceded by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix, prefer names that match that suffix exactly over names that use that suffix as a prefix.

Another thing that might help (though I really hope it won't come to this, and it's not needed by Sublime) is to allow the user to disable directory patterns. In my case I might eliminate searches of Doxygen-generated html files and would definitely elimiate .os files (why in the world is it showing binary libraries?).

@jeancroy
Copy link
Contributor

jeancroy commented Dec 3, 2015

Ok please open an issue on fuzzaldrin-plus I can give it a look. I'd need
the result that come before to understand why they are preferred. Also full
path is useful, if private, a mock-up with same length and directory depth.

On Thu, Dec 3, 2015, 13:51 Russell Owen notifications@github.com wrote:

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a
huge improvement, though still not as good as Sublime Text. I have a
project with a huge number of files, including Doxygen generated html files
that I rarely want to look at. I tried to find a file named
"matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right
file as the first suggestion. In Atom Beta it is the ninth choice, preceded
by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix,
prefer names that match that suffix exactly over names that use that suffix
as a prefix.

Another thing that might help (though I really hope it won't come to this,
and it's not needed by Sublime) is to allow the user to disable directory
patterns. In my case I might eliminate searches of Doxygen-generated html
files and would definitely elimiate .os files (why in the world is it
showing binary libraries?).


Reply to this email directly or view it on GitHub
#21 (comment).

@r-owen
Copy link

r-owen commented Dec 4, 2015

I just submitted this issue. I hope it helps.

jeancroy/fuzz-aldrin-plus#12

Thank you very much for trying to improve Atom’s fuzzy search.

— Russell

On Dec 3, 2015, at 10:56 AM, Jean Christophe Roy notifications@github.com wrote:

Ok please open an issue on fuzzaldrin-plus I can give it a look. I'd need
the result that come before to understand why they are preferred. Also full
path is useful, if private, a mock-up with same length and directory depth.

On Thu, Dec 3, 2015, 13:51 Russell Owen notifications@github.com wrote:

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a
huge improvement, though still not as good as Sublime Text. I have a
project with a huge number of files, including Doxygen generated html files
that I rarely want to look at. I tried to find a file named
"matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right
file as the first suggestion. In Atom Beta it is the ninth choice, preceded
by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix,
prefer names that match that suffix exactly over names that use that suffix
as a prefix.

Another thing that might help (though I really hope it won't come to this,
and it's not needed by Sublime) is to allow the user to disable directory
patterns. In my case I might eliminate searches of Doxygen-generated html
files and would definitely elimiate .os files (why in the world is it
showing binary libraries?).


Reply to this email directly or view it on GitHub
#21 (comment).


Reply to this email directly or view it on GitHub.

@tnrich
Copy link

tnrich commented May 4, 2016

Does anyone know if there is an equivalent issue open discussing the Cmd-Shift-P search algorithm?

@jeancroy
Copy link
Contributor

jeancroy commented May 4, 2016

you're speaking of command palette? Should already be integrated. If you
have a problem you can try openings a issue on fuzzaldrin-plus repo

On Tue, May 3, 2016, 20:22 Thomas Rich notifications@github.com wrote:

Does anyone know if there is an equivalent issue open discussing the
Cmd-Shift-P search algorithm?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#21 (comment)

@adamreisnz
Copy link

This is a pretty ancient issue, so I have little hope of improvement arriving any time soon, but here's my two cents:

image

Two things wrong with the way the fuzzy finder currently works both illustrated with the above example.

  1. Sort order (as already pointed out in this issue): based on my input, I would really expect the file member/edit/edit.html to show up on top. It does for some reason when I remove the l from html:

image

But with the full html it suddenly drops to second place which gets rather infuriating after having opened the wrong file several times.

So scoring should somehow take into account how close the search terms are to each other in the filename, and prioritize edit.html over edit-payment.html unless I include payment in my search query.

  1. It should really prioritize full word matches rather than scattered letters. If you look at the above example, it actually matches member because it's in the app/components/admin/member folder, instead of simply matching to the member part of the path, because that's a whole matching word.

These two tweaks would make the search algorithm a lot stronger.

@jeancroy
Copy link
Contributor

jeancroy commented May 17, 2017

hi @adamreisnz , as a curiosity, is this happening with alternate scoring turned on ? There was a strong preference for word "togetherness" in that version.

From screenshot I'm guessing it's not, but if it is I'll add a few of those to test benchmark.
Previous algorithm would take first occurrence of m then first e then first m instead of waiting and trying for member

@adamreisnz
Copy link

adamreisnz commented May 17, 2017

@jeancroy thanks for looking into it, but yes, it's in fact enabled:

image

The version I'm using is 1.18.0-dev-f4a83b238

@jeancroy
Copy link
Contributor

jeancroy commented May 17, 2017

Another possibility is that alternate score is used for ranking while classic is used for highlighting.
The whole component below fuzzy finder has been rewritten recently. If that's the case the whole scattered letter is a false trail.


One feature of the new one is a bias toward file name (vs whole path) when we match file extension exactly I think you are batling against that when you are using keyword from the path but end with extensions

To sum up your request, you want the htm behavior to happens even in html case ? I'm not sure what the algorithm does because of how scambled the higligth is.

@adamreisnz
Copy link

adamreisnz commented May 17, 2017

Well, my use case as you might deduce from my example is that in a large project, there will be many components. Each component might have an edit sub component as in the example, and each of those components will have edit.html template, and edit.js module, and perhaps edit.ctrl.js controller.

So the way I tend to quickly open the file I want, is by specifying the parent component member, then the sub component edit and then extension if I know there's going to be more than one file.

This usually works fine, but in the above case it was messing it up due to the existence of another similar file in the same path (edit-payment.html).

I think my use case is fairly common, so I wouldn't expect to be "battling" against the fuzzy finder's system with it.

edit.html should still be preferred over edit-payment.html if you search for "edit html" imo, on account of it being the shorter and closer match.

@jeancroy
Copy link
Contributor

jeancroy commented May 17, 2017

You're right on all account, in this case it seems the algorithm just like the m of payment.
I guess the m of html manage to count twice, I'll see how to fix that.

Good news is that the issue is more constrained than say lack of prioritizing "full word matches". (Here - count as a word boundary)

untitled

I'll open a different issue for highlight regression it should group member appropriately

@adamreisnz
Copy link

Yeah that looks better in your screenshot, highlighting member properly. And interesting that it likes the m in payment and paykent is put at the bottom properly. Looks like it's just a few tweaks needed to fix those issues then 👍

@adamreisnz
Copy link

adamreisnz commented Jun 9, 2017

Looks like in the latest version (just built Atom from master yesterday) there's still some scoring issues. For example this result:

image

It should not prioritise cards/club-details.js over cards/details.js for the same reason as above, where it shouldn't prioritise the edit-payment.html file. cards/details.js is a closer match, because it has fewer non-matching characters between the matches.

I did not type a c character and it already matched card, so it's a bit baffling why it tries to mark the c of club and give that result a higher score than the more sensible result below that.

Note that when I type cards it does prioritise correctly (but still marks the c in the second result):

image

I think once a search term has been used/matched in the path, it should not try to match it again for another part of the path. In addition, results with the least amount of non-matching characters between the matches should probably score highest.

@adamreisnz
Copy link

adamreisnz commented Aug 4, 2017

Another example in Atom 1.20 dev where prioritisation is not what one would expect;

image

@adamreisnz
Copy link

Guys, any activity on this issue please? It's infuriating to keep opening the wrong files because the fuzzy finder sorting logic is off.

VSCode manages to do it correctly, why not Atom? Perhaps it would be worthwhile looking at their algorithm.

image

image

@winstliu
Copy link
Contributor

@adamreisnz looks like this was fixed a month ago by @jeancroy but we're running an outdated version of fuzzaldrin-plus. Will create a PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests