Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzzy match for typoes #34

Closed
danielb2 opened this issue Mar 14, 2020 · 16 comments
Closed

fuzzy match for typoes #34

danielb2 opened this issue Mar 14, 2020 · 16 comments

Comments

@danielb2
Copy link

Hi, I'm coming from using autojump after being recommended this from a coworker. I love the blazing speed of zoxide.

Below are some examples for fuzzy matching.

Screen Shot 2020-03-14 at 15 43 13

( Note: you can see the echo that's mentioned in #22 )

I also miss jo (open matching directory. for mac this is something like open (zoxide query $argv) )

@ajeetdsouza
Copy link
Owner

@cole-h and I did some work on this, and it seems like this feature might take a while, since none of the existing solutions for Rust seem to suit our purpose.

@bbqsrc
Copy link

bbqsrc commented May 30, 2020

Have you tried https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance ? It considers transpositions. The strsim crate has a lot of useful stuff for this. :)

@ajeetdsouza
Copy link
Owner

@bbqsrc I have tried Levenshtein distance, but I haven't yet found a satisfactory way to apply it. The major problem is that the number of characters in a query is typically 3-4, so even one edit can significantly change the word. Moreover, there can be multiple words in a query which have to match the path in order. Allowing all of them to be fuzzy can sometimes significantly alter results.

I do have a rough algorithm in mind, but I'd gladly appreciate any inputs!

@bbqsrc
Copy link

bbqsrc commented May 31, 2020

Note that Damerau-Levenshtein is a different algorithm derived from Levenshtein. You can introduce your own penalty weighting based on the distance of the result to device whether a 'typo' is sufficient to be considered the winner. You can also pass the decision to the user.

I don't really have a convenient answer for your use case, unfortunately. But I'd be happy to provide feedback on any proposed algorithm, particularly if you can provide example cases of what you would hope the behaviour to be. My day to day work is basically in supporting linguistic tooling so I hope i can be of some use. xD

@ajeetdsouza
Copy link
Owner

@bbqsrc Thanks, that would be great!

Regarding strsim, I don't think that fits my use case, since I'm trying to match a substring within the path, and I would require the last index of the match. I'll do some searching.

I have a hectic week ahead, but I'll get back to you with something concrete as soon as I can.

@NightMachinery
Copy link

I like a fuzzy mode like fzf. For example, js currently returns no result for me, while I have javascript in my zoxide db.

See this for turning the text into a fuzzy search and this for sorting them intelligently.

@ajeetdsouza
Copy link
Owner

@NightMachinary, I'm not sure that is the kind of fuzzy matching we're looking for. I think this issue about fuzziness as a solution to minor spelling errors, where the correct spelling is one or two characters off the query.

fzf expects each of the input characters to be present in the output in order (no errors), but allows an arbitrary number of characters in between each matching character. This works well for fzf as a selector, but I have my doubts about how well it will work for something like zoxide. According to me, there'd be a large number of paths in a typical database containing a j followed by an s, making such a search unreliable.

All the same, you'd get similar results if you tried something like z j s (with a space in between).

@NightMachinery
Copy link

@ajeetdsouza

I think that if you add a penalty to their score for drifting edit distances, things will be fine even if the fuzziness leads to lots of matches? (Also, on my system, I currently have no match for js at all. It seems to me that the assumption of big path DBs might not be all that true in lots of cases.)

Another interesting thing is to support a query like sc/ja for ~/scripts/javascript.

@ajeetdsouza
Copy link
Owner

@NightMachinary fzf already does add a penalty to edit distance, but still I doubt it would work well outside of this isolated case. However, since zoxide matches each keyword in order in the path, matching against sc/ja is already quite easy:

z sc ja

@ajeetdsouza
Copy link
Owner

Closing this - zoxide queries are almost always just a few characters long, and fuzzy matching would only lower their accuracy at this point. It also makes queries annoyingly unpredictable in the case where it's wrong.

@bjesus
Copy link

bjesus commented Apr 22, 2022

Sorry to bring this up again - but just raising the voice of us that uses zoxide differently. I'm never typing down to get to downloads, but I'm more likely to type downlad or something like that. I just don't think like that: if I want to go to ABC I type ABC, not AB, but I might make some mistakes. My queries aren't just a few characters long and could very much benefit from such an algorithm - but perhaps it's just me 🤔

@not-matthias
Copy link

not-matthias commented Jun 11, 2022

I'd also love such a feature. Especially because it's kind of annoying to rewrite the entire command just because of one small typo. You could only use fuzzy matching when there are no results.

@ThatXliner
Copy link

Maybe it could be configurable?

You could only use fuzzy matching when there are no results.

Also this

@danielb2
Copy link
Author

@ajeetdsouza I decided to check in again. I have some really long directory names, and the whole point of a directory jumper is being lazy and not typing in a full path. I don't really get the logic. Can you reconsider?

@AdrianArtiles
Copy link

since this is the top search result for zoxide fuzzy search I thought I'd share what I do for my interactive workflow (the main way I use zoxide) in case it is helpful for others.
I do one of two things:

  1. use zi and have a space between my characters that I want to fuzzy search. like zi ar doc to match /Archives/misc/documentation. I mainly do this in interactive mode after calling zi.
  2. use my custom zf command below that I've added to .zshrc. This is basically just using fzf but ordered by zoxide's algorithm. I use eza for directory preview, but it should be clear how to customize how you're calling fzf.
zf () {
  cd $(zoxide query --list --score | fzf --height 40% --layout reverse --info inline --border --preview "eza --all --group-directories-first --header --long --no-user --no-permissions --color=always {2}" --no-sort | awk '{print $2}')
}

@noelzubin
Copy link

noelzubin commented Jul 2, 2024

Can we do a search with levensteins distance with threshold incase when no results are found. This should keep zoxide as fast as it is now given a match exists. If not it can try and pick the closest one. Having used autojump all this time and Given how often i make typos this feature is indispensable. After all even if it slower than other algos, its still faster than me going back and editing my shell command. @ajeetdsouza

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants