Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart case search #224

Open
fenuks opened this issue Jun 6, 2021 · 13 comments
Open

Smart case search #224

fenuks opened this issue Jun 6, 2021 · 13 comments
Labels
feature Request for a new feature.

Comments

@fenuks
Copy link

fenuks commented Jun 6, 2021

Hello,
vim has very handy option smartcase to do case-sensitive search if there is any capital letter in search query, otherwise search is case-insensitive. I have directory called V, and if I type z V I am moved, e.g. into ~/.config/nvim directory. I have to use longer z parent/V to give z a hint. Therefore, I think it makes sense for z to support notion of smart case as well, it would make tool more comfortable to use.

@ajeetdsouza
Copy link
Owner

ajeetdsouza commented Jul 13, 2021

One question here is -- should z doc Foo match ~/Documents/Foo? I'd think not -- either the whole query should be smartcase or none of it should be.


I also think normalizing diacritics (café -> cafe) would be great. Possible queries should be:

Accent normalization:

  • z cafe matches ~/Pictures/Café
  • z café does not match ~/Pictures/Cafe

Case normalization:

  • z cafe matches ~/Pictures/Cafe
  • z Cafe does not match ~/Pictures/cafe

@fenuks
Copy link
Author

fenuks commented Jul 14, 2021

One question here is -- should z doc Foo match ~/Documents/Foo? I'd think not -- either the whole query should be smartcase or none of it should be.

I agree, it should be all or nothing. Or perhaps there should be two types of switches, one that applies case-sensitive search globally if there is at least one with capital letter in search query, and other that infers smart case for each word of query individually.

I also think accent normalization (café -> cafe) would be great. Possible queries should be:

Accent normalization:

* `z cafe` matches `~/Pictures/Café`

* `z café` does not match `~/Pictures/Cafe`

Case normalization:

* `z cafe` matches `~/Pictures/Cafe`

* `z Cafe` does not match `~/Pictures/cafe`

That would be great as well to have, I happen to speak in language with diacritics, but if I were to choose one of the two only, then it would be smart case. ;)

@kidonng
Copy link
Contributor

kidonng commented Aug 5, 2021

Somewhat related: #114

As for me, I would like zoxide to match case-insensitively all the time, with an option to enable case-sensitive matching. Smart case seems overkill.

@ajeetdsouza
Copy link
Owner

@kidonng I'm curious as to why you'd want to disable smartcase matching. I wouldn't expect anyone to use uppercase in a query unless they were hoping for results with the same uppercase letters in them.

@PurpleMyst
Copy link

I'm willing to work on this, if that's ok! I'll be trying to send in a PR by tomorrow.

@ajeetdsouza
Copy link
Owner

@PurpleMyst there's already a pending PR on improving search which will very likely conflict with this. I haven't really had time to look into it yet, but for now, I'd recommend against creating a separate PR.

@lefth
Copy link

lefth commented Nov 26, 2021

@kidonng I'm curious as to why you'd want to disable smartcase matching. I wouldn't expect anyone to use uppercase in a query unless they were hoping for results with the same uppercase letters in them.

I seldom disable smart case in any program, but I sometimes want true case-insensitive when I search for copied text that contains capitals. For example if I want to search for "armv7_unknown" but copied text from: CARGO_TARGET_ARMV7_UNKNOWN_LINUX_MUSLEABIHF_LINKER=arm-linux-musleabihf-gcc.

And case sensitivity (no smart case) is useful when there's a great pollution of upper case strings but you want to find a lower case string.

I suggest the option to disable smart case can be deferred until someone insistently asks for it, since (though it's important) it's so uncommonly used. It may never be an issue in zoxide since zoxide's use case is partly to avoid copying long strings. And because zoxide works on paths rather than codebases, there will be less collision between different cases. Paths don't have the same case convention issues as code (different kinds of tokens having different capitalization).

lefth added a commit to lefth/zoxide that referenced this issue Nov 27, 2021
A future option should be to turn off smart case, but that's not a priority,
for the reasons mentioned here: ajeetdsouza#224
@lefth
Copy link

lefth commented Nov 27, 2021

BTW, I just implemented smart case matching in my branch. If you want to try my version with smart case and new keyword-based scoring (I think these features will make it into the official version at some point), you can install it with: cargo install zoxide --git https://github.com/lefth/zoxide

@ajeetdsouza ajeetdsouza added the feature Request for a new feature. label Jan 20, 2022
@dedebenui
Copy link

Somewhat related to that, a lot of characters with diacritics have several representations, for example é can be U+0065 U+0301 or simply U+00E9. It would be nice if zoxide could merge those, perhaps by running everything (both queries and file/dir lists) through some normalization transformation. Right now, at least on macOS, if my folder is named café (U+0065 U+0301, the preferred encoding when renaming things in Finder) and I z café (U+00E9, what actually gets typed in my shell), I get no match.

@etiennepellegrini
Copy link

Was progress made on this, or on merging @lefth's branch into the project?
I would really enjoy having the smartcase ability (less interested in diacritics for now)

@lefth
Copy link

lefth commented Apr 17, 2024

@etiennepellegrini I haven't pushed to integrate my change, largely because I don't have a solid framework for deciding which matches are best.

We could start that effort and gradually improve it by adding a test that lists a lot of filenames and several input strings and confirms that each best match is correct. But there are a lot of arbitrary decisions to be made. Is "Key" input string a better match for "Keystore" or "key" (or .keyStore for that matter)? Is a full word match, full string match, or exact case match considered best?

How do other flexible database search engines rank matches? Is there a term of art that describes this ranking so I can search for more info?

@etiennepellegrini
Copy link

Those are good questions -- I think your idea of figuring out a list of edge cases is a good step.
Many (hopefully most?) of these decisions may have been decided before, on other projects where smartcase is used.

I don't know enough about the internal workings of zoxide to really help, but here's the way I think about zoxide and how I'd answer the questions you're asking:

  • the input string is used to filter the database
  • the return value is the database entry with the highest score (so the "quality" of the match isn't really determined by the input string)
  • then, if input string is capitalized, only match directories with the exact same case
    (I picture it as if the database was a text file with all directories sorted according to their score, and I was using vim to search for the input key, returning the first match)

So in this scenario, if you have a .keyStore, .keystore, and .Keystore directories:

  • z Key would only match with Keystore
  • z key would match all three and return the highest rated one
  • z kEy doesn't match anything

@lefth
Copy link

lefth commented Apr 17, 2024

That is reasonable, especially as a first step. But the engineer in me says at a minimum, the logic should be able to distinguish among directories named "Audio", "Audiobooks", and "Books". (Also, if that doesn't work it's not even as capable as the proof of concept version in my branch.) So maybe we can start with a small test and few heuristics and add more later if there's an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Request for a new feature.
Projects
None yet
Development

No branches or pull requests

7 participants