Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unordered Tokens Search #378

Closed
Canop opened this issue Apr 30, 2021 · 18 comments
Closed

Unordered Tokens Search #378

Canop opened this issue Apr 30, 2021 · 18 comments
Assignees
Labels
enhancement New feature or request

Comments

@Canop
Copy link
Owner

Canop commented Apr 30, 2021

The goal is to be able to find paths like "/home/user/path2/subpath/Documents/"
by typing the "sub", "doc", and "2" discriminant tokens.

It would probably more naturally be applied to subpaths.

A syntax could probably be

up/sub,doc,2

with "up" the search mode selector, which the user would remap or remove at will (see https://dystroy.org/broot/conf_file/#search-modes)
and the comma separator could be changed to another character (with limited freedom here)

@wolfisberg
Copy link

If doable, I'd love for semicolon ; and / to bin within that limited freedom!

@Canop
Copy link
Owner Author

Canop commented Apr 30, 2021

The semicolon can replace the comma, but not the slash (already used as pattern delimitor).

@Canop Canop self-assigned this Apr 30, 2021
@Canop Canop added the enhancement New feature or request label Apr 30, 2021
@Canop
Copy link
Owner Author

Canop commented Apr 30, 2021

A few notes.

Such a search would be very similar to the composite search

rp/sub/i&rp/doc/i&rp/2/i

but

  • nobody wants to type such a composite pattern on a regular basis
  • matching characters in a composite search can't highlight all the sub patterns
  • the "unordered tokens" search would ensure the matching parts aren't overlapping (and thus allowing proper highlighting)
  • the diacritics and case removals, as well as the unicode normalization in use in fuzzy patterns would be applied there too
  • it's possible a version with a low level of fuzzying is useful (to be tested)

@wolfisberg
Copy link

wolfisberg commented Apr 30, 2021

Thats a good point:

the diacritics and case removals, as well as the unicode normalization in use in fuzzy patterns would be applied there too

I'd prefer it to be either case-insensitive or configurable.

Also, is the composite search unordered?

@Canop
Copy link
Owner Author

Canop commented Apr 30, 2021

Composite search is unordered, yes. You can have parts apply to the name, the subpath, the file content, and the only impact of the order is the time it takes to do the search.
See https://dystroy.org/broot/input/#combining-filtering-patterns

@Canop
Copy link
Owner Author

Canop commented May 4, 2021

The "unordered-tokens" branch contains a first implementation.

You can search on names with nt/, on paths with pt/ or just t/ (of course you can define this as the default).

image

There's no scoring function now: either it matches or it doesn't. I'm not sure whether I'll add a real one (maybe just the total length as a malus).

Remaining tasks:

  • allow both ',' and ';' as separator (only works with the comma right now)
  • scoring (I don't know whether I'll try somethings sophisticate and slow or not)
  • bench and probably perf improvements
  • web documentation

Feedback very welcome

@Canop
Copy link
Owner Author

Canop commented May 5, 2021

It's now on master

@wolfisberg
Copy link

Man, you move fast. I'll gladly test it, but I won't get around to it before saturday.

@wolfisberg
Copy link

wolfisberg commented May 9, 2021

I just tested the feature for a while now, here's my feedback.

$ br --version
broot 1.3.2-dev
  1. t/ works great, very much they way I envisioned it!
  2. it seems to behave case-insensitive (which I prefer over case-sensitive)
  3. Concerning:
* [x]  scoring (I don't know whether I'll try somethings sophisticate and slow or not)

Not sure if any more ranking/scoring magic is actually required. The beauty of the unordered token search is that you can always append something to further narrow the match list. I'd probably not notice a more sophisticated scoring mechanism.
4. the ? help menu seems a little broken for me. The word(s) [unordered] token are not displayed, instead it shows ???

prefix search
... ...
rp/ regex search on sub path
t/ ??? search on sub path

(same goes for the np/ | ??? search on file name entry btw.)

  1. Because of the ??? it is unclear how to configure the searches in the config file. I assume the word that is masked by the question marks (token) is the config value as well?
  2. It's also not entirely clear from the help menu in what ways you can combine the search. Is it sub path only? can it be a file name search as well? Wiki does not provide further information as well.

Seems like a great experience so far, thanks for implementing it!

Cheers.

@Canop
Copy link
Owner Author

Canop commented May 9, 2021

3: I'll start without more precise ranking because it's very expensive (if you want to rank, you have to test all possible positions)

4: Thanks for noticing this bug. I'm fixing it.

5: Yes

  1. Yes, you can search on name with tokens, but it doesn't seem so useful. I could also very simply add content search on tokens but I don't see any point

@wolfisberg
Copy link

cool, thanks for elaborating.

One more thing, is the token delimiter ,`` configurable? comma is not that bad although I'd prefer more options (especially semicolon '`)

@Canop
Copy link
Owner Author

Canop commented May 11, 2021

I finally decided against making this configurable, and I think I had a better idea: you can use both the comma and the semicolon and the first found is the separator.

Combined with filtering out empty tokens, it makes possible to have one of them in your tokens.

For example, when searching a;b,b2;c, the tokens are a, b,b2, and c.

When searching ,a;a2,b, the tokens are a;a2 and b.

@wolfisberg
Copy link

I finally decided against making this configurable, and I think I had a better idea: you can use both the comma and the semicolon and the first found is the separator.

Works for me.

One more thing, I just tested the configuration, and the following config does not seem to work for me, default is not token search.

        search_modes: {
            <empty>:    token path
            e/:         exact path
            r/:         regex path
            f/:         fuzzy path

            en/:        exact name
            rn/:        regex name
            fn/:        fuzzy name

            ec/:        exact content
            rc/:        regex content
        }

@Canop
Copy link
Owner Author

Canop commented May 12, 2021

It's "tokens", not "token".

I'll change so that there's an error on launch when a search mode isn't understood.

@wolfisberg
Copy link

It's "tokens", not "token".

Thank you, tokens works. Is this documented somewhere?

Another thing I noticed:
The search seems to slow down with each added token. If I know what I'm looking for, usually two tokens is enough to find it and broot performs pretty good. But when I don't really know what I'm looking for (basically use br more as a search instead of navigation/file manager) and I enter a third or even fourth token, the search seems to significantly slow down for some reason. I'd imagine adding a third token should actually be much faster than adding the first, because the base list to match against is already smaller. This is usually bearable (<1s) if I start form $HOME but when starting from root /, oddly the search results for the first token are presented almost immediately, but adding a second token takes upwards of 1 second to complete and adding a third upwards of 3 seconds.

@Canop
Copy link
Owner Author

Canop commented May 12, 2021

It's documented here: https://dystroy.org/broot/conf_file/#search-modes
(I just updated that doc)

@Canop
Copy link
Owner Author

Canop commented May 12, 2021

related: 872c9cd

@Canop
Copy link
Owner Author

Canop commented May 19, 2021

Closing the issue.
If something isn't quite OK, come to the chat or open a new one.

@Canop Canop closed this as completed May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants