New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matching algorithm #320
Comments
To reduce the number of unnatural results, starting with next release, an acronym will match consecutive words only. For instance, search string |
Awesome, I'd bet that would significantly reduce the problem I've
experienced in this regard.
Does KP pay attention to capital letter input vs capital letter match?
IE: if I put in KP and I have two items `Keypirinha is a Launcher` and
`kpass.exe` I would think that 'KP' should prioritize the former and
'kp' the latter.
|
This is subject to debate but I would tend to disagree on that. For instance, you may expect different results for different plugins. However, regarding acronyms, they are currently matched only on capital letters, regardless of the case used in the search string. So that |
Ahh, I miswrote what I meant. I meant So given these two items:
I would rank them (personally, for this trite example) as:
I use an IDE called PhpStorm which uses this to great effect, basically capitals matching capitals at \b boundaries or lowercase/Uppercase boundary (as in DependencyInjectionContainer) gives higher weight to capital acronym matches, but otherwise slightly greater weight to consecutive letter matches. I don't know if you've ever used fzf but the fuzzy matching algorithm there is documented pretty clearly in comments, at least the thoughts behind them are pretty well. Incedentally I had worked on my own JS version of the fzf matching using purely regular expressions. I hadn't gotten to the point of testing it with large datasets (besides for speed). Anyways, just some food for thought. |
Case-sensitive search has been considered very early during KP's development and is even implemented as an option in some search-related functions internally, including the fuzzy matching algorithm, which also supports character mismatch as an option (permanently disabled). But in the context of Keypirinha, I'm not quite sure this would be of any benefit to somehow "require" user to enable capital letters in order to enforce a particular type of search that did not give the expected result in the first place using case-insensitive search. Especially once #221 will be implemented, which allows greater benefit at the cost of taking little time (once) to teach Keypirinha how you want things to be found. Perhaps case-sensitive search could be added as an option (disabled by default). But this still implies some changes internally that I'm quite reluctant to make ATM in addition to the fact that you are the first and only user to ever mention it at all. I do not wish to close the door to it though. Thanks for taking the time to elaborate on your use-case. |
no problem. I have quite a few things indexed, finding them is actually more difficult than I would expect. Being a CLI person I'm used to typing the first few letters and tab to auto-complete. I think the big problem I'm running into is that consecutive letter matches are just not ranked as highly as acronym finding. Also, you're right about #221 that would take care of most of my needs. Reading early on in the docs that it functions that way just meant I had to find it manually the first time. Sometimes it's not even in the 1st 200 results though (but rarely). |
I hear you. The aforementioned modification related to acronym matching might be useful in your case. On the other hand it's not trivial to find the right balance in score weighting so that it feels right to most use cases. Every time a user opened an issue here to indicate the search algorithm does not work as expected, their use case was different. I hope #221 will make at least some users happier. |
If it's possible, perhaps exposing those weights into the configuration would let some of us tweak it to the desired levels. Just a thought. Thanks again, I nominated Keypirinha to Life Hackers best app launcher yesterday. They had rescinded Executors #1 status and replaced it with Launchy but didn't even have Keypirinha mentioned. I was quite surprised. |
This leads to new questions like how to categorize weights and how to apply them. For instance, would you categorize per category of item (bookmark, URL, Calc result, ...) ? Type of file? Both? ... In that regard, Keypirinha is "item-type agnostic" in the sense that new item categories can be created via the plugin API, which means the user would have to acquire knowledge of the categories created by some plugins in order to have good weight balance in their final config. That's really a lot of time spent on configuring Keypirinha for the user for such little benefits IMHO. In addition to that, users would still pop up here for exactly the same issue (i.e. search algo gives unexpected results in some specific/corner cases). Probably because at some point, they may want to have those weight to vary for a particular kind of search. All in all I do not believe it is a good solution and to be honest it's been a long while this feature has been removed from project's to-do list.
Thanks for the nice feedback and for encouraging the project :) |
Definitely, I was going to get involved in a new JS launcher gaining popularity named 'hain' but the original developer didn't have the time to work on it and though he gave me access to work on it, I wasn't interested in taking it on by myself.
I would expose it as it is presently used/written. The values must be hard-coded now, no? I'm, making the assumption that the ItemCatalog is searched by a central matching algorithm, except in special cases where a 'live search' is being performed (ala WebSuggest).
Are you saying that each category has its own levers for weighting?
If 90% of the people are happy with it as is, then I wouldn't even publicize it much. For those that do ask, tell them how to get to the config for that. A thought I just had, that may be kindof fun (at least I think it would be), could be using the users interaction between their search and their selection to auto-adjust the weights. So if I type in ch and scroll down 50 items and launch something, the delta between the 'top hit' and the 'user hit' could be looked at and adjusted. Perhaps even using some standard ML library would be enough, dunno. Wish I could help you with the coding effort as I'm proficient in c-c#, JS, PHP, Python and some (keep it 10 feet away, please, Java), most of the K&R languages in general. |
Sorry for the lag. There are nice features in hain like this double pane thing to have content preview. I had several high CPU and memory usage issues with it though... If I may, why did choose not to take on its dev?
That is one of the questions raised by the idea of having such a feature implemented. But again, I'm not particularly fond of the idea itself which, I think, just artificially compensates a flaw in the search algorithm.
I tried something very similar - at least it sounds like it - during early stage of KP's development. While it seemed to be an improvement during the first quick tests, which were successful, it eventually shown itself to produce pretty weird results on the long run. Even by tweaking things a little to fit my own use case, at some point there were items being pushed up out of the blue. Bottom line is that it worked just fine technically-wise, but the algo felt incomplete/messy to the end-user. Unusable as-is.
Sincerely appreciated, thanks! KP's application (as opposed to its package) is written in C++ but is not open source ATM. |
Long story, I was very excited about it in the beginning, having written a pretty sophisticated plugin for it, submitted a few patches. Ultimately it was formed into an organization and I was made an administrator. Unfortunately, the original creator Heejin didn't have any time for it and I didn't have time to take on the project by myself. After some time of a few PR's I reviewed/merged a few developers expressed interest. I had suggested I was pretty familiar with the base code and re-writing it might be the best course, Heejin himself agreed as well. I ended up forming a new org with the new members and worked on some important sections of code for the new project (the primary of which was the matching algorithm). Sometime later, Heejin found time to work on it and afaik, it's been evolving here and there. I don't actively use it any longer now that I've found KP, it replaced both executor and hain for me. I built the fuzzy matching algorithm in JavaScript based on the amazing junegunn/fzf project. In particular his algo.go file has explicit detail on the levers, reasons for them, etc with a V1/V2 comparison. A good read none-the-less. My JS based version is here: regex_fzf.js though it's not battle-tested. My next step was to build a minimal-GUI to see a live-listed ranking but I haven't gotten back to that just yet. It may be a bit tough to follow as-is, it was a test-bed of sorts but leveraged the speed of native regex to score 100k lists in <100ms.
This would be the fzf/regex_fzf versions I was talking about, though they are not exposed to the end-user, they are levers none-the-less. See this line 68 to see the 'levers' I'm referring to. Mind you, these were not written of my own design entirely. If I were to expose algorithm controls to the end user they would be simpler indicators that affected these internal levers.
Perhaps you should try out fzf and see what you think, it's immensely useful anyways if your a keyboard jockey, available for windows/*nix and macOS probably. Dissecting and understanding the fzf algorithm was interesting to me. My own implementation of it was a foray into an area I had interest in anyways.
Perhaps there may be a way to expose a pluggable search algorithm in binary form for KP such that an intrepid OCD person such as myself could write an alternate pluggable algorithm. Perhaps it could just be a python plugin, I've often found Python/JS are plenty fast even for demanding 100k scoring algorithms. |
I'm closing this since v2.20 has been released and acronyms are now matched on consecutive words only, as described above. This does not mean I'm closing the door to this conversation, I'll get back later to it! |
@cpriest noticed that KP's matching algorithm produces unnatural results in some cases.
For instance, item named
RHI3 - Click (With Valid Hash) (Bookmark)
gets a better score thanGoogle Chrome Dev
when searching forch
.Ref: initial chat
The text was updated successfully, but these errors were encountered: