90% of the items returned by Keypirinha are irrelevant #158

Closed
robinhoodhimself opened this Issue Feb 7, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@robinhoodhimself

robinhoodhimself commented Feb 7, 2017

I am using the application for some month now. I find it really powerful and technically cleanly written. I like the text configuration file. (It Reminds me of windows 3.x)
My problem is that I don't understand why more then 90% of the items returned are not correct. Well it could be that some settings are incorrectly configured.

I'm using 2.11 with all the package on W7 64bits.

For example I type "Rogue" and Keypirinha is returning 55 results of wich only one is relevant. I don't understand why Keypirinha is returning those 54 other results. Only the first one should be return.

Here in the picture I'm showing you the first 11 results returned. Only the first result is relevant.

image

Here the item property for the second item returned. I don't understand why this item is ever returned?
Do you see "Rogue" somewhere here?

20:21:36.216 Item properties:
label: Bookmark: Vastgoed / Immobilier ERA Belgie / Belgique - Immobilien makelaar in onroerend goed. Huis te koop, appartement te huur. Agence bien immo . Maison a vendre , appartement á louer. (Firefox)
args:
short_desc:
target: http://www.era.be/splash.htm
category: url (40)
args_hint: forbidden
hit_hint: noargs
plugin: Bookmarks.Bookmarks
item_id: 2301936377849473762
loop_on_suggest: false
data_bag:

I'm waiting for your answer.
Hereunder the console log.
20:47:40.970 Keypirinha 2.11 (66a575a) for x64
20:47:40.970 Installed mode.
20:47:40.970 System: WinNT-x64 6.1.7601-ws-0x0100 SP1.0
20:47:40.970 Official packages: C:\Program Files\Keypirinha\default\Packages
20:47:40.970 Profile dir: C:\Users\robin\AppData\Roaming\Keypirinha
20:47:40.970 Local dir: C:\Users\robin\AppData\Local\Keypirinha
20:47:40.970 Keyboard layout: 0000080C
20:47:40.970 Monitor #1: Name[\.\DISPLAY1] Rect[0, 0, 1920, 1200] DpiScale[1.00] PRIMARY
20:47:41.064 Python 3.5.2 (default, Jul 1 2016, 10:33:08) [MSC v.1800 64 bit (AMD64)]
20:47:42.499 Loaded 1267 items from history file.
20:47:42.515 Loaded 394 items from Catalog file.
20:47:42.546 Init done in 1.8408s, including 0.8137s (44%) for the Python interpreter.
20:47:42.546 Plugin loaded: Internal.About
20:47:42.593 Plugin loaded: Apps.CustomCmds (instance #1)
20:47:42.593 Plugin loaded: Apps.Desktop (instance #1)
20:47:42.593 Plugin loaded: Apps.EnvPath (instance #1)
20:47:42.593 Plugin loaded: Apps.ExtraPaths (instance #1)
20:47:42.593 Plugin loaded: Apps.StartMenu (instance #1)
20:47:42.671 Plugin loaded: Bookmarks.Bookmarks (instance #1)
20:47:42.686 Plugin loaded: Calc.Calc (instance #1)
20:47:42.686 Plugin loaded: ControlPanel.ControlPanel (instance #1)
20:47:42.686 Plugin loaded: Everything.Everything (instance #1)
20:47:42.702 Plugin loaded: FileBrowser.FileBrowser (instance #1)
20:47:42.749 Plugin loaded: FileZilla.FileZilla (instance #1)
20:47:42.764 Plugin loaded: GoogleTranslate.GoogleTranslate (instance #1)
20:47:42.764 Plugin loaded: PuTTY.PuTTY (instance #1)
20:47:42.764 Plugin loaded: RegBrowser.RegBrowser (instance #1)
20:47:42.780 Plugin loaded: SystemCommands.SystemCommands (instance #1)
20:47:42.780 Plugin loaded: TaskSwitcher.TaskSwitcher (instance #1)
20:47:42.780 Plugin loaded: URL.URL (instance #1)
20:47:42.780 Plugin loaded: WebSearch.WebSearch (instance #1)
20:47:42.795 Python: Traceback (most recent call last):
20:47:42.795 Python: File "C:\Program Files\Keypirinha\default\Packages\WebSuggest.keypirinha-package\websuggest.py", line 15, in
20:47:42.795 Python: ImportError: cannot import name 'websuggest_user_parsers'
20:47:42.795 Plugin loaded: WebSuggest.WebSuggest (instance #1)
20:47:42.795 Plugin loaded: Winamp.Winamp (instance #1)
20:47:42.795 Plugin loaded: WinSCP.WinSCP (instance #1)
20:47:42.920 Bookmarks.Bookmarks: Referenced 0 bookmarks
20:47:43.014 Apps.EnvPath: Cataloged 740 items in 0.1 seconds

@polyvertex

This comment has been minimized.

Show comment
Hide comment
@polyvertex

polyvertex Feb 7, 2017

Member

tl;dr Keypirinha implements a permissive matching algorithm (i.e. fuzzy matching), which is the reason why there are 54 items more in the list. They actually match the sequence rogue since the only requirement for a match is that all the characters you type must be present in item's title, and in the same order.

More details: With an exact matching algorithm, you would not be able to type ffx to match Firefox for example; requiring you to type fire or fox or any other sequence that matches exactly at least a part of the name. Yet, in Keypirinha, the item Firefox will have a better score with the search term fire than with the search term ffx.

All of this is then balanced by the scoring system that will tend to give more weight to items you have already launched, despite perhaps a lesser score in some cases.

Indeed, it is not an exact science and this algorithm, like any other, has its cons. It may happen for example that the scoring system gets too much involved in its task by pushing items to high in the results list, which may require you to scroll down the list a tiny bit. This is where the implicit keyword association mechanism comes into the game so it feels a bit more natural the next time you aim for the same item.

So in the end, yes in your example most of the items we see do not seem relevant, yet, the item likely to be the most relevant has been pushed up to the top of the list.

If the length of the results list feels too noisy, you may want to try the max_height setting ([gui] section), which helps reducing the "visual noise":

[gui]
max_height = 10
Member

polyvertex commented Feb 7, 2017

tl;dr Keypirinha implements a permissive matching algorithm (i.e. fuzzy matching), which is the reason why there are 54 items more in the list. They actually match the sequence rogue since the only requirement for a match is that all the characters you type must be present in item's title, and in the same order.

More details: With an exact matching algorithm, you would not be able to type ffx to match Firefox for example; requiring you to type fire or fox or any other sequence that matches exactly at least a part of the name. Yet, in Keypirinha, the item Firefox will have a better score with the search term fire than with the search term ffx.

All of this is then balanced by the scoring system that will tend to give more weight to items you have already launched, despite perhaps a lesser score in some cases.

Indeed, it is not an exact science and this algorithm, like any other, has its cons. It may happen for example that the scoring system gets too much involved in its task by pushing items to high in the results list, which may require you to scroll down the list a tiny bit. This is where the implicit keyword association mechanism comes into the game so it feels a bit more natural the next time you aim for the same item.

So in the end, yes in your example most of the items we see do not seem relevant, yet, the item likely to be the most relevant has been pushed up to the top of the list.

If the length of the results list feels too noisy, you may want to try the max_height setting ([gui] section), which helps reducing the "visual noise":

[gui]
max_height = 10
@robinhoodhimself

This comment has been minimized.

Show comment
Hide comment
@robinhoodhimself

robinhoodhimself Feb 8, 2017

Well, is it possible to place different flags to implement different order of matching instead of very very very fuzzy result?
Thanks for the explanation, but the problem is not the length of the list it is the order you are returning the result. A program is a tool it should give clear result to a clear question. The user is the brain and the computer is the muscles.

Maybe something like (just an idea).

  1. Score + Exact match
    1.5 Exact match
  2. Score
  3. Effectiv fuzzy result
  4. Medium fuzzy result
  5. Very very fuzzy medium.

Right now the algorithm is giving result that as the user I can not predict.

Here waterfox, the program should be first.

image

Here imdb the keyword should be first and not second. If I wanted the first result I would have typed "A cure for li". You can see on the last line the wrong search I'm sending often to imdb.

image

robinhoodhimself commented Feb 8, 2017

Well, is it possible to place different flags to implement different order of matching instead of very very very fuzzy result?
Thanks for the explanation, but the problem is not the length of the list it is the order you are returning the result. A program is a tool it should give clear result to a clear question. The user is the brain and the computer is the muscles.

Maybe something like (just an idea).

  1. Score + Exact match
    1.5 Exact match
  2. Score
  3. Effectiv fuzzy result
  4. Medium fuzzy result
  5. Very very fuzzy medium.

Right now the algorithm is giving result that as the user I can not predict.

Here waterfox, the program should be first.

image

Here imdb the keyword should be first and not second. If I wanted the first result I would have typed "A cure for li". You can see on the last line the wrong search I'm sending often to imdb.

image

@polyvertex

This comment has been minimized.

Show comment
Hide comment
@polyvertex

polyvertex Feb 8, 2017

Member

There are plan to improve the algorithm for the exact case corresponding to your first "waterfox" example, so the scoring system is a bit less involved in pushing already executed items. I've been considering to add an "exact vs. fuzzy" option but don't count too much on it for now as this was just a stone dropped in this deep well that is Keypirinha's todo list.

However, even after this improvement, you will always be able to find cases that you're not happy with. Your "imdb" example fall into this category since the answer really depends on the usage context and your statement would sound really not so obvious using a different kind of item for example.

Keypirinha tries to adapt to your usage profile but it certainly does not claim to have the same level of accuracy than a Google and you should understand that elaborating such a search algorithm is actually a science on its own that does not boil down to 5 bullet points in a comment. For instance, if I would strictly follow the ideal behind your examples to refactor the search algorithm completely in order to comply as much as possible to your user profile, some users would complain the same way, with the same arguments but giving opposite examples to elaborate on their thoughts.

Again, I enjoin you to try relying more on keyword-item association and to try to understand how it works as it might feel a bit tricky in some cases as well (i.e. args vs. no args items). Using your own ideology: Keypirinha is an improvable tool, it has its flows, pros and cons, and it cannot adapt to your usage profile as much as you, the brain, are able to understand how it works in order to make the best out of it.

I'm closing this as I got your point and there's nothing more to add on the subject that would not lead to an infinite loop of comments (no pun intended). Your feedback is appreciated but, alas, KP's author is not a search engine scientist :)

Member

polyvertex commented Feb 8, 2017

There are plan to improve the algorithm for the exact case corresponding to your first "waterfox" example, so the scoring system is a bit less involved in pushing already executed items. I've been considering to add an "exact vs. fuzzy" option but don't count too much on it for now as this was just a stone dropped in this deep well that is Keypirinha's todo list.

However, even after this improvement, you will always be able to find cases that you're not happy with. Your "imdb" example fall into this category since the answer really depends on the usage context and your statement would sound really not so obvious using a different kind of item for example.

Keypirinha tries to adapt to your usage profile but it certainly does not claim to have the same level of accuracy than a Google and you should understand that elaborating such a search algorithm is actually a science on its own that does not boil down to 5 bullet points in a comment. For instance, if I would strictly follow the ideal behind your examples to refactor the search algorithm completely in order to comply as much as possible to your user profile, some users would complain the same way, with the same arguments but giving opposite examples to elaborate on their thoughts.

Again, I enjoin you to try relying more on keyword-item association and to try to understand how it works as it might feel a bit tricky in some cases as well (i.e. args vs. no args items). Using your own ideology: Keypirinha is an improvable tool, it has its flows, pros and cons, and it cannot adapt to your usage profile as much as you, the brain, are able to understand how it works in order to make the best out of it.

I'm closing this as I got your point and there's nothing more to add on the subject that would not lead to an infinite loop of comments (no pun intended). Your feedback is appreciated but, alas, KP's author is not a search engine scientist :)

@polyvertex polyvertex closed this Feb 8, 2017

@polyvertex

This comment has been minimized.

Show comment
Hide comment
@polyvertex

polyvertex Apr 26, 2017

Member

Search algorithm has been improved in v2.15

Member

polyvertex commented Apr 26, 2017

Search algorithm has been improved in v2.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment