Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign up[feature request] Flex/Fuzzy matching #47
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jan 22, 2014
Member
The problem with flx is that to use it as intended, we'll have to feed it the full list of completions at point, matching an empty prefix (IOW, matching anything). In the general case, it will be slow, at two points:
- Transmitting the matches over the wire if we're using an external process. There can be a lot of them, they need to be encoded and decoded, and it could take some time.
- Even if all candidates are already known to Emacs (such as the case with completions in Emacs Lisp code), matching can still take a long time, because we do it in Elisp. If you try
ido-ubiquitoustogether withflx-ido-mode, you can notice that commands likedescribe-functionanddescribe-variabletake a fraction of a second when you start typing to cache the matches table. The invocations after that a fast because they can use the cache, but that won't happen with an arbitrary collection of strings.
To sum up, some backends could do it (and starting with version 0.6.13, released recently, Company allows backends to generate matches using any kind of algorithm), but each backend would need to support that matching algorithm explicitly. Backends using an external process would perform the flex matching before sending the results to Emacs, and in-process backends with fixed completion sets (like Emacs Lisp one) would be able to take advantage of caching.
Some further notes:
-
If you add
company-capftocompany-backends, or, better yet, just use Company with Emacs built from recent trunk, you can take advantage ofpartial-completion, the built-in matching algorithm that's one step stricter than flx (and, naturally, faster), but it still lets you save some keystrokes. It looks like this: http://i.imgur.com/USJgucj.pngOne can also define new matching algorithms by adding entries to
completion-stylesandcompletion-styles-alist, and writing a couple of new functions with predefined signatures. -
All symbols in your example use the initials-based matching. Maybe that's the algorithm you would prefer, instead of flx? It's easier to optimize as a general approach, because this way you at least have a first letter you can match against. Unlike flx, this algorithm should be more reasonable to use as a new completion style.
There is a general limitation in the completion styles mechanism: because it's implemented in a not very efficient way, any completion style other than the first one is only used if all previous styles haven't yielded any matches. So if you define
initials-completionas a new style and put it at the end ofcompletion-styles, it'll only be used if the input has no prefix matches, etc. This is different from YouCompleteMe, which merges all matches, but as long as the user keeps that in mind, it should work okay, too.Or one could go ahead and set
completion-stylesto just one (new) entry, with some hybrid matching algorithm that works like flx, but at least requires the first letter to match.
|
The problem with
To sum up, some backends could do it (and starting with version 0.6.13, released recently, Company allows backends to generate matches using any kind of algorithm), but each backend would need to support that matching algorithm explicitly. Backends using an external process would perform the flex matching before sending the results to Emacs, and in-process backends with fixed completion sets (like Emacs Lisp one) would be able to take advantage of caching. Some further notes:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jan 22, 2014
Contributor
Thanks. I had feared that this was the case. Elisp is really too slow for this kind of thing (YCM uses a native binary). Ideally, the autocomplete library would have a source -> filter -> interface paradigm where flx could be used as the filter module, but that doesn't seem to exist anywhere. I'll look into the stuff you described. Perhaps flx is fast enough (the caching delay seems nonexistant on my system for describe function, and I have a function that iterates through all of the symbols in the file and computes an uncached flx score for each of them [used as a custom hippie-expand routine] that's also very fast, <0.1s on a 4k line file).
Just a few more ideas: [mostly notes to myself]
- Treat first letter as always being the first letter
- Inserting a character always reduces the match scope
- Only
flxmatch if the total matches are <500 - We could really use true asynchronous elisp
- Find an optimized
.*x.*y.*z.*regex and filter by that first
Oh, and thanks for the capf/partial-completion info. That solves most of what I'm talking about and I'm very glad to see it become part of company-mode. Another idea I'll have to look into is just reusing the YCM engine (with it's authors permission) and write a frontend in emacs. It'll be a while before my lisp skills are awesome enough for that.
And your last point, what do you mean? I'm looking around for mentions of custom completion-styles but I can't find anything.
Thanks for your time and, by extension, company-mode.
PythonNut
|
Thanks. I had feared that this was the case. Elisp is really too slow for this kind of thing (YCM uses a native binary). Ideally, the autocomplete library would have a Just a few more ideas: [mostly notes to myself]
Oh, and thanks for the capf/partial-completion info. That solves most of what I'm talking about and I'm very glad to see it become part of And your last point, what do you mean? I'm looking around for mentions of custom Thanks for your time and, by extension, PythonNut |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jan 22, 2014
Member
the autocomplete library would have a source -> filter -> interface paradigm
completion-styles works as a certain kind of filter. We could add something like that to Company, but I'm not convinced, because, like mentioned, this approach is not performance-oriented.
the caching delay seems nonexistant on my system
It depends on the system, of course. The caching delay in describe-function is quite small on my current laptop with Core i7, SSD and Linux, but it was more noticeable on an older Windows machine. And we do want to support older systems.
Find an optimized ._x._y.z. regex and filter by that first
That would be [^x]*x[^y]*y[^z]*z, used for example in ido-set-matches-1 for Flex matching in recent Emacs.
reusing the YCM engine (with it's authors permission) and write a frontend in emacs
Erm, maybe? I'd expect it to have to Vim-specific code, though.
I'm looking around for mentions of custom completion-styles but I can't find anything.
There's probably none existing, but you should be able to write a new one anyway. completion-styles is a defcustom, look at its docstring and the docstring of completion-styles-alist mentioned there. Of course, creating a new style will require one to read the code of some of the existing ones.
Though please let me know if you try and find it too hard. I might try to do it myself.
Thanks for your time and, by extension, company-mode.
No problem!
It depends on the system, of course. The caching delay in
That would be
Erm, maybe? I'd expect it to have to Vim-specific code, though.
There's probably none existing, but you should be able to write a new one anyway. Though please let me know if you try and find it too hard. I might try to do it myself.
No problem! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jan 28, 2014
Member
By the way, note that completion-styles-alist already includes the initials style. It's not the same as flx, but you could try, for example, setting completino-styles to (initials basic partial-completion) and see how it works for you.
|
By the way, note that |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Feb 27, 2014
Contributor
I've been itching to have something like this ever since watching that vim demo, so I just hacked together a quick toy example of a flx version of company-search/filter, basically just using flx-score to rank/filter candidates in company-search/filter-mode, with no backend modification: https://github.com/sadboy/company-mode/tree/flx
To use you need to type some prefix, then press a key to initiate search/filter (or no prefix and just company-manual-begin). Not as good as YouCompleteMe of course, but better than nothing (in most cases). If there's interest I can probably fashion this approach into some kind of stop-gap measure, at least until non-prefix matching can be implemented. However, as already pointed out, performance is definitely a concern. On my 3 year old Core i5-760, elisp completion with no prefix (listing all symbols), typing the first characters in search/filter mode causes a very noticeable delay. On a slower AMD E-350, it's much worse. The flx algorithm should really be done in C.
|
I've been itching to have something like this ever since watching that vim demo, so I just hacked together a quick toy example of a |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 27, 2014
Member
using flx-score to rank/filter candidates in company-search/filter-mode
Have you tried to do the sorting in company-transformers?
Have you tried to do the sorting in |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Feb 27, 2014
Contributor
the initials style. It's not the same as flx
No, and unfortunately it's much inferior. flx matching basically allows you to just type whatever pops into your head and it zeros in on your desired target very quickly. I've just been using this for half a day and I'm already getting the can't-live-without feeling ;) I'm quite surprised nobody has yet bothered to incorporate this algorithm into Emacs proper.
No, and unfortunately it's much inferior. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Feb 27, 2014
Contributor
Have you tried to do the sorting in company-transformers?
No, this is change is purely in company-search/filter mode, so it operates entirely in the frontend (after company-transformers has run)
No, this is change is purely in company-search/filter mode, so it operates entirely in the frontend (after |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 27, 2014
Member
Wouldn't it work better in transformers? This doesn't sounds like a frontend feature to me.
The basic implementation would have to forego the highlighting, but maybe we could work out a convention for highlighting hints the tooltip could pick up and use. Or some new hook that would allow to additionally propertize each tooltip line.
|
Wouldn't it work better in transformers? This doesn't sounds like a frontend feature to me. The basic implementation would have to forego the highlighting, but maybe we could work out a convention for highlighting hints the tooltip could pick up and use. Or some new hook that would allow to additionally propertize each tooltip line. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 27, 2014
Member
On my 3 year old Core i5-760, elisp completion with no prefix (listing all symbols), typing the first characters in search/filter mode causes a very noticeable delay.
You could only do flx scoring when the number of candidates is below a certain threshold.
The flx algorithm should really be done in C.
There was a discussion on emacs-devel some time ago, the suggestion was to implement it in Lisp, and if it's too slow, come back and discuss optimizations. Le hasn't returned to that discussion yet.
You could only do flx scoring when the number of candidates is below a certain threshold.
There was a discussion on emacs-devel some time ago, the suggestion was to implement it in Lisp, and if it's too slow, come back and discuss optimizations. Le hasn't returned to that discussion yet. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Feb 27, 2014
Contributor
Hm... I know flx-ido already bakes a threshold in there somewhere (with flx-ido-threshold). Also, Le seems to have more efficient methods to flx match a group (using caching etc.), but I'm not sure how to use them. The ELisp is not simple...
|
Hm... I know |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Feb 28, 2014
Contributor
Wouldn't it work better in transformers? This doesn't sounds like a frontend feature to me.
Yes of course, but that approach takes more effort. Also it does not conflict with what I'm doing here, unless you're thinking of removing company-search-mode altogether, so it's not an either-or thing. The company-search-mode feature has been there forever but is of limited usefulness in its current form, this simple change makes it much more useful. Implementing non-prefix matching and backend filtering (what you're referring to) can be separate effort.
The basic implementation would have to forego the highlighting
That would be unfortunate, as highlighting actually makes quite a big difference in the usability of this feature.
I think instead of querying the backends, a better approach is to redesign company-candidates, so that instead of being just a string, each element becomes a (STRING . PLIST) pair. This allows auxilliary information to be attached to each candidate, which can be used for things like sorting and/or formatting. One thing that could potentially be improved is combining the old alphabetical sorting and the new company-sort-by-occurance in to one sorting pass, instead of two independent sorting passes.
Yes of course, but that approach takes more effort. Also it does not conflict with what I'm doing here, unless you're thinking of removing
That would be unfortunate, as highlighting actually makes quite a big difference in the usability of this feature. I think instead of querying the backends, a better approach is to redesign |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Feb 28, 2014
Contributor
Le seems to have more efficient methods to flx match a group (using caching etc.)
Aha! That did the trick, same scenario on the i5 is now instantaneous after the first invocation. I had thought the caching was automatic, turns out you have to explicitly pass in the cache to use. Thanks for the tip!
Aha! That did the trick, same scenario on the i5 is now instantaneous after the first invocation. I had thought the caching was automatic, turns out you have to explicitly pass in the cache to use. Thanks for the tip! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 28, 2014
Member
a better approach is to redesign company-candidates, so that instead of being just a string, each element becomes a (STRING . PLIST) pair. This allows auxilliary information to be attached to each candidate, which can be used for things like sorting and/or formatting.
Why don't you just use propertize? See company-clang--parse-output and company-clang--meta, for example.
Why don't you just use |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 28, 2014
Member
Yes of course, but that approach takes more effort.
From what I see of your current code, you're coupling company-mode to flx (and missing a require, by the way), which isn't something I would accept in the codebase (flx isn't in GNU ELPA, for one). If you're just experimenting for yourself, that's fine, of course.
And as for whether flx belongs in company-search-mode... I'm just not sure. We could make the matching method customizable, though.
backend filtering
What's that?
That would be unfortunate, as highlighting actually makes quite a big difference in the usability of this feature.
Well, it's just harder to do without using flx explicitly.
One thing that could potentially be improved is combining the old alphabetical sorting and the new company-sort-by-occurance in to one sorting pass, instead of two independent sorting passes.
I don't think that's desirable: the initial sorting is performed so that the duplicates can be removed quickly. If there are still duplicates when company-sort-by-occurrence is called... well, you've just slowed it down considerably.
From what I see of your current code, you're coupling company-mode to flx (and missing a And as for whether flx belongs in
What's that?
Well, it's just harder to do without using
I don't think that's desirable: the initial sorting is performed so that the duplicates can be removed quickly. If there are still duplicates when |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Feb 28, 2014
Contributor
Why don't you just use propertize? See company-clang--parse-output and company-clang--meta, for example.
That would of course be better, if the properties are guaranteed to survive, I thought there were places which replace or strip off text properties from the candidates.
That would of course be better, if the properties are guaranteed to survive, I thought there were places which replace or strip off text properties from the candidates. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 28, 2014
Member
strip off text properties from the candidates.
Only in company--insert-candidate and potentially in company-reformat.
Tooltip just calls add-text-properties, and not on the candidates themselves anyway.
Only in Tooltip just calls |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Feb 28, 2014
Contributor
which isn't something I would accept in the codebase
Oh, this isn't a pull request by any means, as I said it's just a quick proof-of-concept to see if it's worth my time pursuing further.
missing a
require
It's in there, I just tucked it down with the search-mode code. Again, not code for review, just a prototype to play around with.
We could make the matching method customizable, though.
Or a completely separate mode, company-flx-mode, that isolates the dependencies on flx. BTW I'm not sure what the long-term prospects of that project are, it's been stale for quite sometime.
What's that?
Well to do it like in YCM you need to teach the backends that the "prefix" is no longer a prefix but a subsequence filter.
Well, it's just harder to do without using flx explicitly.
Why? It's only a set of markers to delimit the subsequences to highlight. You can have a general convention that is understood by company-fill-propertize, and the company-flx adapter can translate the results from flx if needed.
so that the duplicates can be removed
Oh right, I forgot about deduping. Still, company-sort-by-occurance and this flx ranking thing (if implemented in the future) can probably be combined into one pass.
Oh, this isn't a pull request by any means, as I said it's just a quick proof-of-concept to see if it's worth my time pursuing further.
It's in there, I just tucked it down with the search-mode code. Again, not code for review, just a prototype to play around with.
Or a completely separate mode,
Well to do it like in
Why? It's only a set of markers to delimit the subsequences to highlight. You can have a general convention that is understood by
Oh right, I forgot about deduping. Still, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 28, 2014
Member
Still, company-sort-by-occurance and this flx ranking thing (if implemented in the future) can probably be combined into one pass.
And then you don't have the flexibility of company-transformers anymore.
There'd have to be a very compelling performance improvement evidence.
And then you don't have the flexibility of |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 28, 2014
Member
Well to do it like in YCM you need to teach the backends that the "prefix" is no longer a prefix but a subsequence filter.
Forcing each backend to use flx is definitely not an option. But otherwise non-prefix matching works, the backend just has to return the right candidates and disable company caching.
Forcing each backend to use flx is definitely not an option. But otherwise non-prefix matching works, the backend just has to return the right candidates and disable company caching. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Feb 28, 2014
Contributor
Forcing each backend to use flx is definitely not an option. But otherwise non-prefix matching works, the backend just have to return the right candidates and disable company caching.
OK, but how would they know what are the right candidates to return in a non-prefix matching scenario?
OK, but how would they know what are the right candidates to return in a non-prefix matching scenario? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Feb 28, 2014
Member
Each one can provide a backend-specific user option. Or use completion-styles, like company-capf does.
|
Each one can provide a backend-specific user option. Or use |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Mar 1, 2014
Contributor
There's have to be a very compelling performance improvement evidence.
Yep, just ran some quick benchmarks, and the overhead of a couple of sorting runs over 10k candidates is practically negligible. That is good, one less thing to worry about.
Yep, just ran some quick benchmarks, and the overhead of a couple of sorting runs over 10k candidates is practically negligible. That is good, one less thing to worry about. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sadboy
Mar 1, 2014
Contributor
Each one can provide a backend-specific user option. Or use completion-styles, like company-capf does.
OK, from what I see so far, here's a list of tasks that should to be done to achieve this feature:
- Decouple visualization from
company-search-string, instead, use ahighlightproperty on the candidate to indicate which portions of candidate to highlight. - Make backends
completion-styles-aware. - Implement new
subsequencecompletion style. - A new transformer
company-sort-by-flx-score. - A
company-flx-searchmode as an alternative tocompany-search, or refactorcompany-searchto use custom matcher and sorter (former option is much simpler).
4 and 5 are dependant on flx so can be isolated in a separate company-flx package. 2 probably takes the most effort.
Is this in line with what you have in mind?
OK, from what I see so far, here's a list of tasks that should to be done to achieve this feature:
4 and 5 are dependant on Is this in line with what you have in mind? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Mar 2, 2014
Member
Make backends completion-styles-aware.
I'm not sure all styles make sense for all backends. Would you use substring matching with dabbrev, for example?
And the biggest problem will be with backends that use external services. For example, retrieving a list of completions with empty-string prefix on target of unknown type is pretty slow in Robe (it does a full scan). But unless you have that list, you can't do substring filtering (except using the -search interface).
Anyway, no real need to move in that direction now. We'll get completion-styles support for free when all/most backends will be turned into completion-at-point-function fns. Which is the current consensus on emacs-devel.
No sure yet what to do about the efficiency concerns: obviously, using the implementations of different styles inside external services would be a lot faster in certain cases. Maybe completion functions would be able to say "no need to apply styles, I'm handling them myself".
Implement new subsequence completion style.
There's an existing substring style, not enabled by default. But you probably mean something like ido's flex, right?
A new transformer company-sort-by-flx-score.
It would use the search string as input, right? Then you'll only need to calculate the scores once, do the searching and adding of the highlight property (probably will have to to use propertize, not add-text-properties, because the strings might be used elsewhere).
Adding the highlight to every candidate could be premature, though: only a few of them are rendered at a time. Maybe just attach the raw information returned by flx, and then do the highlighting in a hook, somewhere in company-fill-propertize. Need to do some measurements.
A company-flx-search mode
At the moment, the search mode does no ordering, it just jumps between matches. If the matches are ordered, there's no real need to jump between them, they're all at the top.
Maybe do a mode with different semantics, like company-search-kill-others? It would sort the matches, and also update company-candidates-predicate on the fly.
I'm not sure all styles make sense for all backends. Would you use substring matching with dabbrev, for example? And the biggest problem will be with backends that use external services. For example, retrieving a list of completions with empty-string prefix on target of unknown type is pretty slow in Robe (it does a full scan). But unless you have that list, you can't do substring filtering (except using the -search interface). Anyway, no real need to move in that direction now. We'll get No sure yet what to do about the efficiency concerns: obviously, using the implementations of different styles inside external services would be a lot faster in certain cases. Maybe completion functions would be able to say "no need to apply styles, I'm handling them myself".
There's an existing
It would use the search string as input, right? Then you'll only need to calculate the scores once, do the searching and adding of the Adding the
At the moment, the search mode does no ordering, it just jumps between matches. If the matches are ordered, there's no real need to jump between them, they're all at the top. Maybe do a mode with different semantics, like |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
wasamasa
Mar 10, 2014
Anyway, no real need to move in that direction now. We'll get completion-styles support for free when all/most backends will be turned into completion-at-point-function fns. Which is the current consensus on emacs-devel.
How do you plan to do that? Will you try adding a "proper" completion-at-point-function upstream or will you make every backend add its own to the major mode's completion-at-point-functions ?
wasamasa
commented
Mar 10, 2014
How do you plan to do that? Will you try adding a "proper" |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Mar 10, 2014
Member
or will you make every backend add its own to the major mode's completion-at-point-functions ?
This, probably. Maybe not every backend, but the ones left behind won't get the completion-styles support. Doing only one completion-at-point-function would be essentially a double conversion, so it's not a direction I'd like the official code to go.
In the meantime, you could implement the "one c-a-p-f function" approach yourself, though (for your own setup, or at least to be packaged separately). Should be relatively easy.
This, probably. Maybe not every backend, but the ones left behind won't get the In the meantime, you could implement the "one c-a-p-f function" approach yourself, though (for your own setup, or at least to be packaged separately). Should be relatively easy. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lewang
commented
Apr 7, 2014
|
Hi all. Do y'all need anything from me? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
wasamasa
Apr 21, 2014
It occurred to me that one of the original suggestions to use a fast binary that does the matching part and communicates with Emacs could be done by writing a new completion style in 24.4. That way it stays optional, doesn't interfere with company and should work for people who are willing to drop an extra binary they've compiled into their $PATH. How does that sound?
wasamasa
commented
Apr 21, 2014
|
It occurred to me that one of the original suggestions to use a fast binary that does the matching part and communicates with Emacs could be done by writing a new completion style in 24.4. That way it stays optional, doesn't interfere with company and should work for people who are willing to drop an extra binary they've compiled into their $PATH. How does that sound? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Apr 21, 2014
Member
The original suggestion was to implement the matching code in C, but not in external program. That could be an option, too, but someone will have to see whether the overhead is small enough, and if it makes caching unnecessary (or whether to launch that external program once and then keep it running).
|
The original suggestion was to implement the matching code in C, but not in external program. That could be an option, too, but someone will have to see whether the overhead is small enough, and if it makes caching unnecessary (or whether to launch that external program once and then keep it running). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lewang
Apr 21, 2014
The original suggestion was to implement the matching code in C, but not in external program. That could be an option, too,
Yep, I originally wanted to do it in C within Emacs. I'm not opposed to trying it out as an external program though. Do you guys know of an example of such a program that interacts with Emacs where speed is of the utmost concern?
The reason that it's not in C is because of time constraints and my unfamiliarity with the Emacs code base. The emacs-lisp version also proved to be fast enough for the vast majority of use-cases. Although to be fair, I don't use flx day-to-day. My impression is that many people do though.
but someone will have to see whether the overhead is small enough, and if it makes caching unnecessary (or whether to launch that external program once and then keep it running).
The interfacing with company-mode and fast searching are separate issues. When I tried with flx and ido, it was fast enough for ~10k items. Why not work on the interface first?
lewang
commented
Apr 21, 2014
Yep, I originally wanted to do it in C within Emacs. I'm not opposed to trying it out as an external program though. Do you guys know of an example of such a program that interacts with Emacs where speed is of the utmost concern? The reason that it's not in C is because of time constraints and my unfamiliarity with the Emacs code base. The emacs-lisp version also proved to be fast enough for the vast majority of use-cases. Although to be fair, I don't use flx day-to-day. My impression is that many people do though.
The interfacing with company-mode and fast searching are separate issues. When I tried with flx and ido, it was fast enough for ~10k items. Why not work on the interface first? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Apr 21, 2014
Member
Why not work on the interface first?
You (or anyone else) are welcome to implement the relevant completion-styles-alist elements. Should be relatively easy.
You (or anyone else) are welcome to implement the relevant |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lewang
Apr 21, 2014
I can do that. Although timing is open.
Track completion style progress here: lewang/flx/issues/54
lewang
commented
Apr 21, 2014
|
I can do that. Although timing is open. Track completion style progress here: lewang/flx/issues/54 |
dgutov
added
the
wishlist
label
Jun 7, 2014
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lazywei
commented
Jun 17, 2014
|
Really need this! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Oct 13, 2014
Contributor
Just a minor note, it occurred to me that we could build the completions cache for a lot of sources when the buffer is created. (Incurring minimal interruptions later). An example would be the lisp symbol completion, which is known almost entirely at buffer creation time.
|
Just a minor note, it occurred to me that we could build the completions cache for a lot of sources when the buffer is created. (Incurring minimal interruptions later). An example would be the lisp symbol completion, which is known almost entirely at buffer creation time. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jun 29, 2015
Contributor
Why not work on the interface first?
You (or anyone else) are welcome to implement the relevant completion-styles-alist elements. Should be relatively easy.
That part is now done, and in fact it doesn't require flx, because completion-style does not control sorting.
@dgutov The systems that control sorting in vanilla minibuffer completion don't seem to be used by company. How can we now add a custom sorting function?
That part is now done, and in fact it doesn't require @dgutov The systems that control sorting in vanilla minibuffer completion don't seem to be used by |
PythonNut
changed the title from
[feature request] flx (fuzzy) matching
to
[feature request] flx (flex/fuzzy) matching
Jun 29, 2015
PythonNut
changed the title from
[feature request] flx (flex/fuzzy) matching
to
[feature request] Flex/Fuzzy matching
Jun 29, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jun 29, 2015
Member
How can we now add a custom sorting function?
Via company-transformers, for example. If you want it to apply to all backends.
Via |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jun 29, 2015
Member
The systems that control sorting in vanilla minibuffer completion
What are those, by the way?
What are those, by the way? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jun 30, 2015
Contributor
@dgutov they're pretty ugly, and I admit, I still don't quite understand how they work.
- Displayed completions are sorted in
minibuffer-completion-help - When cycling, sorting is defined in
completion-all-sorted-completions
In both cases, the sorting comparison function is extracted from the completion metadata:
(let* ((completion--metadata (buffer-substring-no-properties
start (point))
base-size md
minibuffer-completion-table
minibuffer-completion-predicate)
(sort-fun (completion-metadata-get all-md 'cycle-sort-function)))
...)And the metadata comes from minibuffer-completion-table.
|
@dgutov they're pretty ugly, and I admit, I still don't quite understand how they work.
In both cases, the sorting comparison function is extracted from the completion metadata: (let* ((completion--metadata (buffer-substring-no-properties
start (point))
base-size md
minibuffer-completion-table
minibuffer-completion-predicate)
(sort-fun (completion-metadata-get all-md 'cycle-sort-function)))
...)And the metadata comes from |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jun 30, 2015
Member
Oh right. company-capf supports display-sort-function, but it's defined for each completion function separately. Or "completion table", which is basically the same.
|
Oh right. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jun 30, 2015
Contributor
I'm having good success with:
(setq
company-flx-cache (flx-make-string-cache 'flx-get-heatmap-str)
company-transformers
(list
(lambda (cands)
(let ((num-cands (length cands)))
(mapcar #'car
(cl-sort (mapcar
(lambda (cand)
(cons
cand
(or (car (flx-score cand
company-prefix
company-flx-cache))
0)))
(cl-subseq (cl-sort cands
#'<
:key #'length)
0
(min 500 num-cands)))
#'>
:key #'cdr)))))In general, I'll guess that you don't want an exhaustive list of candidates if you're using fuzzy matching as the candidates quickly descend into chaos as the flx-scores decrease.
500 yields near-instantaneous performance for me. Of course it is possible in theory for a good match that is very long to slip through sort and truncate, but it's not a problem because: 1. it's very rare, and 2. you can always just type another character to help narrow down the candidates.
I also have an optimization to the completion style, but I'm not quite sure where to put it since the code itself is mostly someone else's.
|
I'm having good success with: (setq
company-flx-cache (flx-make-string-cache 'flx-get-heatmap-str)
company-transformers
(list
(lambda (cands)
(let ((num-cands (length cands)))
(mapcar #'car
(cl-sort (mapcar
(lambda (cand)
(cons
cand
(or (car (flx-score cand
company-prefix
company-flx-cache))
0)))
(cl-subseq (cl-sort cands
#'<
:key #'length)
0
(min 500 num-cands)))
#'>
:key #'cdr)))))In general, I'll guess that you don't want an exhaustive list of candidates if you're using fuzzy matching as the candidates quickly descend into chaos as the
I also have an optimization to the completion style, but I'm not quite sure where to put it since the code itself is mostly someone else's. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jun 30, 2015
Member
This looks good. company can't depend on flx though, so you can either publish it as a package, or put the function somewhere in the wiki maybe.
|
This looks good. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
How will we reimplement |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jun 30, 2015
Member
It needs some other work, but in the end it should become a completion-at-point-functions element as well.
|
It needs some other work, but in the end it should become a |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jun 30, 2015
Contributor
It should become a completion-at-point-functions element as well.
Is that the plan for all of the backends?
Is that the plan for all of the backends? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
In the long run, yes. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bbatsov
Jul 4, 2015
Contributor
@dgutov I'd also like to see some development here (and not only me). Plenty of folks feel that the inability to deal with any sort of non-prefix candidates if the only major shortcoming for company. Can't we just handle the case when the backend handles the filtering of the candidates. E.g. CIDER supports both matching of word starts (wots -> with-output-to-string) and a true fuzzy matching, but the candidates returned by the backend are simply ignored by the UI, as they don't share the completion prefix. I'm thinking a simple solution be to just expand such leading characters into a full prefix when you receive some response from the backend. //cc @alexander-yakushev
|
@dgutov I'd also like to see some development here (and not only me). Plenty of folks feel that the inability to deal with any sort of non-prefix candidates if the only major shortcoming for company. Can't we just handle the case when the backend handles the filtering of the candidates. E.g. CIDER supports both matching of word starts (wots -> with-output-to-string) and a true fuzzy matching, but the candidates returned by the backend are simply ignored by the UI, as they don't share the completion prefix. I'm thinking a simple solution be to just expand such leading characters into a full prefix when you receive some response from the backend. //cc @alexander-yakushev |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jul 4, 2015
Member
@bbatsov External fuzzy matching is a bit of a different issue.
Bad news there: completion-at-point doesn't support fuzzy matching performed by the external process. You can see that, for instance, by using M-x completion-at-point: there won't be any fuzzy matches there; and company-capf reuses its filtering logic.
Even if the completion function returns fuzzy matches, the "completion table" API obligates the consumer to call all-completions on it (because, for one thing, the table might not be pre-filtered). We've had a long-ish conversation with Stefan about it, but it's sort of hanging unfinished now.
So, as I see, your options are:
- Create a new CIDER Company backend, in addition to the
completion-at-point-functionselement. You can support fuzzy matching in it. - File an Emacs bug for
completion-at-point, asking to support "fuzzy" completion tables. I'll try to, at least, compile the highlights of the related discussion and post them in it. But naturally, anything that comes of that won't help in the currently released Emacs versions.
I'm thinking a simple solution be to just expand such leading characters into a full prefix when you receive some response from the backend.
That might also work, but it will look ugly in the completion popup.
|
@bbatsov External fuzzy matching is a bit of a different issue. Bad news there: Even if the completion function returns fuzzy matches, the "completion table" API obligates the consumer to call So, as I see, your options are:
That might also work, but it will look ugly in the completion popup. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jul 17, 2015
Contributor
I suppose the next question (other than the rearchitecture around capf) is how to support fuzzy highlighting in company. Is there any way to do that easily?
|
I suppose the next question (other than the rearchitecture around capf) is how to support fuzzy highlighting in company. Is there any way to do that easily? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bbatsov
Jul 17, 2015
Contributor
You you can take some pointers from ido-flx I guess. There's highlighting of the matching characters there, which looks pretty good.
|
You you can take some pointers from |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jul 17, 2015
Contributor
@bbatsov I mean, how do we put arbitrary highlighting on company candidates?
|
@bbatsov I mean, how do we put arbitrary highlighting on |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jul 17, 2015
Member
That might also be solved via a change to completion-at-point-funtions API. And a tweak to the handling of the match action in company-backends.
|
That might also be solved via a change to |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Fuco1
Jul 30, 2015
I haven't read all the discussion, but the part about a C external program caught my attention.
Some month ago I wrote https://github.com/Fuco1/yafmp which is something like flx but
- In C (maybe not an advantage :D)
- Uses much faster scoring algorithm (O(mn) dynamic programming versus O(mnumber-of-substring-matches) which grows exponential). Note that the logic behind is roughly the same (substr bonus is not implemented yet, but that can be added easily), so the quality of the matches is comparable to
flxalready. - Can be made to operate in a server mode (not yet implemented), when it caches the heatmaps and you just feed it lists of strings and it ranks them. On my tests I could run 10k strings under 100ms, and that wasn't even very optimized.
Having said that, flx is actually really fast in elisp as well, and could be made much faster using vectors instead of lists. The caching works per "candidate" not "group", so over time it builds the cache for all the strings you are getting back from your backend (each candidate is considered separately, this is the crucial insight). Plus, if speed is really a concern, one could pre-filter and only kick in ranking later (note that prefiltering could be done without regexp alltogether just looking for a subsequence match, which I'm sure could be written to be much faster than regexp as well).
The only real issue I see here which was menitioned is that the backend can take a long time to generate the candidates.
Fuco1
commented
Jul 30, 2015
|
I haven't read all the discussion, but the part about a C external program caught my attention. Some month ago I wrote https://github.com/Fuco1/yafmp which is something like
Having said that, The only real issue I see here which was menitioned is that the backend can take a long time to generate the candidates. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Let's add @lewang to this discussion as well. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lewang
Jul 30, 2015
I welcome all innovation in this field. The reality is I use helm without fuzzy matching. The interface is just too usable.
I've made several efforts to deeply integrate flx into helm and sadly have been distracted/discouraged somewhere along the way each time (working with Thierry is not the easiest, and I don't think he really sees the value of fuzzy matching).
@Fuco1 I haven't tried yafmp yet, but it could be another way forward. I will add that in actual usage, the ability for substring matches to overtake abbreviation matches is important. It means that you rarely have to change parts of the query already typed.
I would also welcome any PRs or ideas to improve performance of flx. IIRC I stopped tweaking with it when it was fast enough in the majority of the use cases.
lewang
commented
Jul 30, 2015
|
I welcome all innovation in this field. The reality is I use helm without fuzzy matching. The interface is just too usable. I've made several efforts to deeply integrate flx into helm and sadly have been distracted/discouraged somewhere along the way each time (working with Thierry is not the easiest, and I don't think he really sees the value of fuzzy matching). @Fuco1 I haven't tried yafmp yet, but it could be another way forward. I will add that in actual usage, the ability for substring matches to overtake abbreviation matches is important. It means that you rarely have to change parts of the query already typed. I would also welcome any PRs or ideas to improve performance of flx. IIRC I stopped tweaking with it when it was fast enough in the majority of the use cases. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Fuco1
Jul 30, 2015
I've made several efforts to deeply integrate flx into helm and sadly have been distracted/discouraged somewhere along the way each time (working with Thierry is not the easiest, and I don't think he really sees the value of fuzzy matching).
Which is one of the reasons I wrote sallet... and there is also ivy and other packages where people got tired of helm's rather peculiar philosopy. Helm is too ad-hoc on a lot of places, and tbh, I find it not too usable. But let's not get offtopic.
@lewang yafmp (as the name suggests) is mostly a proof of concept thing. I took the algorithm from flx (the heatmaps mostly, that is genius :)) and reworked the matching/scoring part, where I don't iterate over all matches but instead build the score incrementally in time proportional only to the length of the pattern, not number of matches.
I tried this in elisp too but elisp has really bad data structures for this kind of programming... you could do it with vectors but I haven't had the nerve to try and port it over :)
Having said that, in practice it rarely happens that the growth is exponential, so that is of relatively minor concern (it blows mostly if the candidates are long and repetitive). The C version is still thousands of times faster, but being an external program is a major PITA to distribute, so unless we could get it inside emacs I don't see that as a viable option.
Fuco1
commented
Jul 30, 2015
Which is one of the reasons I wrote sallet... and there is also ivy and other packages where people got tired of helm's rather peculiar philosopy. Helm is too ad-hoc on a lot of places, and tbh, I find it not too usable. But let's not get offtopic. @lewang yafmp (as the name suggests) is mostly a proof of concept thing. I took the algorithm from flx (the heatmaps mostly, that is genius :)) and reworked the matching/scoring part, where I don't iterate over all matches but instead build the score incrementally in time proportional only to the length of the pattern, not number of matches. I tried this in elisp too but elisp has really bad data structures for this kind of programming... you could do it with vectors but I haven't had the nerve to try and port it over :) Having said that, in practice it rarely happens that the growth is exponential, so that is of relatively minor concern (it blows mostly if the candidates are long and repetitive). The C version is still thousands of times faster, but being an external program is a major PITA to distribute, so unless we could get it inside emacs I don't see that as a viable option. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jul 30, 2015
Member
unless we could get it inside emacs I don't see that as a viable option
We could, though someone should double-check that the piping overhead does not offset the performance gains too much.
On the other hand, like you said, flx is pretty fast already. Maybe it could incorporate the changes from your scoring algorithm, too.
We could, though someone should double-check that the piping overhead does not offset the performance gains too much. On the other hand, like you said, flx is pretty fast already. Maybe it could incorporate the changes from your scoring algorithm, too. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Fuco1
Jul 30, 2015
We could, though someone should double-check that the piping overhead does not offset the performance gains too much.
I ment that it would really be a part of emacs, exposed as a function string-score-flx (or some better name, indicating the algorithm used), so there would be no pipe involved.
On the other hand, like you said, flx is pretty fast already. Maybe it could incorporate the changes from your scoring algorithm, too.
That's definitely possible, I might get back to the idea of porting it there.
Fuco1
commented
Jul 30, 2015
I ment that it would really be a part of emacs, exposed as a function
That's definitely possible, I might get back to the idea of porting it there. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Jul 30, 2015
Member
it would really be a part of emacs, exposed as a function
Oh, that's also possible indeed.
Oh, that's also possible indeed. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jul 30, 2015
Contributor
First, I would absolutely love an even faster Elisp fuzzy matching method, (be it an improvement on flx or a new system entirely).
But one thing to definitely point out is that the fuzzy scoring runs on a sliding scale. If it's too slow, we can trivially back off on the number of candidates to consider. The strategy of sorting the candidates by length and only sorting the first n of them seems to work very well. On fast machines, we might have n=500 and on slow machines, we might have n=50. In some cases (esp, when n<50), it can keep flx from looking far into the future (that is, small inputs with very long matches). Faster scoring means a proportionally larger n.
Here's a fun plot based purely on casual empirical observations.

But this is all relatively secondary. Fast machines (Quad i7) can already achieve n=1000 with almost no lag, so all we really worry about is slower machines.
Slower machines have problems with fuzzy matching. Unlike sorting, matching is a hard limit because you must match against every possible candidate. I currently I, rather naïvely, implement matching this way:
;; here, `infix' is the input to match against
;; `candidates' is the list of fuzzy-matched candidates
(let*
(regexp (concat "\\`"
(mapconcat
(lambda (x)
(concat "[^" (string x) "]*?" (string x)))
infix
"")))
(candidates (cl-remove-if-not
(apply-partially 'string-match-p regexp)
(all-completions prefix table predicate))))With this code, slower laptops are already unusable (sorry, all I have in the slow department is a bunch of Core2Duos and the like). This is without scoring of any kind. I'm no Elisp expert, so I'd love it if some of you people could suggest ways of accelerating this step.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Fuco1
Jul 30, 2015
Please, no sorting on length... that's terrible.
If you want further speedups, then start scoring only when the pattern is some number of letters long. On 2 or 3 letters it's still not super efficient.
Re your code: apply-partially is slow as HELL!!!, so is everything in CL. Use simple loops and simplest possible forms if you go for performance.
Fuco1
commented
Jul 30, 2015
|
Please, no sorting on length... that's terrible. If you want further speedups, then start scoring only when the pattern is some number of letters long. On 2 or 3 letters it's still not super efficient. Re your code: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jul 30, 2015
Contributor
Please, no sorting on length... that's terrible.
I don't actually quite see how it's terrible. Could you elaborate?
Re your code: apply-partially is slow as HELL!!!, so is everything in CL. Use simple loops and simplest possible forms if you go for performance.
Good to know.
I don't actually quite see how it's terrible. Could you elaborate?
Good to know. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
wasamasa
Jul 30, 2015
@PythonNut The best candidate is not necessarily the shortest, especially if you do weighting on boundaries like flx does.
wasamasa
commented
Jul 30, 2015
|
@PythonNut The best candidate is not necessarily the shortest, especially if you do weighting on boundaries like flx does. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Aug 25, 2015
Contributor
@dgutov do you know of a function I can wrap that will only set the completion-style for company? I'd like to start moving in the direction of a company-flx package, and I don't want to just override the completion style for everything.
|
@dgutov do you know of a function I can wrap that will only set the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dgutov
Aug 25, 2015
Member
I don't want to just override the completion style for everything
I'd do just that (why not?), but you can add an around-advice to company-capf instead.
I'd do just that (why not?), but you can add an around-advice to |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
PythonNut
Jan 11, 2016
Contributor
Alright since fuzzy matching if working for company-capf, and we have a concrete way forward to bring this support to other backends (porting to company-capf), I think all of the conditions for this issue have been met.
If anyone is unhappy with my implementation, please file an issue on its own issue tracker.
|
Alright since fuzzy matching if working for If anyone is unhappy with my implementation, please file an issue on its own issue tracker. |
PythonNut commentedJan 21, 2014
flxis a very smart sublime-text like fuzzy/flex matching library. After using similar flex matching in Vim'sYouCompleteMeI miss it inemacs. I would seriously recommend trying it out for a bit. It's amazing how much code you can generate with a few key strokes using these advanced algorithms.Becomes
I already try to use
flxfor everything (minibuffer, hippie-expand etc.), and the only thing left is auto-completion. I can't get it to work myself because it's so complicated. I also have a feature request forauto-complete-mode, but they seem busy with their big 2.0 rewrite and it's no going anywhere.By the way, thanks for
company-modeit's very fast and gets out of my way. I think it's awesome!Thanks,
PythonNut