New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Flex/Fuzzy matching #47

Closed
PythonNut opened this Issue Jan 21, 2014 · 66 comments

Comments

Projects
None yet
8 participants
@PythonNut
Contributor

PythonNut commented Jan 21, 2014

flx is a very smart sublime-text like fuzzy/flex matching library. After using similar flex matching in Vim's YouCompleteMe I miss it in emacs. I would seriously recommend trying it out for a bit. It's amazing how much code you can generate with a few key strokes using these advanced algorithms.

(kcdg<tab> eism<tab> "jj" ens<tab>)

Becomes

(key-chord-define-global evil-insert-state-map "jj" evil-normal-state)

I already try to use flx for everything (minibuffer, hippie-expand etc.), and the only thing left is auto-completion. I can't get it to work myself because it's so complicated. I also have a feature request for auto-complete-mode, but they seem busy with their big 2.0 rewrite and it's no going anywhere.

By the way, thanks for company-mode it's very fast and gets out of my way. I think it's awesome!

Thanks,
PythonNut

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jan 22, 2014

Member

The problem with flx is that to use it as intended, we'll have to feed it the full list of completions at point, matching an empty prefix (IOW, matching anything). In the general case, it will be slow, at two points:

  1. Transmitting the matches over the wire if we're using an external process. There can be a lot of them, they need to be encoded and decoded, and it could take some time.
  2. Even if all candidates are already known to Emacs (such as the case with completions in Emacs Lisp code), matching can still take a long time, because we do it in Elisp. If you try ido-ubiquitous together with flx-ido-mode, you can notice that commands like describe-function and describe-variable take a fraction of a second when you start typing to cache the matches table. The invocations after that a fast because they can use the cache, but that won't happen with an arbitrary collection of strings.

To sum up, some backends could do it (and starting with version 0.6.13, released recently, Company allows backends to generate matches using any kind of algorithm), but each backend would need to support that matching algorithm explicitly. Backends using an external process would perform the flex matching before sending the results to Emacs, and in-process backends with fixed completion sets (like Emacs Lisp one) would be able to take advantage of caching.

Some further notes:

  1. If you add company-capf to company-backends, or, better yet, just use Company with Emacs built from recent trunk, you can take advantage of partial-completion, the built-in matching algorithm that's one step stricter than flx (and, naturally, faster), but it still lets you save some keystrokes. It looks like this: http://i.imgur.com/USJgucj.png

    One can also define new matching algorithms by adding entries to completion-styles and completion-styles-alist, and writing a couple of new functions with predefined signatures.

  2. All symbols in your example use the initials-based matching. Maybe that's the algorithm you would prefer, instead of flx? It's easier to optimize as a general approach, because this way you at least have a first letter you can match against. Unlike flx, this algorithm should be more reasonable to use as a new completion style.

    There is a general limitation in the completion styles mechanism: because it's implemented in a not very efficient way, any completion style other than the first one is only used if all previous styles haven't yielded any matches. So if you define initials-completion as a new style and put it at the end of completion-styles, it'll only be used if the input has no prefix matches, etc. This is different from YouCompleteMe, which merges all matches, but as long as the user keeps that in mind, it should work okay, too.

    Or one could go ahead and set completion-styles to just one (new) entry, with some hybrid matching algorithm that works like flx, but at least requires the first letter to match.

Member

dgutov commented Jan 22, 2014

The problem with flx is that to use it as intended, we'll have to feed it the full list of completions at point, matching an empty prefix (IOW, matching anything). In the general case, it will be slow, at two points:

  1. Transmitting the matches over the wire if we're using an external process. There can be a lot of them, they need to be encoded and decoded, and it could take some time.
  2. Even if all candidates are already known to Emacs (such as the case with completions in Emacs Lisp code), matching can still take a long time, because we do it in Elisp. If you try ido-ubiquitous together with flx-ido-mode, you can notice that commands like describe-function and describe-variable take a fraction of a second when you start typing to cache the matches table. The invocations after that a fast because they can use the cache, but that won't happen with an arbitrary collection of strings.

To sum up, some backends could do it (and starting with version 0.6.13, released recently, Company allows backends to generate matches using any kind of algorithm), but each backend would need to support that matching algorithm explicitly. Backends using an external process would perform the flex matching before sending the results to Emacs, and in-process backends with fixed completion sets (like Emacs Lisp one) would be able to take advantage of caching.

Some further notes:

  1. If you add company-capf to company-backends, or, better yet, just use Company with Emacs built from recent trunk, you can take advantage of partial-completion, the built-in matching algorithm that's one step stricter than flx (and, naturally, faster), but it still lets you save some keystrokes. It looks like this: http://i.imgur.com/USJgucj.png

    One can also define new matching algorithms by adding entries to completion-styles and completion-styles-alist, and writing a couple of new functions with predefined signatures.

  2. All symbols in your example use the initials-based matching. Maybe that's the algorithm you would prefer, instead of flx? It's easier to optimize as a general approach, because this way you at least have a first letter you can match against. Unlike flx, this algorithm should be more reasonable to use as a new completion style.

    There is a general limitation in the completion styles mechanism: because it's implemented in a not very efficient way, any completion style other than the first one is only used if all previous styles haven't yielded any matches. So if you define initials-completion as a new style and put it at the end of completion-styles, it'll only be used if the input has no prefix matches, etc. This is different from YouCompleteMe, which merges all matches, but as long as the user keeps that in mind, it should work okay, too.

    Or one could go ahead and set completion-styles to just one (new) entry, with some hybrid matching algorithm that works like flx, but at least requires the first letter to match.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jan 22, 2014

Contributor

Thanks. I had feared that this was the case. Elisp is really too slow for this kind of thing (YCM uses a native binary). Ideally, the autocomplete library would have a source -> filter -> interface paradigm where flx could be used as the filter module, but that doesn't seem to exist anywhere. I'll look into the stuff you described. Perhaps flx is fast enough (the caching delay seems nonexistant on my system for describe function, and I have a function that iterates through all of the symbols in the file and computes an uncached flx score for each of them [used as a custom hippie-expand routine] that's also very fast, <0.1s on a 4k line file).

Just a few more ideas: [mostly notes to myself]

  • Treat first letter as always being the first letter
  • Inserting a character always reduces the match scope
  • Only flx match if the total matches are <500
  • We could really use true asynchronous elisp
  • Find an optimized .*x.*y.*z.* regex and filter by that first

Oh, and thanks for the capf/partial-completion info. That solves most of what I'm talking about and I'm very glad to see it become part of company-mode. Another idea I'll have to look into is just reusing the YCM engine (with it's authors permission) and write a frontend in emacs. It'll be a while before my lisp skills are awesome enough for that.

And your last point, what do you mean? I'm looking around for mentions of custom completion-styles but I can't find anything.

Thanks for your time and, by extension, company-mode.

PythonNut

Contributor

PythonNut commented Jan 22, 2014

Thanks. I had feared that this was the case. Elisp is really too slow for this kind of thing (YCM uses a native binary). Ideally, the autocomplete library would have a source -> filter -> interface paradigm where flx could be used as the filter module, but that doesn't seem to exist anywhere. I'll look into the stuff you described. Perhaps flx is fast enough (the caching delay seems nonexistant on my system for describe function, and I have a function that iterates through all of the symbols in the file and computes an uncached flx score for each of them [used as a custom hippie-expand routine] that's also very fast, <0.1s on a 4k line file).

Just a few more ideas: [mostly notes to myself]

  • Treat first letter as always being the first letter
  • Inserting a character always reduces the match scope
  • Only flx match if the total matches are <500
  • We could really use true asynchronous elisp
  • Find an optimized .*x.*y.*z.* regex and filter by that first

Oh, and thanks for the capf/partial-completion info. That solves most of what I'm talking about and I'm very glad to see it become part of company-mode. Another idea I'll have to look into is just reusing the YCM engine (with it's authors permission) and write a frontend in emacs. It'll be a while before my lisp skills are awesome enough for that.

And your last point, what do you mean? I'm looking around for mentions of custom completion-styles but I can't find anything.

Thanks for your time and, by extension, company-mode.

PythonNut

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jan 22, 2014

Member

the autocomplete library would have a source -> filter -> interface paradigm

completion-styles works as a certain kind of filter. We could add something like that to Company, but I'm not convinced, because, like mentioned, this approach is not performance-oriented.

the caching delay seems nonexistant on my system

It depends on the system, of course. The caching delay in describe-function is quite small on my current laptop with Core i7, SSD and Linux, but it was more noticeable on an older Windows machine. And we do want to support older systems.

Find an optimized ._x._y.z. regex and filter by that first

That would be [^x]*x[^y]*y[^z]*z, used for example in ido-set-matches-1 for Flex matching in recent Emacs.

reusing the YCM engine (with it's authors permission) and write a frontend in emacs

Erm, maybe? I'd expect it to have to Vim-specific code, though.

I'm looking around for mentions of custom completion-styles but I can't find anything.

There's probably none existing, but you should be able to write a new one anyway. completion-styles is a defcustom, look at its docstring and the docstring of completion-styles-alist mentioned there. Of course, creating a new style will require one to read the code of some of the existing ones.

Though please let me know if you try and find it too hard. I might try to do it myself.

Thanks for your time and, by extension, company-mode.

No problem!

Member

dgutov commented Jan 22, 2014

the autocomplete library would have a source -> filter -> interface paradigm

completion-styles works as a certain kind of filter. We could add something like that to Company, but I'm not convinced, because, like mentioned, this approach is not performance-oriented.

the caching delay seems nonexistant on my system

It depends on the system, of course. The caching delay in describe-function is quite small on my current laptop with Core i7, SSD and Linux, but it was more noticeable on an older Windows machine. And we do want to support older systems.

Find an optimized ._x._y.z. regex and filter by that first

That would be [^x]*x[^y]*y[^z]*z, used for example in ido-set-matches-1 for Flex matching in recent Emacs.

reusing the YCM engine (with it's authors permission) and write a frontend in emacs

Erm, maybe? I'd expect it to have to Vim-specific code, though.

I'm looking around for mentions of custom completion-styles but I can't find anything.

There's probably none existing, but you should be able to write a new one anyway. completion-styles is a defcustom, look at its docstring and the docstring of completion-styles-alist mentioned there. Of course, creating a new style will require one to read the code of some of the existing ones.

Though please let me know if you try and find it too hard. I might try to do it myself.

Thanks for your time and, by extension, company-mode.

No problem!

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jan 28, 2014

Member

By the way, note that completion-styles-alist already includes the initials style. It's not the same as flx, but you could try, for example, setting completino-styles to (initials basic partial-completion) and see how it works for you.

Member

dgutov commented Jan 28, 2014

By the way, note that completion-styles-alist already includes the initials style. It's not the same as flx, but you could try, for example, setting completino-styles to (initials basic partial-completion) and see how it works for you.

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Feb 27, 2014

Contributor

I've been itching to have something like this ever since watching that vim demo, so I just hacked together a quick toy example of a flx version of company-search/filter, basically just using flx-score to rank/filter candidates in company-search/filter-mode, with no backend modification: https://github.com/sadboy/company-mode/tree/flx
To use you need to type some prefix, then press a key to initiate search/filter (or no prefix and just company-manual-begin). Not as good as YouCompleteMe of course, but better than nothing (in most cases). If there's interest I can probably fashion this approach into some kind of stop-gap measure, at least until non-prefix matching can be implemented. However, as already pointed out, performance is definitely a concern. On my 3 year old Core i5-760, elisp completion with no prefix (listing all symbols), typing the first characters in search/filter mode causes a very noticeable delay. On a slower AMD E-350, it's much worse. The flx algorithm should really be done in C.

Contributor

sadboy commented Feb 27, 2014

I've been itching to have something like this ever since watching that vim demo, so I just hacked together a quick toy example of a flx version of company-search/filter, basically just using flx-score to rank/filter candidates in company-search/filter-mode, with no backend modification: https://github.com/sadboy/company-mode/tree/flx
To use you need to type some prefix, then press a key to initiate search/filter (or no prefix and just company-manual-begin). Not as good as YouCompleteMe of course, but better than nothing (in most cases). If there's interest I can probably fashion this approach into some kind of stop-gap measure, at least until non-prefix matching can be implemented. However, as already pointed out, performance is definitely a concern. On my 3 year old Core i5-760, elisp completion with no prefix (listing all symbols), typing the first characters in search/filter mode causes a very noticeable delay. On a slower AMD E-350, it's much worse. The flx algorithm should really be done in C.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 27, 2014

Member

using flx-score to rank/filter candidates in company-search/filter-mode

Have you tried to do the sorting in company-transformers?

Member

dgutov commented Feb 27, 2014

using flx-score to rank/filter candidates in company-search/filter-mode

Have you tried to do the sorting in company-transformers?

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Feb 27, 2014

Contributor

the initials style. It's not the same as flx

No, and unfortunately it's much inferior. flx matching basically allows you to just type whatever pops into your head and it zeros in on your desired target very quickly. I've just been using this for half a day and I'm already getting the can't-live-without feeling ;) I'm quite surprised nobody has yet bothered to incorporate this algorithm into Emacs proper.

Contributor

sadboy commented Feb 27, 2014

the initials style. It's not the same as flx

No, and unfortunately it's much inferior. flx matching basically allows you to just type whatever pops into your head and it zeros in on your desired target very quickly. I've just been using this for half a day and I'm already getting the can't-live-without feeling ;) I'm quite surprised nobody has yet bothered to incorporate this algorithm into Emacs proper.

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Feb 27, 2014

Contributor

Have you tried to do the sorting in company-transformers?

No, this is change is purely in company-search/filter mode, so it operates entirely in the frontend (after company-transformers has run)

Contributor

sadboy commented Feb 27, 2014

Have you tried to do the sorting in company-transformers?

No, this is change is purely in company-search/filter mode, so it operates entirely in the frontend (after company-transformers has run)

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 27, 2014

Member

Wouldn't it work better in transformers? This doesn't sounds like a frontend feature to me.

The basic implementation would have to forego the highlighting, but maybe we could work out a convention for highlighting hints the tooltip could pick up and use. Or some new hook that would allow to additionally propertize each tooltip line.

Member

dgutov commented Feb 27, 2014

Wouldn't it work better in transformers? This doesn't sounds like a frontend feature to me.

The basic implementation would have to forego the highlighting, but maybe we could work out a convention for highlighting hints the tooltip could pick up and use. Or some new hook that would allow to additionally propertize each tooltip line.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 27, 2014

Member

On my 3 year old Core i5-760, elisp completion with no prefix (listing all symbols), typing the first characters in search/filter mode causes a very noticeable delay.

You could only do flx scoring when the number of candidates is below a certain threshold.

The flx algorithm should really be done in C.

There was a discussion on emacs-devel some time ago, the suggestion was to implement it in Lisp, and if it's too slow, come back and discuss optimizations. Le hasn't returned to that discussion yet.

Member

dgutov commented Feb 27, 2014

On my 3 year old Core i5-760, elisp completion with no prefix (listing all symbols), typing the first characters in search/filter mode causes a very noticeable delay.

You could only do flx scoring when the number of candidates is below a certain threshold.

The flx algorithm should really be done in C.

There was a discussion on emacs-devel some time ago, the suggestion was to implement it in Lisp, and if it's too slow, come back and discuss optimizations. Le hasn't returned to that discussion yet.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Feb 27, 2014

Contributor

Hm... I know flx-ido already bakes a threshold in there somewhere (with flx-ido-threshold). Also, Le seems to have more efficient methods to flx match a group (using caching etc.), but I'm not sure how to use them. The ELisp is not simple...

Contributor

PythonNut commented Feb 27, 2014

Hm... I know flx-ido already bakes a threshold in there somewhere (with flx-ido-threshold). Also, Le seems to have more efficient methods to flx match a group (using caching etc.), but I'm not sure how to use them. The ELisp is not simple...

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Feb 28, 2014

Contributor

Wouldn't it work better in transformers? This doesn't sounds like a frontend feature to me.

Yes of course, but that approach takes more effort. Also it does not conflict with what I'm doing here, unless you're thinking of removing company-search-mode altogether, so it's not an either-or thing. The company-search-mode feature has been there forever but is of limited usefulness in its current form, this simple change makes it much more useful. Implementing non-prefix matching and backend filtering (what you're referring to) can be separate effort.

The basic implementation would have to forego the highlighting

That would be unfortunate, as highlighting actually makes quite a big difference in the usability of this feature.

I think instead of querying the backends, a better approach is to redesign company-candidates, so that instead of being just a string, each element becomes a (STRING . PLIST) pair. This allows auxilliary information to be attached to each candidate, which can be used for things like sorting and/or formatting. One thing that could potentially be improved is combining the old alphabetical sorting and the new company-sort-by-occurance in to one sorting pass, instead of two independent sorting passes.

Contributor

sadboy commented Feb 28, 2014

Wouldn't it work better in transformers? This doesn't sounds like a frontend feature to me.

Yes of course, but that approach takes more effort. Also it does not conflict with what I'm doing here, unless you're thinking of removing company-search-mode altogether, so it's not an either-or thing. The company-search-mode feature has been there forever but is of limited usefulness in its current form, this simple change makes it much more useful. Implementing non-prefix matching and backend filtering (what you're referring to) can be separate effort.

The basic implementation would have to forego the highlighting

That would be unfortunate, as highlighting actually makes quite a big difference in the usability of this feature.

I think instead of querying the backends, a better approach is to redesign company-candidates, so that instead of being just a string, each element becomes a (STRING . PLIST) pair. This allows auxilliary information to be attached to each candidate, which can be used for things like sorting and/or formatting. One thing that could potentially be improved is combining the old alphabetical sorting and the new company-sort-by-occurance in to one sorting pass, instead of two independent sorting passes.

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Feb 28, 2014

Contributor

Le seems to have more efficient methods to flx match a group (using caching etc.)

Aha! That did the trick, same scenario on the i5 is now instantaneous after the first invocation. I had thought the caching was automatic, turns out you have to explicitly pass in the cache to use. Thanks for the tip!

Contributor

sadboy commented Feb 28, 2014

Le seems to have more efficient methods to flx match a group (using caching etc.)

Aha! That did the trick, same scenario on the i5 is now instantaneous after the first invocation. I had thought the caching was automatic, turns out you have to explicitly pass in the cache to use. Thanks for the tip!

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 28, 2014

Member

a better approach is to redesign company-candidates, so that instead of being just a string, each element becomes a (STRING . PLIST) pair. This allows auxilliary information to be attached to each candidate, which can be used for things like sorting and/or formatting.

Why don't you just use propertize? See company-clang--parse-output and company-clang--meta, for example.

Member

dgutov commented Feb 28, 2014

a better approach is to redesign company-candidates, so that instead of being just a string, each element becomes a (STRING . PLIST) pair. This allows auxilliary information to be attached to each candidate, which can be used for things like sorting and/or formatting.

Why don't you just use propertize? See company-clang--parse-output and company-clang--meta, for example.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 28, 2014

Member

Yes of course, but that approach takes more effort.

From what I see of your current code, you're coupling company-mode to flx (and missing a require, by the way), which isn't something I would accept in the codebase (flx isn't in GNU ELPA, for one). If you're just experimenting for yourself, that's fine, of course.

And as for whether flx belongs in company-search-mode... I'm just not sure. We could make the matching method customizable, though.

backend filtering

What's that?

That would be unfortunate, as highlighting actually makes quite a big difference in the usability of this feature.

Well, it's just harder to do without using flx explicitly.

One thing that could potentially be improved is combining the old alphabetical sorting and the new company-sort-by-occurance in to one sorting pass, instead of two independent sorting passes.

I don't think that's desirable: the initial sorting is performed so that the duplicates can be removed quickly. If there are still duplicates when company-sort-by-occurrence is called... well, you've just slowed it down considerably.

Member

dgutov commented Feb 28, 2014

Yes of course, but that approach takes more effort.

From what I see of your current code, you're coupling company-mode to flx (and missing a require, by the way), which isn't something I would accept in the codebase (flx isn't in GNU ELPA, for one). If you're just experimenting for yourself, that's fine, of course.

And as for whether flx belongs in company-search-mode... I'm just not sure. We could make the matching method customizable, though.

backend filtering

What's that?

That would be unfortunate, as highlighting actually makes quite a big difference in the usability of this feature.

Well, it's just harder to do without using flx explicitly.

One thing that could potentially be improved is combining the old alphabetical sorting and the new company-sort-by-occurance in to one sorting pass, instead of two independent sorting passes.

I don't think that's desirable: the initial sorting is performed so that the duplicates can be removed quickly. If there are still duplicates when company-sort-by-occurrence is called... well, you've just slowed it down considerably.

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Feb 28, 2014

Contributor

Why don't you just use propertize? See company-clang--parse-output and company-clang--meta, for example.

That would of course be better, if the properties are guaranteed to survive, I thought there were places which replace or strip off text properties from the candidates.

Contributor

sadboy commented Feb 28, 2014

Why don't you just use propertize? See company-clang--parse-output and company-clang--meta, for example.

That would of course be better, if the properties are guaranteed to survive, I thought there were places which replace or strip off text properties from the candidates.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 28, 2014

Member

strip off text properties from the candidates.

Only in company--insert-candidate and potentially in company-reformat.

Tooltip just calls add-text-properties, and not on the candidates themselves anyway.

Member

dgutov commented Feb 28, 2014

strip off text properties from the candidates.

Only in company--insert-candidate and potentially in company-reformat.

Tooltip just calls add-text-properties, and not on the candidates themselves anyway.

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Feb 28, 2014

Contributor

which isn't something I would accept in the codebase

Oh, this isn't a pull request by any means, as I said it's just a quick proof-of-concept to see if it's worth my time pursuing further.

missing a require

It's in there, I just tucked it down with the search-mode code. Again, not code for review, just a prototype to play around with.

We could make the matching method customizable, though.

Or a completely separate mode, company-flx-mode, that isolates the dependencies on flx. BTW I'm not sure what the long-term prospects of that project are, it's been stale for quite sometime.

What's that?

Well to do it like in YCM you need to teach the backends that the "prefix" is no longer a prefix but a subsequence filter.

Well, it's just harder to do without using flx explicitly.

Why? It's only a set of markers to delimit the subsequences to highlight. You can have a general convention that is understood by company-fill-propertize, and the company-flx adapter can translate the results from flx if needed.

so that the duplicates can be removed

Oh right, I forgot about deduping. Still, company-sort-by-occurance and this flx ranking thing (if implemented in the future) can probably be combined into one pass.

Contributor

sadboy commented Feb 28, 2014

which isn't something I would accept in the codebase

Oh, this isn't a pull request by any means, as I said it's just a quick proof-of-concept to see if it's worth my time pursuing further.

missing a require

It's in there, I just tucked it down with the search-mode code. Again, not code for review, just a prototype to play around with.

We could make the matching method customizable, though.

Or a completely separate mode, company-flx-mode, that isolates the dependencies on flx. BTW I'm not sure what the long-term prospects of that project are, it's been stale for quite sometime.

What's that?

Well to do it like in YCM you need to teach the backends that the "prefix" is no longer a prefix but a subsequence filter.

Well, it's just harder to do without using flx explicitly.

Why? It's only a set of markers to delimit the subsequences to highlight. You can have a general convention that is understood by company-fill-propertize, and the company-flx adapter can translate the results from flx if needed.

so that the duplicates can be removed

Oh right, I forgot about deduping. Still, company-sort-by-occurance and this flx ranking thing (if implemented in the future) can probably be combined into one pass.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 28, 2014

Member

Still, company-sort-by-occurance and this flx ranking thing (if implemented in the future) can probably be combined into one pass.

And then you don't have the flexibility of company-transformers anymore.
There'd have to be a very compelling performance improvement evidence.

Member

dgutov commented Feb 28, 2014

Still, company-sort-by-occurance and this flx ranking thing (if implemented in the future) can probably be combined into one pass.

And then you don't have the flexibility of company-transformers anymore.
There'd have to be a very compelling performance improvement evidence.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 28, 2014

Member

Well to do it like in YCM you need to teach the backends that the "prefix" is no longer a prefix but a subsequence filter.

Forcing each backend to use flx is definitely not an option. But otherwise non-prefix matching works, the backend just has to return the right candidates and disable company caching.

Member

dgutov commented Feb 28, 2014

Well to do it like in YCM you need to teach the backends that the "prefix" is no longer a prefix but a subsequence filter.

Forcing each backend to use flx is definitely not an option. But otherwise non-prefix matching works, the backend just has to return the right candidates and disable company caching.

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Feb 28, 2014

Contributor

Forcing each backend to use flx is definitely not an option. But otherwise non-prefix matching works, the backend just have to return the right candidates and disable company caching.

OK, but how would they know what are the right candidates to return in a non-prefix matching scenario?

Contributor

sadboy commented Feb 28, 2014

Forcing each backend to use flx is definitely not an option. But otherwise non-prefix matching works, the backend just have to return the right candidates and disable company caching.

OK, but how would they know what are the right candidates to return in a non-prefix matching scenario?

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Feb 28, 2014

Member

Each one can provide a backend-specific user option. Or use completion-styles, like company-capf does.

Member

dgutov commented Feb 28, 2014

Each one can provide a backend-specific user option. Or use completion-styles, like company-capf does.

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Mar 1, 2014

Contributor

There's have to be a very compelling performance improvement evidence.

Yep, just ran some quick benchmarks, and the overhead of a couple of sorting runs over 10k candidates is practically negligible. That is good, one less thing to worry about.

Contributor

sadboy commented Mar 1, 2014

There's have to be a very compelling performance improvement evidence.

Yep, just ran some quick benchmarks, and the overhead of a couple of sorting runs over 10k candidates is practically negligible. That is good, one less thing to worry about.

@sadboy

This comment has been minimized.

Show comment
Hide comment
@sadboy

sadboy Mar 1, 2014

Contributor

Each one can provide a backend-specific user option. Or use completion-styles, like company-capf does.

OK, from what I see so far, here's a list of tasks that should to be done to achieve this feature:

  1. Decouple visualization from company-search-string, instead, use a highlight property on the candidate to indicate which portions of candidate to highlight.
  2. Make backends completion-styles-aware.
  3. Implement new subsequence completion style.
  4. A new transformer company-sort-by-flx-score.
  5. A company-flx-search mode as an alternative to company-search, or refactor company-search to use custom matcher and sorter (former option is much simpler).

4 and 5 are dependant on flx so can be isolated in a separate company-flx package. 2 probably takes the most effort.

Is this in line with what you have in mind?

Contributor

sadboy commented Mar 1, 2014

Each one can provide a backend-specific user option. Or use completion-styles, like company-capf does.

OK, from what I see so far, here's a list of tasks that should to be done to achieve this feature:

  1. Decouple visualization from company-search-string, instead, use a highlight property on the candidate to indicate which portions of candidate to highlight.
  2. Make backends completion-styles-aware.
  3. Implement new subsequence completion style.
  4. A new transformer company-sort-by-flx-score.
  5. A company-flx-search mode as an alternative to company-search, or refactor company-search to use custom matcher and sorter (former option is much simpler).

4 and 5 are dependant on flx so can be isolated in a separate company-flx package. 2 probably takes the most effort.

Is this in line with what you have in mind?

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Mar 2, 2014

Member

Make backends completion-styles-aware.

I'm not sure all styles make sense for all backends. Would you use substring matching with dabbrev, for example?

And the biggest problem will be with backends that use external services. For example, retrieving a list of completions with empty-string prefix on target of unknown type is pretty slow in Robe (it does a full scan). But unless you have that list, you can't do substring filtering (except using the -search interface).

Anyway, no real need to move in that direction now. We'll get completion-styles support for free when all/most backends will be turned into completion-at-point-function fns. Which is the current consensus on emacs-devel.

No sure yet what to do about the efficiency concerns: obviously, using the implementations of different styles inside external services would be a lot faster in certain cases. Maybe completion functions would be able to say "no need to apply styles, I'm handling them myself".

Implement new subsequence completion style.

There's an existing substring style, not enabled by default. But you probably mean something like ido's flex, right?

A new transformer company-sort-by-flx-score.

It would use the search string as input, right? Then you'll only need to calculate the scores once, do the searching and adding of the highlight property (probably will have to to use propertize, not add-text-properties, because the strings might be used elsewhere).

Adding the highlight to every candidate could be premature, though: only a few of them are rendered at a time. Maybe just attach the raw information returned by flx, and then do the highlighting in a hook, somewhere in company-fill-propertize. Need to do some measurements.

A company-flx-search mode

At the moment, the search mode does no ordering, it just jumps between matches. If the matches are ordered, there's no real need to jump between them, they're all at the top.

Maybe do a mode with different semantics, like company-search-kill-others? It would sort the matches, and also update company-candidates-predicate on the fly.

Member

dgutov commented Mar 2, 2014

Make backends completion-styles-aware.

I'm not sure all styles make sense for all backends. Would you use substring matching with dabbrev, for example?

And the biggest problem will be with backends that use external services. For example, retrieving a list of completions with empty-string prefix on target of unknown type is pretty slow in Robe (it does a full scan). But unless you have that list, you can't do substring filtering (except using the -search interface).

Anyway, no real need to move in that direction now. We'll get completion-styles support for free when all/most backends will be turned into completion-at-point-function fns. Which is the current consensus on emacs-devel.

No sure yet what to do about the efficiency concerns: obviously, using the implementations of different styles inside external services would be a lot faster in certain cases. Maybe completion functions would be able to say "no need to apply styles, I'm handling them myself".

Implement new subsequence completion style.

There's an existing substring style, not enabled by default. But you probably mean something like ido's flex, right?

A new transformer company-sort-by-flx-score.

It would use the search string as input, right? Then you'll only need to calculate the scores once, do the searching and adding of the highlight property (probably will have to to use propertize, not add-text-properties, because the strings might be used elsewhere).

Adding the highlight to every candidate could be premature, though: only a few of them are rendered at a time. Maybe just attach the raw information returned by flx, and then do the highlighting in a hook, somewhere in company-fill-propertize. Need to do some measurements.

A company-flx-search mode

At the moment, the search mode does no ordering, it just jumps between matches. If the matches are ordered, there's no real need to jump between them, they're all at the top.

Maybe do a mode with different semantics, like company-search-kill-others? It would sort the matches, and also update company-candidates-predicate on the fly.

@wasamasa

This comment has been minimized.

Show comment
Hide comment
@wasamasa

wasamasa Mar 10, 2014

Anyway, no real need to move in that direction now. We'll get completion-styles support for free when all/most backends will be turned into completion-at-point-function fns. Which is the current consensus on emacs-devel.

How do you plan to do that? Will you try adding a "proper" completion-at-point-function upstream or will you make every backend add its own to the major mode's completion-at-point-functions ?

wasamasa commented Mar 10, 2014

Anyway, no real need to move in that direction now. We'll get completion-styles support for free when all/most backends will be turned into completion-at-point-function fns. Which is the current consensus on emacs-devel.

How do you plan to do that? Will you try adding a "proper" completion-at-point-function upstream or will you make every backend add its own to the major mode's completion-at-point-functions ?

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Mar 10, 2014

Member

or will you make every backend add its own to the major mode's completion-at-point-functions ?

This, probably. Maybe not every backend, but the ones left behind won't get the completion-styles support. Doing only one completion-at-point-function would be essentially a double conversion, so it's not a direction I'd like the official code to go.

In the meantime, you could implement the "one c-a-p-f function" approach yourself, though (for your own setup, or at least to be packaged separately). Should be relatively easy.

Member

dgutov commented Mar 10, 2014

or will you make every backend add its own to the major mode's completion-at-point-functions ?

This, probably. Maybe not every backend, but the ones left behind won't get the completion-styles support. Doing only one completion-at-point-function would be essentially a double conversion, so it's not a direction I'd like the official code to go.

In the meantime, you could implement the "one c-a-p-f function" approach yourself, though (for your own setup, or at least to be packaged separately). Should be relatively easy.

@lewang

This comment has been minimized.

Show comment
Hide comment
@lewang

lewang Apr 7, 2014

Hi all. Do y'all need anything from me?

lewang commented Apr 7, 2014

Hi all. Do y'all need anything from me?

@wasamasa

This comment has been minimized.

Show comment
Hide comment
@wasamasa

wasamasa Apr 21, 2014

It occurred to me that one of the original suggestions to use a fast binary that does the matching part and communicates with Emacs could be done by writing a new completion style in 24.4. That way it stays optional, doesn't interfere with company and should work for people who are willing to drop an extra binary they've compiled into their $PATH. How does that sound?

wasamasa commented Apr 21, 2014

It occurred to me that one of the original suggestions to use a fast binary that does the matching part and communicates with Emacs could be done by writing a new completion style in 24.4. That way it stays optional, doesn't interfere with company and should work for people who are willing to drop an extra binary they've compiled into their $PATH. How does that sound?

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Apr 21, 2014

Member

The original suggestion was to implement the matching code in C, but not in external program. That could be an option, too, but someone will have to see whether the overhead is small enough, and if it makes caching unnecessary (or whether to launch that external program once and then keep it running).

Member

dgutov commented Apr 21, 2014

The original suggestion was to implement the matching code in C, but not in external program. That could be an option, too, but someone will have to see whether the overhead is small enough, and if it makes caching unnecessary (or whether to launch that external program once and then keep it running).

@lewang

This comment has been minimized.

Show comment
Hide comment
@lewang

lewang Apr 21, 2014

The original suggestion was to implement the matching code in C, but not in external program. That could be an option, too,

Yep, I originally wanted to do it in C within Emacs. I'm not opposed to trying it out as an external program though. Do you guys know of an example of such a program that interacts with Emacs where speed is of the utmost concern?

The reason that it's not in C is because of time constraints and my unfamiliarity with the Emacs code base. The emacs-lisp version also proved to be fast enough for the vast majority of use-cases. Although to be fair, I don't use flx day-to-day. My impression is that many people do though.

but someone will have to see whether the overhead is small enough, and if it makes caching unnecessary (or whether to launch that external program once and then keep it running).

The interfacing with company-mode and fast searching are separate issues. When I tried with flx and ido, it was fast enough for ~10k items. Why not work on the interface first?

lewang commented Apr 21, 2014

The original suggestion was to implement the matching code in C, but not in external program. That could be an option, too,

Yep, I originally wanted to do it in C within Emacs. I'm not opposed to trying it out as an external program though. Do you guys know of an example of such a program that interacts with Emacs where speed is of the utmost concern?

The reason that it's not in C is because of time constraints and my unfamiliarity with the Emacs code base. The emacs-lisp version also proved to be fast enough for the vast majority of use-cases. Although to be fair, I don't use flx day-to-day. My impression is that many people do though.

but someone will have to see whether the overhead is small enough, and if it makes caching unnecessary (or whether to launch that external program once and then keep it running).

The interfacing with company-mode and fast searching are separate issues. When I tried with flx and ido, it was fast enough for ~10k items. Why not work on the interface first?

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Apr 21, 2014

Member

Why not work on the interface first?

You (or anyone else) are welcome to implement the relevant completion-styles-alist elements. Should be relatively easy.

Member

dgutov commented Apr 21, 2014

Why not work on the interface first?

You (or anyone else) are welcome to implement the relevant completion-styles-alist elements. Should be relatively easy.

@lewang

This comment has been minimized.

Show comment
Hide comment
@lewang

lewang Apr 21, 2014

I can do that. Although timing is open. 😸

Track completion style progress here: lewang/flx/issues/54

lewang commented Apr 21, 2014

I can do that. Although timing is open. 😸

Track completion style progress here: lewang/flx/issues/54

@dgutov dgutov added the wishlist label Jun 7, 2014

@lazywei

This comment has been minimized.

Show comment
Hide comment
@lazywei

lazywei Jun 17, 2014

Really need this!

lazywei commented Jun 17, 2014

Really need this!

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Oct 13, 2014

Contributor

Just a minor note, it occurred to me that we could build the completions cache for a lot of sources when the buffer is created. (Incurring minimal interruptions later). An example would be the lisp symbol completion, which is known almost entirely at buffer creation time.

Contributor

PythonNut commented Oct 13, 2014

Just a minor note, it occurred to me that we could build the completions cache for a lot of sources when the buffer is created. (Incurring minimal interruptions later). An example would be the lisp symbol completion, which is known almost entirely at buffer creation time.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jun 29, 2015

Contributor

Why not work on the interface first?

You (or anyone else) are welcome to implement the relevant completion-styles-alist elements. Should be relatively easy.

That part is now done, and in fact it doesn't require flx, because completion-style does not control sorting.

@dgutov The systems that control sorting in vanilla minibuffer completion don't seem to be used by company. How can we now add a custom sorting function?

Contributor

PythonNut commented Jun 29, 2015

Why not work on the interface first?

You (or anyone else) are welcome to implement the relevant completion-styles-alist elements. Should be relatively easy.

That part is now done, and in fact it doesn't require flx, because completion-style does not control sorting.

@dgutov The systems that control sorting in vanilla minibuffer completion don't seem to be used by company. How can we now add a custom sorting function?

@PythonNut PythonNut changed the title from [feature request] flx (fuzzy) matching to [feature request] flx (flex/fuzzy) matching Jun 29, 2015

@PythonNut PythonNut changed the title from [feature request] flx (flex/fuzzy) matching to [feature request] Flex/Fuzzy matching Jun 29, 2015

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jun 29, 2015

Member

How can we now add a custom sorting function?

Via company-transformers, for example. If you want it to apply to all backends.

Member

dgutov commented Jun 29, 2015

How can we now add a custom sorting function?

Via company-transformers, for example. If you want it to apply to all backends.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jun 29, 2015

Member

The systems that control sorting in vanilla minibuffer completion

What are those, by the way?

Member

dgutov commented Jun 29, 2015

The systems that control sorting in vanilla minibuffer completion

What are those, by the way?

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jun 30, 2015

Contributor

@dgutov they're pretty ugly, and I admit, I still don't quite understand how they work.

  • Displayed completions are sorted in minibuffer-completion-help
  • When cycling, sorting is defined in completion-all-sorted-completions

In both cases, the sorting comparison function is extracted from the completion metadata:

(let* ((completion--metadata (buffer-substring-no-properties
                                            start (point))
                                           base-size md
                                           minibuffer-completion-table
                                           minibuffer-completion-predicate)
    (sort-fun (completion-metadata-get all-md 'cycle-sort-function)))
    ...)

And the metadata comes from minibuffer-completion-table.

Contributor

PythonNut commented Jun 30, 2015

@dgutov they're pretty ugly, and I admit, I still don't quite understand how they work.

  • Displayed completions are sorted in minibuffer-completion-help
  • When cycling, sorting is defined in completion-all-sorted-completions

In both cases, the sorting comparison function is extracted from the completion metadata:

(let* ((completion--metadata (buffer-substring-no-properties
                                            start (point))
                                           base-size md
                                           minibuffer-completion-table
                                           minibuffer-completion-predicate)
    (sort-fun (completion-metadata-get all-md 'cycle-sort-function)))
    ...)

And the metadata comes from minibuffer-completion-table.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jun 30, 2015

Member

Oh right. company-capf supports display-sort-function, but it's defined for each completion function separately. Or "completion table", which is basically the same.

Member

dgutov commented Jun 30, 2015

Oh right. company-capf supports display-sort-function, but it's defined for each completion function separately. Or "completion table", which is basically the same.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jun 30, 2015

Contributor

I'm having good success with:

(setq
    company-flx-cache (flx-make-string-cache 'flx-get-heatmap-str)
    company-transformers
    (list
      (lambda (cands)
        (let ((num-cands (length cands)))
          (mapcar #'car
            (cl-sort (mapcar
                       (lambda (cand)
                         (cons
                           cand
                           (or (car (flx-score cand
                                      company-prefix
                                      company-flx-cache))
                             0)))
                       (cl-subseq (cl-sort cands
                                    #'<
                                    :key #'length)
                         0
                         (min 500 num-cands)))
              #'>
              :key #'cdr)))))

In general, I'll guess that you don't want an exhaustive list of candidates if you're using fuzzy matching as the candidates quickly descend into chaos as the flx-scores decrease.

500 yields near-instantaneous performance for me. Of course it is possible in theory for a good match that is very long to slip through sort and truncate, but it's not a problem because: 1. it's very rare, and 2. you can always just type another character to help narrow down the candidates.

I also have an optimization to the completion style, but I'm not quite sure where to put it since the code itself is mostly someone else's.

Contributor

PythonNut commented Jun 30, 2015

I'm having good success with:

(setq
    company-flx-cache (flx-make-string-cache 'flx-get-heatmap-str)
    company-transformers
    (list
      (lambda (cands)
        (let ((num-cands (length cands)))
          (mapcar #'car
            (cl-sort (mapcar
                       (lambda (cand)
                         (cons
                           cand
                           (or (car (flx-score cand
                                      company-prefix
                                      company-flx-cache))
                             0)))
                       (cl-subseq (cl-sort cands
                                    #'<
                                    :key #'length)
                         0
                         (min 500 num-cands)))
              #'>
              :key #'cdr)))))

In general, I'll guess that you don't want an exhaustive list of candidates if you're using fuzzy matching as the candidates quickly descend into chaos as the flx-scores decrease.

500 yields near-instantaneous performance for me. Of course it is possible in theory for a good match that is very long to slip through sort and truncate, but it's not a problem because: 1. it's very rare, and 2. you can always just type another character to help narrow down the candidates.

I also have an optimization to the completion style, but I'm not quite sure where to put it since the code itself is mostly someone else's.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jun 30, 2015

Member

This looks good. company can't depend on flx though, so you can either publish it as a package, or put the function somewhere in the wiki maybe.

Member

dgutov commented Jun 30, 2015

This looks good. company can't depend on flx though, so you can either publish it as a package, or put the function somewhere in the wiki maybe.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jun 30, 2015

Contributor

How will we reimplement company-dabbrev?

Contributor

PythonNut commented Jun 30, 2015

How will we reimplement company-dabbrev?

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jun 30, 2015

Member

It needs some other work, but in the end it should become a completion-at-point-functions element as well.

Member

dgutov commented Jun 30, 2015

It needs some other work, but in the end it should become a completion-at-point-functions element as well.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jun 30, 2015

Contributor

It should become a completion-at-point-functions element as well.

Is that the plan for all of the backends?

Contributor

PythonNut commented Jun 30, 2015

It should become a completion-at-point-functions element as well.

Is that the plan for all of the backends?

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jun 30, 2015

Member

In the long run, yes.

Member

dgutov commented Jun 30, 2015

In the long run, yes.

@bbatsov

This comment has been minimized.

Show comment
Hide comment
@bbatsov

bbatsov Jul 4, 2015

Contributor

@dgutov I'd also like to see some development here (and not only me). Plenty of folks feel that the inability to deal with any sort of non-prefix candidates if the only major shortcoming for company. Can't we just handle the case when the backend handles the filtering of the candidates. E.g. CIDER supports both matching of word starts (wots -> with-output-to-string) and a true fuzzy matching, but the candidates returned by the backend are simply ignored by the UI, as they don't share the completion prefix. I'm thinking a simple solution be to just expand such leading characters into a full prefix when you receive some response from the backend. //cc @alexander-yakushev

Contributor

bbatsov commented Jul 4, 2015

@dgutov I'd also like to see some development here (and not only me). Plenty of folks feel that the inability to deal with any sort of non-prefix candidates if the only major shortcoming for company. Can't we just handle the case when the backend handles the filtering of the candidates. E.g. CIDER supports both matching of word starts (wots -> with-output-to-string) and a true fuzzy matching, but the candidates returned by the backend are simply ignored by the UI, as they don't share the completion prefix. I'm thinking a simple solution be to just expand such leading characters into a full prefix when you receive some response from the backend. //cc @alexander-yakushev

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jul 4, 2015

Member

@bbatsov External fuzzy matching is a bit of a different issue.

Bad news there: completion-at-point doesn't support fuzzy matching performed by the external process. You can see that, for instance, by using M-x completion-at-point: there won't be any fuzzy matches there; and company-capf reuses its filtering logic.

Even if the completion function returns fuzzy matches, the "completion table" API obligates the consumer to call all-completions on it (because, for one thing, the table might not be pre-filtered). We've had a long-ish conversation with Stefan about it, but it's sort of hanging unfinished now.

So, as I see, your options are:

  • Create a new CIDER Company backend, in addition to the completion-at-point-functions element. You can support fuzzy matching in it.
  • File an Emacs bug for completion-at-point, asking to support "fuzzy" completion tables. I'll try to, at least, compile the highlights of the related discussion and post them in it. But naturally, anything that comes of that won't help in the currently released Emacs versions.

I'm thinking a simple solution be to just expand such leading characters into a full prefix when you receive some response from the backend.

That might also work, but it will look ugly in the completion popup.

Member

dgutov commented Jul 4, 2015

@bbatsov External fuzzy matching is a bit of a different issue.

Bad news there: completion-at-point doesn't support fuzzy matching performed by the external process. You can see that, for instance, by using M-x completion-at-point: there won't be any fuzzy matches there; and company-capf reuses its filtering logic.

Even if the completion function returns fuzzy matches, the "completion table" API obligates the consumer to call all-completions on it (because, for one thing, the table might not be pre-filtered). We've had a long-ish conversation with Stefan about it, but it's sort of hanging unfinished now.

So, as I see, your options are:

  • Create a new CIDER Company backend, in addition to the completion-at-point-functions element. You can support fuzzy matching in it.
  • File an Emacs bug for completion-at-point, asking to support "fuzzy" completion tables. I'll try to, at least, compile the highlights of the related discussion and post them in it. But naturally, anything that comes of that won't help in the currently released Emacs versions.

I'm thinking a simple solution be to just expand such leading characters into a full prefix when you receive some response from the backend.

That might also work, but it will look ugly in the completion popup.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jul 17, 2015

Contributor

I suppose the next question (other than the rearchitecture around capf) is how to support fuzzy highlighting in company. Is there any way to do that easily?

Contributor

PythonNut commented Jul 17, 2015

I suppose the next question (other than the rearchitecture around capf) is how to support fuzzy highlighting in company. Is there any way to do that easily?

@bbatsov

This comment has been minimized.

Show comment
Hide comment
@bbatsov

bbatsov Jul 17, 2015

Contributor

You you can take some pointers from ido-flx I guess. There's highlighting of the matching characters there, which looks pretty good.

Contributor

bbatsov commented Jul 17, 2015

You you can take some pointers from ido-flx I guess. There's highlighting of the matching characters there, which looks pretty good.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jul 17, 2015

Contributor

@bbatsov I mean, how do we put arbitrary highlighting on company candidates?

Contributor

PythonNut commented Jul 17, 2015

@bbatsov I mean, how do we put arbitrary highlighting on company candidates?

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jul 17, 2015

Member

That might also be solved via a change to completion-at-point-funtions API. And a tweak to the handling of the match action in company-backends.

Member

dgutov commented Jul 17, 2015

That might also be solved via a change to completion-at-point-funtions API. And a tweak to the handling of the match action in company-backends.

@Fuco1

This comment has been minimized.

Show comment
Hide comment
@Fuco1

Fuco1 Jul 30, 2015

I haven't read all the discussion, but the part about a C external program caught my attention.

Some month ago I wrote https://github.com/Fuco1/yafmp which is something like flx but

  • In C (maybe not an advantage :D)
  • Uses much faster scoring algorithm (O(mn) dynamic programming versus O(mnumber-of-substring-matches) which grows exponential). Note that the logic behind is roughly the same (substr bonus is not implemented yet, but that can be added easily), so the quality of the matches is comparable to flx already.
  • Can be made to operate in a server mode (not yet implemented), when it caches the heatmaps and you just feed it lists of strings and it ranks them. On my tests I could run 10k strings under 100ms, and that wasn't even very optimized.

Having said that, flx is actually really fast in elisp as well, and could be made much faster using vectors instead of lists. The caching works per "candidate" not "group", so over time it builds the cache for all the strings you are getting back from your backend (each candidate is considered separately, this is the crucial insight). Plus, if speed is really a concern, one could pre-filter and only kick in ranking later (note that prefiltering could be done without regexp alltogether just looking for a subsequence match, which I'm sure could be written to be much faster than regexp as well).

The only real issue I see here which was menitioned is that the backend can take a long time to generate the candidates.

Fuco1 commented Jul 30, 2015

I haven't read all the discussion, but the part about a C external program caught my attention.

Some month ago I wrote https://github.com/Fuco1/yafmp which is something like flx but

  • In C (maybe not an advantage :D)
  • Uses much faster scoring algorithm (O(mn) dynamic programming versus O(mnumber-of-substring-matches) which grows exponential). Note that the logic behind is roughly the same (substr bonus is not implemented yet, but that can be added easily), so the quality of the matches is comparable to flx already.
  • Can be made to operate in a server mode (not yet implemented), when it caches the heatmaps and you just feed it lists of strings and it ranks them. On my tests I could run 10k strings under 100ms, and that wasn't even very optimized.

Having said that, flx is actually really fast in elisp as well, and could be made much faster using vectors instead of lists. The caching works per "candidate" not "group", so over time it builds the cache for all the strings you are getting back from your backend (each candidate is considered separately, this is the crucial insight). Plus, if speed is really a concern, one could pre-filter and only kick in ranking later (note that prefiltering could be done without regexp alltogether just looking for a subsequence match, which I'm sure could be written to be much faster than regexp as well).

The only real issue I see here which was menitioned is that the backend can take a long time to generate the candidates.

@bbatsov

This comment has been minimized.

Show comment
Hide comment
@bbatsov

bbatsov Jul 30, 2015

Contributor

Let's add @lewang to this discussion as well.

Contributor

bbatsov commented Jul 30, 2015

Let's add @lewang to this discussion as well.

@lewang

This comment has been minimized.

Show comment
Hide comment
@lewang

lewang Jul 30, 2015

I welcome all innovation in this field. The reality is I use helm without fuzzy matching. The interface is just too usable.

I've made several efforts to deeply integrate flx into helm and sadly have been distracted/discouraged somewhere along the way each time (working with Thierry is not the easiest, and I don't think he really sees the value of fuzzy matching).

@Fuco1 I haven't tried yafmp yet, but it could be another way forward. I will add that in actual usage, the ability for substring matches to overtake abbreviation matches is important. It means that you rarely have to change parts of the query already typed.

I would also welcome any PRs or ideas to improve performance of flx. IIRC I stopped tweaking with it when it was fast enough in the majority of the use cases.

lewang commented Jul 30, 2015

I welcome all innovation in this field. The reality is I use helm without fuzzy matching. The interface is just too usable.

I've made several efforts to deeply integrate flx into helm and sadly have been distracted/discouraged somewhere along the way each time (working with Thierry is not the easiest, and I don't think he really sees the value of fuzzy matching).

@Fuco1 I haven't tried yafmp yet, but it could be another way forward. I will add that in actual usage, the ability for substring matches to overtake abbreviation matches is important. It means that you rarely have to change parts of the query already typed.

I would also welcome any PRs or ideas to improve performance of flx. IIRC I stopped tweaking with it when it was fast enough in the majority of the use cases.

@Fuco1

This comment has been minimized.

Show comment
Hide comment
@Fuco1

Fuco1 Jul 30, 2015

I've made several efforts to deeply integrate flx into helm and sadly have been distracted/discouraged somewhere along the way each time (working with Thierry is not the easiest, and I don't think he really sees the value of fuzzy matching).

Which is one of the reasons I wrote sallet... and there is also ivy and other packages where people got tired of helm's rather peculiar philosopy. Helm is too ad-hoc on a lot of places, and tbh, I find it not too usable. But let's not get offtopic.

@lewang yafmp (as the name suggests) is mostly a proof of concept thing. I took the algorithm from flx (the heatmaps mostly, that is genius :)) and reworked the matching/scoring part, where I don't iterate over all matches but instead build the score incrementally in time proportional only to the length of the pattern, not number of matches.

I tried this in elisp too but elisp has really bad data structures for this kind of programming... you could do it with vectors but I haven't had the nerve to try and port it over :)

Having said that, in practice it rarely happens that the growth is exponential, so that is of relatively minor concern (it blows mostly if the candidates are long and repetitive). The C version is still thousands of times faster, but being an external program is a major PITA to distribute, so unless we could get it inside emacs I don't see that as a viable option.

Fuco1 commented Jul 30, 2015

I've made several efforts to deeply integrate flx into helm and sadly have been distracted/discouraged somewhere along the way each time (working with Thierry is not the easiest, and I don't think he really sees the value of fuzzy matching).

Which is one of the reasons I wrote sallet... and there is also ivy and other packages where people got tired of helm's rather peculiar philosopy. Helm is too ad-hoc on a lot of places, and tbh, I find it not too usable. But let's not get offtopic.

@lewang yafmp (as the name suggests) is mostly a proof of concept thing. I took the algorithm from flx (the heatmaps mostly, that is genius :)) and reworked the matching/scoring part, where I don't iterate over all matches but instead build the score incrementally in time proportional only to the length of the pattern, not number of matches.

I tried this in elisp too but elisp has really bad data structures for this kind of programming... you could do it with vectors but I haven't had the nerve to try and port it over :)

Having said that, in practice it rarely happens that the growth is exponential, so that is of relatively minor concern (it blows mostly if the candidates are long and repetitive). The C version is still thousands of times faster, but being an external program is a major PITA to distribute, so unless we could get it inside emacs I don't see that as a viable option.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jul 30, 2015

Member

unless we could get it inside emacs I don't see that as a viable option

We could, though someone should double-check that the piping overhead does not offset the performance gains too much.

On the other hand, like you said, flx is pretty fast already. Maybe it could incorporate the changes from your scoring algorithm, too.

Member

dgutov commented Jul 30, 2015

unless we could get it inside emacs I don't see that as a viable option

We could, though someone should double-check that the piping overhead does not offset the performance gains too much.

On the other hand, like you said, flx is pretty fast already. Maybe it could incorporate the changes from your scoring algorithm, too.

@Fuco1

This comment has been minimized.

Show comment
Hide comment
@Fuco1

Fuco1 Jul 30, 2015

We could, though someone should double-check that the piping overhead does not offset the performance gains too much.

I ment that it would really be a part of emacs, exposed as a function string-score-flx (or some better name, indicating the algorithm used), so there would be no pipe involved.

On the other hand, like you said, flx is pretty fast already. Maybe it could incorporate the changes from your scoring algorithm, too.

That's definitely possible, I might get back to the idea of porting it there.

Fuco1 commented Jul 30, 2015

We could, though someone should double-check that the piping overhead does not offset the performance gains too much.

I ment that it would really be a part of emacs, exposed as a function string-score-flx (or some better name, indicating the algorithm used), so there would be no pipe involved.

On the other hand, like you said, flx is pretty fast already. Maybe it could incorporate the changes from your scoring algorithm, too.

That's definitely possible, I might get back to the idea of porting it there.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Jul 30, 2015

Member

it would really be a part of emacs, exposed as a function

Oh, that's also possible indeed.

Member

dgutov commented Jul 30, 2015

it would really be a part of emacs, exposed as a function

Oh, that's also possible indeed.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jul 30, 2015

Contributor

First, I would absolutely love an even faster Elisp fuzzy matching method, (be it an improvement on flx or a new system entirely).

But one thing to definitely point out is that the fuzzy scoring runs on a sliding scale. If it's too slow, we can trivially back off on the number of candidates to consider. The strategy of sorting the candidates by length and only sorting the first n of them seems to work very well. On fast machines, we might have n=500 and on slow machines, we might have n=50. In some cases (esp, when n<50), it can keep flx from looking far into the future (that is, small inputs with very long matches). Faster scoring means a proportionally larger n.

Here's a fun plot based purely on casual empirical observations.
figure_1

But this is all relatively secondary. Fast machines (Quad i7) can already achieve n=1000 with almost no lag, so all we really worry about is slower machines.


Slower machines have problems with fuzzy matching. Unlike sorting, matching is a hard limit because you must match against every possible candidate. I currently I, rather naïvely, implement matching this way:

;; here, `infix' is the input to match against
;; `candidates' is the list of fuzzy-matched candidates
(let*
  (regexp (concat "\\`"
            (mapconcat
              (lambda (x)
                (concat "[^" (string x) "]*?" (string x)))
              infix
              "")))
  (candidates (cl-remove-if-not
                (apply-partially 'string-match-p regexp)
                (all-completions prefix table predicate))))

With this code, slower laptops are already unusable (sorry, all I have in the slow department is a bunch of Core2Duos and the like). This is without scoring of any kind. I'm no Elisp expert, so I'd love it if some of you people could suggest ways of accelerating this step.

Contributor

PythonNut commented Jul 30, 2015

First, I would absolutely love an even faster Elisp fuzzy matching method, (be it an improvement on flx or a new system entirely).

But one thing to definitely point out is that the fuzzy scoring runs on a sliding scale. If it's too slow, we can trivially back off on the number of candidates to consider. The strategy of sorting the candidates by length and only sorting the first n of them seems to work very well. On fast machines, we might have n=500 and on slow machines, we might have n=50. In some cases (esp, when n<50), it can keep flx from looking far into the future (that is, small inputs with very long matches). Faster scoring means a proportionally larger n.

Here's a fun plot based purely on casual empirical observations.
figure_1

But this is all relatively secondary. Fast machines (Quad i7) can already achieve n=1000 with almost no lag, so all we really worry about is slower machines.


Slower machines have problems with fuzzy matching. Unlike sorting, matching is a hard limit because you must match against every possible candidate. I currently I, rather naïvely, implement matching this way:

;; here, `infix' is the input to match against
;; `candidates' is the list of fuzzy-matched candidates
(let*
  (regexp (concat "\\`"
            (mapconcat
              (lambda (x)
                (concat "[^" (string x) "]*?" (string x)))
              infix
              "")))
  (candidates (cl-remove-if-not
                (apply-partially 'string-match-p regexp)
                (all-completions prefix table predicate))))

With this code, slower laptops are already unusable (sorry, all I have in the slow department is a bunch of Core2Duos and the like). This is without scoring of any kind. I'm no Elisp expert, so I'd love it if some of you people could suggest ways of accelerating this step.

@Fuco1

This comment has been minimized.

Show comment
Hide comment
@Fuco1

Fuco1 Jul 30, 2015

Please, no sorting on length... that's terrible.

If you want further speedups, then start scoring only when the pattern is some number of letters long. On 2 or 3 letters it's still not super efficient.

Re your code: apply-partially is slow as HELL!!!, so is everything in CL. Use simple loops and simplest possible forms if you go for performance.

Fuco1 commented Jul 30, 2015

Please, no sorting on length... that's terrible.

If you want further speedups, then start scoring only when the pattern is some number of letters long. On 2 or 3 letters it's still not super efficient.

Re your code: apply-partially is slow as HELL!!!, so is everything in CL. Use simple loops and simplest possible forms if you go for performance.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jul 30, 2015

Contributor

Please, no sorting on length... that's terrible.

I don't actually quite see how it's terrible. Could you elaborate?

Re your code: apply-partially is slow as HELL!!!, so is everything in CL. Use simple loops and simplest possible forms if you go for performance.

Good to know.

Contributor

PythonNut commented Jul 30, 2015

Please, no sorting on length... that's terrible.

I don't actually quite see how it's terrible. Could you elaborate?

Re your code: apply-partially is slow as HELL!!!, so is everything in CL. Use simple loops and simplest possible forms if you go for performance.

Good to know.

@wasamasa

This comment has been minimized.

Show comment
Hide comment
@wasamasa

wasamasa Jul 30, 2015

@PythonNut The best candidate is not necessarily the shortest, especially if you do weighting on boundaries like flx does.

wasamasa commented Jul 30, 2015

@PythonNut The best candidate is not necessarily the shortest, especially if you do weighting on boundaries like flx does.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Aug 25, 2015

Contributor

@dgutov do you know of a function I can wrap that will only set the completion-style for company? I'd like to start moving in the direction of a company-flx package, and I don't want to just override the completion style for everything.

Contributor

PythonNut commented Aug 25, 2015

@dgutov do you know of a function I can wrap that will only set the completion-style for company? I'd like to start moving in the direction of a company-flx package, and I don't want to just override the completion style for everything.

@dgutov

This comment has been minimized.

Show comment
Hide comment
@dgutov

dgutov Aug 25, 2015

Member

I don't want to just override the completion style for everything

I'd do just that (why not?), but you can add an around-advice to company-capf instead.

Member

dgutov commented Aug 25, 2015

I don't want to just override the completion style for everything

I'd do just that (why not?), but you can add an around-advice to company-capf instead.

@PythonNut

This comment has been minimized.

Show comment
Hide comment
@PythonNut

PythonNut Jan 11, 2016

Contributor

Alright since fuzzy matching if working for company-capf, and we have a concrete way forward to bring this support to other backends (porting to company-capf), I think all of the conditions for this issue have been met.

If anyone is unhappy with my implementation, please file an issue on its own issue tracker.

Contributor

PythonNut commented Jan 11, 2016

Alright since fuzzy matching if working for company-capf, and we have a concrete way forward to bring this support to other backends (porting to company-capf), I think all of the conditions for this issue have been met.

If anyone is unhappy with my implementation, please file an issue on its own issue tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment