Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

counsel-rg(ag) command load cpu 100% #1976

Closed
Renkoru opened this issue Mar 21, 2019 · 13 comments
Closed

counsel-rg(ag) command load cpu 100% #1976

Renkoru opened this issue Mar 21, 2019 · 13 comments

Comments

@Renkoru
Copy link

@Renkoru Renkoru commented Mar 21, 2019

I am using counsel-rg command to search in a project.
After an update, I have noticed that this command loads CPU on 100%.

Precondition:

  1. You are on the latest (3954bfe) swiper commit

Steps to reproduce:

  1. Pick a medium project (3k files)
  2. Search with counsel-rg for something that will lead to a lot of results. Like 'item', 'var', 'const', ....
  3. Look at your CPU load

Notes:
The shell command is generated from counsel-rg is

rg -S --no-heading --line-number --color never --glob !TAGS --glob !.idea/ --glob !.ensime_cache/ --glob !.eunit/ --glob !.git/ --glob !.hg/ --glob !.fslckout/ --glob !_FOSSIL_/ --glob !.bzr/ --glob !_darcs/ --glob !.tox/ --glob !.svn/ --glob !.stack-work/ --pcre2 (?=.*const) .

The problem is connected with this part: (?=.*const)
If we search just by .*const everything works fine.

This look-around feature was added in this commit
80d40e4

@basil-conto
Copy link
Collaborator

@basil-conto basil-conto commented Mar 21, 2019

FWIW, see #1935 (comment).

@abo-abo
Copy link
Owner

@abo-abo abo-abo commented Mar 25, 2019

Tagging @CeleritasCelery and @dsedivec to weigh in on this issue.

@abo-abo
Copy link
Owner

@abo-abo abo-abo commented Mar 25, 2019

(setf (alist-get 'counsel-ag ivy-re-builders-alist)
#'ivy--regex-plus)

While this fixes the issue, it should be automatic, not a user's concern.

@abo-abo
Copy link
Owner

@abo-abo abo-abo commented Mar 25, 2019

@Renkoru Please review the referenced test case. Let's try to pinpoint where it goes wrong.

@CeleritasCelery
Copy link
Contributor

@CeleritasCelery CeleritasCelery commented Mar 25, 2019

Ripgrep was designed for Rusts DFA regex engine which does not support look arounds. In a later version, PCRE support was added. However it would not surprise me that PCRE support in rust is really slow (as seems to be the case).

From my own experience, I have used ag in repos of about 30k files with look arounds for years and have not found any performance issues.

It seems like we need to do a couple of things here.

Form the ripgrep side, Open an issue on the ripgrep issue tracker to disccus this. Because to a smart regex compiler, (?=.*const) .*const should be equivalent. However I get the feeling that ripgrep’s PCRE2 engine is poorly supported.

From the ivy side, seems like we have two options:

  1. Remove out-of-order matching for ripgrep entirely if we determine that it can’t handle it.

  2. Explicitly set it ivy--regex in ivy-re-builder-alistfor counsel-rg. However this will require the refactoring that @basil-conto proposed in #1935

@dsedivec
Copy link
Contributor

@dsedivec dsedivec commented Mar 25, 2019

(?=.*const) is very slow on files with long lines. For reasons unbeknownst to me, ^(?=.*const) is much faster than (?=.*const) without ^. For counsel's purposes, I believe both patterns produce the same results, but adding the ^ makes it ridiculously faster.

Here's a test: I have a directory of JavaScript including one ~500 kB minified JavaScript file where the entire content is a single line.

javascripts$ time rg -l --pcre2 '(?=.*const)' | sort > /tmp/before

real	4m43.216s
user	4m55.116s
sys	0m14.018s

javascripts$ time rg -l --pcre2 '^(?=.*const)' | sort > /tmp/after

real	0m0.047s
user	0m0.091s
sys	0m0.067s

javascripts$ diff -u /tmp/{before,after}
javascripts$

Here is a 500 KB JavaScript file on a single line that I tested with.

Note that I have not done any testing with the negative lookahead version, only positive lookahead.

Can anyone else see any reason the leading ^ would affect the results? Is this something counsel--elisp-to-pcre could add?

And if anyone knows why adding ^ improves performance so much, I'd be interested to know, for my own edification. Having just read too much of pcrepattern(3), I now greatly fear the authors of libpcre. If I had to guess why ^ improves the performance, I'm guessing something to do with preventing excessive backtracking. The only prior mentions I can find of this are a random StackOverflow comment and a post on perlmonks.org where the author uses ^ but doesn't say why.

@CeleritasCelery FYI I just installed ag 2.2.0 from MacPorts, built against PCRE 8.42:

javascripts$ time ag -l '(?=.*const)' | sort > /tmp/ag_before

real	4m57.890s
user	15m18.558s
sys	0m4.126s

javascripts$ time ag -l '^(?=.*const)' | sort > /tmp/ag_after

real	0m0.074s
user	0m0.061s
sys	0m0.089s

ag is also very slow without the ^, so I'm not sure if this is really a ripgrep problem? Could this be a "you normally only run ag over sane inputs that are not single-line 500 KB files because you are a sane person and not a web dev" problem? :)

@abo-abo
Copy link
Owner

@abo-abo abo-abo commented Mar 25, 2019

My point here was that if the input is e.g. const, then (?=.*const) should not be produced in the first place. Ivy should simplify the shell command accordingly.

Lookaheads should be used only when needed, i.e. when the ivy-text pattern has !.

@dsedivec
Copy link
Contributor

@dsedivec dsedivec commented Mar 25, 2019

@abo-abo Are you saying that counsel--elisp-to-pcre should special case a list of 1 pattern, as in

(counsel--elisp-to-pcre (ivy--regex-ignore-order "foo") t)
;; => "foo"  (currently returns "(?=.*foo)" instead)
(counsel--elisp-to-pcre (ivy--regex-ignore-order "foo bar") t)
;; => "(?=.*foo)(?=.*bar)"

@CeleritasCelery
Copy link
Contributor

@CeleritasCelery CeleritasCelery commented Mar 25, 2019

Here is my testing on a small repo with 1000 files

* rg ag rg pcre rg pcre w/ ^ ag pcre ag pcre w/ ^
time (seconds) 1.3 5.5 159 4.8 5.5 5.5
total results 31622 34128 31622 31622 34128 34128

So this confirms what @dsedivec is seeing.

Can anyone else see any reason the leading ^ would affect the results? Is this something counsel--elisp-to-pcre could add?

There is no reason we can't add it, and after seeing this, looks like we need to. I will do some more testing with it.

Could this be a "you normally only run ag over sane inputs that are not single-line 500 KB files because you are a sane person and not a web dev" problem?

Perhaps. In my testing adding the bol (^) did not make a difference for ag. But I was also not searching massive files.

My point here was that if the input is e.g. const, then (?=.*const) should not be produced in the first place. Ivy should simplify the shell command accordingly.

ivy's PCRE converter uses lookaheads because the underlying regex builders use cons (e.g. ivy--regex-ignore-order will return (("foo" . t)) if the pattern is "foo"). Seems like we should fix it in the re builders instead of trying to fix it in counsel--elisp-to-pcre.

@abo-abo
Copy link
Owner

@abo-abo abo-abo commented Mar 25, 2019

@dsedivec

;; => "foo" (currently returns "(?=.*foo)" instead)

Yes, that's exactly what I meant.

@abo-abo
Copy link
Owner

@abo-abo abo-abo commented Mar 25, 2019

Seems like we should fix it in the re builders instead of trying to fix it in counsel--elisp-to-pcre.

It should not be fixed in the re builders. They already have a stable API that's useful and fast:

(ivy--regex-plus "yes oui ! no non")
;; => (("\\(yes\\).*?\\(oui\\)" . t) ("no") ("non"))

(ivy--regex-ignore-order "yes non da tak")
;; => (("yes" . t) ("non" . t) ("da" . t) ("tak" . t))

This means:

  1. Reduce the sequence by applying a list of rules one on top of another.
  2. Each rule's car is a regex, each cdr is either nil or t.
  3. t means keep all candidates that match the pattern, nil means keep all candidates that don't match it.

Using this API is much faster than using lookaheads, and Elisp regex engine doesn't support them anyway. The trade off is that we have to make multiple passes when filtering.

All counsel--elisp-to-pcre has to do is:

  1. test if the list passed to it by a regex builder is of length 1
  2. in that case, don't translate to lookaheads

@CeleritasCelery
Copy link
Contributor

@CeleritasCelery CeleritasCelery commented Mar 25, 2019

They already have a stable API that's useful and fast: Using this API is much faster than using lookaheads, and Elisp regex engine doesn't support them anyway.

I think you misunderstood me.

in the current ivy API, (("foo" . t)) is equivalent to "foo". Depending on what re-builder you are using, it will give one of those two forms. For example

(ivy--regex "foo")     => "foo"

(ivy--regex-plus "foo")      => "foo"

(ivy--regex-ignore-order "foo")   =>   (("foo" . t))

My proposal was to change ivy--regex-ignore-order to return "foo" instead of (("foo" . t)) when there is only one non-negative pattern. And since those two forms are functionally equivalent, it will not change how anything behaves. Just make it simpler. There is no API change. I am not suggesting we try and use lookarounds in elisp.

All ivy--regex-ignore-order has to do is:

  1. test if there is only one pattern and that pattern is not negative
  2. in that case, return the pattern; else do its normal conversion

You are right that we could fix is downstream in counsel--elisp-to-pcre, but it seems more "correct" to just take care of this edge case in the original builder.

@Renkoru
Copy link
Author

@Renkoru Renkoru commented Mar 25, 2019

Thank you! You are great magicians!

@basil-conto for the workaround.
@abo-abo @CeleritasCelery @dsedivec for fast response, discussion and fixing this issue.
It is amazing and pleasant to see such a responsive community around emacs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants