-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf] rg does not use globs to prune recursion when it can #2789
Comments
I thought there was already an open issue for this, but I couldn't find it. This is a rather difficult optimization to do and is blocked on a rewrite of Your best bet is to use some other tool to filter out files first. Possibly even using your shell's glob support. Although those could in theory end up being slower than ripgrep even when ripgrep visits more than it needs to. It depends. |
Alright, thanks for clarifying. Using It would be neat if |
That's #273. But you shouldn't need it. You should be able to pipe the output of |
I can use xargs yes but ripgrep won't start searching any of those paths until the EDIT: oh nvm i can make xargs chunk it up into multiple |
@BGR360 Were you using -X, as that would explain it? |
Please tick this box to confirm you have reviewed the above.
What version of ripgrep are you using?
ripgrep 14.1.0
How did you install ripgrep?
Build from source with the following patch applied:
What operating system are you using ripgrep on?
Linux 5.15.0-60-generic #66~20.04.1-Ubuntu SMP x86_64 GNU/Linux
Describe your bug.
I'm trying to search through a massive corpus of log files (~10M files), on a remote NFS mount, to see if a particular string is present in a certain type of log file. I have a glob that filters down to the log files I care about. The key point is that the files that match my glob are a small subset of all the files.
The corpus looks like this:
Expand for preview of corpus
With my glob being
*/*/*/perf_*/profile/0-trigger/oplogs/*.log
.The problem is that ripgrep is not limiting its recursive walk to only the paths that definitely match the glob. It is enumerating directories that could not possibly match the glob, and the number of files that end up being considered really adds up. It's considering far more files than it needs to.
What are the steps to reproduce the behavior?
Create the following directory tree. It mimics my corpus.
Use the following glob search. It mimics my search. I only want to search through the directories I know will contain my interesting files.
What is the actual behavior?
ripgrep recurses into the
*/blah/
directories when there's no chance that they could match the glob.Problematic lines emphasized with
>>>
Same result if I try
/*/GOOD/*.log
What is the expected behavior?
Ripgrep should skip recursing into directories that do not match the glob.
Something like this:
The text was updated successfully, but these errors were encountered: