Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ripgrep significantly slower than grep #2715

Closed
1 task done
Mr-Pine opened this issue Jan 18, 2024 · 3 comments
Closed
1 task done

Ripgrep significantly slower than grep #2715

Mr-Pine opened this issue Jan 18, 2024 · 3 comments

Comments

@Mr-Pine
Copy link

Mr-Pine commented Jan 18, 2024

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 14.1.0

features:-simd-accel,+pcre2
simd(compile):+SSE2,-SSSE3,-AVX2
simd(runtime):+SSE2,+SSSE3,+AVX2

PCRE2 10.42 is available (JIT is available)

How did you install ripgrep?

Pacman

What operating system are you using ripgrep on?

EndeavourOS (Arch)

Describe your bug.

Searching for system in rockyou.txt takes about ten times as long with ripgrep as with grep

hyperfine -- "rg system /usr/share/dict/rockyou.txt" "grep system /usr/share/dict/rockyou.txt"
Benchmark 1: rg system /usr/share/dict/rockyou.txt
  Time (mean ± σ):      24.9 ms ±  23.7 ms    [User: 10.7 ms, System: 10.0 ms]
  Range (min … max):    16.8 ms … 130.8 ms    22 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (130.8 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: grep system /usr/share/dict/rockyou.txt
  Time (mean ± σ):       1.8 ms ±   0.7 ms    [User: 1.3 ms, System: 1.2 ms]
  Range (min … max):     0.5 ms …   4.7 ms    653 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  grep system /usr/share/dict/rockyou.txt ran
   13.91 ± 14.27 times faster than rg system /usr/share/dict/rockyou.txt

Additional note: When concatting rockyou 64 times after itself the same operations take 50ms with grep and over 1s with rg

What are the steps to reproduce the behavior?

Run rg system <Path to your rockyou.txt>

What is the actual behavior?

For timings, see above.

rg --debug system /usr/share/dict/rockyou.txt

rg: DEBUG|rg::flags::parse|crates/core/flags/parse.rs:97: no extra arguments found from configuration file
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1260: found hostname for hyperlink configuration: small-forest
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1270: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 1 thread(s)
rg: DEBUG|grep_regex::config|crates/regex/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

rest of the output in https://gist.github.com/Mr-Pine/87529294fa2175a7c69dd907008b483d (looks correct, only issue is perfomance)

What is the expected behavior?

ripgrep should be faster than grep

@blyxxyz
Copy link
Contributor

blyxxyz commented Jan 18, 2024

GNU grep can tell that its output is redirected to /dev/null and because of that it exits after the first match. (It still needs to use the exit code to report whether there was a match at all, but that's it.) If you send the output through cat then it cannot perform that optimization and ends up slightly slower than ripgrep:

$ hyperfine 'rg system rockyou.txt | cat' 'grep system rockyou.txt | cat'
Benchmark #1: rg system rockyou.txt | cat
  Time (mean ± σ):      50.4 ms ±  10.8 ms    [User: 37.6 ms, System: 14.2 ms]
  Range (min … max):    43.0 ms …  84.6 ms    34 runs
 
Benchmark #2: grep system rockyou.txt | cat
  Time (mean ± σ):      71.8 ms ±  13.3 ms    [User: 54.3 ms, System: 18.9 ms]
  Range (min … max):    60.4 ms …  97.5 ms    36 runs
 
Summary
  'rg system rockyou.txt | cat' ran
    1.42 ± 0.40 times faster than 'grep system rockyou.txt | cat'

@Mr-Pine
Copy link
Author

Mr-Pine commented Jan 18, 2024

Interesting, thanks for the explanation! Seems like I never looked at the execution time when getting an output...

@Mr-Pine Mr-Pine closed this as completed Jan 18, 2024
@BurntSushi
Copy link
Owner

Yeah @blyxxyz has it right. For hyperfine specifically, you can pass --output=pipe instead of adding | cat.

I have specifically avoided the "detect /dev/null and automatically enable -q/--quiet" optimization that GNU grep does. ripgrep could do it. It isn't hard to do. It just always seemed a little weird to me.

For simple literal searches like this, ripgrep should pretty much always be faster. ripgrep's substring search should be very nearly strictly superior to GNU grep's. If you can find a non-pathological case where GNU grep is noticeably faster (on x86-64 or aarch64) for real, I'd love to see it.

Repository owner locked and limited conversation to collaborators Jan 19, 2024
@BurntSushi BurntSushi converted this issue into discussion #2717 Jan 19, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants