New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark time on very small corpora #33

Closed
BurntSushi opened this Issue Sep 23, 2016 · 3 comments

Comments

Projects
None yet
1 participant
@BurntSushi
Owner

BurntSushi commented Sep 23, 2016

An end user reports that rg isn't as fast on ag on very small repositories. While it seems trivial, if this is because of startup time, then it's worth investigating and fixing.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Sep 25, 2016

Owner

#77 is another report of grep beating rg on a single file because of startup time.

Owner

BurntSushi commented Sep 25, 2016

#77 is another report of grep beating rg on a single file because of startup time.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Sep 26, 2016

Owner

Someone ran micobenchmarks on infinitesimally sized input. grep is twice as fast.

Owner

BurntSushi commented Sep 26, 2016

Someone ran micobenchmarks on infinitesimally sized input. grep is twice as fast.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Nov 20, 2016

Owner

OK, I think I can call this one done. I think a variety of small improvements have mostly fixed this:

  • Switch from Docopt to clap (probably the biggest contributor).
  • Permit ignore handling/parsing to reuse previous results when applicable.
  • Use a parallel recursive directory iterator.

In particular, on the previously linked microbenchmark:

$ python -m timeit -n 1 -r 10 -v -s 'import os' 'os.system("cat /etc/group | grep 1000 > /dev/null")'
raw times: 0.0115 0.0104 0.0103 0.0104 0.00902 0.00929 0.00842 0.00919 0.00792 0.00861
1 loops, best of 10: 7.92 msec per loop
$ python -m timeit -n 1 -r 10 -v -s 'import os' 'os.system("cat /etc/group | rg 1000 > /dev/null")'
raw times: 0.0103 0.0103 0.00946 0.00881 0.00871 0.00809 0.00802 0.00764 0.00722 0.00714
1 loops, best of 10: 7.14 msec per loop

Compare this with ripgrep 0.2.6:

python -m timeit -n 1 -r 10 -v -s 'import os' 'os.system("cat /etc/group | rg-0.2.6 1000 > /dev/null")'
raw times: 0.0261 0.0251 0.0257 0.0217 0.0234 0.0239 0.0237 0.0235 0.022 0.0245
1 loops, best of 10: 21.7 msec per loop

I've also done some testing on small repos. This repo qualifies as quite small. There is a ton of variance. The following were ran in succession:

[andrew@Cheetah ripgrep] time rg ripgrep | wc -l
305

real    0m0.033s
user    0m0.257s
sys     0m0.007s
[andrew@Cheetah ripgrep] time rg ripgrep | wc -l
305

real    0m0.025s
user    0m0.157s
sys     0m0.023s
[andrew@Cheetah ripgrep] time rg ripgrep | wc -l
305

real    0m0.013s
user    0m0.017s
sys     0m0.030s
[andrew@Cheetah ripgrep] time rg ripgrep | wc -l
305

real    0m0.040s
user    0m0.270s
sys     0m0.007s

ag seems to have slightly less variance:

[andrew@Cheetah ripgrep] time ag ripgrep | wc -l
306

real    0m0.029s
user    0m0.007s
sys     0m0.010s
[andrew@Cheetah ripgrep] time ag ripgrep | wc -l
306

real    0m0.022s
user    0m0.007s
sys     0m0.010s
[andrew@Cheetah ripgrep] time ag ripgrep | wc -l
306

real    0m0.023s
user    0m0.020s
sys     0m0.003s
[andrew@Cheetah ripgrep] time ag ripgrep | wc -l
306

real    0m0.016s
user    0m0.013s
sys     0m0.003s

But, if we fix both programs to use a single thread, then variance becomes almost non-existent:

[andrew@Cheetah ripgrep] time rg -j1 ripgrep | wc -l
305

real    0m0.011s
user    0m0.003s
sys     0m0.007s
[andrew@Cheetah ripgrep] time rg -j1 ripgrep | wc -l
305

real    0m0.011s
user    0m0.007s
sys     0m0.007s
[andrew@Cheetah ripgrep] time rg -j1 ripgrep | wc -l
305

real    0m0.011s
user    0m0.000s
sys     0m0.010s
[andrew@Cheetah ripgrep] time rg -j1 ripgrep | wc -l
305

real    0m0.010s
user    0m0.003s
sys     0m0.003s

and ag:

[andrew@Cheetah ripgrep] time ag --workers 1 ripgrep | wc -l
306

real    0m0.018s
user    0m0.010s
sys     0m0.000s
[andrew@Cheetah ripgrep] time ag --workers 1 ripgrep | wc -l
306

real    0m0.018s
user    0m0.010s
sys     0m0.000s
[andrew@Cheetah ripgrep] time ag --workers 1 ripgrep | wc -l
306

real    0m0.018s
user    0m0.010s
sys     0m0.003s
[andrew@Cheetah ripgrep] time ag --workers 1 ripgrep | wc -l
306

real    0m0.021s
user    0m0.007s
sys     0m0.003s

In other words, on small corpora, ripgrep (and, to a lesser extent, ag) seem to be quite susceptible to the overhead of creating threads, which also seems to introduce large variance.

I'm not sure there's anything we can really do about it. The difference in this particular example, at least, is nominal. From a user's perspective, we don't really care about the difference between 10ms and 30ms. However, the place where this can hurt us is if folks use ripgrep like it was grep, for example, in an xargs pipeline:

$ time find ./ -name '*.[ch]' -print0 | xargs -0 -P8 grep PM_RESUME | wc -l
10

real    0m0.213s
user    0m0.423s
sys     0m0.330s
$ time find ./ -name '*.[ch]' -print0 | xargs -0 -P8 rg PM_RESUME | wc -l
10

real    0m0.402s
user    0m4.750s
sys     0m0.460s

The overhead of ripgrep launching threads ends up making it twice as slow. Of course, in this case, ripgrep launching threads doesn't really make much sense since xargs is doing it for us. But this isn't something that a user will obviously know. Of course, forcing ripgrep to use a single thread brings it back down to grep speeds:

time find ./ -name '*.[ch]' -print0 | xargs -0 -P8 rg -j1 PM_RESUME | wc -l
10

real    0m0.206s
user    0m0.507s
sys     0m0.487s

The other case to consider is using xargs without -P and having ripgrep handle parallelism:

time find ./ -name '*.[ch]' -print0 | xargs -0 rg PM_RESUME | wc -l
10

real    0m0.463s
user    0m1.297s
sys     0m0.597s

While worse than using xargs -P, compare with grep:

time find ./ -name '*.[ch]' -print0 | xargs -0 grep PM_RESUME | wc -l
10

real    0m0.655s
user    0m0.477s
sys     0m0.347s

I'm not sure there's really too much we can do here. An end user can use ripgrep in an xargs pipeline optimally, but it definitely requires a bit of intuition on the user's behalf to realize that ripgrep is already multithreaded where as grep is not.

(This is yet another example proving that benchmarking these tools is ridiculously hard.)

Owner

BurntSushi commented Nov 20, 2016

OK, I think I can call this one done. I think a variety of small improvements have mostly fixed this:

  • Switch from Docopt to clap (probably the biggest contributor).
  • Permit ignore handling/parsing to reuse previous results when applicable.
  • Use a parallel recursive directory iterator.

In particular, on the previously linked microbenchmark:

$ python -m timeit -n 1 -r 10 -v -s 'import os' 'os.system("cat /etc/group | grep 1000 > /dev/null")'
raw times: 0.0115 0.0104 0.0103 0.0104 0.00902 0.00929 0.00842 0.00919 0.00792 0.00861
1 loops, best of 10: 7.92 msec per loop
$ python -m timeit -n 1 -r 10 -v -s 'import os' 'os.system("cat /etc/group | rg 1000 > /dev/null")'
raw times: 0.0103 0.0103 0.00946 0.00881 0.00871 0.00809 0.00802 0.00764 0.00722 0.00714
1 loops, best of 10: 7.14 msec per loop

Compare this with ripgrep 0.2.6:

python -m timeit -n 1 -r 10 -v -s 'import os' 'os.system("cat /etc/group | rg-0.2.6 1000 > /dev/null")'
raw times: 0.0261 0.0251 0.0257 0.0217 0.0234 0.0239 0.0237 0.0235 0.022 0.0245
1 loops, best of 10: 21.7 msec per loop

I've also done some testing on small repos. This repo qualifies as quite small. There is a ton of variance. The following were ran in succession:

[andrew@Cheetah ripgrep] time rg ripgrep | wc -l
305

real    0m0.033s
user    0m0.257s
sys     0m0.007s
[andrew@Cheetah ripgrep] time rg ripgrep | wc -l
305

real    0m0.025s
user    0m0.157s
sys     0m0.023s
[andrew@Cheetah ripgrep] time rg ripgrep | wc -l
305

real    0m0.013s
user    0m0.017s
sys     0m0.030s
[andrew@Cheetah ripgrep] time rg ripgrep | wc -l
305

real    0m0.040s
user    0m0.270s
sys     0m0.007s

ag seems to have slightly less variance:

[andrew@Cheetah ripgrep] time ag ripgrep | wc -l
306

real    0m0.029s
user    0m0.007s
sys     0m0.010s
[andrew@Cheetah ripgrep] time ag ripgrep | wc -l
306

real    0m0.022s
user    0m0.007s
sys     0m0.010s
[andrew@Cheetah ripgrep] time ag ripgrep | wc -l
306

real    0m0.023s
user    0m0.020s
sys     0m0.003s
[andrew@Cheetah ripgrep] time ag ripgrep | wc -l
306

real    0m0.016s
user    0m0.013s
sys     0m0.003s

But, if we fix both programs to use a single thread, then variance becomes almost non-existent:

[andrew@Cheetah ripgrep] time rg -j1 ripgrep | wc -l
305

real    0m0.011s
user    0m0.003s
sys     0m0.007s
[andrew@Cheetah ripgrep] time rg -j1 ripgrep | wc -l
305

real    0m0.011s
user    0m0.007s
sys     0m0.007s
[andrew@Cheetah ripgrep] time rg -j1 ripgrep | wc -l
305

real    0m0.011s
user    0m0.000s
sys     0m0.010s
[andrew@Cheetah ripgrep] time rg -j1 ripgrep | wc -l
305

real    0m0.010s
user    0m0.003s
sys     0m0.003s

and ag:

[andrew@Cheetah ripgrep] time ag --workers 1 ripgrep | wc -l
306

real    0m0.018s
user    0m0.010s
sys     0m0.000s
[andrew@Cheetah ripgrep] time ag --workers 1 ripgrep | wc -l
306

real    0m0.018s
user    0m0.010s
sys     0m0.000s
[andrew@Cheetah ripgrep] time ag --workers 1 ripgrep | wc -l
306

real    0m0.018s
user    0m0.010s
sys     0m0.003s
[andrew@Cheetah ripgrep] time ag --workers 1 ripgrep | wc -l
306

real    0m0.021s
user    0m0.007s
sys     0m0.003s

In other words, on small corpora, ripgrep (and, to a lesser extent, ag) seem to be quite susceptible to the overhead of creating threads, which also seems to introduce large variance.

I'm not sure there's anything we can really do about it. The difference in this particular example, at least, is nominal. From a user's perspective, we don't really care about the difference between 10ms and 30ms. However, the place where this can hurt us is if folks use ripgrep like it was grep, for example, in an xargs pipeline:

$ time find ./ -name '*.[ch]' -print0 | xargs -0 -P8 grep PM_RESUME | wc -l
10

real    0m0.213s
user    0m0.423s
sys     0m0.330s
$ time find ./ -name '*.[ch]' -print0 | xargs -0 -P8 rg PM_RESUME | wc -l
10

real    0m0.402s
user    0m4.750s
sys     0m0.460s

The overhead of ripgrep launching threads ends up making it twice as slow. Of course, in this case, ripgrep launching threads doesn't really make much sense since xargs is doing it for us. But this isn't something that a user will obviously know. Of course, forcing ripgrep to use a single thread brings it back down to grep speeds:

time find ./ -name '*.[ch]' -print0 | xargs -0 -P8 rg -j1 PM_RESUME | wc -l
10

real    0m0.206s
user    0m0.507s
sys     0m0.487s

The other case to consider is using xargs without -P and having ripgrep handle parallelism:

time find ./ -name '*.[ch]' -print0 | xargs -0 rg PM_RESUME | wc -l
10

real    0m0.463s
user    0m1.297s
sys     0m0.597s

While worse than using xargs -P, compare with grep:

time find ./ -name '*.[ch]' -print0 | xargs -0 grep PM_RESUME | wc -l
10

real    0m0.655s
user    0m0.477s
sys     0m0.347s

I'm not sure there's really too much we can do here. An end user can use ripgrep in an xargs pipeline optimally, but it definitely requires a bit of intuition on the user's behalf to realize that ripgrep is already multithreaded where as grep is not.

(This is yet another example proving that benchmarking these tools is ridiculously hard.)

@BurntSushi BurntSushi closed this Nov 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment