Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup ripgrep using profile guided optimization #1225

Closed
ghost opened this issue Mar 22, 2019 · 6 comments
Closed

Speedup ripgrep using profile guided optimization #1225

ghost opened this issue Mar 22, 2019 · 6 comments
Labels
question An issue that is lacking clarity on one or more points.

Comments

@ghost
Copy link

ghost commented Mar 22, 2019

Request for improvement

What version of ripgrep are you using?

compiled current master (0913972)

How did you install ripgrep?

compiled current master (0913972)

What operating system are you using ripgrep on?

Linux Mint 19.1 with kernel 4.18.0-16-lowlatency

Describe your question, feature request, or bug.

Profile guided optimization works by building the binary with profiling instrumentation first, then running the instrumented binary in a few test cases while saving runtime profiles and then using the profiles to optimize the code for example by reordering functions to improve cachability.

Using the very naive benchmark time sudo target/release/rg --with-filename --word-regexp --line-buffered a /home > /dev/null ripgrep needed 3.7s on avg without profile guided optimization on my system and 3.3s with pgo.

This is how I compiled rg with pgo:

# make sure we get a profile for each run, %p will be replaced with the pid
export LLVM_PROFILE_FILE=./target/pgo/pgo-%p.profraw

# compile instrumented binary
RUSTFLAGS="-Z pgo-gen=llvm-profile-file-env-variable-overrides-this" cargo +nightly build --release

# run a few test cases (these need to be improved to cover more rg features)
target/release/rg --help > /dev/null
target/release/rg a > /dev/null
target/release/rg B --line-buffered > /dev/null
target/release/rg c --word-regexp > /dev/null
target/release/rg D --vimgrep > /dev/null
target/release/rg e --with-filename > /dev/null
target/release/rg f --unrestricted > /dev/null
target/release/rg '[A-Z]+_SUSPEND' --with-filename --word-regexp --line-buffered > /dev/null
target/release/rg h --unrestricted --with-filename --vimgrep --word-regexp --line-buffered > /dev/null

# merge profiles
rustup run nightly llvm-profdata merge -o target/pgo/pgo.profdata target/pgo/pgo*.profraw

# compile with profile in mind
RUSTFLAGS="-Z pgo-use=target/pgo/pgo.profdata" cargo +nightly build --release

pgo could speed up ripgrep quite a bit when done correctly.

@BurntSushi
Copy link
Owner

Neat. Could you please document the build dependencies here more thoroughly? Whether this is done or not almost completely depends on how much of a hassle it would be to add to the release process.

@ghost
Copy link
Author

ghost commented Mar 22, 2019

I just checked this against alpine since it is used in .travis.yaml. The package llvm-dev would be required. The main problem is that -Z flags requires nightly rust, which is not available as an alpine package.

@BurntSushi
Copy link
Owner

Thanks for suggesting this. I just briefly tried this out, and while I could get it to work, I couldn't really see any noticeable performance improvement. Moreover, this would complicate and prolong the release process by quite a bit. Not only does ripgrep need to be built twice, but it needs to get some sizable corpus on which to search a number of times. Overall, I don't think it's worth it.

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Apr 14, 2019
@ArniDagur
Copy link

As of Rust 1.37, PGO is stable.

@zamazan4ik
Copy link

zamazan4ik commented Nov 27, 2023

I didn't finish all the tests but I already have some preliminary results.

On my Linux machine (Fedora 39, AMD Ryzen 5900x with disabled Turbo boost, 48 Gib RAM, Rust 1.74, with ripgrep from the latest master branch with commit cd5440fb6230f72ab598916c1c5ab96686541d47) I get the following results:

hyperfine --warmup 5 --min-runs 20 '../target/rg_release -n "PM_RESUME" data/linux' '../target/rg_release_with_lto -n "PM_RESUME" data/linux' '../target/rg_optimized_with_lto -n "PM_RESUME" data/linux'
Benchmark 1: ../target/rg_release -n "PM_RESUME" data/linux
  Time (mean ± σ):     137.2 ms ±   3.4 ms    [User: 529.9 ms, System: 986.1 ms]
  Range (min … max):   129.3 ms … 142.4 ms    21 runs

Benchmark 2: ../target/rg_release_with_lto -n "PM_RESUME" data/linux
  Time (mean ± σ):     131.8 ms ±   2.2 ms    [User: 492.7 ms, System: 968.0 ms]
  Range (min … max):   126.9 ms … 135.8 ms    22 runs

Benchmark 3: ../target/rg_optimized_with_lto -n "PM_RESUME" data/linux
  Time (mean ± σ):     125.1 ms ±   2.9 ms    [User: 406.6 ms, System: 975.1 ms]
  Range (min … max):   119.5 ms … 132.5 ms    23 runs

Summary
  ../target/rg_optimized_with_lto -n "PM_RESUME" data/linux ran
    1.05 ± 0.03 times faster than ../target/rg_release_with_lto -n "PM_RESUME" data/linux
    1.10 ± 0.04 times faster than ../target/rg_release -n "PM_RESUME" data/linux

where:

  • rg_release - default Release build with cargo build --release
  • rg_release_with_lto - default Release build with cargo build --release but with enabled codegen-units = 1 and lto = true
  • rg_optimized_with_lto - default Release build with cargo build --release but with enabled codegen-units = 1 and lto = true and optimized with PGO

As a PGO training set, I used Ripgrep's benchsuite.

Right now I cannot provide you with all the results since on my machine for some unknown yet reasons the provided benchsuite script doesn't work (it complains about missing dependencies even if they are downloaded), so I ran all the commands manually.

I ran several other commands from the bench suite - the performance improvements were near the same in the tested-by-me cases.

I specially built ripgrep without pcre since it's a system dependency and cannot be easily PGOed. Since I bench PGO for Ripgrep, I decided to reduce external DLLs influence as much as I could. For someone, it still could be interesting since some people can build Ripgrep without PCRE support.

I have some thoughts about these results:

  • I've seen a discussion before that LTO doesn't bring huge improvements. According to my tests, 5% improvement is a huge improvement (even if we are not talking about the binary size). So I think the LTO decision should be estimated (at least enable it in an additional profile.release-lto Cargo profile)
  • PGO shows measurable improvements too. More tests should be evaluated with PGO - maybe some cases are pessimized with PGO (who knows). Especially would be especially interesting to recompile with PGO the PCRE dependency too.

5-10% improvement for some people is really important. My main use-case for that - (rip)grepp'ing hundreds of logs on our log storage nodes with some non-trivial patterns. Some queries could run for tens of minutes/hours, and even a few percent of performance is definitely worth it (even if we need to recompile ripgrep - it's not a problem).

@BurntSushi Should I create a separate issue/discussion for the topic?

BurntSushi added a commit that referenced this issue Nov 28, 2023
The idea is to build ripgrep with as much optimization as possible.

This makes compilation times absolutely obscene. They jump from <10
seconds to 30+ seconds on my i9-12900K. I don't even want to know how
long CI would take with these.

I tried some ad hoc benchmarks and could not notice any meaningful
improvement with the LTO binary versus the normal release profile.
Because of that, I still don't think it's worth bloating the release
cycle times.

Ref #1225
@BurntSushi
Copy link
Owner

@zamazan4ik See b6bac84

Basically, I'm still not convinced. I only tested LTO and not PGO, but PGO does not sound like something I'm keen on maintaining.

In any case, I've at least added a release-lto profile to ripgrep, but stopped short of using it in the actual release binaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

3 participants