Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable PGO for benchmarks #18

Open
djkoloski opened this issue Jun 29, 2021 · 3 comments
Open

Enable PGO for benchmarks #18

djkoloski opened this issue Jun 29, 2021 · 3 comments

Comments

@djkoloski
Copy link
Owner

Enabling profile-guided optimization will provide some numbers that are the best they can be for each framework. It might be worth separating these out from the general numbers so users can get an idea of how much they stand to gain for their efforts.

@djkoloski
Copy link
Owner Author

Look at autofdo with perf record.

@zamazan4ik
Copy link

I do ongoing PGO research on different applications - all results are available at https://github.com/zamazan4ik/awesome-pgo . I performed some PGO benchmarks on the rust_serialization_benchmark too and want to share my results here.

Test environment

  • Fedora 39
  • Linux kernel 6.7.6
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.76
  • Repo version: master branch on commit ce821970017832f43d00bc5110462ff1a2c38e17
  • Disabled Turbo boost for improving consistency across runs

Benchmark

Release benchmarks are done with tasket -c 0 cargo +nightly bench, PGO training phase with tasket -c 0 cargo +nightly pgo bench, PGO optimization with tasket -c 0 cargo +nightly pgo optimize bench. taskset -c 0 is used for better benchmark consistency. All PGO-related routines are done with cargo-pgo. All benchmarks are done on the same machine, with the same hardware/software during runs, with the same background "noise" (as much as I can guarantee, of course).

Results

Here are the results:

At least in the provided by project benchmarks, there are measurable improvements in many cases. However, also there are some regressions.

Look at autofdo with perf record.

I recommend starting with the regular PGO via instrumentation. AutoFDO is used for sampling the PGO approach. Starting with the instrumentation is generally a better idea since it has wider platform support and can be easily enabled for the project (compared to the sampling-based PGO).

@finnbear
Copy link
Contributor

Great job presenting the results of PGO! In general, it seems like PGO increases average performance by ~5% but introduces noise. Not necessarily from run to run, but from benchmark to benchmark and version to version. Might make more sense to average PGO results over all datasets (and just live with the fact that there are only 4, so the average isn't totally immune to noise). Could also average over all crates, and just report a single PGO result. The goal is to give users an accurate answer for 1) which crate 2) whether to try PGO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants