Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous benchmark tracking #27284

Open
dergoegge opened this issue Mar 20, 2023 · 19 comments
Open

Continuous benchmark tracking #27284

dergoegge opened this issue Mar 20, 2023 · 19 comments

Comments

@dergoegge
Copy link
Member

It would be beneficial to have continuous tracking of our benchmark tests, because regressions (or unexpected improvements) otherwise go undetected (at least for a while). Afaict currently, the only benefit of our benchmarking tests is to evaluate changes as they are being proposed but imo that only gives us ~50% of the benefit that benchmarks can provide.

I am imagining this to be a separate service (maybe integrated with @DrahtBot) that regularly runs the benchmarks in an environment configured for benchmarking. Regressions could be reported by the service through opening issues or sending emails. Additionally, a website that presents the benchmark data with some pretty graphs would be nice (example from firefox's infra).

Setting this up in a way that it is easy to replicate would be very beneficial.

@maflcko
Copy link
Member

maflcko commented Mar 20, 2023

I think @jamesob set something up at one point, but it had to be queried manually, as there were no notifications. Also, I am not sure if it is running at all. See https://codespeed.bitcoinperf.com/timeline/

@jonatack
Copy link
Contributor

jonatack commented Mar 20, 2023

I proposed to @LarryRuane last week (Thurs/Fri/Sat) to check in with @jamesob about picking up https://bitcoinperf.com/ and checking with @0xB10C about potential cross-fertilization with tracepoints and their dashboards, and potentially hooking it up to the CI or DrahtBot. Also #26957 (comment).

@jonatack
Copy link
Contributor

See also #26957 (comment) by @martinus for one nice way, with an example, to create and share detailed benchmark results.

@dergoegge
Copy link
Member Author

Honestly I think https://codespeed.bitcoinperf.com/ is pretty close to what we want here. It does seem like that hasn't been running for a while? But getting that running again and adding some kind of notification system is probably all we need.

@LarryRuane
Copy link
Contributor

Yes, this would be very valuable. I'd like to attempt to get this going; @dergoegge, would that be okay? I made a related comment last week before I was aware of these websites (which are definitely better than what I suggested).

@dergoegge
Copy link
Member Author

@LarryRuane cool, please do!

@epompeii
Copy link

If using https://codespeed.bitcoinperf.com doesn't work out, I have created a continuous benchmarking for doing exactly this, Bencher: https://github.com/bencherdev/bencher

Bencher tracks changes over time. It can easily be run in CI as a GitHub Action, and it has statistical thresholds to detect deviations.

@aureleoules
Copy link
Member

I was not aware that this issue existed but I've started working on monitoring benchmark results on pull requests on corecheck. For example: https://corecheck.dev/bitcoin/bitcoin/pulls/28674.
It is still experimental and I am still working on reducing the noise between runs, but as of today I usually don't see more than 5-6% difference between identical bench runs.

@epompeii
Copy link

@aureleoules that looks really nice!

Would you be interesting in plotting those data over time? If so I can work on ingesting your results into Bencher, similar to how rustls is doing it: https://bencher.dev/perf/rustls-821705769

@aureleoules
Copy link
Member

Would you be interesting in plotting those data over time?

Yes I plan to display on the homepage the plot of benchmarks and test coverage ratio of master over time!

@maflcko
Copy link
Member

maflcko commented Dec 12, 2023

Agree that a plot over time would be useful. They were on https://codespeed.bitcoinperf.com/timeline/ , but it hasn't run for some years now.

@epompeii
Copy link

Sounds great!

If you want them to be live updating, you can embed Bencher plots. Just go to the Share button on the Perf Page and copy the Embed Perf Plot Link for the current plot. This is an example of what that could look like.

@ThongchaiDonWanon

This comment has been minimized.

@0xB10C
Copy link
Contributor

0xB10C commented Apr 8, 2024

If using https://codespeed.bitcoinperf.com doesn't work out, I have created a continuous benchmarking for doing exactly this, Bencher: https://github.com/bencherdev/bencher

Bencher tracks changes over time. It can easily be run in CI as a GitHub Action, and it has statistical thresholds to detect deviations.

For Bitcoin Core, it would be useful to have an adapter for the nanobench JSON output. To track this, I've opened bencherdev/bencher#361.

@0xB10C
Copy link
Contributor

0xB10C commented Apr 8, 2024

I just learned that nanobench is able to fill in an output format template. It might make sense to try that route first.

@epompeii
Copy link

epompeii commented Apr 8, 2024

For Bitcoin Core, it would be useful to have an adapter for the nanobench JSON output. To track this, I've opened bencherdev/bencher#361.

@0xB10C I would be more than happy to implement a nanobench JSON output adapter. It is going to take me a couple of weeks or so to get to it though. So you could either:

  1. Use the nanobench output format template to Bencher Metric Format
  2. Implement the adapter in Bencher and open a PR
  3. Wait a few weeks and I'll take care of it 😃

@0xB10C
Copy link
Contributor

0xB10C commented Apr 10, 2024

I've been playing around with bencher running the bitcoin_bench bitcoind binary in a GH action as PoC. A sample dashboard is here (however, it takes a while till it loads for me). While my branch needs a bit of cleanup, it works out of the box without modifications to nanobench using a nanobench output template and a custom bencher metric (seconds instead of the default nanoseconds).

@epompeii
Copy link

A sample dashboard is here

For others who haven't created an account yet this is the public perf page.
I have also create a tracking issue to make this sort of redirect the default behavior going forward: bencherdev/bencher#364

(however, it takes a while till it loads for me)

Yes, my apologies about the long load times. I'm still trying to figure out design wise how I want to handle displaying reports with a lot of benchmarks 😃
This has prompted me to create a tracking issue for this as well: bencherdev/bencher#363

@0xB10C
Copy link
Contributor

0xB10C commented Apr 29, 2024

I was made aware of https://bencher.dev/learn/engineering/sqlite-performance-tuning/ recently and the dashboard seem to load nearly instantly now! Cool, I have this on my list to further work on it (at some point). My wip branch is here if someone else wants to give this a shot.

Next thing to look into is probably adding instructions as a measurement. Nanobench supports this on Linux, but I'm not sure this is possible in our CI. time might not be an ideal metric to track on a public GitHub runner that might also be running other jobs in parallel and changing it's hardware over time. After that, probably setting up a master job for Statistical Continuous Benchmarking and a PR job for Relative Continuous Benchmarking a la https://bencher.dev/docs/how-to/track-benchmarks/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants