-
Notifications
You must be signed in to change notification settings - Fork 36.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple benchmarking framework #6733
Conversation
static void CODE_TO_TIME(benchmark::State& state) | ||
{ | ||
... do any setup needed... | ||
while (state.KeepRunning()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works well as long as 'stuff you want to time' does not run too quickly - otherwise the state.KeepRunning(), which does a gettimeofday
syscall, with the implied latency of that, will dominate.
(an alternative to calling the time function in the benchmark loop that is commonly used, is to first time a fixed # of runs of the loop, then from the result compute a number of iterations so that it runs for approximately the allotted time)
Concept ACK |
if ENABLE_WALLET | ||
bench_bench_bitcoin_LDADD += $(LIBBITCOIN_WALLET) | ||
endif | ||
bench_bench_bitcoin_LDADD += $(LIBBITCOIN_CONSENSUS) $(BDB_LIBS) $(SSL_LIBS) $(CRYPTO_LIBS) $(MINIUPNPC_LIBS) $(CURL_LIBS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm $(CURL_LIBS)
? This seems to have be sneaked in from XT.
Concept ACK |
e91b47f
to
7500d59
Compare
Removed the $(CURL_LIBS) noticed by @jonasschnelli. And added the new libevent and libzmq dependencies. I also decided to make it disabled by default, so you have to configure --enable-bench, so it doesn't slow down compile times for everybody. @laanwj : RE: gettimeofday overhead and timing things that are very fast: timing how fast gettimeofday() is and then warning the developer if their benchmarking code runs into that overhead (maybe suggest that they loop 1,000 times inside the KeepRunning loop) would be another way of preventing that problem. Feel free to improve, I'm planning on using this to benchmark things that take microseconds to run, not nanoseconds. |
Disagree on default-disabled. Principle: Code that is not built by default bitrots rapidly, is not tested as much, receives less maintenance in general by random developers updating the code. |
@jgarzik : I KNEW you were going to say that... Would a Travis configuration that runs benchmarks be good enough to prevent code rot? Compiling and linking Yet Another Binary that approximately nobody will run (unless you're actively working on optimizing something) tickles my "features shouldn't cost anything unless you're using them" sensibility. |
This looks a bit strange in the benchmarking tool:
|
@@ -0,0 +1,45 @@ | |||
bin_PROGRAMS += bench/bench_bitcoin | |||
BENCH_SRCDIR = bench | |||
BENCH_BINARY=bench/bench_bitcoin$(EXEEXT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces around =
?
I think it should stay disabled by default as it is now, because benchmarking will be done on purpose by developers only. Concept ACK. I like it. This is a way to benchmark parts of code. Now we should identify parts that need some love and prepare benchmarks for them specifically. With this framework, people can report speed fixes with proper benchmark code which can be tested before and after applying the fix. @laanwj @gavinandresen I think that both ways to benchmark are useful and maybe we should add the second example benchmark there to illustrate. |
Compile by default but not run, seems fine to me. |
7500d59
to
8815533
Compare
Fixed @paveljanik 's spaces nits, and removed the unneeded SelectParams inherited from the unit test code. Data on the "compile by default or not" decision: It takes 8 seconds on my machine to compile and link the benchmarking code, single-processor, ccache cleared. |
8815533
to
b3a2e05
Compare
Damn you @laanwj, you inspired me to spend another hour implementing support for really fast benchmarks.... Picked @paveljanik 's CreateNewBlock nit (good catch, I copied this from a CreateNewBlock benchmark I haven't finished). I decided to rename MilliSleep.cpp to Examples.cpp, and added a "see how fast sin() runs" benchmark to test the really-fast-benchmark support (works nicely). And switched back to compile-by-default, which seems to be the consensus. I'm really and truly done tweaking this now, assuming Travis is happy. |
Concept ACK |
Yes, to be specific, I meant "compile by default but not run" |
concept ACK |
Tested. |
|
||
#include "bench.h" | ||
|
||
#include "chainparams.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not need chainparams.h
now.
ACK |
Benchmarking framework, loosely based on google's micro-benchmarking library (https://github.com/google/benchmark) Wny not use the Google Benchmark framework? Because adding Even More Dependencies isn't worth it. If we get a dozen or three benchmarks and need nanosecond-accurate timings of threaded code then switching to the full-blown Google Benchmark library should be considered. The benchmark framework is hard-coded to run each benchmark for one wall-clock second, and then spits out .csv-format timing information to stdout. It is left as an exercise for later (or maybe never) to add command-line arguments to specify which benchmark(s) to run, how long to run them for, how to format results, etc etc etc. Again, see the Google Benchmark framework for where that might end up. See src/bench/MilliSleep.cpp for a sanity-test benchmark that just benchmarks 'sleep 100 milliseconds.' To compile and run benchmarks: cd src; make bench Sample output: Benchmark,count,min,max,average Sleep100ms,10,0.101854,0.105059,0.103881
Avoid calling gettimeofday every time through the benchmarking loop, by keeping track of how long each loop takes and doubling the number of iterations done between time checks when they take less than 1/16'th of the total elapsed time.
b3a2e05
to
7072c54
Compare
@@ -91,6 +91,11 @@ AC_ARG_ENABLE(tests, | |||
[use_tests=$enableval], | |||
[use_tests=yes]) | |||
|
|||
AC_ARG_ENABLE(bench, | |||
AS_HELP_STRING([--enable-bench],[compile benchmarks (default is yes)]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If enabled by default, please make it --disable-bench
(see #6739).
ACK |
Micro-benchmarking framework part 1 Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6733 - bitcoin/bitcoin#6770 - bitcoin/bitcoin#6892 - Excluding changes to `src/policy/policy.h` which we don't have yet. - bitcoin/bitcoin#7934 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#8039 - bitcoin/bitcoin#8107 - bitcoin/bitcoin#8115 - bitcoin/bitcoin#8914 - Required resolving several merge conflicts in code that had been refactored upstream. The changes were simple enough that I decided it was okay to impose merge conflicts on pulling in those refactors later. - bitcoin/bitcoin#9200 - bitcoin/bitcoin#9202 - Adds support for measuring CPU cycles, which is later removed in an upstream PR after the refactor. I am including it to reduce future merge conflicts. - bitcoin/bitcoin#9281 - Only changes to `src/bench/bench.cpp` - bitcoin/bitcoin#9498 - bitcoin/bitcoin#9712 - bitcoin/bitcoin#9547 - bitcoin/bitcoin#9505 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#9792 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#10272 - bitcoin/bitcoin#10395 - Only changes to `src/bench/` - bitcoin/bitcoin#10735 - Only changes to `src/bench/base58.cpp` - bitcoin/bitcoin#10963 - bitcoin/bitcoin#11303 - Only the benchmark backend change. - bitcoin/bitcoin#11562 - bitcoin/bitcoin#11646 - bitcoin/bitcoin#11654 This pulls in all changes to the micro-benchmark framework prior to December 2017, when it was rewritten. The rewrite depends on other upstream PRs we have not pulled in yet. This does not pull in all benchmarks prior to December 2017. It leaves out benchmarks that either test code we do not have yet (except for the `FastRandomContext` refactor, which I decided to pull in), or would require rewrites to work with our changes to the codebase.
3f3edde [Bench] Use PIVX address in Base58Decode test (random-zebra) 5a1be90 [Travis] Disable benchmark framework for trusty test (random-zebra) 1bd89ac Initialize recently introduced non-static class member lastCycles to zero in constructor (random-zebra) ec60671 Require a steady clock for bench with at least micro precision (random-zebra) 84069ce bench: prefer a steady clock if the resolution is no worse (random-zebra) 38367b1 bench: switch to std::chrono for time measurements (random-zebra) a24633a Remove countMaskInv caching in bench framework (random-zebra) 9e9bc22 Restore default format state of cout after printing with std::fixed/setprecision (random-zebra) 3dd559d Avoid static analyzer warnings regarding uninitialized arguments (random-zebra) e85f224 Replace boost::function with std::function (C++11) (random-zebra) 98c0857 Prevent warning: variable 'x' is uninitialized (random-zebra) 7f0d4b3 FastRandom benchmark (random-zebra) d9fa0c6 Add prevector destructor benchmark (random-zebra) e1527ba Assert that what might look like a possible division by zero is actually unreachable (random-zebra) e94cf15 bench: Fix initialization order in registration (random-zebra) 151c25f Basic CCheckQueue Benchmarks (random-zebra) 51aedbc Use std:thread:hardware_concurrency, instead of Boost, to determine available cores (random-zebra) d447613 Use real number of cores for default -par, ignore virtual cores (random-zebra) 9162a56 [Refactoring] Removed using namespace <xxx> from bench/ sources (random-zebra) 5c07f67 bench: Add support for measuring CPU cycles (random-zebra) 41ce1ed bench: Fix subtle counting issue when rescaling iteration count (random-zebra) 68ea794 Avoid integer division in the benchmark inner-most loop. (random-zebra) 3fa4f27 bench: Added base58 encoding/decoding benchmarks (random-zebra) 4442118 bench: Add crypto hash benchmarks (random-zebra) a5179b6 [Trivial] ensure minimal header conventions (random-zebra) 8607d6b Support very-fast-running benchmarks (random-zebra) 4aebb60 Simple benchmarking framework (random-zebra) Pull request description: Introduces the benchmarking framework, loosely based on google's micro-benchmarking library (https://github.com/google/benchmark), ported from Bitcoin, up to 0.16. The benchmark framework is hard-coded to run each benchmark for one wall-clock second, and then spits out .csv-format timing information to stdout. Backported PR: - bitcoin#6733 - bitcoin#6770 - bitcoin#6892 - bitcoin#8039 - bitcoin#8107 - bitcoin#8115 - bitcoin#9200 - bitcoin#9202 - bitcoin#9281 - bitcoin#6361 - bitcoin#10271 - bitcoin#9498 - bitcoin#9712 - bitcoin#9547 - bitcoin#9505 (benchmark only. Rest was in #1557) - bitcoin#9792 (benchmark only. Rest was in #643) - bitcoin#10272 - bitcoin#10395 (base58 only) - bitcoin#10963 - bitcoin#11303 (first commit) - bitcoin#11562 - bitcoin#11646 - bitcoin#11654 Current output of `src/bench/bench_pivx`: ``` #Benchmark,count,min(ns),max(ns),average(ns),min_cycles,max_cycles,average_cycles Base58CheckEncode,131072,7697,8065,7785,20015,20971,20242 Base58Decode,294912,3305,3537,3454,8595,9198,8981 Base58Encode,180224,5498,6020,5767,14297,15652,14994 CCheckQueueSpeed,320,3159960,3535173,3352787,8216030,9191602,8717388 CCheckQueueSpeedPrevectorJob,96,9184484,11410840,10823070,23880046,29668680,28140445 FastRandom_1bit,320,3143690,4838162,3199156,8173726,12579373,8317941 FastRandom_32bit,60,17097612,17923669,17367440,44454504,46602306,45156079 PrevectorClear,3072,334741,366618,346731,870340,953224,901516 PrevectorDestructor,2816,344233,368912,357281,895022,959187,928948 RIPEMD160,288,3404503,3693917,3577774,8851850,9604334,9302363 SHA1,384,2718128,2891558,2802513,7067238,7518184,7286652 SHA256,176,6133760,6580005,6239866,15948035,17108376,16223916 SHA512,240,4251468,4358706,4313463,11054006,11332826,11215186 Sleep100ms,10,100221470,100302411,100239073,260580075,260790726,260625870 ``` NOTE: Not all the tests have been pulled yet (as we might not have the code being tested, or it would require rewrites to work with our different code base), but the framework is updated to December 2017. ACKs for top commit: Fuzzbawls: ACK 3f3edde Tree-SHA512: c283311a9accf6d2feeb93b185afa08589ebef3f18b6e86980dbc3647b9845f75ac9ecce2f1b08738d25ceac36596a2c89d41e4dbf3b463502aa695611aa1f8e
Benchmarking framework, loosely based on google's micro-benchmarking
library (https://github.com/google/benchmark)
Wny not use the Google Benchmark framework? Because adding Even More Dependencies
isn't worth it. If we get a dozen or three benchmarks and need nanosecond-accurate
timings of threaded code then switching to the full-blown Google Benchmark library
should be considered.
The benchmark framework is hard-coded to run each benchmark for one wall-clock second,
and then spits out .csv-format timing information to stdout. It is left as an
exercise for later (or maybe never) to add command-line arguments to specify which
benchmark(s) to run, how long to run them for, how to format results, etc etc etc.
Again, see the Google Benchmark framework for where that might end up.
See src/bench/Examples.cpp for a sanity-test benchmark that just benchmarks
'sleep 100 milliseconds' and a benchmark for math.h's sin() function.
To compile and run benchmarks:
Sample output: