Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bench: Add support for measuring CPU cycles #9202

Merged
merged 1 commit into from Nov 29, 2016

Conversation

@laanwj
Copy link
Member

commented Nov 22, 2016

This adds cycle min/max/avg to the statistics.

Supported on x86 and x86_64 (natively through rdtsc), as well as for some other architectures on Linux (perf syscall). Will just show 0 on unsupported platforms.

Was tested on x86_64 and AARCH64.

bench: Add support for measuring CPU cycles
This adds cycle min/max/avg to the statistics.

Supported on x86 and x86_64 (natively through rdtsc), as well as Linux
(perf syscall).

@laanwj laanwj added the Tests label Nov 22, 2016

@jonasschnelli
Copy link
Member

left a comment

Tested ACK (OSX) 3532818

Result with -02 on OSX (2.6 GHz Intel Core i7)

#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,229376,0.000003975452273,0.000005225097993,0.000004511387877,10312,13553,11703
Base58Decode,851968,0.000001059088390,0.000001410629920,0.000001215919978,2747,3659,3154
Base58Encode,327680,0.000002935426892,0.000003485380148,0.000003217919584,7617,9042,8347
CCoinsCaching,90112,0.000009148381650,0.000012961449102,0.000011695591225,23730,33622,30338
CoinSelection,416,0.002168059349060,0.002760812640190,0.002422756873644,5623936,7161456,6284967
DeserializeAndCheckBlockTest,72,0.013411879539490,0.015962481498718,0.014648040135701,34790547,41406927,37996977
DeserializeBlockTest,88,0.010604500770569,0.012940049171448,0.011376557025042,27508165,33566117,29512372
LockedPool,512,0.001808419823647,0.003033317625523,0.002045841421932,4691069,7868411,5307190
MempoolEviction,15360,0.000059797894210,0.000087032094598,0.000065140891820,155116,225747,168975
RIPEMD160,384,0.002612933516502,0.002894565463066,0.002725711092353,6777939,7508452,7070852
RollingBloom-refresh,1,0.000611000000000,0.000611000000000,0.000611000000000
RollingBloom-refresh,1,0.000105000000000,0.000105000000000,0.000105000000000
RollingBloom-refresh,1,0.000101000000000,0.000101000000000,0.000101000000000
RollingBloom-refresh,1,0.000097000000000,0.000097000000000,0.000097000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000099000000000,0.000099000000000,0.000099000000000
RollingBloom-refresh,1,0.000096000000000,0.000096000000000,0.000096000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000128000000000,0.000128000000000,0.000128000000000
RollingBloom-refresh,1,0.000094000000000,0.000094000000000,0.000094000000000
RollingBloom-refresh,1,0.000151000000000,0.000151000000000,0.000151000000000
RollingBloom-refresh,1,0.000095000000000,0.000095000000000,0.000095000000000
RollingBloom-refresh,1,0.000106000000000,0.000106000000000,0.000106000000000
RollingBloom-refresh,1,0.000124000000000,0.000124000000000,0.000124000000000
RollingBloom-refresh,1,0.000115000000000,0.000115000000000,0.000115000000000
RollingBloom-refresh,1,0.000100000000000,0.000100000000000,0.000100000000000
RollingBloom-refresh,1,0.000100000000000,0.000100000000000,0.000100000000000
RollingBloom-refresh,1,0.000117000000000,0.000117000000000,0.000117000000000
RollingBloom-refresh,1,0.000101000000000,0.000101000000000,0.000101000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom,1310720,0.000000795478627,0.000000927659130,0.000000840967550,2063,2406,2181
SHA1,512,0.001935496926308,0.002218931913376,0.002032823860645,5020685,5755989,5273419
SHA256,208,0.004498481750488,0.005540251731873,0.004966990305827,11667956,14371842,12885041
SHA256_32b,4,0.345051527023315,0.346106529235840,0.345579028129578,895062320,897798922,896430621
SHA512,352,0.002845406532288,0.003299534320831,0.003069994124499,7380971,8558912,7963958
SipHash_32b,30,0.033124923706055,0.037207484245300,0.035290129979451,85925534,96516794,91547269
Sleep100ms,10,0.100992441177368,0.104498505592346,0.102697491645813,261974287,271068521,266396862
Trig,67108864,0.000000014460568,0.000000015428895,0.000000014972940,37,40,38
VerifyScriptBench,5632,0.000182222574949,0.000207984820008,0.000195238654586,472678,539492,506447
@morcos

This comment has been minimized.

Copy link
Member

commented Nov 22, 2016

@laanwj I was playing around with this type of timing earlier and read that I should be wary of rdtsc getting reordered with respect to other instructions and that if you can't use rdtscp instead, then you should add a serializing instruction first like cpuid. Also do you not have any issues with the thread migrating to another core? I had to set cpu affinity.

I couldn't find where I was reading all that, but here is one link:
http://blog.regehr.org/archives/330

@laanwj

This comment has been minimized.

Copy link
Member Author

commented Nov 23, 2016

I was playing around with this type of timing earlier and read that I should be wary of rdtsc getting reordered with respect to other instructions

Yes, both the compiler and the CPU pipeline may reorder it. In this specific case it's not too bad, though, because the call is already from a function (State::KeepRunning) called inside the benchmark. So there is quite some overhead already, making reordering by a few instructions probably unnoticeable in the noise.

and that if you can't use rdtscp instead, then you should add a serializing instruction first like cpuid.

I didn't know that. Although rdtscp seems not to be available on all x86 processors. I'll leave that as a future improvement.

x86 is already precise and low-overhead compared to the ARM path which has to do a syscall (the instructions aren't available to user-space).

Also do you not have any issues with the thread migrating to another core? I had to set cpu affinity.

Indeed, calling bench with e.g. taskset -c 0 bench_bitcoin will likely get more precise cycle measurements.

@fanquake

This comment has been minimized.

Copy link
Member

commented Nov 23, 2016

Running on OSX (3.4GHz i7)

#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,262144,0.000003828128683,0.000004008295946,0.000003908591680,12985,13597,13260
Base58Decode,983040,0.000000999269105,0.000001138963853,0.000001040822341,3389,3863,3530
Base58Encode,425984,0.000002385859261,0.000002805812983,0.000002487964454,8093,9517,8440
CCoinsCaching,106496,0.000009394483641,0.000010205199942,0.000010016403394,31871,34618,33978
CoinSelection,480,0.002068780362606,0.002542287111282,0.002170727153619,7017890,8624174,7364277
DeserializeAndCheckBlockTest,96,0.010975986719131,0.011605978012085,0.011256289978822,37233600,39370829,38184392
DeserializeBlockTest,112,0.009186625480652,0.010600864887238,0.009532500590597,31163698,35960873,32339308
LockedPool,640,0.001598000526428,0.001764506101608,0.001673145219684,5420783,5985758,5675764
MempoolEviction,14336,0.000070976559073,0.000092454254627,0.000074292616253,240772,313630,252040
RIPEMD160,416,0.002447441220284,0.002628095448017,0.002492115474664,8302338,8915072,8453936
RollingBloom-refresh,1,0.000569000000000,0.000569000000000,0.000569000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000121000000000,0.000121000000000,0.000121000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000114000000000,0.000114000000000,0.000114000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000114000000000,0.000114000000000,0.000114000000000
RollingBloom-refresh,1,0.000115000000000,0.000115000000000,0.000115000000000
RollingBloom-refresh,1,0.000125000000000,0.000125000000000,0.000125000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000116000000000,0.000116000000000,0.000116000000000
RollingBloom-refresh,1,0.000114000000000,0.000114000000000,0.000114000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000116000000000,0.000116000000000,0.000116000000000
RollingBloom-refresh,1,0.000149000000000,0.000149000000000,0.000149000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000114000000000,0.000114000000000,0.000114000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom,1441792,0.000000725554855,0.000000796730092,0.000000755561731,2465,2702,2563
SHA1,576,0.001806784421206,0.001858018338680,0.001830946240160,6129109,6302896,6211558
SHA256,240,0.004244878888130,0.004624485969543,0.004340062538783,14399626,15688224,14722663
SHA256_32b,4,0.300903081893921,0.302513957023621,0.301708519458771,1020885060,1026209988,1023547524
SHA512,384,0.002580255270004,0.002831816673279,0.002633192886909,8752964,9606202,8932496
SipHash_32b,28,0.036777973175049,0.037643551826477,0.037094610077994,124759771,127697052,125844822
Sleep100ms,10,0.102699518203735,0.104491472244263,0.103782296180725,348388267,354464246,352058326
Trig,67108864,0.000000015213971,0.000000016026718,0.000000015468853,51,54,52
VerifyScriptBench,6144,0.000170339830220,0.000214513391256,0.000176954781637,577836,727691,600278
@jonasschnelli

This comment has been minimized.

Copy link
Member

commented Nov 23, 2016

@fanquake: did you compile with -O2 or -O0 (--enable-debug)?

@paveljanik

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2016

It looks like RollingBloom-refresh bench is not changed to the new output format.

@laanwj

This comment has been minimized.

Copy link
Member Author

commented Nov 23, 2016

@paveljanik

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2016

Agreed.

ACK 3532818

@fanquake

This comment has been minimized.

Copy link
Member

commented Nov 23, 2016

@jonasschnelli

--enable-debug

Options used to compile and link:
  debug enabled = yes
  target os     = darwin
  build os      = darwin

  CC            = /usr/local/bin/ccache gcc
  CFLAGS        = -g -O2 -g3 -O0
  CPPFLAGS      = -Qunused-arguments  -DDEBUG -DDEBUG_LOCKORDER -DHAVE_BUILD_INFO -D__STDC_FORMAT_MACROS -I/usr/local/opt/berkeley-db4/include -DMAC_OSX
  CXX           = /usr/local/bin/ccache g++ -std=c++11
  CXXFLAGS      = -g -O2 -g3 -O0 -Wall -Wextra -Wformat -Wformat-security -Wno-unused-parameter -Wno-self-assign -Wno-unused-local-typedef -Wno-deprecated-register
  LDFLAGS       =  -Wl,-headerpad_max_install_names -Wl,-dead_strip
#Benchmark,count,min,max,average,min_cycles,max_cycles,average_cycles
Base58CheckEncode,30720,0.000034503871575,0.000035417033359,0.000034728871348,117174,120144,117818
Base58Decode,73728,0.000013721641153,0.000014259770978,0.000013993813708,46547,48372,47470
Base58Encode,40960,0.000023265369236,0.000028600101359,0.000024570309324,78931,97019,83356
CCoinsCaching,13312,0.000073989387602,0.000080669764429,0.000076363722865,250992,273651,259056
CoinSelection,104,0.009642988443375,0.011777520179749,0.009857095204867,32711658,40027011,33439357
DeserializeAndCheckBlockTest,12,0.087463498115540,0.089210510253906,0.088053584098816,296699541,302626002,298725543
DeserializeBlockTest,16,0.068791985511780,0.069242000579834,0.068972617387772,233362289,234886757,233973693
LockedPool,160,0.004223585128784,0.006983995437622,0.006671081483364,14328786,23691500,22631932
MempoolEviction,2560,0.000405104830861,0.000430928543210,0.000418359413743,1374232,1461820,1419187
RIPEMD160,20,0.053014993667603,0.053706049919128,0.053378355503082,179840289,182186794,181088217
RollingBloom,229376,0.000004121757229,0.000005482856068,0.000004432338756,13982,18599,15035
SHA1,56,0.018210709095001,0.019991517066956,0.018770660672869,61774970,67816841,63680293
SHA256,32,0.032343983650208,0.033765912055969,0.033258154988289,109720940,114543363,112829591
SHA256_32b,2,2.213187932968140,2.213187932968140,2.213187932968140,7508019029,7508019029,7508019029
SHA512,52,0.020092487335205,0.020573496818542,0.020253401536208,68158867,69791050,68705021
SipHash_32b,8,0.155891418457031,0.157407522201538,0.156427383422852,528826462,534114465,530680206
Sleep100ms,10,0.100543022155762,0.104717016220093,0.102999615669250,341066705,355370400,349431503
Trig,62914560,0.000000015808098,0.000000016690024,0.000000016066789,53,56,54
VerifyScriptBench,3584,0.000285433605313,0.000293343327940,0.000287477991411,968261,996237,975283

@laanwj laanwj merged commit 3532818 into bitcoin:master Nov 29, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
laanwj added a commit that referenced this pull request Nov 29, 2016
Merge #9202: bench: Add support for measuring CPU cycles
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
codablock added a commit to codablock/dash that referenced this pull request Jan 17, 2018
Merge bitcoin#9202: bench: Add support for measuring CPU cycles
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
andvgal added a commit to energicryptocurrency/energi that referenced this pull request Jan 6, 2019
Merge bitcoin#9202: bench: Add support for measuring CPU cycles
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
CryptoCentric added a commit to absolute-community/absolute that referenced this pull request Feb 25, 2019
Merge bitcoin#9202: bench: Add support for measuring CPU cycles
3532818 bench: Add support for measuring CPU cycles (Wladimir J. van der Laan)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.