New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid integer division in the benchmark inner-most loop. #8115

Merged
merged 1 commit into from May 31, 2016

Conversation

Projects
None yet
3 participants
@gmaxwell
Member

gmaxwell commented May 29, 2016

Previously the benchmark code used an integer division (%) with
a non-constant in the inner-loop. This is quite slow on many
processors, especially ones like ARM that lack a hardware divide.

Even on fairly recent x86_64 like haswell an integer division can
take something like 100 cycles-- making it comparable to the
runtime of siphash.

This change avoids the division by using bitmasking instead. This
was especially easy since the count was only increased by doubling.

This change also restarts the timing when the execution time was
very low this avoids mintimes of zero in cases where one execution
ends up below the timer resolution. It also reduces the impact of
the overhead on the final result.

The formatting of the prints is changed to not use scientific
notation make it more machine readable (in particular, gnuplot
croaks on the non-fixedpoint, and it doesn't sort correctly).

This also hoists out all the floating point divisions out of the
semi-hot path because it was easy to do so.

It might be prudent to break out the critical test into a macro
just to guarantee that it gets inlined. It might also make sense
to just save out the intermediate counts and times and get the
floating point completely out of the timing loop (because e.g.
on hardware without a fast hardware FPU like some ARM it will
still be slow enough to distort the results). I haven't done
either of these in this commit.

@fanquake

This comment has been minimized.

Show comment
Hide comment
@fanquake

fanquake May 29, 2016

Member

Master

Benchmark,count,min,max,average
RIPEMD160,415,0.00242996,0.00253913,0.00246098
RollingBloom-refresh,1,0.000801,0.000801,0.000801
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000131,0.000131,0.000131
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.00011,0.00011,0.00011
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000129,0.000129,0.000129
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000114,0.000114,0.000114
RollingBloom-refresh,1,0.000134,0.000134,0.000134
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.00012,0.00012,0.00012
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom,1441791,6.77479e-07,1.19209e-06,7.49941e-07
SHA1,575,0.00178885,0.00195658,0.0018124
SHA256,239,0.00424147,0.00436625,0.00429098
SHA512,383,0.00258148,0.00269461,0.00262054
Sleep100ms,10,0.100642,0.105008,0.103793
Trig,46137343,0,9.53674e-07,2.26862e-08

This PR

#Benchmark,count,min,max,average
RIPEMD160,448,0.001245033173334,0.002638196945190,0.002461894814457
RollingBloom-refresh,1,0.000635000000000,0.000635000000000,0.000635000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000107000000000,0.000107000000000,0.000107000000000
RollingBloom-refresh,1,0.000204000000000,0.000204000000000,0.000204000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000120000000000,0.000120000000000,0.000120000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000127000000000,0.000127000000000,0.000127000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000134000000000,0.000134000000000,0.000134000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000110000000000,0.000110000000000,0.000110000000000
RollingBloom-refresh,1,0.000107000000000,0.000107000000000,0.000107000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom,1310720,0.000000364444583,0.000000838096198,0.000000766858648
SHA1,640,0.000909024336207,0.001938136418660,0.001843086257577
SHA256,256,0.002209486499909,0.008500099182129,0.004300644621253
SHA512,384,0.001319904176016,0.002813005447388,0.002615700786312
Sleep100ms,10,0.205592155456543,0.210056066513062,0.104166316986084
Trig,67108864,0.000000014997003,0.000000015448112,0.000000015188842
Member

fanquake commented May 29, 2016

Master

Benchmark,count,min,max,average
RIPEMD160,415,0.00242996,0.00253913,0.00246098
RollingBloom-refresh,1,0.000801,0.000801,0.000801
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000131,0.000131,0.000131
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.00011,0.00011,0.00011
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000129,0.000129,0.000129
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000114,0.000114,0.000114
RollingBloom-refresh,1,0.000134,0.000134,0.000134
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.00012,0.00012,0.00012
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom,1441791,6.77479e-07,1.19209e-06,7.49941e-07
SHA1,575,0.00178885,0.00195658,0.0018124
SHA256,239,0.00424147,0.00436625,0.00429098
SHA512,383,0.00258148,0.00269461,0.00262054
Sleep100ms,10,0.100642,0.105008,0.103793
Trig,46137343,0,9.53674e-07,2.26862e-08

This PR

#Benchmark,count,min,max,average
RIPEMD160,448,0.001245033173334,0.002638196945190,0.002461894814457
RollingBloom-refresh,1,0.000635000000000,0.000635000000000,0.000635000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000107000000000,0.000107000000000,0.000107000000000
RollingBloom-refresh,1,0.000204000000000,0.000204000000000,0.000204000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000120000000000,0.000120000000000,0.000120000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000127000000000,0.000127000000000,0.000127000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000134000000000,0.000134000000000,0.000134000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000110000000000,0.000110000000000,0.000110000000000
RollingBloom-refresh,1,0.000107000000000,0.000107000000000,0.000107000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom,1310720,0.000000364444583,0.000000838096198,0.000000766858648
SHA1,640,0.000909024336207,0.001938136418660,0.001843086257577
SHA256,256,0.002209486499909,0.008500099182129,0.004300644621253
SHA512,384,0.001319904176016,0.002813005447388,0.002615700786312
Sleep100ms,10,0.205592155456543,0.210056066513062,0.104166316986084
Trig,67108864,0.000000014997003,0.000000015448112,0.000000015188842

@laanwj laanwj added the Tests label May 30, 2016

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 30, 2016

Member

utACK

Member

laanwj commented May 30, 2016

utACK

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell May 30, 2016

Member

There is a bug in the min/max handling (see the sleep 100 timings). I'll fix.

Member

gmaxwell commented May 30, 2016

There is a bug in the min/max handling (see the sleep 100 timings). I'll fix.

Avoid integer division in the benchmark inner-most loop.
Previously the benchmark code used an integer division (%) with
 a non-constant in the inner-loop.  This is quite slow on many
 processors, especially ones like ARM that lack a hardware divide.

Even on fairly recent x86_64 like haswell an integer division can
 take something like 100 cycles-- making it comparable to the
 runtime of siphash.

This change avoids the division by using bitmasking instead. This
 was especially easy since the count was only increased by doubling.

This change also restarts the timing when the execution time was
 very low this avoids mintimes of zero in cases where one execution
 ends up below the timer resolution. It also reduces the impact of
 the overhead on the final result.

The formatting of the prints is changed to not use scientific
 notation make it more machine readable (in particular, gnuplot
 croaks on the non-fixedpoint, and it doesn't sort correctly).

This also hoists out all the floating point divisions out of the
 semi-hot path because it was easy to do so.

It might be prudent to break out the critical test into a macro
 just to guarantee that it gets inlined.  It might also make sense
 to just save out the intermediate counts and times and get the
 floating point completely out of the timing loop (because e.g.
 on hardware without a fast hardware FPU like some ARM it will
 still be slow enough to distort the results). I haven't done
 either of these in this commit.

@laanwj laanwj merged commit 63ff57d into bitcoin:master May 31, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request May 31, 2016

Merge #8115: Avoid integer division in the benchmark inner-most loop.
63ff57d Avoid integer division in the benchmark inner-most loop. (Gregory Maxwell)

codablock added a commit to codablock/dash that referenced this pull request Sep 16, 2017

Merge #8115: Avoid integer division in the benchmark inner-most loop.
63ff57d Avoid integer division in the benchmark inner-most loop. (Gregory Maxwell)

codablock added a commit to codablock/dash that referenced this pull request Sep 19, 2017

Merge #8115: Avoid integer division in the benchmark inner-most loop.
63ff57d Avoid integer division in the benchmark inner-most loop. (Gregory Maxwell)

codablock added a commit to codablock/dash that referenced this pull request Dec 21, 2017

Merge #8115: Avoid integer division in the benchmark inner-most loop.
63ff57d Avoid integer division in the benchmark inner-most loop. (Gregory Maxwell)

@dagurval dagurval referenced this pull request Apr 30, 2018

Merged

Benchmark cherries #408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment