Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secp256k1: Optimize precomp values to use affine. #2690

Merged

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Jul 31, 2021

This optimizes the pre-computed byte points used to accelerate scalar base multiplication to store the data in affine coordinates instead of Jacobian coordinates which reduces the memory usage requirement to 66% of what it current requires and also has the important benefit of further speeding up the computation.

This is the case because projecting affine coordinates into Jacobian space is essentially free and the point doubling and addition routines have optimizations which allow them to avoid additional operations when the Z coordinate is 1, which is the case for an initial affine projection.

Further, since the compressed table is stored in the string table of the binary, it also reduces the size the of final binary by ~385KiB.

There are also a couple of preparatory commits to ease the review process that separates the code that loads the pre-computed byte points from the elliptic adaptor code which further makes the internals of the package independent of the crypto/elliptic and crypto/ecdsa stdlib interfaces.

The following benchmark shows a before and after comparison of scalar base multiplication as well as how that translates to signature verification:

name                       old time/op    new time/op    delta
-------------------------------------------------------------------------
ScalarBaseMult        34.5µs ± 1%   24.7µs ± 1%   -28.43% (p=0.008 n=5+5)
ScalarBaseMultLarge   48.2µs ± 1%   38.0µs ± 1%   -21.08% (p=0.008 n=5+5)
SigVerify              181µs ± 5%    163µs ± 2%    -9.86% (p=0.008 n=5+5)

While 18 µs less per signature verification might not seem like much on the surface, consider that every transaction requires at least one signature operation, so there are a ton of them when doing no checkpoint syncs. For a concrete number, verifying 100 million signatures would take 30 minutes less time.

@davecgh davecgh added this to the 1.7.0 milestone Jul 31, 2021
@davecgh davecgh force-pushed the secp256k1_optimize_precomps_and_base_mult branch 3 times, most recently from af58a2d to d886f32 Compare August 2, 2021 23:18
@davecgh
Copy link
Member Author

davecgh commented Aug 2, 2021

Rebased to the latest master. No changes.

Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested regenerating the compressedpoints.go file (after locally removing it) and it matches the one in the commit.

Copy link
Member

@rstaudt2 rstaudt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Ran a full sync on mainnet with --nocheckpoints without issue.

dcrec/secp256k1/loadprecomputed.go Outdated Show resolved Hide resolved
This separates the code that loads the pre-computed byte points used to
accelerate scalar base multiplication from the elliptic adaptor code
which further makes the internals of the package independent of the
crypto/elliptic and crypto/ecdsa stdlib interfaces.

It also takes this opportunity to improve the related code a bit by
making it less dependent on magic numbers and defining a proper type for
the data table.

Finally, it retains and improves the logic which only loads the data on
first use by making use of a closure to house and access the loaded data
so it is no longer possible to accidentally access the uninitialized
pointer.
This removes the code that deals with only initializing the adaptor
instance on first use since the adaptor code no longer houses the
pre-computed byte points which motivated that behavior.
This optimizes the pre-computed byte points used to accelerate scalar
base multiplication to store the data in affine coordinates instead of
Jacobian coordinates which reduces the memory usage requirement to 66%
of what it current requires and also has the important benefit of
further speeding up the computation.

This is the case because projecting affine coordinates into Jacobian
space is essentially free and the point doubling and addition routines
have optimizations which allow them to avoid additional operations when
the Z coordinate is 1, which is the case for an initial affine
projection.

Further, since the compressed table is stored in the string table of the
binary, it also reduces the size the of final binary by ~385KiB.

The following benchmark shows a before and after comparison of scalar
base multiplication as well as how that translates to signature
verification:

name                       old time/op    new time/op    delta
-------------------------------------------------------------------------
ScalarBaseMult        34.5µs ± 1%   24.7µs ± 1%   -28.43% (p=0.008 n=5+5)
ScalarBaseMultLarge   48.2µs ± 1%   38.0µs ± 1%   -21.08% (p=0.008 n=5+5)
SigVerify              181µs ± 5%    163µs ± 2%    -9.86% (p=0.008 n=5+5)

While 18 µs less per signature verification might not seem like much on
the surface, consider that every transaction requires at least one
signature operation, so there are a ton of them when doing no checkpoint
syncs.  For a concrete number, verifying 100 million signatures would
take 30 minutes less time.
@davecgh davecgh force-pushed the secp256k1_optimize_precomps_and_base_mult branch from d886f32 to fdfae1a Compare August 11, 2021 19:10
@davecgh davecgh merged commit fdfae1a into decred:master Aug 11, 2021
@davecgh davecgh deleted the secp256k1_optimize_precomps_and_base_mult branch August 11, 2021 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants