New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
secp256k1: Optimize precomp values to use affine. #2690
secp256k1: Optimize precomp values to use affine. #2690
Conversation
af58a2d
to
d886f32
Compare
Rebased to the latest master. No changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested regenerating the compressedpoints.go
file (after locally removing it) and it matches the one in the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Ran a full sync on mainnet with --nocheckpoints
without issue.
This separates the code that loads the pre-computed byte points used to accelerate scalar base multiplication from the elliptic adaptor code which further makes the internals of the package independent of the crypto/elliptic and crypto/ecdsa stdlib interfaces. It also takes this opportunity to improve the related code a bit by making it less dependent on magic numbers and defining a proper type for the data table. Finally, it retains and improves the logic which only loads the data on first use by making use of a closure to house and access the loaded data so it is no longer possible to accidentally access the uninitialized pointer.
This removes the code that deals with only initializing the adaptor instance on first use since the adaptor code no longer houses the pre-computed byte points which motivated that behavior.
This optimizes the pre-computed byte points used to accelerate scalar base multiplication to store the data in affine coordinates instead of Jacobian coordinates which reduces the memory usage requirement to 66% of what it current requires and also has the important benefit of further speeding up the computation. This is the case because projecting affine coordinates into Jacobian space is essentially free and the point doubling and addition routines have optimizations which allow them to avoid additional operations when the Z coordinate is 1, which is the case for an initial affine projection. Further, since the compressed table is stored in the string table of the binary, it also reduces the size the of final binary by ~385KiB. The following benchmark shows a before and after comparison of scalar base multiplication as well as how that translates to signature verification: name old time/op new time/op delta ------------------------------------------------------------------------- ScalarBaseMult 34.5µs ± 1% 24.7µs ± 1% -28.43% (p=0.008 n=5+5) ScalarBaseMultLarge 48.2µs ± 1% 38.0µs ± 1% -21.08% (p=0.008 n=5+5) SigVerify 181µs ± 5% 163µs ± 2% -9.86% (p=0.008 n=5+5) While 18 µs less per signature verification might not seem like much on the surface, consider that every transaction requires at least one signature operation, so there are a ton of them when doing no checkpoint syncs. For a concrete number, verifying 100 million signatures would take 30 minutes less time.
d886f32
to
fdfae1a
Compare
This optimizes the pre-computed byte points used to accelerate scalar base multiplication to store the data in affine coordinates instead of Jacobian coordinates which reduces the memory usage requirement to 66% of what it current requires and also has the important benefit of further speeding up the computation.
This is the case because projecting affine coordinates into Jacobian space is essentially free and the point doubling and addition routines have optimizations which allow them to avoid additional operations when the Z coordinate is 1, which is the case for an initial affine projection.
Further, since the compressed table is stored in the string table of the binary, it also reduces the size the of final binary by ~385KiB.
There are also a couple of preparatory commits to ease the review process that separates the code that loads the pre-computed byte points from the elliptic adaptor code which further makes the internals of the package independent of the crypto/elliptic and crypto/ecdsa stdlib interfaces.
The following benchmark shows a before and after comparison of scalar base multiplication as well as how that translates to signature verification:
While 18 µs less per signature verification might not seem like much on the surface, consider that every transaction requires at least one signature operation, so there are a ton of them when doing no checkpoint syncs. For a concrete number, verifying 100 million signatures would take 30 minutes less time.