secp256k1: Reduce scalar base mult copies.#2898
Merged
davecgh merged 1 commit intodecred:masterfrom Mar 18, 2022
Merged
Conversation
2bbfb34 to
9fcf7d6
Compare
Member
|
The PR description says this is rebased on #2888, but it looks like it is just based off master currently. |
9fcf7d6 to
4037827
Compare
Member
Author
|
Ah, I guess I rebased it over master in the last round of updates. It's rebased over 2888 now as intended. |
rstaudt2
approved these changes
Mar 14, 2022
4037827 to
9f550ed
Compare
JoeGruffins
approved these changes
Mar 17, 2022
matheusd
approved these changes
Mar 18, 2022
Profiling shows that around 7.5% of the time in scalar base multiplication is attributed to duffcopy. Upon further examination, this is the result of a combination of the range statement making copies of the bytes and the need to construct a Jacobian point from the individual field values stored in the in-memory byte points table. This optimizes the function to avoid that as follows: - Perform the conversion to Jacobian once when the affine byte table is decompressed from the stored values - Make use of those Jacobian points directly - Use an indexed for loop instead of a range over the bytes - Perform the calculation using the result variable directly instead of via a local variable that is copied to the result The following benchmark results show the speedup is in line with the expected gains per the profiling results: name old time/op new time/op delta ------------------------------------------------------------------------------ ScalarBaseMultNonConst 24.1µs ±22% 22.5µs ± 2% -6.97% (p=0.000 n=98+96)
9f550ed to
aae0128
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is rebased on #2888.Profiling shows that around 7.5% of the time in scalar base multiplication is attributed to
duffcopy. Upon further examination, this is the result of a combination of the range statement making copies of the bytes and the need to construct a Jacobian point from the individual field values stored in the in-memory byte points table.This optimizes the function to avoid that as follows:
The following benchmark results show the speedup is in line with the expected gains per the profiling results: