New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pippenger_wnaf for multi-multiplication #486

Merged
merged 9 commits into from Dec 8, 2017

Conversation

Projects
None yet
4 participants
@jonasnick
Contributor

jonasnick commented Nov 13, 2017

This PR is based on #473 and adds a variant of "Pippengers algorithm" (see Bernstein et al., Faster batch forgery identification, page 15 and scipr-lab/libff#10) for point multi-multiplication that performs better with a large number of points than Strauss' algorithm.

aggsig

Thanks to @sipa for providing wnaf_fixed, benchmarking, and the crucial suggestion to use affine addition.

The PR also makes ecmult_multi decide which algorithm to use, based on the number of points and the available scratch space.
For restricted scratch spaces this can be further optimized in the future (f.e. a 35kB scratch space allows batches of 11 points with strauss or 95 points with pippenger; choosing pippenger would be 5% faster).

As soon as this PR has received some feedback I'll repeat the benchmarks to determine the optimal pippenger_bucket_window with the new benchmarking code in #473.

@apoelstra

This comment has been minimized.

Show comment
Hide comment
@apoelstra

apoelstra Nov 13, 2017

Member

Hooray!

Please rebase -- should be straightforward, we merged only some minor things last week.

Member

apoelstra commented Nov 13, 2017

Hooray!

Please rebase -- should be straightforward, we merged only some minor things last week.

@jonasnick

This comment has been minimized.

Show comment
Hide comment
@jonasnick

jonasnick Nov 14, 2017

Contributor

Rebased

Contributor

jonasnick commented Nov 14, 2017

Rebased

Show outdated Hide outdated src/ecmult_impl.h Outdated
@apoelstra

This comment has been minimized.

Show comment
Hide comment
@apoelstra

apoelstra Nov 16, 2017

Member

In secp256k1_ecmult_multi_split_strauss_wnaf we immediately convert a secp256k1_ge received from the callback to a secp256k1_gej. Would be better to change the callback API to just return a gej in the first place, otherwise the caller may be forced to do an expensive and pointless gej->ge conversion.

Edit Oh, I was looking at old code. I see that Strauss does the ge->gej conversion but Pippenger does not (and Pippenger derives a speed benefit from using ge). And our two main applications, aggsig and bulletproofs, don't need to do a gej->ge conversion anyway, at least during verification. Never mind.

Member

apoelstra commented Nov 16, 2017

In secp256k1_ecmult_multi_split_strauss_wnaf we immediately convert a secp256k1_ge received from the callback to a secp256k1_gej. Would be better to change the callback API to just return a gej in the first place, otherwise the caller may be forced to do an expensive and pointless gej->ge conversion.

Edit Oh, I was looking at old code. I see that Strauss does the ge->gej conversion but Pippenger does not (and Pippenger derives a speed benefit from using ge). And our two main applications, aggsig and bulletproofs, don't need to do a gej->ge conversion anyway, at least during verification. Never mind.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 17, 2017

Contributor

@apoelstra Yes, I believe it was a conscious choice to making the interface pass affine points because no callers need more.

Contributor

sipa commented Nov 17, 2017

@apoelstra Yes, I believe it was a conscious choice to making the interface pass affine points because no callers need more.

Show outdated Hide outdated include/secp256k1.h Outdated
Show outdated Hide outdated src/scratch_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
@@ -403,4 +484,540 @@ static void secp256k1_ecmult(const secp256k1_ecmult_context *ctx, secp256k1_gej
}
}
static void secp256k1_ecmult(const secp256k1_ecmult_context *ctx, secp256k1_gej *r, const secp256k1_gej *a, const secp256k1_scalar *na, const secp256k1_scalar *ng) {

This comment has been minimized.

@sipa

sipa Nov 25, 2017

Contributor

At some point this should be renamed to _var as well, but let's leave that for later.

@sipa

sipa Nov 25, 2017

Contributor

At some point this should be renamed to _var as well, but let's leave that for later.

Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/ecmult_impl.h Outdated
Show outdated Hide outdated src/scratch.h Outdated
@jonasnick

This comment has been minimized.

Show comment
Hide comment
@jonasnick

jonasnick Nov 30, 2017

Contributor

Pushed fixes

Contributor

jonasnick commented Nov 30, 2017

Pushed fixes

@peterdettman

This comment has been minimized.

Show comment
Hide comment
@peterdettman

peterdettman Nov 30, 2017

Collaborator

@jonasnick I opened a PR here: jonasnick#1 for suggested changes to save some additions in each _pippenger_wnaf window.

Collaborator

peterdettman commented Nov 30, 2017

@jonasnick I opened a PR here: jonasnick#1 for suggested changes to save some additions in each _pippenger_wnaf window.

@jonasnick

This comment has been minimized.

Show comment
Hide comment
@jonasnick

jonasnick Dec 1, 2017

Contributor

As expected, @peterdettman delivers. I see about 2% to 4% speedup. Merged the commit into this PR.

Contributor

jonasnick commented Dec 1, 2017

As expected, @peterdettman delivers. I see about 2% to 4% speedup. Merged the commit into this PR.

@sipa

utACK c772399 tree 6f9376c5ebdfe589593fabde10c5d61f5d07ebcc

Please squash fixes.

Show outdated Hide outdated src/ecmult_impl.h Outdated

yeastplume added a commit to mimblewimble/secp256k1-zkp that referenced this pull request Dec 6, 2017

@jonasnick

This comment has been minimized.

Show comment
Hide comment
@jonasnick

jonasnick Dec 6, 2017

Contributor

Squashed fixes and updated bucket windows. Previous bucket windows did not not include peterdettman's optimization, did not use bench_ecmult and had a bug where the number of points for a bucket window were twice of what they should have been. The result is that the performance without endomorphism is very similar, with endomorphism it's up to 8% faster for some number of points. The performance graph looks very similar to the one above.

Contributor

jonasnick commented Dec 6, 2017

Squashed fixes and updated bucket windows. Previous bucket windows did not not include peterdettman's optimization, did not use bench_ecmult and had a bug where the number of points for a bucket window were twice of what they should have been. The result is that the performance without endomorphism is very similar, with endomorphism it's up to 8% faster for some number of points. The performance graph looks very similar to the one above.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Dec 6, 2017

Contributor

ACK 50f0779

Contributor

sipa commented Dec 6, 2017

ACK 50f0779

Show outdated Hide outdated src/scratch.h Outdated
void *data;
size_t offset;
size_t init_size;
size_t max_size;

This comment has been minimized.

@apoelstra

apoelstra Dec 7, 2017

Member

Can we add an error_callback pointer to this structure and remove it from the arguments of secp256k1_scratch_resize? It would avoid the need to pass error callbacks into a bunch of higher level functions.

@apoelstra

apoelstra Dec 7, 2017

Member

Can we add an error_callback pointer to this structure and remove it from the arguments of secp256k1_scratch_resize? It would avoid the need to pass error callbacks into a bunch of higher level functions.

Show outdated Hide outdated src/ecmult.h Outdated
Show outdated Hide outdated src/tests.c Outdated
@jonasnick

This comment has been minimized.

Show comment
Hide comment
@jonasnick

jonasnick Dec 7, 2017

Contributor

Pushed fixes

Contributor

jonasnick commented Dec 7, 2017

Pushed fixes

Show outdated Hide outdated src/ecmult_impl.h Outdated
@apoelstra

This comment has been minimized.

Show comment
Hide comment
@apoelstra

apoelstra Dec 7, 2017

Member

ACK 5979a57 .. please squash. The only nit is the position of the secp256k1_ecmult_multi_func typedef, which I don't care about that much.

Member

apoelstra commented Dec 7, 2017

ACK 5979a57 .. please squash. The only nit is the position of the secp256k1_ecmult_multi_func typedef, which I don't care about that much.

@jonasnick

This comment has been minimized.

Show comment
Hide comment
@jonasnick

jonasnick Dec 7, 2017

Contributor

Squashed

Contributor

jonasnick commented Dec 7, 2017

Squashed

@sipa sipa merged commit d2f9c6b into bitcoin-core:master Dec 8, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

sipa added a commit that referenced this pull request Dec 8, 2017

Merge #486: Add pippenger_wnaf for multi-multiplication
d2f9c6b Use more precise pippenger bucket windows (Jonas Nick)
4c950bb Save some additions per window in _pippenger_wnaf (Peter Dettman)
a58f543 Add flags for choosing algorithm in ecmult_multi benchmark (Jonas Nick)
36b22c9 Use scratch space dependent batching in ecmult_multi (Jonas Nick)
355a38f Add pippenger_wnaf ecmult_multi (Jonas Nick)
bc65aa7 Add bench_ecmult (Pieter Wuille)
dba5471 Add ecmult_multi tests (Andrew Poelstra)
8c1c831 Generalize Strauss to support multiple points (Pieter Wuille)
548de42 add resizeable scratch space API (Andrew Poelstra)

Pull request description:

  This PR is based on #473 and adds a variant of "Pippengers algorithm" (see [Bernstein et al., Faster batch forgery identification](https://eprint.iacr.org/2012/549.pdf), page 15 and scipr-lab/libff#10) for point multi-multiplication that performs better with a large number of points than Strauss' algorithm.

  ![aggsig](https://user-images.githubusercontent.com/2582071/32731185-12c0f108-c881-11e7-83c7-c2432b5fadf5.png)

  Thanks to @sipa for providing `wnaf_fixed`, benchmarking, and the crucial suggestion to use affine addition.

  The PR also makes `ecmult_multi` decide which algorithm to use, based on the number of points and the available scratch space.
  For restricted scratch spaces this can be further optimized in the future (f.e. a 35kB scratch space allows batches of 11 points with strauss or 95 points with pippenger; choosing pippenger would be 5% faster).

  As soon as this PR has received some feedback I'll repeat the benchmarks to determine the optimal `pippenger_bucket_window` with the new benchmarking code in #473.

Tree-SHA512: 8e155107a00d35f412300275803f912b1d228b7adff578bc4754c5b29641100b51b9d37f989316b636f7144e6b199febe7de302a44f498bbfd8d463bdbe31a5c

@anargle anargle referenced this pull request May 29, 2018

Closed

Kernel half-aggregation #1093

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment