Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add schnorrsig batch verification #760

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jonasnick
Copy link
Contributor

This was part of #558 (for 20 months) to demonstrate the advantages of batch verification (see graph), but then removed to simplify #558 because there are still ongoing discussions:

@jonasnick jonasnick changed the title Add schnorrsig batch verification WIP: Add schnorrsig batch verification Jun 18, 2020
@gmaxwell
Copy link
Contributor

Time to start rebasing on the nearly complete #558?

real-or-random added a commit that referenced this pull request Sep 11, 2020
…signatures

f431b3f valgrind_ctime_test: Add schnorrsig_sign (Jonas Nick)
16ffa9d schnorrsig: Add taproot test case (Jonas Nick)
8dfd53e schnorrsig: Add benchmark for sign and verify (Jonas Nick)
4e43520 schnorrsig: Add BIP-340 compatible signing and verification (Jonas Nick)
7332d2d schnorrsig: Add BIP-340 nonce function (Jonas Nick)
7a703fd schnorrsig: Init empty experimental module (Jonas Nick)
eabd9bc Allow initializing tagged sha256 (Jonas Nick)
6fcb5b8 extrakeys: Add keypair_xonly_tweak_add (Jonas Nick)
5825446 extrakeys: Add keypair struct with create, pub and pub_xonly (Jonas Nick)
f001034 Separate helper functions for pubkey_create and seckey_tweak_add (Jonas Nick)
910d9c2 extrakeys: Add xonly_pubkey_tweak_add & xonly_pubkey_tweak_add_test (Jonas Nick)
176bfb1 Separate helper function for ec_pubkey_tweak_add (Jonas Nick)
4cd2ee4 extrakeys: Add xonly_pubkey with serialize, parse and from_pubkey (Jonas Nick)
47e6618 extrakeys: Init empty experimental module (Jonas Nick)
3e08b02 Make the secp256k1_declassify argument constant (Jonas Nick)

Pull request description:

  This PR implements signing, verification and batch verification as described in [BIP-340](https://github.com/bitcoin/bips/blob/master/bip-0340.mediawiki) in an experimental module named `schnorrsig`. It includes the test vectors and a benchmarking tool.
  This PR also adds a module `extrakeys` that allows [BIP-341](https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki)-style key tweaking.

  (Adding ChaCha20 as a CSPRNG and batch verification was moved to PR #760).

  In order to enable the module run `./configure` with `--enable-experimental --enable-module-schnorrsig`.

  Based on apoelstra's work.

ACKs for top commit:
  gmaxwell:
    ACK f431b3f  (exactly matches the previous post-fixup version which I have already reviewed and tested)
  sipa:
    ACK f431b3f
  real-or-random:
    ACK f431b3f careful code review

Tree-SHA512: e15e849c7bb65cdc5d7b1d6874678e275a71e4514de9d5432ec1700de3ba92aa9f381915813f4729057af152d90eea26aabb976ed297019c5767e59cf0bbc693
@jonasnick
Copy link
Contributor Author

rebased on master

@jonasnick
Copy link
Contributor Author

schnorrsig_sign: min 25.7us / avg 25.8us / max 26.2us
schnorrsig_verify: min 57.5us / avg 57.7us / max 58.0us
schnorrsig_batch_verify_1: min 64.0us / avg 64.3us / max 64.7us
schnorrsig_batch_verify_2: min 50.4us / avg 50.7us / max 51.0us
schnorrsig_batch_verify_4: min 43.8us / avg 43.9us / max 44.1us
schnorrsig_batch_verify_8: min 40.4us / avg 40.5us / max 40.5us
schnorrsig_batch_verify_16: min 38.9us / avg 39.0us / max 39.1us
schnorrsig_batch_verify_32: min 38.2us / avg 38.4us / max 38.7us
schnorrsig_batch_verify_64: min 37.7us / avg 37.8us / max 37.9us
schnorrsig_batch_verify_128: min 35.2us / avg 35.3us / max 35.3us
schnorrsig_batch_verify_256: min 31.9us / avg 32.0us / max 32.1us
schnorrsig_batch_verify_512: min 29.2us / avg 29.4us / max 29.7us
schnorrsig_batch_verify_1024: min 27.5us / avg 27.5us / max 27.5us
schnorrsig_batch_verify_2048: min 25.8us / avg 25.9us / max 26.0us
schnorrsig_batch_verify_4096: min 24.5us / avg 24.7us / max 24.8us
schnorrsig_batch_verify_8192: min 23.5us / avg 23.5us / max 23.6us

@sipa
Copy link
Contributor

sipa commented Sep 11, 2020

It's a bit unfortunate that this API doesn't really lend itself to cleanly supporting combined batches of BIP340 signature and taproot tweaks (which also need an EC multiplication).

Given that this internally builds a batch object anyway, would it be reasonable to have that in the external API as well? So an idea could be that you:

  • Construct an (opaque) batch object
  • Add BIP340 verifications to it, using a variant of secp256k1_schnorrsig_verify that either fails immediately (parsing/decompression failures), or succeeds when the check was added to a batch object.
  • Add tweak checks to it, using a variant of secp256k1_xonly_pubkey_tweak_add_check.
  • In the end, a batch_verify function can be called on the batch to do all checks, and return true or false.

@elichai
Copy link
Contributor

elichai commented Sep 12, 2020

It's a bit unfortunate that this API doesn't really lend itself to cleanly supporting combined batches of BIP340 signature and taproot tweaks (which also need an EC multiplication).

Given that this internally builds a batch object anyway, would it be reasonable to have that in the external API as well? So an idea could be that you:

* Construct an (opaque) batch object

* Add BIP340 verifications to it, using a variant of `secp256k1_schnorrsig_verify` that either fails immediately (parsing/decompression failures), or succeeds when the check was added to a batch object.

* Add tweak checks to it, using a variant of `secp256k1_xonly_pubkey_tweak_add_check`.

* In the end, a batch_verify function can be called on the batch to do all checks, and return true or false.

OoO I like that constructions, it allows for lazy batching and verifying only when you're ready, which is also very useful for non-bitcoin applications by verifying things periodically when you have spare CPU time.

@jonasnick
Copy link
Contributor Author

Sounds like a reasonable plan. In particular, because it would be easy to add functions that manipulate the batch object for other schemes who need an EC mult at the end of verification. The current batch object only holds pointers to the elements, so care must be taken to ensure that they still exist at batch_verify time if this becomes a multi-step process.

@gmaxwell
Copy link
Contributor

gmaxwell commented Sep 12, 2020

I think if it can be avoided it would be best to minimize holding pointers to caller provided objects, except in narrow cases (e.g. scratch)... lifetime management is hard for everyone.

An alternative might be to have a function that takes a sigs countcount and pointers to arrays of pubkeys/signatures/messagehashes, then taproot count, and arrays for those. Less generic, but it would avoid needing to copy the inputs into library provided memory or retain pointers to caller provided objects.

@sipa
Copy link
Contributor

sipa commented Sep 14, 2020

@gmaxwell The alternative is probably that the caller is going to do the copying into some batch object on their side instead, so I don't think it's that much of a difference.

I think having the batch object have its own storage is probably better. That may mean that the caller should be able to select a maximum size (and once exceeded, transparently run validation of the already-provided batch?)

@gmaxwell
Copy link
Contributor

Sounds fine to me, though I hope it doesn't need 2x the memory to store both the input and the intermediate work. :)

@elichai
Copy link
Contributor

elichai commented Sep 14, 2020

I think having the batch object have its own storage is probably better. That may mean that the caller should be able to select a maximum size (and once exceeded, transparently run validation of the already-provided batch?)

I agree but I'm somewhat worried about how, this will probably require the caller to know the approximate size of the batch(or the amount of sigs/tweaks) when starting the batch.
I'd love to see if there's some creative C API we can come up with

@sipa
Copy link
Contributor

sipa commented Sep 14, 2020

@elichai No, I mean the opposite!

The caller shouldn't need to predict how large the batch will become - if they knew that, they wouldn't need it, as they could just choose to stop after a certain size instead.

What I mean is that the caller gets to set a maximum memory usage limit, and when that limit would be exceeded, adding another entry to the batch just causes the batch validation to run on what was added so far - and remember the outcome of that.

@gmaxwell
Copy link
Contributor

gmaxwell commented Sep 14, 2020

If what it processed so far failed, all further calls can be super fast because it's just going to return a fail. :P

@sipa
Copy link
Contributor

sipa commented Sep 14, 2020

Taking short circuit evaluation of && to a next level.

@elichai
Copy link
Contributor

elichai commented Sep 14, 2020

(and once exceeded, transparently run validation of the already-provided batch?)

I like that :) it gives the caller a tradeoff between memory and CPU while not crippling them if they predicted wrongly the max size


over1 = secp256k1_scalar_check_overflow(r1);
over2 = secp256k1_scalar_check_overflow(r2);
over_count++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIP-0340 says we should also repeat if r1 or r2 are zero.

@@ -122,4 +122,9 @@ static SECP256K1_INLINE void secp256k1_scalar_cmov(secp256k1_scalar *r, const se
*r = (*r & mask0) | (*a & mask1);
}

SECP256K1_INLINE static void secp256k1_scalar_chacha20(secp256k1_scalar *r1, secp256k1_scalar *r2, const unsigned char *seed, uint64_t n) {
*r1 = (seed[0] + n) % EXHAUSTIVE_TEST_ORDER;
*r2 = (seed[1] + n) % EXHAUSTIVE_TEST_ORDER;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the same goes for here. BIP-0340 says that 0 should be excluded.

@elichai
Copy link
Contributor

elichai commented Mar 29, 2021

How do people feel about the following API:

int secp256k1_start_batch_size(size_t ops);
secp256k1_batch* secp256k1_start_batch(const secp256k1_context* ctx, secp256k1_scratch_space* scratch);
int secp256k1_batch_add_sig(ctx, batch, sig, msg, pubkey);
int secp256k1_batch_add_xpubkey_tweak_add_check(ctx, batch, parity, tweaked_pubkey, pubkey, tweak);
int secp256k1_batch_verify(ctx, batch);

All the add functions for secp256k1_batch will use something like that:

if (batch.len == batch.scratch_capacity) {
    if (batch.failed) {return;}
    batch.failed = !secp256k1_batch_verify(ctx, batch);
    // clear the rest of the state
}
// add to batch
batch.len++
return
}

(all the names are subject to bikeshedding)

@jonasnick
Copy link
Contributor Author

I like this idea of batch verifying in an add function if the scratch space is full. It'll need quite a bit of refactoring in ecmult_multi to separate out scratch space allocation. @elichai that matches my understanding of the approach and looks good to me. What does secp256k1_start_batch_size do?

@roconnor-blockstream
Copy link
Contributor

roconnor-blockstream commented Mar 29, 2021

I'm starting to think the ecmult_multi_var is slightly too narrow of an interface to be used for batch verification.
Currently it does a "Multi-multiply: R = inp_g_sc * G + sum_i ni * Ai."
But what I think we want is one that does "Multi-multiply: R = (sum_i gi) * G + sum_i ni * Ai."
So that we can stream a series of equations to be batch verfied without needing to add up all the G coefficents in advance.

We have (attempted) this nice streamable API for ecmult_multi_var, but what's the point of it if we just have to allocate a new buffer for all the inputs upfront?

@elichai
Copy link
Contributor

elichai commented Mar 29, 2021

What does secp256k1_start_batch_size do?

Tells you the size of the scratch space required for the amount of signatures/tweaks you want to batch

@roconnor-blockstream
Copy link
Contributor

roconnor-blockstream commented Mar 29, 2021

Barring such an enhanced ecmult_multi_var interface I would propose the following API for batch verification:

typedef int (secp256k1_batch_verify_gi_callback)(secp256k1_scalar *gi, size_t idx, void *data);
typedef int (secp256k1_batch_verify_callback)(secp256k1_scalar *na, secp256k1_scalar *nb, secp256k1_ge *pta, secp256k1_ge *ptb, size_t idx, void *data);

/* Verifies na_i*A_i = nb_i*B_i + ng_i * G for all i < n (with high probability). */
static int secp256k1_batch_verify(ctx, scratch, secp256k1_batch_verify_gi_callback cb_gi, secp256k1_batch_verify_callback  cb, void *cbdata, size_t n);

secp256k1_batch_data_gi_from_sig(secp256k1_scalar *gi, sig);
secp256k1_batch_data_from_sig(secp256k1_scalar *na, secp256k1_scalar *nb, secp256k1_ge *pta, secp256k1_ge *ptb, sig, msg, pubkey);

secp256k1_batch_data_gi_from_xpubkey_tweak(secp256k1_scalar *gi);
secp256k1_batch_data_from_xpubkey_tweak(secp256k1_scalar *na, secp256k1_scalar *nb, secp256k1_ge *pta, secp256k1_ge *ptb, parity, tweaked_pubkey, pubkey, tweak);

Edit: There are a couple of possible variations here. We could drop the na scalar values, and instead verify A_i = nb_i*B_i + ng_i * G (though I think adding the na is fine as it comes nearly for free). We could also rearrange the verification equation to verify 0 = na_i*A_i + nb_i*B_i + ng_i * G or 0 = A_i + nb_i*B_i + ng_i * G. I don't have any strong feelings about these variants.

@roconnor-blockstream
Copy link
Contributor

My proposal was based on the idea that batch_verify must use secp256k1_ecmult_multi_var, but this line of thinking was wrong. batch_verify can call secp256k1_ecmult_pippenger_wnaf and friends directly. I withdraw my proposal until I give things more consideration.

@sipa
Copy link
Contributor

sipa commented Mar 29, 2021

@roconnor-blockstream I don't see what the issue is with the multi-multiplication interface. The batch interface can do the aggregation of scalars before calling the multi-multiplication code.

I also don't think we should be exposing a public interface for arbitrary EC operations/verifications. This library aims for a high-level interface of protocols.

Copy link
Contributor Author

@jonasnick jonasnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased in order to benchmark with safegcd. Pre-rebase (EDIT: without endo):

$ ./bench_schnorrsig
schnorrsig_sign: min 28.0us / avg 28.4us / max 29.0us
schnorrsig_verify: min 63.1us / avg 64.1us / max 64.9us
schnorrsig_batch_verify_1: min 70.7us / avg 71.1us / max 71.5us
schnorrsig_batch_verify_2: min 55.5us / avg 56.1us / max 56.5us
schnorrsig_batch_verify_4: min 47.6us / avg 48.5us / max 48.9us
schnorrsig_batch_verify_8: min 44.3us / avg 44.4us / max 44.7us
schnorrsig_batch_verify_16: min 42.6us / avg 43.0us / max 43.4us
schnorrsig_batch_verify_32: min 42.2us / avg 42.4us / max 42.6us
schnorrsig_batch_verify_64: min 42.0us / avg 42.1us / max 42.2us
schnorrsig_batch_verify_128: min 38.4us / avg 39.0us / max 39.4us
schnorrsig_batch_verify_256: min 35.1us / avg 35.5us / max 35.8us
schnorrsig_batch_verify_512: min 32.1us / avg 32.4us / max 33.0us
schnorrsig_batch_verify_1024: min 30.2us / avg 30.4us / max 30.7us
schnorrsig_batch_verify_2048: min 28.4us / avg 28.7us / max 28.8us
schnorrsig_batch_verify_4096: min 27.4us / avg 27.4us / max 27.5us
schnorrsig_batch_verify_8192: min 26.5us / avg 26.8us / max 27.0us

Post-rebase:

$ ./bench_schnorrsig
schnorrsig_sign: min 26.6us / avg 27.2us / max 27.8us
schnorrsig_verify: min 46.4us / avg 47.4us / max 48.6us
schnorrsig_batch_verify_1: min 55.1us / avg 57.0us / max 58.4us
schnorrsig_batch_verify_2: min 47.9us / avg 49.4us / max 52.0us
schnorrsig_batch_verify_4: min 44.8us / avg 45.1us / max 45.6us
schnorrsig_batch_verify_8: min 42.3us / avg 43.4us / max 44.9us
schnorrsig_batch_verify_16: min 42.2us / avg 42.7us / max 43.0us
schnorrsig_batch_verify_32: min 42.2us / avg 42.4us / max 42.5us
schnorrsig_batch_verify_64: min 39.8us / avg 40.3us / max 41.2us
schnorrsig_batch_verify_128: min 36.7us / avg 37.2us / max 38.1us
schnorrsig_batch_verify_256: min 32.9us / avg 33.6us / max 34.7us
schnorrsig_batch_verify_512: min 30.9us / avg 31.4us / max 32.0us
schnorrsig_batch_verify_1024: min 29.2us / avg 29.5us / max 29.8us
schnorrsig_batch_verify_2048: min 27.9us / avg 28.1us / max 28.3us
schnorrsig_batch_verify_4096: min 26.8us / avg 26.9us / max 27.0us
schnorrsig_batch_verify_8192: min 26.6us / avg 26.9us / max 27.2us

@jonasnick
Copy link
Contributor Author

jonasnick commented Mar 30, 2021

As @elichai noted on IRC, this is an unfair comparison because the pre-rebase benchmark was without endomorphism. So here's pre-rebase with endo enabled:

$ ./bench_schnorrsig 
schnorrsig_sign: min 28.3us / avg 28.6us / max 28.9us
schnorrsig_verify: min 45.2us / avg 45.7us / max 46.4us
schnorrsig_batch_verify_1: min 50.3us / avg 50.7us / max 51.2us
schnorrsig_batch_verify_2: min 46.3us / avg 46.7us / max 47.0us
schnorrsig_batch_verify_4: min 42.9us / avg 43.2us / max 43.5us
schnorrsig_batch_verify_8: min 41.5us / avg 41.6us / max 41.9us
schnorrsig_batch_verify_16: min 41.8us / avg 41.9us / max 42.0us
schnorrsig_batch_verify_32: min 41.4us / avg 41.6us / max 41.7us
schnorrsig_batch_verify_64: min 38.9us / avg 39.2us / max 39.6us
schnorrsig_batch_verify_128: min 35.7us / avg 35.7us / max 35.8us
schnorrsig_batch_verify_256: min 32.4us / avg 32.9us / max 33.7us
schnorrsig_batch_verify_512: min 30.5us / avg 30.6us / max 30.7us
schnorrsig_batch_verify_1024: min 28.5us / avg 28.6us / max 28.7us
schnorrsig_batch_verify_2048: min 27.1us / avg 27.3us / max 27.6us
schnorrsig_batch_verify_4096: min 25.9us / avg 26.4us / max 26.6us
schnorrsig_batch_verify_8192: min 26.1us / avg 26.2us / max 26.3us

EDIT: I can not explain this performance regression right now, here's the pre rebase branch I've used.

@sipa
Copy link
Contributor

sipa commented Mar 31, 2021

I can't reproduce those benchmark results.

All numbers on AMD Ryzen Threadripper 2950X 16-Core Processor, GCC 10.2.1.

old pre-safegcd branch with endo enabled and gmp enabled:

schnorrsig_sign: min 29.2us / avg 29.4us / max 30.0us
schnorrsig_verify: min 48.4us / avg 48.7us / max 49.2us
schnorrsig_batch_verify_1: min 55.0us / avg 55.2us / max 55.6us
schnorrsig_batch_verify_2: min 50.3us / avg 50.4us / max 50.4us
schnorrsig_batch_verify_4: min 46.7us / avg 46.8us / max 46.9us
schnorrsig_batch_verify_8: min 44.6us / avg 44.7us / max 44.7us
schnorrsig_batch_verify_16: min 44.2us / avg 44.3us / max 44.5us
schnorrsig_batch_verify_32: min 43.5us / avg 43.6us / max 43.6us
schnorrsig_batch_verify_64: min 41.1us / avg 41.1us / max 41.2us
schnorrsig_batch_verify_128: min 37.8us / avg 37.8us / max 37.9us
schnorrsig_batch_verify_256: min 33.9us / avg 34.2us / max 34.4us
schnorrsig_batch_verify_512: min 32.0us / avg 32.0us / max 32.0us
schnorrsig_batch_verify_1024: min 29.8us / avg 29.9us / max 30.0us
schnorrsig_batch_verify_2048: min 28.3us / avg 28.4us / max 28.4us
schnorrsig_batch_verify_4096: min 27.1us / avg 27.2us / max 27.3us
schnorrsig_batch_verify_8192: min 27.1us / avg 27.2us / max 27.3us

old pre-safegcd branch with endo enabled and gmp disabled:

schnorrsig_sign: min 29.1us / avg 29.2us / max 29.4us
schnorrsig_verify: min 52.2us / avg 52.4us / max 52.8us
schnorrsig_batch_verify_1: min 55.0us / avg 55.2us / max 55.4us
schnorrsig_batch_verify_2: min 50.6us / avg 50.7us / max 50.9us
schnorrsig_batch_verify_4: min 47.0us / avg 47.3us / max 47.5us
schnorrsig_batch_verify_8: min 44.9us / avg 44.9us / max 44.9us
schnorrsig_batch_verify_16: min 44.8us / avg 45.0us / max 45.4us
schnorrsig_batch_verify_32: min 43.5us / avg 43.6us / max 43.6us
schnorrsig_batch_verify_64: min 41.2us / avg 41.3us / max 41.3us
schnorrsig_batch_verify_128: min 37.8us / avg 37.8us / max 37.9us
schnorrsig_batch_verify_256: min 34.2us / avg 34.3us / max 34.4us
schnorrsig_batch_verify_512: min 32.4us / avg 33.1us / max 33.9us
schnorrsig_batch_verify_1024: min 30.3us / avg 30.4us / max 30.6us
schnorrsig_batch_verify_2048: min 28.3us / avg 28.4us / max 28.6us
schnorrsig_batch_verify_4096: min 27.1us / avg 27.3us / max 27.5us
schnorrsig_batch_verify_8192: min 27.1us / avg 27.2us / max 27.3us

new branch (endo and gmp are gone):

schnorrsig_sign: min 26.3us / avg 26.5us / max 26.7us
schnorrsig_verify: min 48.0us / avg 48.3us / max 48.6us
schnorrsig_batch_verify_1: min 55.1us / avg 55.3us / max 55.3us
schnorrsig_batch_verify_2: min 50.6us / avg 50.7us / max 50.9us
schnorrsig_batch_verify_4: min 47.0us / avg 47.2us / max 47.5us
schnorrsig_batch_verify_8: min 44.6us / avg 44.8us / max 45.0us
schnorrsig_batch_verify_16: min 44.3us / avg 44.6us / max 44.7us
schnorrsig_batch_verify_32: min 43.6us / avg 43.7us / max 43.8us
schnorrsig_batch_verify_64: min 41.2us / avg 41.4us / max 41.6us
schnorrsig_batch_verify_128: min 37.7us / avg 38.1us / max 38.8us
schnorrsig_batch_verify_256: min 34.0us / avg 34.2us / max 34.5us
schnorrsig_batch_verify_512: min 31.9us / avg 32.2us / max 32.4us
schnorrsig_batch_verify_1024: min 30.1us / avg 30.2us / max 30.3us
schnorrsig_batch_verify_2048: min 28.3us / avg 28.5us / max 28.7us
schnorrsig_batch_verify_4096: min 27.2us / avg 27.2us / max 27.3us
schnorrsig_batch_verify_8192: min 27.2us / avg 27.3us / max 27.5us

@sipa
Copy link
Contributor

sipa commented Mar 31, 2021

Similar results with GCC 7.5.0 on the same hardware

pre-safegcd, with endo, with gmp:

schnorrsig_sign: min 28.8us / avg 29.0us / max 29.8us
schnorrsig_verify: min 48.3us / avg 48.6us / max 48.9us
schnorrsig_batch_verify_1: min 54.6us / avg 54.8us / max 55.1us
schnorrsig_batch_verify_2: min 50.1us / avg 50.2us / max 50.4us
schnorrsig_batch_verify_4: min 46.7us / avg 46.8us / max 46.9us
schnorrsig_batch_verify_8: min 44.3us / avg 44.3us / max 44.3us
schnorrsig_batch_verify_16: min 44.0us / avg 44.1us / max 44.2us
schnorrsig_batch_verify_32: min 43.4us / avg 43.8us / max 44.7us
schnorrsig_batch_verify_64: min 41.1us / avg 41.2us / max 41.4us
schnorrsig_batch_verify_128: min 37.5us / avg 37.6us / max 37.6us
schnorrsig_batch_verify_256: min 33.8us / avg 33.9us / max 33.9us
schnorrsig_batch_verify_512: min 31.8us / avg 31.9us / max 32.0us
schnorrsig_batch_verify_1024: min 29.7us / avg 29.8us / max 29.9us
schnorrsig_batch_verify_2048: min 28.2us / avg 28.2us / max 28.3us
schnorrsig_batch_verify_4096: min 27.0us / avg 27.2us / max 27.3us
schnorrsig_batch_verify_8192: min 27.0us / avg 27.1us / max 27.2us

pre-safegcd, with endo, without gmp:

schnorrsig_sign: min 29.0us / avg 29.2us / max 29.6us
schnorrsig_verify: min 51.9us / avg 52.5us / max 54.2us
schnorrsig_batch_verify_1: min 54.7us / avg 55.0us / max 55.5us
schnorrsig_batch_verify_2: min 50.2us / avg 50.4us / max 50.7us
schnorrsig_batch_verify_4: min 46.7us / avg 46.9us / max 47.1us
schnorrsig_batch_verify_8: min 44.4us / avg 44.5us / max 44.6us
schnorrsig_batch_verify_16: min 44.0us / avg 44.1us / max 44.2us
schnorrsig_batch_verify_32: min 43.3us / avg 43.4us / max 43.5us
schnorrsig_batch_verify_64: min 40.9us / avg 40.9us / max 41.0us
schnorrsig_batch_verify_128: min 37.5us / avg 37.5us / max 37.7us
schnorrsig_batch_verify_256: min 33.9us / avg 34.4us / max 34.8us
schnorrsig_batch_verify_512: min 31.8us / avg 31.8us / max 32.0us
schnorrsig_batch_verify_1024: min 29.6us / avg 29.7us / max 29.7us
schnorrsig_batch_verify_2048: min 28.1us / avg 28.2us / max 28.4us
schnorrsig_batch_verify_4096: min 26.9us / avg 27.0us / max 27.1us
schnorrsig_batch_verify_8192: min 27.0us / avg 27.2us / max 27.3us

post-safegcd:

schnorrsig_sign: min 25.9us / avg 26.0us / max 26.3us
schnorrsig_verify: min 47.9us / avg 48.1us / max 48.4us
schnorrsig_batch_verify_1: min 54.7us / avg 54.9us / max 55.0us
schnorrsig_batch_verify_2: min 50.2us / avg 50.4us / max 50.6us
schnorrsig_batch_verify_4: min 46.8us / avg 48.2us / max 50.6us
schnorrsig_batch_verify_8: min 44.5us / avg 45.1us / max 45.5us
schnorrsig_batch_verify_16: min 44.0us / avg 44.2us / max 44.5us
schnorrsig_batch_verify_32: min 43.6us / avg 43.6us / max 43.7us
schnorrsig_batch_verify_64: min 40.9us / avg 41.0us / max 41.2us
schnorrsig_batch_verify_128: min 37.6us / avg 37.9us / max 38.4us
schnorrsig_batch_verify_256: min 33.7us / avg 33.8us / max 34.1us
schnorrsig_batch_verify_512: min 31.8us / avg 32.1us / max 32.4us
schnorrsig_batch_verify_1024: min 29.6us / avg 29.6us / max 29.7us
schnorrsig_batch_verify_2048: min 28.1us / avg 28.2us / max 28.2us
schnorrsig_batch_verify_4096: min 27.0us / avg 27.3us / max 27.7us
schnorrsig_batch_verify_8192: min 27.1us / avg 27.2us / max 27.5us

@sipa
Copy link
Contributor

sipa commented Mar 31, 2021

Did benchmarks on a i7-7820HQ CPU with clock fixed at 2.6 Ghz.

I do indeed observe a small regression on some GCC versions (7,8,10), but on clang it appears to go the other way around. I don't think there is much reason for concern here - we know there are variations in performance between compiler versions, and it's to be expected that different code will affect different compilers differently:

pre-safegcd ENDO=on GMP=off CC=gcc-7
schnorrsig_sign: min 38.5us / avg 38.5us / max 38.6us
schnorrsig_verify: min 66.4us / avg 66.6us / max 67.1us
schnorrsig_batch_verify_1: min 70.2us / avg 70.3us / max 70.5us
schnorrsig_batch_verify_2: min 64.0us / avg 64.0us / max 64.1us
schnorrsig_batch_verify_4: min 59.6us / avg 59.6us / max 59.7us
schnorrsig_batch_verify_8: min 57.2us / avg 57.2us / max 57.3us
schnorrsig_batch_verify_16: min 57.5us / avg 57.6us / max 57.6us
schnorrsig_batch_verify_32: min 57.1us / avg 57.3us / max 57.5us
schnorrsig_batch_verify_64: min 53.5us / avg 53.6us / max 53.6us
schnorrsig_batch_verify_128: min 49.1us / avg 49.1us / max 49.2us
schnorrsig_batch_verify_256: min 44.4us / avg 44.5us / max 44.5us
schnorrsig_batch_verify_512: min 42.1us / avg 42.1us / max 42.2us
schnorrsig_batch_verify_1024: min 39.3us / avg 39.4us / max 39.4us
schnorrsig_batch_verify_2048: min 37.4us / avg 37.5us / max 37.6us
schnorrsig_batch_verify_4096: min 36.0us / avg 36.0us / max 36.2us
schnorrsig_batch_verify_8192: min 36.0us / avg 36.0us / max 36.1us

post-safegcd CC=gcc-7
schnorrsig_sign: min 35.3us / avg 35.3us / max 35.5us
schnorrsig_verify: min 61.9us / avg 62.0us / max 62.3us
schnorrsig_batch_verify_1: min 70.6us / avg 70.7us / max 70.8us
schnorrsig_batch_verify_2: min 64.3us / avg 64.3us / max 64.4us
schnorrsig_batch_verify_4: min 59.7us / avg 59.8us / max 59.9us
schnorrsig_batch_verify_8: min 57.2us / avg 57.3us / max 57.4us
schnorrsig_batch_verify_16: min 57.7us / avg 57.7us / max 57.8us
schnorrsig_batch_verify_32: min 57.3us / avg 57.4us / max 57.5us
schnorrsig_batch_verify_64: min 53.6us / avg 53.7us / max 53.7us
schnorrsig_batch_verify_128: min 49.2us / avg 49.2us / max 49.3us
schnorrsig_batch_verify_256: min 44.5us / avg 44.5us / max 44.6us
schnorrsig_batch_verify_512: min 42.1us / avg 42.2us / max 42.3us
schnorrsig_batch_verify_1024: min 39.4us / avg 39.4us / max 39.5us
schnorrsig_batch_verify_2048: min 37.5us / avg 37.6us / max 37.7us
schnorrsig_batch_verify_4096: min 36.0us / avg 36.1us / max 36.2us
schnorrsig_batch_verify_8192: min 36.1us / avg 36.1us / max 36.2us


pre-safegcd ENDO=on GMP=off CC=gcc-8
schnorrsig_sign: min 38.3us / avg 38.4us / max 38.5us
schnorrsig_verify: min 66.7us / avg 66.8us / max 67.0us
schnorrsig_batch_verify_1: min 70.5us / avg 70.6us / max 70.7us
schnorrsig_batch_verify_2: min 64.3us / avg 64.4us / max 64.5us
schnorrsig_batch_verify_4: min 59.8us / avg 59.9us / max 60.0us
schnorrsig_batch_verify_8: min 57.4us / avg 57.5us / max 57.6us
schnorrsig_batch_verify_16: min 57.9us / avg 58.0us / max 58.1us
schnorrsig_batch_verify_32: min 57.4us / avg 57.5us / max 57.6us
schnorrsig_batch_verify_64: min 53.8us / avg 53.9us / max 53.9us
schnorrsig_batch_verify_128: min 49.3us / avg 49.4us / max 49.5us
schnorrsig_batch_verify_256: min 44.6us / avg 44.7us / max 44.7us
schnorrsig_batch_verify_512: min 42.3us / avg 42.3us / max 42.4us
schnorrsig_batch_verify_1024: min 39.5us / avg 39.5us / max 39.6us
schnorrsig_batch_verify_2048: min 37.6us / avg 37.7us / max 37.8us
schnorrsig_batch_verify_4096: min 36.1us / avg 36.2us / max 36.4us
schnorrsig_batch_verify_8192: min 36.2us / avg 36.3us / max 36.3us

post-safegcd CC=gcc-8
schnorrsig_sign: min 35.0us / avg 35.1us / max 35.2us
schnorrsig_verify: min 62.0us / avg 62.1us / max 62.6us
schnorrsig_batch_verify_1: min 70.7us / avg 70.7us / max 70.7us
schnorrsig_batch_verify_2: min 64.3us / avg 64.4us / max 64.4us
schnorrsig_batch_verify_4: min 59.8us / avg 59.9us / max 60.1us
schnorrsig_batch_verify_8: min 57.4us / avg 57.5us / max 57.5us
schnorrsig_batch_verify_16: min 57.9us / avg 57.9us / max 57.9us
schnorrsig_batch_verify_32: min 57.5us / avg 57.6us / max 57.6us
schnorrsig_batch_verify_64: min 53.8us / avg 53.8us / max 53.9us
schnorrsig_batch_verify_128: min 49.3us / avg 49.3us / max 49.4us
schnorrsig_batch_verify_256: min 44.6us / avg 44.7us / max 44.7us
schnorrsig_batch_verify_512: min 42.3us / avg 42.3us / max 42.3us
schnorrsig_batch_verify_1024: min 39.5us / avg 39.5us / max 39.6us
schnorrsig_batch_verify_2048: min 37.6us / avg 37.6us / max 37.7us
schnorrsig_batch_verify_4096: min 36.1us / avg 36.2us / max 36.3us
schnorrsig_batch_verify_8192: min 36.2us / avg 36.2us / max 36.2us


pre-safegcd ENDO=on GMP=off CC=gcc-9
schnorrsig_sign: min 38.4us / avg 38.4us / max 38.6us
schnorrsig_verify: min 66.8us / avg 66.9us / max 67.1us
schnorrsig_batch_verify_1: min 70.6us / avg 70.6us / max 70.7us
schnorrsig_batch_verify_2: min 64.5us / avg 64.5us / max 64.6us
schnorrsig_batch_verify_4: min 59.8us / avg 59.8us / max 59.9us
schnorrsig_batch_verify_8: min 57.4us / avg 57.5us / max 57.5us
schnorrsig_batch_verify_16: min 57.9us / avg 57.9us / max 58.0us
schnorrsig_batch_verify_32: min 57.5us / avg 57.5us / max 57.6us
schnorrsig_batch_verify_64: min 53.7us / avg 53.7us / max 53.8us
schnorrsig_batch_verify_128: min 49.3us / avg 49.3us / max 49.3us
schnorrsig_batch_verify_256: min 44.6us / avg 44.6us / max 44.6us
schnorrsig_batch_verify_512: min 42.2us / avg 42.2us / max 42.2us
schnorrsig_batch_verify_1024: min 39.4us / avg 39.4us / max 39.5us
schnorrsig_batch_verify_2048: min 37.5us / avg 37.6us / max 37.6us
schnorrsig_batch_verify_4096: min 36.1us / avg 36.2us / max 36.3us
schnorrsig_batch_verify_8192: min 36.2us / avg 36.2us / max 36.2us

post-safegcd CC=gcc-9
schnorrsig_sign: min 35.0us / avg 35.0us / max 35.2us
schnorrsig_verify: min 62.1us / avg 62.2us / max 62.9us
schnorrsig_batch_verify_1: min 70.6us / avg 70.7us / max 70.9us
schnorrsig_batch_verify_2: min 64.2us / avg 64.2us / max 64.3us
schnorrsig_batch_verify_4: min 59.6us / avg 59.6us / max 59.7us
schnorrsig_batch_verify_8: min 57.3us / avg 57.5us / max 57.7us
schnorrsig_batch_verify_16: min 57.9us / avg 57.9us / max 58.0us
schnorrsig_batch_verify_32: min 57.4us / avg 57.5us / max 57.5us
schnorrsig_batch_verify_64: min 53.9us / avg 53.9us / max 53.9us
schnorrsig_batch_verify_128: min 49.3us / avg 49.3us / max 49.4us
schnorrsig_batch_verify_256: min 44.6us / avg 44.6us / max 44.6us
schnorrsig_batch_verify_512: min 42.2us / avg 42.2us / max 42.2us
schnorrsig_batch_verify_1024: min 39.4us / avg 39.4us / max 39.5us
schnorrsig_batch_verify_2048: min 37.5us / avg 37.6us / max 37.7us
schnorrsig_batch_verify_4096: min 36.1us / avg 36.1us / max 36.2us
schnorrsig_batch_verify_8192: min 36.1us / avg 36.1us / max 36.2us


pre-safegcd ENDO=on GMP=off CC=gcc-10
schnorrsig_sign: min 39.3us / avg 39.4us / max 39.5us
schnorrsig_verify: min 66.6us / avg 66.6us / max 66.8us
schnorrsig_batch_verify_1: min 70.3us / avg 70.3us / max 70.4us
schnorrsig_batch_verify_2: min 64.2us / avg 64.2us / max 64.2us
schnorrsig_batch_verify_4: min 59.8us / avg 59.8us / max 59.8us
schnorrsig_batch_verify_8: min 57.3us / avg 57.3us / max 57.3us
schnorrsig_batch_verify_16: min 57.7us / avg 57.7us / max 57.8us
schnorrsig_batch_verify_32: min 57.3us / avg 57.3us / max 57.4us
schnorrsig_batch_verify_64: min 53.7us / avg 53.8us / max 53.8us
schnorrsig_batch_verify_128: min 49.4us / avg 49.4us / max 49.4us
schnorrsig_batch_verify_256: min 44.6us / avg 44.6us / max 44.7us
schnorrsig_batch_verify_512: min 42.2us / avg 42.3us / max 42.4us
schnorrsig_batch_verify_1024: min 39.4us / avg 39.5us / max 39.6us
schnorrsig_batch_verify_2048: min 37.6us / avg 37.6us / max 37.7us
schnorrsig_batch_verify_4096: min 36.1us / avg 36.2us / max 36.3us
schnorrsig_batch_verify_8192: min 36.2us / avg 36.2us / max 36.3us

post-safegcd CC=gcc-10
schnorrsig_sign: min 35.9us / avg 35.9us / max 36.2us
schnorrsig_verify: min 61.9us / avg 61.9us / max 62.1us
schnorrsig_batch_verify_1: min 70.5us / avg 70.5us / max 70.5us
schnorrsig_batch_verify_2: min 64.4us / avg 64.4us / max 64.4us
schnorrsig_batch_verify_4: min 60.1us / avg 60.1us / max 60.2us
schnorrsig_batch_verify_8: min 57.7us / avg 57.7us / max 57.7us
schnorrsig_batch_verify_16: min 57.8us / avg 57.9us / max 57.9us
schnorrsig_batch_verify_32: min 57.4us / avg 57.4us / max 57.5us
schnorrsig_batch_verify_64: min 53.7us / avg 53.7us / max 53.8us
schnorrsig_batch_verify_128: min 49.3us / avg 49.3us / max 49.4us
schnorrsig_batch_verify_256: min 44.6us / avg 44.6us / max 44.6us
schnorrsig_batch_verify_512: min 42.2us / avg 42.2us / max 42.3us
schnorrsig_batch_verify_1024: min 39.4us / avg 39.5us / max 39.5us
schnorrsig_batch_verify_2048: min 37.6us / avg 37.6us / max 37.7us
schnorrsig_batch_verify_4096: min 36.1us / avg 36.3us / max 36.5us
schnorrsig_batch_verify_8192: min 36.2us / avg 36.2us / max 36.2us


pre-safegcd ENDO=on GMP=off CC=clang-8
schnorrsig_sign: min 35.8us / avg 35.9us / max 36.1us
schnorrsig_verify: min 66.4us / avg 66.4us / max 66.6us
schnorrsig_batch_verify_1: min 70.6us / avg 70.7us / max 70.7us
schnorrsig_batch_verify_2: min 63.6us / avg 63.7us / max 63.8us
schnorrsig_batch_verify_4: min 58.8us / avg 58.8us / max 58.8us
schnorrsig_batch_verify_8: min 56.3us / avg 56.4us / max 56.4us
schnorrsig_batch_verify_16: min 56.6us / avg 56.7us / max 56.9us
schnorrsig_batch_verify_32: min 56.5us / avg 56.6us / max 56.6us
schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.6us
schnorrsig_batch_verify_128: min 48.1us / avg 48.2us / max 48.2us
schnorrsig_batch_verify_256: min 43.6us / avg 43.6us / max 43.7us
schnorrsig_batch_verify_512: min 41.3us / avg 41.4us / max 41.4us
schnorrsig_batch_verify_1024: min 38.6us / avg 38.6us / max 38.7us
schnorrsig_batch_verify_2048: min 36.8us / avg 36.9us / max 36.9us
schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.5us

post-safegcd CC=clang-8
schnorrsig_sign: min 32.5us / avg 32.5us / max 32.6us
schnorrsig_verify: min 61.6us / avg 61.7us / max 62.3us
schnorrsig_batch_verify_1: min 69.8us / avg 69.8us / max 69.9us
schnorrsig_batch_verify_2: min 63.0us / avg 63.1us / max 63.1us
schnorrsig_batch_verify_4: min 58.3us / avg 58.3us / max 58.4us
schnorrsig_batch_verify_8: min 55.9us / avg 55.9us / max 56.0us
schnorrsig_batch_verify_16: min 56.4us / avg 56.4us / max 56.5us
schnorrsig_batch_verify_32: min 56.4us / avg 56.4us / max 56.5us
schnorrsig_batch_verify_64: min 52.6us / avg 52.7us / max 52.7us
schnorrsig_batch_verify_128: min 48.3us / avg 48.3us / max 48.3us
schnorrsig_batch_verify_256: min 43.7us / avg 43.8us / max 43.8us
schnorrsig_batch_verify_512: min 41.4us / avg 41.5us / max 41.5us
schnorrsig_batch_verify_1024: min 38.7us / avg 38.7us / max 38.8us
schnorrsig_batch_verify_2048: min 36.9us / avg 37.0us / max 37.1us
schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.5us


pre-safegcd ENDO=on GMP=off CC=clang-9
schnorrsig_sign: min 37.8us / avg 37.9us / max 38.5us
schnorrsig_verify: min 66.5us / avg 66.6us / max 67.4us
schnorrsig_batch_verify_1: min 70.0us / avg 70.1us / max 70.1us
schnorrsig_batch_verify_2: min 63.1us / avg 63.2us / max 63.3us
schnorrsig_batch_verify_4: min 58.4us / avg 58.5us / max 58.6us
schnorrsig_batch_verify_8: min 55.6us / avg 55.7us / max 55.9us
schnorrsig_batch_verify_16: min 56.1us / avg 56.2us / max 56.3us
schnorrsig_batch_verify_32: min 56.2us / avg 56.2us / max 56.3us
schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.6us
schnorrsig_batch_verify_128: min 48.2us / avg 48.2us / max 48.2us
schnorrsig_batch_verify_256: min 43.7us / avg 43.7us / max 43.7us
schnorrsig_batch_verify_512: min 41.4us / avg 41.4us / max 41.4us
schnorrsig_batch_verify_1024: min 38.6us / avg 38.7us / max 38.7us
schnorrsig_batch_verify_2048: min 36.8us / avg 36.9us / max 37.0us
schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
schnorrsig_batch_verify_8192: min 35.5us / avg 35.5us / max 35.5us

post-safegcd CC=clang-9
schnorrsig_sign: min 34.5us / avg 34.5us / max 34.6us
schnorrsig_verify: min 61.5us / avg 61.6us / max 61.8us
schnorrsig_batch_verify_1: min 69.8us / avg 69.8us / max 69.9us
schnorrsig_batch_verify_2: min 63.3us / avg 63.3us / max 63.4us
schnorrsig_batch_verify_4: min 58.6us / avg 58.6us / max 58.7us
schnorrsig_batch_verify_8: min 55.8us / avg 55.9us / max 55.9us
schnorrsig_batch_verify_16: min 56.3us / avg 56.3us / max 56.4us
schnorrsig_batch_verify_32: min 56.4us / avg 56.4us / max 56.5us
schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.5us
schnorrsig_batch_verify_128: min 48.2us / avg 48.2us / max 48.2us
schnorrsig_batch_verify_256: min 43.7us / avg 43.7us / max 43.7us
schnorrsig_batch_verify_512: min 41.4us / avg 41.4us / max 41.5us
schnorrsig_batch_verify_1024: min 38.7us / avg 38.7us / max 38.8us
schnorrsig_batch_verify_2048: min 36.8us / avg 36.9us / max 37.0us
schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.6us


pre-safegcd ENDO=on GMP=off CC=clang-10
schnorrsig_sign: min 37.6us / avg 37.6us / max 37.7us
schnorrsig_verify: min 66.5us / avg 66.6us / max 66.7us
schnorrsig_batch_verify_1: min 70.8us / avg 70.9us / max 70.9us
schnorrsig_batch_verify_2: min 63.7us / avg 63.7us / max 63.8us
schnorrsig_batch_verify_4: min 58.7us / avg 58.8us / max 58.8us
schnorrsig_batch_verify_8: min 56.1us / avg 56.2us / max 56.3us
schnorrsig_batch_verify_16: min 56.6us / avg 56.6us / max 56.7us
schnorrsig_batch_verify_32: min 56.5us / avg 56.5us / max 56.6us
schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.6us
schnorrsig_batch_verify_128: min 48.2us / avg 48.3us / max 48.3us
schnorrsig_batch_verify_256: min 43.7us / avg 43.7us / max 43.7us
schnorrsig_batch_verify_512: min 41.4us / avg 41.4us / max 41.4us
schnorrsig_batch_verify_1024: min 38.7us / avg 38.7us / max 38.8us
schnorrsig_batch_verify_2048: min 36.9us / avg 36.9us / max 37.0us
schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
schnorrsig_batch_verify_8192: min 35.5us / avg 35.5us / max 35.5us

post-safegcd CC=clang-10
schnorrsig_sign: min 34.3us / avg 34.4us / max 34.4us
schnorrsig_verify: min 61.7us / avg 61.7us / max 61.9us
schnorrsig_batch_verify_1: min 69.8us / avg 69.8us / max 69.9us
schnorrsig_batch_verify_2: min 63.0us / avg 63.1us / max 63.1us
schnorrsig_batch_verify_4: min 58.3us / avg 58.3us / max 58.4us
schnorrsig_batch_verify_8: min 55.5us / avg 55.6us / max 55.7us
schnorrsig_batch_verify_16: min 55.9us / avg 56.0us / max 56.1us
schnorrsig_batch_verify_32: min 56.1us / avg 56.2us / max 56.3us
schnorrsig_batch_verify_64: min 52.6us / avg 52.6us / max 52.7us
schnorrsig_batch_verify_128: min 48.2us / avg 48.3us / max 48.4us
schnorrsig_batch_verify_256: min 43.7us / avg 43.8us / max 43.8us
schnorrsig_batch_verify_512: min 41.4us / avg 41.5us / max 41.5us
schnorrsig_batch_verify_1024: min 38.7us / avg 38.7us / max 38.8us
schnorrsig_batch_verify_2048: min 36.9us / avg 36.9us / max 37.1us
schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.7us
schnorrsig_batch_verify_8192: min 35.5us / avg 35.5us / max 35.6us


pre-safegcd ENDO=on GMP=off CC=clang-11
schnorrsig_sign: min 37.5us / avg 37.5us / max 37.6us
schnorrsig_verify: min 66.3us / avg 66.3us / max 66.5us
schnorrsig_batch_verify_1: min 70.4us / avg 70.4us / max 70.5us
schnorrsig_batch_verify_2: min 63.3us / avg 63.3us / max 63.4us
schnorrsig_batch_verify_4: min 58.5us / avg 58.5us / max 58.6us
schnorrsig_batch_verify_8: min 55.6us / avg 55.6us / max 55.7us
schnorrsig_batch_verify_16: min 56.1us / avg 56.2us / max 56.3us
schnorrsig_batch_verify_32: min 56.3us / avg 56.4us / max 56.4us
schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.6us
schnorrsig_batch_verify_128: min 48.2us / avg 48.2us / max 48.2us
schnorrsig_batch_verify_256: min 43.6us / avg 43.7us / max 43.7us
schnorrsig_batch_verify_512: min 41.4us / avg 41.4us / max 41.4us
schnorrsig_batch_verify_1024: min 38.6us / avg 38.7us / max 38.7us
schnorrsig_batch_verify_2048: min 36.8us / avg 36.9us / max 36.9us
schnorrsig_batch_verify_4096: min 35.4us / avg 35.4us / max 35.5us
schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.5us

post-safegcd CC=clang-11
schnorrsig_sign: min 34.3us / avg 34.4us / max 34.4us
schnorrsig_verify: min 61.3us / avg 61.5us / max 61.7us
schnorrsig_batch_verify_1: min 69.5us / avg 69.5us / max 69.6us
schnorrsig_batch_verify_2: min 62.7us / avg 62.8us / max 62.8us
schnorrsig_batch_verify_4: min 58.1us / avg 58.2us / max 58.3us
schnorrsig_batch_verify_8: min 55.9us / avg 56.0us / max 56.1us
schnorrsig_batch_verify_16: min 56.5us / avg 56.6us / max 56.6us
schnorrsig_batch_verify_32: min 56.2us / avg 56.3us / max 56.3us
schnorrsig_batch_verify_64: min 52.6us / avg 53.4us / max 55.1us
schnorrsig_batch_verify_128: min 48.2us / avg 48.8us / max 49.6us
schnorrsig_batch_verify_256: min 43.7us / avg 43.8us / max 43.8us
schnorrsig_batch_verify_512: min 41.4us / avg 41.6us / max 41.7us
schnorrsig_batch_verify_1024: min 38.7us / avg 38.8us / max 38.8us
schnorrsig_batch_verify_2048: min 36.9us / avg 37.0us / max 37.0us
schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.5us

@jonasnick
Copy link
Contributor Author

I noticed a higher variance in my benchmark than I was used to and re-run the experiment in a more controlled environment (gcc 10.2.0). I did not find a performance degradation post-rebase anymore. Single schnorrsig_verify was fastest post-rebase compared to pre-rebase (with endo, bignum=gmp and bignum=no) and batch verify was very similar across the three configurations.

@jonasnick
Copy link
Contributor Author

I added a commit to reduce the batch verification randomizers to 128 bits. This gives up to a 9% speedup.

@jonasnick
Copy link
Contributor Author

I'm intending to remove the batch verification speedup graph from BIP-340 and instead place it in libsecp's doc directory. Therefore, I added a commit that allows recreating said graph (originally proposed for BIP-340).

I removed the log fit from the graph and instead increased the granularity. The shape of the graph may change again once the optimal pippenger threshold/windows are updated to reflect the latest improvements.

@jonasnick
Copy link
Contributor Author

Added two commits:

  1. A fix for a bug in the batch_verify benchmarks. Previously the same signatures would be checked in every iteration, which meant that the same randomizers would be used such that the optimization in 2. showed worse results for some number of sigs. Moreover, the graph in docs/speedup-batch/ looks much smoother now.
  2. An optimization to the range of optimzers by @roconnor-blockstream. Instead of choosing them from [0, 2^128-1], they are chosen from [-2^127, 2^127-1] which affects the scalars post endomorphism split and leads to an improvement of 3 to 9 percent. With the former randomizer range, one of the scalars would be 0 about 50% of the time. Now the scalar is always 0, which speeds up both Strauss' and Pippenger's algo.

@jonasnick
Copy link
Contributor Author

Rebased the PR to (hopefully) fix CI issues.

This is in preparation for schnorrsig_batch_verify.
Without this commit, 8192 points require 2 batches.
This is just a commit for benchmarks and should be improved if 128 bit
randomizers are to be actually used.
1) it does not follow bip-schnorr batch verification
2) the randomizers are not uniformly distributed in [0, 2^128-1] for no reason
3) chacha output is thrown away
H/T roconnor-blockstream for this idea
@gmaxwell
Copy link
Contributor

gmaxwell commented Jun 1, 2021

Would it be a win to skip applying the endomorphism where the corresponding scalar is equal to zero? (or perhaps for strauss, more generally only applying endomorphism to digits that get used, which has not applying it at all for zero as a subset),

@jonasnick
Copy link
Contributor Author

If there's concern that our current ecmulti_multi implementation isn't robust enough yet to be used for consensus applications, we could do the following:

  1. Disable pippenger for now. This decreases complexity and Strauss' algo is already being used.
  2. Support only a fixed number of scratch sizes (e.g. small and large) which allows to test more exhaustively.

@Sajjon
Copy link

Sajjon commented Feb 9, 2022

Hey! What is the status of this PR? What are the blockers to get it merged? :)

BIP340 mentions BatchVerify and schnorrsig/tests_impl.h on master contains some verify_batch (TODO) comments suggesting that BatchVerify will get merged, but not when.

I'm asking because I would like to know when I can offer this as part of my API in a Swift wrapper around libsecp256k1.

I guess there is no low-hanging fruit I can help with to get this merged? One does not simply do crypto...

Thanks!

@jonasnick
Copy link
Contributor Author

Hey @Sajjon, this PR needs a significant overhaul before getting merged as discussed above. I proposed this as a project to https://www.summerofbitcoin.org/. If there are people wanting to do this project, my plan is that we will help them to get a PR ready this summer.

@bicrxm
Copy link

bicrxm commented Feb 17, 2022

Hey @Sajjon, this PR needs a significant overhaul before getting merged as discussed above. I proposed this as a project to https://www.summerofbitcoin.org/. If there are people wanting to do this project, my plan is that we will help them to get a PR ready this summer.

I am interested in this. Should I start from PR #558 to understand it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants