Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch affine msm + generic in msm #261

Merged
merged 43 commits into from
Nov 21, 2022
Merged

Batch affine msm + generic in msm #261

merged 43 commits into from
Nov 21, 2022

Conversation

gbotrel
Copy link
Collaborator

@gbotrel gbotrel commented Nov 9, 2022

Derived from #249 (from @0x0ece ) . Strategy is similar, but implementation details are a bit different.

This PR:

  • introduce generics in the MSM; this allow to re-use code across various MSM (extended jacobian, batch affine and upcoming twisted edwards variation). In short, a type is generated by bucket size (for example type bucketG1AffineC4 [1 << (4 - 1)]G1Affine) and the innerMSM functions are parametrized with that type. It allows for the buckets to be allocated on the stack, which is critical for perf.
  • window size (c parameter) is capped at 16bits. With extended jacobian coordinates, larger windows on large msm were beneficial, they are not for this new (msm-affine) method.
  • for small c, extended jacobian is called. for larger c, batch affine is called.
  • partitionScalars returns a list of digits as a []uint16 slice (instead of being packed into field elements).
  • benchmark: fillBenchScalars was not uniformly distributed. fixed.
  • adjusted the number of chunks / last window size to take into account the exact number of bits used by scalars (fr.Bits) instead of fr.Limbs * 64.
  • parameters & strategy for the batch affine version can (and hopefully will be) tuned with better heuristics.
  • generalizes the strategy to split the processing of a chunk into 2 go routines when this chunk is overweight; previously, we checked only the first chunk (for cases where input had a lot of binary values)
  • more tests
  • code is more readable than previous version; code is not duplicated (through templates + code generation) for each c, instead, only types are generated and generics used to select the right methods.

TODO:

  • remove partitionScalarsOld . will create a separate issue.
  • test batchAffineAdd methods

Some remarks on the msm-affine

Roughly speaking, the idea is to, as in the previous bucket method, process "chunks" of the scalars (think: columns) of a c-bit window size.
We can do efficient batchAddition (as in compute n point to point additions, NOT sum n points) in affine coordinates, but the n point to point additions must be independent; in our case, since we are adding points from the input vector to a smaller set of buckets, all we need to ensure is that during a same call to batchAddition, we don't add twice to the same bucket.

The larger the batch size is, the less (costly) inverses we do, but we potentially put more memory/cache pressure (if it's too large) and most importantly, increase the chances of finding conflicting additions (2 points to the same bucket).

Idea is that if we consider uniformly random scalars, and take say, a batchSize of 100, with 32000 buckets, the probably to hit the same bucket twice in a "batch-window" of 100 is very low. If that happens, we append the conflicting point to a queue, and try to reprocess it later.

new: with our current parameters, the queue should stay mostly empty, and if it becomes full, we are hitting a input vector that's unfriendly for the msm-affine. This can happen in SNARK context for example if a lot of the inputs have same values, or, if we keep finding m-consecutive identical values, with m being roughly the same order as the batchSize. This would force us to process batch additions of very small sizes (not full) and make the algorithm perform terribly. To deal with that and other edge cases, when the queue is full, we use another set of buckets, in extended jacobian coordinates to flush the queue. In practice (for uniformly distributed points), the slow down is ~5%, but worth it to avoid too many code path / edge cases.

Benchmarks

On AWS hpc6a.48xlarge. develop branch against feat/msm-affine (both generate uniformly distributed scalars).

without split logic (we only use as many cores as nbChunks)

TLDR; from 30 to 60% speed up 😲 . Need to benchmark on a low-cost device.

bls12-377
2022/11/18 01:50:30 comparing ../../ecc/bls12-377 MultiExp
name                           old time/op   new time/op   delta
MultiExpG1/32_points-96          381µs ±21%    351µs ± 9%     ~     (p=0.222 n=5+5)
MultiExpG1/64_points-96          442µs ±12%    486µs ±13%     ~     (p=0.222 n=5+5)
MultiExpG1/128_points-96         587µs ±35%    704µs ±41%     ~     (p=0.421 n=5+5)
MultiExpG1/256_points-96         717µs ±13%    713µs ±12%     ~     (p=1.000 n=5+5)
MultiExpG1/512_points-96         921µs ±23%    976µs ±12%     ~     (p=0.690 n=5+5)
MultiExpG1/1024_points-96       1.40ms ±30%   1.70ms ±46%     ~     (p=0.310 n=5+5)
MultiExpG1/2048_points-96       2.38ms ±86%   1.83ms ±20%     ~     (p=0.310 n=5+5)
MultiExpG1/4096_points-96       3.10ms ±23%   2.69ms ± 4%     ~     (p=0.151 n=5+5)
MultiExpG1/8192_points-96       6.10ms ±48%   5.29ms ±15%     ~     (p=1.000 n=5+5)
MultiExpG1/16384_points-96      10.0ms ±50%   11.0ms ±48%     ~     (p=1.000 n=5+5)
MultiExpG1/32768_points-96      22.2ms ±24%   25.6ms ±13%     ~     (p=0.310 n=5+5)
MultiExpG1/65536_points-96      42.3ms ±20%   30.2ms ± 9%  -28.46%  (p=0.016 n=5+4)
MultiExpG1/131072_points-96     69.0ms ±10%   68.0ms ±16%     ~     (p=0.841 n=5+5)
MultiExpG1/262144_points-96      161ms ±31%    143ms ±20%     ~     (p=0.421 n=5+5)
MultiExpG1/524288_points-96      268ms ± 2%    215ms ± 5%  -19.86%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96     513ms ± 5%    367ms ± 1%  -28.56%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     983ms ± 9%    710ms ± 2%  -27.81%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     2.97s ± 3%    1.39s ± 3%  -53.22%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     5.61s ± 3%    2.84s ±20%  -49.44%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    10.8s ± 2%     5.4s ± 3%  -50.04%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    21.2s ± 4%    11.0s ± 5%  -48.02%  (p=0.008 n=5+5)
MultiExpG1Reference-96           480ms ± 3%    359ms ± 2%  -25.29%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       566ms ± 4%    387ms ± 4%  -31.60%  (p=0.008 n=5+5)
MultiExpG2/32_points-96          709µs ± 5%    888µs ±24%     ~     (p=0.556 n=4+5)
MultiExpG2/64_points-96          926µs ±30%  1816µs ±150%     ~     (p=0.222 n=5+5)
MultiExpG2/128_points-96        1.31ms ±51%   1.19ms ± 5%     ~     (p=0.905 n=5+4)
MultiExpG2/256_points-96       3.22ms ±187%   2.58ms ±49%     ~     (p=0.690 n=5+5)
MultiExpG2/512_points-96        2.43ms ±68%   1.91ms ±13%     ~     (p=0.222 n=5+5)
MultiExpG2/1024_points-96       2.76ms ±16%   2.88ms ±51%     ~     (p=1.000 n=5+5)
MultiExpG2/2048_points-96       5.40ms ±26%   4.53ms ±17%     ~     (p=0.310 n=5+5)
MultiExpG2/4096_points-96       8.42ms ±51%   9.63ms ±45%     ~     (p=0.310 n=5+5)
MultiExpG2/8192_points-96       16.1ms ±45%   10.9ms ±19%  -32.48%  (p=0.008 n=5+5)
MultiExpG2/16384_points-96      31.1ms ±44%   29.0ms ±39%     ~     (p=1.000 n=5+5)
MultiExpG2/32768_points-96      59.1ms ±21%   41.6ms ±15%  -29.68%  (p=0.008 n=5+5)
MultiExpG2/65536_points-96       139ms ±11%     86ms ±16%  -37.80%  (p=0.008 n=5+5)
MultiExpG2/131072_points-96      228ms ±32%    143ms ± 5%  -37.31%  (p=0.008 n=5+5)
MultiExpG2/262144_points-96      401ms ±13%    275ms ± 4%  -31.30%  (p=0.008 n=5+5)
MultiExpG2/524288_points-96      788ms ± 3%    557ms ± 7%  -29.39%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96     1.48s ± 3%    0.98s ± 3%  -33.55%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     2.85s ± 3%    1.84s ± 3%  -35.53%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96     7.83s ± 2%    3.82s ±20%  -51.28%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96     14.5s ± 2%     7.4s ± 8%  -48.78%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    28.3s ± 1%    14.8s ± 9%  -47.50%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96    56.4s ± 3%    30.5s ± 6%  -45.92%  (p=0.008 n=5+5)
MultiExpG2Reference-96           1.47s ± 7%    0.98s ± 2%  -33.24%  (p=0.016 n=5+4)
ManyMultiExpG2Reference-96       1.70s ± 3%    1.20s ± 4%  -29.36%  (p=0.008 n=5+5)

bls12-381

2022/11/18 01:50:30 comparing ../../ecc/bls12-381 MultiExp
name                           old time/op   new time/op   delta
MultiExpG1/32_points-96          363µs ±12%    357µs ± 5%     ~     (p=0.690 n=5+5)
MultiExpG1/64_points-96          450µs ± 9%    446µs ± 8%     ~     (p=1.000 n=5+5)
MultiExpG1/128_points-96         624µs ± 7%    562µs ± 8%   -9.94%  (p=0.032 n=5+5)
MultiExpG1/256_points-96         919µs ±40%    897µs ±56%     ~     (p=0.841 n=5+5)
MultiExpG1/512_points-96         922µs ±23%    949µs ±25%     ~     (p=0.841 n=5+5)
MultiExpG1/1024_points-96       1.33ms ±57%   1.17ms ± 4%     ~     (p=1.000 n=5+5)
MultiExpG1/2048_points-96       2.05ms ±39%   2.40ms ±20%     ~     (p=0.222 n=5+5)
MultiExpG1/4096_points-96       3.07ms ±36%   5.18ms ±99%     ~     (p=0.095 n=5+5)
MultiExpG1/8192_points-96       5.28ms ±23%   4.65ms ±32%     ~     (p=0.095 n=5+5)
MultiExpG1/16384_points-96      11.9ms ±57%    8.8ms ±18%     ~     (p=0.095 n=5+5)
MultiExpG1/32768_points-96      18.0ms ± 7%   19.8ms ±22%     ~     (p=0.413 n=4+5)
MultiExpG1/65536_points-96      38.8ms ±21%   35.7ms ±54%     ~     (p=0.310 n=5+5)
MultiExpG1/131072_points-96     79.9ms ±20%   72.0ms ±18%     ~     (p=0.421 n=5+5)
MultiExpG1/262144_points-96      147ms ±21%    110ms ± 4%  -24.79%  (p=0.016 n=5+4)
MultiExpG1/524288_points-96      272ms ± 5%    202ms ± 1%  -25.47%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96     503ms ± 3%    362ms ± 2%  -28.12%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     964ms ± 2%    704ms ± 5%  -27.04%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     3.01s ± 7%    1.38s ± 3%  -54.11%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     5.67s ± 3%    2.78s ± 4%  -51.03%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    10.9s ± 2%     5.3s ± 4%  -51.34%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    21.2s ± 4%    10.7s ±10%  -49.32%  (p=0.008 n=5+5)
MultiExpG1Reference-96           481ms ± 2%    355ms ± 1%  -26.22%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       576ms ± 5%    373ms ± 2%  -35.13%  (p=0.008 n=5+5)
MultiExpG2/32_points-96          862µs ±46%    733µs ±24%     ~     (p=0.310 n=5+5)
MultiExpG2/64_points-96         1.11ms ±22%   0.82ms ±15%  -25.73%  (p=0.016 n=5+5)
MultiExpG2/128_points-96       1.71ms ±124%   1.11ms ±18%     ~     (p=0.548 n=5+5)
MultiExpG2/256_points-96        1.63ms ±75%  3.59ms ±128%     ~     (p=0.056 n=5+5)
MultiExpG2/512_points-96        1.98ms ±14%   7.87ms ±77%     ~     (p=0.151 n=5+5)
MultiExpG2/1024_points-96       2.76ms ±36%   2.43ms ±11%     ~     (p=0.310 n=5+5)
MultiExpG2/2048_points-96       3.96ms ±13%   4.57ms ±32%     ~     (p=0.222 n=5+5)
MultiExpG2/4096_points-96       6.92ms ±29%   8.51ms ±28%     ~     (p=0.222 n=5+5)
MultiExpG2/8192_points-96       17.2ms ±39%   10.2ms ± 3%  -40.49%  (p=0.008 n=5+5)
MultiExpG2/16384_points-96      30.0ms ±68%   25.7ms ±33%     ~     (p=0.841 n=5+5)
MultiExpG2/32768_points-96      46.7ms ±14%   56.6ms ±53%     ~     (p=0.841 n=5+5)
MultiExpG2/65536_points-96       113ms ±18%     81ms ±28%  -28.31%  (p=0.016 n=5+5)
MultiExpG2/131072_points-96      208ms ±48%    143ms ±15%  -31.05%  (p=0.008 n=5+5)
MultiExpG2/262144_points-96      347ms ± 2%    268ms ± 2%  -22.96%  (p=0.029 n=4+4)
MultiExpG2/524288_points-96      710ms ± 3%    516ms ± 4%  -27.35%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96     1.35s ± 5%    0.90s ± 3%  -33.21%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     2.61s ± 8%    1.74s ± 5%  -33.52%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96     7.16s ± 2%    3.38s ± 9%  -52.79%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96     13.2s ± 3%     6.7s ± 3%  -48.99%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    25.5s ± 3%    13.2s ± 6%  -48.26%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96    51.3s ± 4%    26.4s ±10%  -48.54%  (p=0.008 n=5+5)
MultiExpG2Reference-96           1.35s ± 7%    0.89s ± 2%  -33.73%  (p=0.008 n=5+5)
ManyMultiExpG2Reference-96       1.56s ± 5%    1.10s ± 3%  -29.43%  (p=0.016 n=5+4)

bn254

2022/11/18 01:50:30 comparing ../../ecc/bn254 MultiExp
name                           old time/op  new time/op  delta
MultiExpG1/32_points-96         241µs ± 5%   244µs ± 6%     ~     (p=0.222 n=5+5)
MultiExpG1/64_points-96         357µs ±24%   334µs ± 8%     ~     (p=0.690 n=5+5)
MultiExpG1/128_points-96        483µs ±46%   434µs ± 6%     ~     (p=0.841 n=5+5)
MultiExpG1/256_points-96        477µs ±17%   562µs ±14%     ~     (p=0.056 n=5+5)
MultiExpG1/512_points-96        673µs ±14%   635µs ±17%     ~     (p=0.690 n=5+5)
MultiExpG1/1024_points-96       826µs ±11%   856µs ±13%     ~     (p=0.548 n=5+5)
MultiExpG1/2048_points-96      1.13ms ±12%  1.21ms ± 5%     ~     (p=0.095 n=5+5)
MultiExpG1/4096_points-96      1.88ms ±39%  2.15ms ±29%     ~     (p=0.310 n=5+5)
MultiExpG1/8192_points-96      2.89ms ±16%  3.26ms ±33%     ~     (p=0.151 n=5+5)
MultiExpG1/16384_points-96     6.24ms ±33%  5.24ms ±12%     ~     (p=0.310 n=5+5)
MultiExpG1/32768_points-96     14.5ms ±25%  11.0ms ±37%     ~     (p=0.056 n=5+5)
MultiExpG1/65536_points-96     21.7ms ±18%  25.6ms ±65%     ~     (p=0.841 n=5+5)
MultiExpG1/131072_points-96    53.9ms ±25%  45.4ms ±61%     ~     (p=0.421 n=5+5)
MultiExpG1/262144_points-96    80.5ms ± 7%  65.8ms ± 5%  -18.26%  (p=0.008 n=5+5)
MultiExpG1/524288_points-96     166ms ± 2%   137ms ± 6%  -17.75%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96    317ms ± 8%   234ms ± 3%  -26.15%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96    575ms ± 3%   453ms ± 3%  -21.20%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96    2.33s ± 3%   0.86s ± 2%  -63.12%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96    4.39s ± 5%   1.68s ± 4%  -61.64%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96   8.77s ± 4%   3.41s ± 7%  -61.07%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96   17.3s ± 3%    6.7s ± 3%  -61.25%  (p=0.008 n=5+5)
MultiExpG1Reference-96          291ms ± 1%   228ms ± 1%  -21.57%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96      349ms ± 4%   239ms ± 1%  -31.44%  (p=0.008 n=5+5)
MultiExpG2/32_points-96         435µs ± 7%   434µs ± 8%     ~     (p=0.841 n=5+5)
MultiExpG2/64_points-96         555µs ±12%   507µs ± 4%     ~     (p=0.063 n=5+4)
MultiExpG2/128_points-96        701µs ±12%   656µs ± 7%     ~     (p=0.421 n=5+5)
MultiExpG2/256_points-96       1.14ms ±17%  1.12ms ±13%     ~     (p=1.000 n=5+5)
MultiExpG2/512_points-96       1.79ms ±51%  1.20ms ±15%     ~     (p=0.310 n=5+5)
MultiExpG2/1024_points-96      1.37ms ± 2%  1.54ms ±17%     ~     (p=0.190 n=4+5)
MultiExpG2/2048_points-96      2.72ms ±39%  2.84ms ±25%     ~     (p=0.690 n=5+5)
MultiExpG2/4096_points-96      5.28ms ±24%  4.56ms ±25%     ~     (p=0.421 n=5+5)
MultiExpG2/8192_points-96      7.44ms ±51%  5.70ms ± 7%     ~     (p=0.063 n=5+4)
MultiExpG2/16384_points-96     14.1ms ± 7%  12.2ms ±26%     ~     (p=0.421 n=5+5)
MultiExpG2/32768_points-96     28.8ms ± 6%  26.7ms ±31%     ~     (p=0.905 n=4+5)
MultiExpG2/65536_points-96     45.8ms ± 6%  45.7ms ±37%     ~     (p=0.556 n=4+5)
MultiExpG2/131072_points-96    95.9ms ±10%  71.2ms ± 6%  -25.72%  (p=0.008 n=5+5)
MultiExpG2/262144_points-96     252ms ±66%   154ms ±12%  -38.68%  (p=0.008 n=5+5)
MultiExpG2/524288_points-96     368ms ± 4%   267ms ± 2%  -27.47%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96    678ms ± 3%   497ms ± 8%  -26.69%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96    1.26s ± 2%   0.95s ±16%  -24.14%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96    3.81s ± 3%   1.80s ± 2%  -52.66%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96    7.25s ± 6%   3.56s ± 5%  -50.92%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96   13.9s ± 2%    6.9s ± 3%  -50.42%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96   27.2s ± 3%   14.2s ± 5%  -47.75%  (p=0.008 n=5+5)
MultiExpG2Reference-96          654ms ± 3%   498ms ± 7%  -23.82%  (p=0.008 n=5+5)
ManyMultiExpG2Reference-96      772ms ± 1%   528ms ± 7%  -31.56%  (p=0.008 n=5+5)

bw6-761

2022/11/18 01:50:30 comparing ../../ecc/bw6-761 MultiExp
name                           old time/op   new time/op    delta
MultiExpG1/32_points-96         1.28ms ±56%    1.17ms ±34%      ~     (p=0.690 n=5+5)
MultiExpG1/64_points-96         1.50ms ±28%   2.17ms ±104%      ~     (p=0.548 n=5+5)
MultiExpG1/128_points-96        1.37ms ±14%   2.54ms ±110%      ~     (p=0.063 n=4+5)
MultiExpG1/256_points-96        1.73ms ± 1%   2.90ms ±130%      ~     (p=1.000 n=4+5)
MultiExpG1/512_points-96        2.08ms ± 1%    2.78ms ±21%   +33.85%  (p=0.016 n=4+5)
MultiExpG1/1024_points-96       2.96ms ± 2%    6.32ms ±63%  +113.99%  (p=0.016 n=4+5)
MultiExpG1/2048_points-96       4.13ms ± 2%  11.02ms ±125%  +166.75%  (p=0.008 n=5+5)
MultiExpG1/4096_points-96       7.34ms ± 7%    7.38ms ±12%      ~     (p=0.841 n=5+5)
MultiExpG1/8192_points-96       13.3ms ±18%    14.7ms ± 7%      ~     (p=0.095 n=5+5)
MultiExpG1/16384_points-96      27.4ms ±13%    33.2ms ±43%      ~     (p=0.310 n=5+5)
MultiExpG1/32768_points-96      52.3ms ±24%    47.5ms ± 8%      ~     (p=0.413 n=5+4)
MultiExpG1/65536_points-96       203ms ±29%      93ms ±11%   -54.08%  (p=0.008 n=5+5)
MultiExpG1/131072_points-96      329ms ±18%     233ms ± 9%   -29.05%  (p=0.008 n=5+5)
MultiExpG1/262144_points-96      515ms ± 3%     446ms ±67%      ~     (p=0.151 n=5+5)
MultiExpG1/524288_points-96      933ms ± 2%     593ms ± 7%   -36.44%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96     1.83s ± 7%     1.10s ± 5%   -40.25%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     3.43s ± 5%     2.20s ± 1%   -35.93%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     7.09s ± 3%     4.48s ± 6%   -36.88%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     14.0s ± 5%      8.9s ± 5%   -36.39%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    30.2s ±11%     18.2s ± 3%   -39.71%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    57.5s ± 4%     37.2s ± 7%   -35.25%  (p=0.008 n=5+5)
MultiExpG1Reference-96           1.72s ± 5%     1.13s ±14%   -34.25%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       1.93s ± 2%     1.37s ± 5%   -28.87%  (p=0.016 n=5+4)
MultiExpG2/32_points-96         1.21ms ±31%   1.54ms ±104%      ~     (p=1.000 n=5+5)
MultiExpG2/64_points-96         1.92ms ±43%    1.77ms ±72%      ~     (p=0.841 n=5+5)
MultiExpG2/128_points-96       2.48ms ±134%   3.18ms ±168%      ~     (p=0.841 n=5+5)
MultiExpG2/256_points-96        2.81ms ±77%    2.96ms ±68%      ~     (p=0.310 n=5+5)
MultiExpG2/512_points-96        2.61ms ±30%    2.50ms ±23%      ~     (p=0.548 n=5+5)
MultiExpG2/1024_points-96       3.29ms ±10%    4.50ms ±46%   +36.81%  (p=0.032 n=5+5)
MultiExpG2/2048_points-96       5.53ms ±35%    6.04ms ±70%      ~     (p=1.000 n=5+5)
MultiExpG2/4096_points-96       8.77ms ±36%    8.44ms ±44%      ~     (p=0.841 n=5+5)
MultiExpG2/8192_points-96       15.3ms ±30%    16.2ms ±38%      ~     (p=0.841 n=5+5)
MultiExpG2/16384_points-96      28.1ms ±26%    26.5ms ±16%      ~     (p=0.690 n=5+5)
MultiExpG2/32768_points-96      66.3ms ±22%    52.0ms ±25%   -21.58%  (p=0.032 n=5+5)
MultiExpG2/65536_points-96       190ms ± 8%      89ms ± 4%   -53.04%  (p=0.008 n=5+5)
MultiExpG2/131072_points-96      317ms ±26%     261ms ±12%      ~     (p=0.151 n=5+5)
MultiExpG2/262144_points-96      522ms ±11%     402ms ±31%      ~     (p=0.056 n=5+5)
MultiExpG2/524288_points-96      912ms ± 3%     622ms ±17%   -31.80%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96     1.77s ± 9%     1.08s ± 3%   -38.85%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     3.32s ± 7%     2.27s ±24%   -31.72%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96     6.77s ± 9%     4.39s ± 6%   -35.16%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96     13.3s ± 5%      8.8s ± 2%   -33.73%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    26.9s ± 5%     17.8s ± 8%   -33.70%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96    56.2s ± 7%     38.1s ±11%   -32.23%  (p=0.008 n=5+5)
MultiExpG2Reference-96           1.69s ± 8%     1.10s ± 6%   -34.88%  (p=0.008 n=5+5)
ManyMultiExpG2Reference-96       1.90s ± 1%     1.37s ± 7%   -27.68%  (p=0.008 n=5+5)

with split logic (more cores == we split the msm)

TLDR; advantage is good for most sizes (10% to 50% perf gain), decreases with large msms probably due to the fact that we now stop at c=16. Some small sizes on G2 have significant decrease, need to tune the batchSize / choice of c for those.

bls12-377
2022/11/17 23:08:58 comparing ../../ecc/bls12-377 MultiExp
name                           old time/op  new time/op  delta
MultiExpG1/32_points-96         459µs ± 5%   363µs ± 9%  -21.07%  (p=0.008 n=5+5)
MultiExpG1/64_points-96         609µs ±12%   431µs ± 6%  -29.19%  (p=0.008 n=5+5)
MultiExpG1/128_points-96        778µs ±15%   726µs ±34%     ~     (p=0.421 n=5+5)
MultiExpG1/256_points-96       1.02ms ±32%  0.90ms ±21%     ~     (p=0.690 n=5+5)
MultiExpG1/512_points-96        985µs ±18%  1031µs ±32%     ~     (p=0.690 n=5+5)
MultiExpG1/1024_points-96      1.16ms ±16%  1.18ms ±27%     ~     (p=1.000 n=5+5)
MultiExpG1/2048_points-96      1.52ms ± 8%  1.66ms ± 6%     ~     (p=0.056 n=5+5)
MultiExpG1/4096_points-96      2.30ms ±23%  2.45ms ± 8%     ~     (p=0.421 n=5+5)
MultiExpG1/8192_points-96      2.88ms ± 4%  3.55ms ±21%  +23.34%  (p=0.008 n=5+5)
MultiExpG1/16384_points-96     4.88ms ±10%  5.54ms ±25%     ~     (p=0.222 n=5+5)
MultiExpG1/32768_points-96     8.35ms ± 0%  8.71ms ±13%     ~     (p=0.690 n=5+5)
MultiExpG1/65536_points-96     10.6ms ±12%  13.1ms ±31%     ~     (p=0.151 n=5+5)
MultiExpG1/131072_points-96    18.3ms ± 2%  15.9ms ± 2%  -13.14%  (p=0.008 n=5+5)
MultiExpG1/262144_points-96    35.4ms ± 3%  35.3ms ±39%     ~     (p=0.151 n=5+5)
MultiExpG1/524288_points-96    75.5ms ±16%  56.1ms ± 3%  -25.73%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96    149ms ± 2%   105ms ± 1%  -29.09%  (p=0.016 n=4+5)
MultiExpG1/2097152_points-96    316ms ± 1%   227ms ± 0%  -28.31%  (p=0.029 n=4+4)
MultiExpG1/4194304_points-96    601ms ± 8%   419ms ± 2%  -30.26%  (p=0.016 n=4+5)
MultiExpG1/8388608_points-96    1.08s ± 3%   0.81s ±10%  -25.23%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96   1.88s ± 7%   1.53s ± 3%  -18.62%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96   3.83s ±17%   3.06s ± 5%  -20.08%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96   6.65s ± 1%   5.92s ± 5%  -10.90%  (p=0.016 n=4+5)
MultiExpG1Reference-96          148ms ± 8%   105ms ± 1%  -29.05%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96      347ms ± 4%   266ms ± 2%  -23.21%  (p=0.016 n=5+4)
MultiExpG2/32_points-96        1.15ms ±76%  0.87ms ±33%     ~     (p=0.548 n=5+5)
MultiExpG2/64_points-96        1.04ms ±12%  1.11ms ±18%     ~     (p=0.556 n=5+4)
MultiExpG2/128_points-96       1.24ms ±16%  1.14ms ±13%     ~     (p=0.421 n=5+5)
MultiExpG2/256_points-96       2.78ms ±68%  1.85ms ±14%     ~     (p=0.095 n=5+5)
MultiExpG2/512_points-96       1.93ms ±46%  2.00ms ±19%     ~     (p=0.310 n=5+5)
MultiExpG2/1024_points-96      2.08ms ±11%  2.91ms ±48%     ~     (p=0.056 n=5+5)
MultiExpG2/2048_points-96      2.94ms ± 7%  3.03ms ± 3%     ~     (p=0.548 n=5+5)
MultiExpG2/4096_points-96      4.31ms ±18%  6.13ms ±66%  +42.30%  (p=0.032 n=5+5)
MultiExpG2/8192_points-96      7.13ms ± 4%  7.97ms ±14%  +11.82%  (p=0.016 n=5+5)
MultiExpG2/16384_points-96     12.9ms ± 6%  13.4ms ±24%     ~     (p=0.690 n=5+5)
MultiExpG2/32768_points-96     23.6ms ± 1%  22.0ms ±19%     ~     (p=0.151 n=5+5)
MultiExpG2/65536_points-96     30.2ms ±25%  20.3ms ± 2%  -32.88%  (p=0.016 n=5+4)
MultiExpG2/131072_points-96    57.2ms ± 4%  39.8ms ± 4%  -30.47%  (p=0.008 n=5+5)
MultiExpG2/262144_points-96     113ms ± 1%    88ms ± 4%  -21.56%  (p=0.016 n=4+5)
MultiExpG2/524288_points-96     225ms ± 2%   154ms ± 9%  -31.86%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96    461ms ± 5%   287ms ± 3%  -37.73%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96    907ms ±31%   639ms ± 6%  -29.57%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96    1.43s ± 7%   1.20s ± 5%  -16.14%  (p=0.029 n=4+4)
MultiExpG2/8388608_points-96    2.69s ± 3%   2.28s ± 5%  -15.14%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96   5.06s ± 2%   4.41s ± 2%  -12.84%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96   10.2s ±16%    8.6s ± 5%  -15.13%  (p=0.008 n=5+5)
MultiExpG2/67108864_points-96   17.7s ± 3%   17.6s ± 0%     ~     (p=0.905 n=5+4)
MultiExpG2Reference-96          468ms ± 6%   282ms ± 1%  -39.84%  (p=0.016 n=5+4)
ManyMultiExpG2Reference-96      1.08s ± 3%   0.71s ± 1%  -34.21%  (p=0.008 n=5+5)

bls12-378
2022/11/17 23:08:58 comparing ../../ecc/bls12-378 MultiExp
name                           old time/op  new time/op   delta
MultiExpG1/32_points-96         463µs ± 8%    358µs ±12%  -22.51%  (p=0.008 n=5+5)
MultiExpG1/64_points-96         645µs ±23%    453µs ±16%  -29.75%  (p=0.032 n=5+5)
MultiExpG1/128_points-96        646µs ±16%    603µs ±17%     ~     (p=0.548 n=5+5)
MultiExpG1/256_points-96        792µs ±13%    629µs ± 3%  -20.51%  (p=0.016 n=5+4)
MultiExpG1/512_points-96       1.05ms ±12%   0.82ms ±15%  -21.62%  (p=0.008 n=5+5)
MultiExpG1/1024_points-96      1.22ms ±23%   1.15ms ±20%     ~     (p=0.548 n=5+5)
MultiExpG1/2048_points-96      1.78ms ±26%   1.70ms ±16%     ~     (p=0.841 n=5+5)
MultiExpG1/4096_points-96      2.24ms ±20%   3.21ms ±26%  +43.52%  (p=0.016 n=5+5)
MultiExpG1/8192_points-96      2.97ms ± 5%   3.72ms ±42%  +25.31%  (p=0.016 n=5+5)
MultiExpG1/16384_points-96     5.14ms ± 7%   4.87ms ±12%     ~     (p=0.310 n=5+5)
MultiExpG1/32768_points-96     8.69ms ± 3%   9.61ms ±28%     ~     (p=0.421 n=5+5)
MultiExpG1/65536_points-96     16.0ms ± 2%   18.3ms ±70%     ~     (p=0.730 n=4+5)
MultiExpG1/131072_points-96    21.5ms ± 9%   17.3ms ±14%  -19.41%  (p=0.016 n=5+5)
MultiExpG1/262144_points-96    36.5ms ± 5%   32.0ms ± 3%  -12.24%  (p=0.016 n=5+4)
MultiExpG1/524288_points-96    71.3ms ± 1%   58.9ms ± 6%  -17.36%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96    150ms ± 2%    107ms ± 1%  -28.77%  (p=0.016 n=5+4)
MultiExpG1/2097152_points-96    321ms ± 3%    234ms ± 3%  -27.12%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96    589ms ±10%    420ms ± 4%  -28.58%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96    1.08s ± 4%    0.84s ±12%  -22.11%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96   1.97s ± 4%    1.56s ± 5%  -20.51%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96   3.62s ± 1%    3.03s ± 1%  -16.34%  (p=0.016 n=4+5)
MultiExpG1/67108864_points-96   6.66s ± 0%    6.24s ±11%     ~     (p=0.190 n=4+5)
MultiExpG1Reference-96          148ms ± 2%    108ms ± 2%  -26.76%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96      352ms ± 1%    268ms ±10%  -23.89%  (p=0.016 n=5+4)
MultiExpG2/32_points-96         933µs ±17%    820µs ± 9%     ~     (p=0.222 n=5+5)
MultiExpG2/64_points-96        1.33ms ±29%  1.46ms ±123%     ~     (p=0.310 n=5+5)
MultiExpG2/128_points-96       1.61ms ±12%   1.23ms ±27%  -23.43%  (p=0.032 n=4+5)
MultiExpG2/256_points-96       1.58ms ±21%   1.69ms ±33%     ~     (p=0.841 n=5+5)
MultiExpG2/512_points-96       1.80ms ±28%   1.85ms ±14%     ~     (p=0.730 n=4+5)
MultiExpG2/1024_points-96      2.01ms ± 0%   2.65ms ± 5%  +31.58%  (p=0.029 n=4+4)
MultiExpG2/2048_points-96      2.80ms ± 8%   3.89ms ±24%  +38.73%  (p=0.008 n=5+5)
MultiExpG2/4096_points-96      4.13ms ± 9%   4.93ms ±10%  +19.37%  (p=0.032 n=5+5)
MultiExpG2/8192_points-96      7.43ms ± 6%   8.51ms ±12%  +14.63%  (p=0.016 n=5+5)
MultiExpG2/16384_points-96     12.3ms ± 1%   12.9ms ±14%     ~     (p=0.222 n=5+5)
MultiExpG2/32768_points-96     23.7ms ± 1%   24.5ms ±27%     ~     (p=0.690 n=5+5)
MultiExpG2/65536_points-96     35.0ms ±21%   26.6ms ±14%  -23.82%  (p=0.032 n=5+4)
MultiExpG2/131072_points-96    59.0ms ± 3%   60.7ms ±67%     ~     (p=0.690 n=5+5)
MultiExpG2/262144_points-96     114ms ± 1%     88ms ± 1%  -23.20%  (p=0.029 n=4+4)
MultiExpG2/524288_points-96     232ms ± 4%    164ms ±11%  -29.13%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96    513ms ± 1%    299ms ± 4%  -41.79%  (p=0.016 n=4+5)
MultiExpG2/2097152_points-96    1.16s ± 6%    0.68s ± 4%  -41.68%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96    1.94s ± 3%    1.24s ± 4%  -36.02%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96    3.20s ± 5%    2.27s ± 3%  -29.03%  (p=0.016 n=5+4)
MultiExpG2/16777216_points-96   5.69s ± 1%    4.48s ± 2%  -21.34%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96   9.64s ± 1%    8.59s ± 5%  -10.92%  (p=0.016 n=4+5)
MultiExpG2/67108864_points-96   17.7s ± 0%    17.2s ± 2%   -2.85%  (p=0.016 n=4+5)
MultiExpG2Reference-96          466ms ± 5%   619ms ±187%     ~     (p=0.310 n=5+5)
ManyMultiExpG2Reference-96      1.08s ± 2%    0.72s ± 5%  -33.12%  (p=0.008 n=5+5)

bls12-381

2022/11/17 23:08:58 comparing ../../ecc/bls12-381 MultiExp
name                           old time/op   new time/op  delta
MultiExpG1/32_points-96          456µs ± 7%   384µs ±29%     ~     (p=0.151 n=5+5)
MultiExpG1/64_points-96          592µs ±15%   444µs ±26%  -24.99%  (p=0.032 n=5+5)
MultiExpG1/128_points-96         803µs ±42%   584µs ±17%  -27.35%  (p=0.032 n=5+5)
MultiExpG1/256_points-96         786µs ±18%   712µs ± 8%     ~     (p=0.222 n=5+5)
MultiExpG1/512_points-96         891µs ± 9%  1066µs ±34%     ~     (p=0.222 n=5+5)
MultiExpG1/1024_points-96       1.31ms ±19%  1.96ms ±45%     ~     (p=0.151 n=5+5)
MultiExpG1/2048_points-96       2.13ms ±23%  1.63ms ±24%  -23.41%  (p=0.016 n=5+5)
MultiExpG1/4096_points-96       2.11ms ±16%  2.15ms ±16%     ~     (p=0.310 n=5+5)
MultiExpG1/8192_points-96       3.01ms ± 5%  3.53ms ±23%  +17.22%  (p=0.032 n=5+5)
MultiExpG1/16384_points-96      4.83ms ± 7%  6.25ms ±30%     ~     (p=0.095 n=5+5)
MultiExpG1/32768_points-96      8.48ms ± 1%  9.34ms ±39%     ~     (p=0.841 n=5+5)
MultiExpG1/65536_points-96      11.9ms ±36%  13.8ms ±49%     ~     (p=0.421 n=5+5)
MultiExpG1/131072_points-96     22.0ms ±15%  26.3ms ±94%     ~     (p=0.690 n=5+5)
MultiExpG1/262144_points-96     35.5ms ± 1%  32.1ms ± 5%   -9.40%  (p=0.008 n=5+5)
MultiExpG1/524288_points-96     72.0ms ± 0%  65.2ms ±16%     ~     (p=0.190 n=4+5)
MultiExpG1/1048576_points-96     154ms ± 1%   116ms ± 1%  -24.80%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     320ms ± 2%   235ms ± 5%  -26.61%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     608ms ± 4%   427ms ± 3%  -29.70%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     1.11s ±11%   0.83s ±11%  -24.94%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    1.88s ± 3%   1.61s ± 4%  -14.22%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    3.72s ± 1%   3.14s ± 7%  -15.63%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96    6.75s ± 0%   6.10s ± 6%   -9.59%  (p=0.008 n=5+5)
MultiExpG1Reference-96           146ms ± 1%   115ms ± 1%  -20.86%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       351ms ± 1%   268ms ± 4%  -23.64%  (p=0.008 n=5+5)
MultiExpG2/32_points-96          813µs ±11%   755µs ±29%     ~     (p=0.310 n=5+5)
MultiExpG2/64_points-96        1.38ms ±109%  0.89ms ±24%     ~     (p=0.056 n=5+5)
MultiExpG2/128_points-96       1.86ms ±107%  1.74ms ±84%     ~     (p=0.841 n=5+5)
MultiExpG2/256_points-96       2.01ms ±130%  1.53ms ±32%     ~     (p=0.841 n=5+5)
MultiExpG2/512_points-96        1.79ms ±36%  1.57ms ± 7%     ~     (p=0.310 n=5+5)
MultiExpG2/1024_points-96       2.51ms ±72%  2.49ms ±26%     ~     (p=0.222 n=5+5)
MultiExpG2/2048_points-96       2.61ms ± 3%  3.52ms ±36%  +34.83%  (p=0.008 n=5+5)
MultiExpG2/4096_points-96       3.76ms ± 9%  4.67ms ±28%  +24.39%  (p=0.016 n=5+5)
MultiExpG2/8192_points-96       6.73ms ± 6%  7.29ms ±29%     ~     (p=0.548 n=5+5)
MultiExpG2/16384_points-96      11.0ms ± 0%  15.6ms ±23%  +42.05%  (p=0.016 n=4+5)
MultiExpG2/32768_points-96      21.2ms ± 1%  25.0ms ±23%     ~     (p=0.151 n=5+5)
MultiExpG2/65536_points-96      35.5ms ± 9%  30.7ms ±28%     ~     (p=0.151 n=5+5)
MultiExpG2/131072_points-96     51.6ms ± 2%  47.4ms ±30%     ~     (p=0.690 n=5+5)
MultiExpG2/262144_points-96      101ms ± 1%   121ms ±74%     ~     (p=0.690 n=5+5)
MultiExpG2/524288_points-96      204ms ± 1%   143ms ± 2%  -29.71%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96     425ms ± 3%   338ms ± 3%  -20.56%  (p=0.016 n=5+4)
MultiExpG2/2097152_points-96     1.25s ±40%   0.65s ± 3%  -48.05%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96     1.45s ± 4%   1.19s ± 3%  -18.09%  (p=0.016 n=4+5)
MultiExpG2/8388608_points-96     2.63s ± 5%   2.22s ± 4%  -15.25%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    4.99s ± 2%   4.48s ±23%     ~     (p=0.151 n=5+5)
MultiExpG2/33554432_points-96    8.88s ± 1%   8.20s ± 3%   -7.64%  (p=0.016 n=4+5)
MultiExpG2/67108864_points-96    16.6s ± 3%   16.1s ± 5%     ~     (p=0.095 n=5+5)
MultiExpG2Reference-96           420ms ± 4%   418ms ±55%     ~     (p=0.151 n=5+5)
ManyMultiExpG2Reference-96       961ms ± 2%   690ms ± 3%  -28.18%  (p=0.008 n=5+5)


bn254

2022/11/17 23:08:59 comparing ../../ecc/bn254 MultiExp
name                           old time/op  new time/op  delta
MultiExpG1/32_points-96         299µs ±13%   248µs ± 3%  -16.81%  (p=0.008 n=5+5)
MultiExpG1/64_points-96         340µs ±14%   324µs ±13%     ~     (p=0.151 n=5+5)
MultiExpG1/128_points-96        512µs ±16%   443µs ±17%     ~     (p=0.222 n=5+5)
MultiExpG1/256_points-96        558µs ± 6%   521µs ±23%     ~     (p=0.151 n=5+5)
MultiExpG1/512_points-96        737µs ±12%   660µs ± 9%     ~     (p=0.095 n=5+5)
MultiExpG1/1024_points-96      1.00ms ±27%  0.85ms ± 9%     ~     (p=0.222 n=5+5)
MultiExpG1/2048_points-96      1.08ms ± 7%  1.24ms ±25%     ~     (p=0.222 n=5+5)
MultiExpG1/4096_points-96      1.59ms ±23%  1.61ms ±12%     ~     (p=0.421 n=5+5)
MultiExpG1/8192_points-96      2.51ms ±14%  2.10ms ± 6%  -16.48%  (p=0.008 n=5+5)
MultiExpG1/16384_points-96     3.43ms ±10%  3.95ms ±19%     ~     (p=0.056 n=5+5)
MultiExpG1/32768_points-96     5.66ms ± 3%  6.95ms ±66%     ~     (p=0.690 n=5+5)
MultiExpG1/65536_points-96     9.42ms ±19%  8.05ms ±12%     ~     (p=0.095 n=5+5)
MultiExpG1/131072_points-96    15.6ms ± 4%  15.2ms ± 8%     ~     (p=0.548 n=5+5)
MultiExpG1/262144_points-96    22.2ms ± 2%  21.0ms ±20%     ~     (p=0.151 n=5+5)
MultiExpG1/524288_points-96    44.5ms ± 0%  43.9ms ±19%     ~     (p=0.730 n=4+5)
MultiExpG1/1048576_points-96   93.0ms ± 1%  70.7ms ± 3%  -24.02%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96    201ms ± 2%   139ms ± 1%  -31.06%  (p=0.016 n=4+5)
MultiExpG1/4194304_points-96    407ms ±12%   253ms ± 2%  -37.64%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96    744ms ± 3%   510ms ± 8%  -31.53%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96   1.31s ± 6%   1.09s ±17%  -16.96%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96   2.81s ± 4%   1.91s ±16%  -31.98%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96   5.29s ± 5%   3.79s ± 9%  -28.45%  (p=0.008 n=5+5)
MultiExpG1Reference-96         88.8ms ± 1%  83.7ms ±29%     ~     (p=0.690 n=5+5)
ManyMultiExpG1Reference-96      214ms ± 2%   267ms ±44%     ~     (p=0.690 n=5+5)
MultiExpG2/32_points-96         566µs ±15%   417µs ± 3%  -26.39%  (p=0.008 n=5+5)
MultiExpG2/64_points-96         679µs ±16%   551µs ± 6%  -18.84%  (p=0.008 n=5+5)
MultiExpG2/128_points-96        899µs ±21%   761µs ±29%     ~     (p=0.222 n=5+5)
MultiExpG2/256_points-96        852µs ± 6%  1012µs ±24%     ~     (p=0.151 n=5+5)
MultiExpG2/512_points-96       1.16ms ±14%  0.99ms ±12%  -14.66%  (p=0.032 n=5+5)
MultiExpG2/1024_points-96      1.60ms ±24%  1.39ms ±21%     ~     (p=0.056 n=5+5)
MultiExpG2/2048_points-96      1.88ms ±16%  2.16ms ±33%     ~     (p=0.151 n=5+5)
MultiExpG2/4096_points-96      2.35ms ± 1%  3.03ms ±16%  +28.85%  (p=0.008 n=5+5)
MultiExpG2/8192_points-96      3.80ms ±13%  4.02ms ±15%     ~     (p=0.310 n=5+5)
MultiExpG2/16384_points-96     6.18ms ± 9%  6.43ms ± 9%     ~     (p=0.310 n=5+5)
MultiExpG2/32768_points-96     11.3ms ± 0%  11.4ms ± 9%     ~     (p=0.690 n=5+5)
MultiExpG2/65536_points-96     18.8ms ±14%  26.8ms ±82%     ~     (p=0.690 n=5+5)
MultiExpG2/131072_points-96    28.9ms ± 5%  31.5ms ±44%     ~     (p=0.690 n=5+5)
MultiExpG2/262144_points-96    49.7ms ± 1%  58.8ms ±78%     ~     (p=0.690 n=5+5)
MultiExpG2/524288_points-96     101ms ± 1%    79ms ± 8%  -21.14%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96    207ms ± 2%   156ms ± 2%  -24.63%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96    425ms ± 7%   348ms ± 5%  -18.21%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96    734ms ± 8%   614ms ± 5%  -16.40%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96    1.24s ± 7%   1.10s ± 7%  -11.56%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96   2.29s ± 4%   2.13s ± 7%     ~     (p=0.056 n=5+5)
MultiExpG2/33554432_points-96   4.84s ± 1%   4.04s ±11%  -16.50%  (p=0.008 n=5+5)
MultiExpG2/67108864_points-96   8.91s ± 0%   7.68s ± 2%  -13.84%  (p=0.008 n=5+5)
MultiExpG2Reference-96          209ms ± 2%   144ms ± 2%  -31.10%  (p=0.016 n=4+5)
ManyMultiExpG2Reference-96      482ms ± 1%   369ms ± 2%  -23.34%  (p=0.008 n=5+5)

bw6-633

2022/11/17 23:08:59 comparing ../../ecc/bw6-633 MultiExp
name                           old time/op   new time/op   delta
MultiExpG1/32_points-96          896µs ±20%   1113µs ±43%     ~     (p=0.556 n=5+4)
MultiExpG1/64_points-96         1.29ms ±23%  2.27ms ±153%     ~     (p=1.000 n=5+5)
MultiExpG1/128_points-96       2.54ms ±181%  2.54ms ±118%     ~     (p=0.841 n=5+5)
MultiExpG1/256_points-96        1.67ms ±69%   1.49ms ±19%     ~     (p=0.905 n=5+4)
MultiExpG1/512_points-96        1.99ms ±17%   2.49ms ±79%     ~     (p=0.548 n=5+5)
MultiExpG1/1024_points-96       2.48ms ±16%   2.30ms ± 7%     ~     (p=0.413 n=5+4)
MultiExpG1/2048_points-96       2.82ms ± 3%   3.39ms ±15%  +20.03%  (p=0.008 n=5+5)
MultiExpG1/4096_points-96       4.26ms ±10%   4.07ms ± 2%   -4.64%  (p=0.032 n=5+5)
MultiExpG1/8192_points-96       6.91ms ± 6%   6.09ms ± 1%  -11.81%  (p=0.008 n=5+5)
MultiExpG1/16384_points-96      11.5ms ±11%   15.2ms ±75%     ~     (p=0.151 n=5+5)
MultiExpG1/32768_points-96      18.5ms ± 5%   19.9ms ±10%     ~     (p=0.056 n=5+5)
MultiExpG1/65536_points-96      34.1ms ± 0%   43.5ms ±12%  +27.49%  (p=0.008 n=5+5)
MultiExpG1/131072_points-96     66.6ms ± 3%   86.2ms ±42%  +29.48%  (p=0.008 n=5+5)
MultiExpG1/262144_points-96      126ms ± 1%    117ms ± 1%   -6.88%  (p=0.008 n=5+5)
MultiExpG1/524288_points-96      307ms ± 2%    292ms ± 6%     ~     (p=0.056 n=5+5)
MultiExpG1/1048576_points-96     487ms ± 4%    306ms ± 1%  -37.14%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     791ms ± 5%    535ms ± 2%  -32.43%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     1.34s ± 3%    0.99s ± 2%  -26.16%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     2.36s ± 6%    1.88s ± 2%  -20.46%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    4.30s ± 5%    3.75s ± 7%  -12.82%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    8.26s ± 1%    7.06s ± 2%  -14.58%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96    16.3s ± 1%    14.1s ± 3%  -13.40%  (p=0.008 n=5+5)
MultiExpG1Reference-96           479ms ± 1%    302ms ± 2%  -36.95%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       1.12s ± 3%    0.80s ± 1%  -28.59%  (p=0.008 n=5+5)
MultiExpG2/32_points-96        1.28ms ±109%   0.72ms ± 7%  -43.74%  (p=0.008 n=5+5)
MultiExpG2/64_points-96         1.00ms ±17%  2.76ms ±103%     ~     (p=0.730 n=4+5)
MultiExpG2/128_points-96       2.17ms ±116%   1.26ms ±51%     ~     (p=0.095 n=5+5)
MultiExpG2/256_points-96       3.46ms ±124%   5.46ms ±73%     ~     (p=0.548 n=5+5)
MultiExpG2/512_points-96        1.70ms ±16%   1.77ms ± 7%     ~     (p=0.413 n=5+4)
MultiExpG2/1024_points-96       2.02ms ± 1%   2.37ms ±11%  +17.74%  (p=0.016 n=4+5)
MultiExpG2/2048_points-96       2.66ms ± 0%   3.25ms ± 5%  +22.32%  (p=0.029 n=4+4)
MultiExpG2/4096_points-96       3.88ms ± 5%   4.16ms ± 7%   +7.30%  (p=0.032 n=5+5)
MultiExpG2/8192_points-96       6.14ms ± 6%   6.08ms ± 6%     ~     (p=0.548 n=5+5)
MultiExpG2/16384_points-96      10.7ms ± 6%   15.1ms ± 9%  +40.36%  (p=0.008 n=5+5)
MultiExpG2/32768_points-96      18.4ms ± 2%   23.4ms ± 6%  +27.63%  (p=0.008 n=5+5)
MultiExpG2/65536_points-96      33.9ms ± 0%   53.7ms ±13%  +58.50%  (p=0.008 n=5+5)
MultiExpG2/131072_points-96     66.6ms ± 2%   71.1ms ± 7%     ~     (p=0.056 n=5+5)
MultiExpG2/262144_points-96      126ms ± 3%    136ms ±19%     ~     (p=0.095 n=5+5)
MultiExpG2/524288_points-96      550ms ±60%    317ms ± 6%     ~     (p=0.095 n=5+5)
MultiExpG2/1048576_points-96     506ms ± 3%    356ms ± 1%  -29.54%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     828ms ± 4%    578ms ± 1%  -30.20%  (p=0.016 n=4+5)
MultiExpG2/4194304_points-96     1.32s ± 1%    1.05s ± 6%  -20.13%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96     2.33s ± 3%    1.93s ± 9%  -17.08%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    4.58s ±31%    3.60s ± 2%  -21.47%  (p=0.016 n=5+4)
MultiExpG2/33554432_points-96    8.24s ± 2%    6.97s ± 4%  -15.35%  (p=0.008 n=5+5)
MultiExpG2/67108864_points-96    16.5s ± 2%    14.0s ± 3%  -15.03%  (p=0.008 n=5+5)
MultiExpG2Reference-96           485ms ± 4%    303ms ± 1%  -37.53%  (p=0.008 n=5+5)
ManyMultiExpG2Reference-96       1.11s ± 1%    0.80s ± 2%  -28.34%  (p=0.008 n=5+5)

bw6-761

2022/11/17 23:08:59 comparing ../../ecc/bw6-761 MultiExp
name                           old time/op   new time/op   delta
MultiExpG1/32_points-96         1.08ms ± 7%   1.29ms ±26%     ~     (p=0.310 n=5+5)
MultiExpG1/64_points-96         2.00ms ±56%   1.37ms ±47%     ~     (p=0.095 n=5+5)
MultiExpG1/128_points-96       3.32ms ±174%   3.65ms ±90%     ~     (p=0.841 n=5+5)
MultiExpG1/256_points-96       3.41ms ±145%   1.84ms ±11%     ~     (p=0.286 n=5+4)
MultiExpG1/512_points-96        2.37ms ± 9%   2.55ms ±19%     ~     (p=0.222 n=5+5)
MultiExpG1/1024_points-96       3.23ms ±12%   2.83ms ± 5%  -12.54%  (p=0.032 n=5+4)
MultiExpG1/2048_points-96       4.59ms ± 9%   4.41ms ±28%     ~     (p=0.310 n=5+5)
MultiExpG1/4096_points-96       7.53ms ± 9%   5.56ms ±19%  -26.18%  (p=0.008 n=5+5)
MultiExpG1/8192_points-96       11.5ms ±12%    7.7ms ± 4%  -32.67%  (p=0.016 n=5+4)
MultiExpG1/16384_points-96      21.3ms ±22%   13.1ms ± 2%  -38.57%  (p=0.008 n=5+5)
MultiExpG1/32768_points-96      37.2ms ± 3%   29.2ms ±18%  -21.51%  (p=0.008 n=5+5)
MultiExpG1/65536_points-96      65.3ms ± 5%   47.1ms ±13%  -27.85%  (p=0.008 n=5+5)
MultiExpG1/131072_points-96      109ms ± 2%     94ms ±19%     ~     (p=0.151 n=5+5)
MultiExpG1/262144_points-96      287ms ±93%    293ms ±18%     ~     (p=0.222 n=5+5)
MultiExpG1/524288_points-96      387ms ±12%    331ms ±24%     ~     (p=0.095 n=5+5)
MultiExpG1/1048576_points-96     641ms ±16%    513ms ± 5%  -19.99%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     1.13s ±13%    0.81s ±16%  -28.44%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     2.15s ± 5%    1.32s ± 9%  -38.73%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     4.06s ± 3%    2.68s ±12%  -34.13%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    7.98s ± 8%    4.79s ± 2%  -40.00%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    15.3s ± 1%     9.4s ± 2%  -38.72%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96    30.8s ± 6%    18.3s ± 2%  -40.51%  (p=0.008 n=5+5)
MultiExpG1Reference-96           729ms ±24%  2010ms ±200%     ~     (p=0.690 n=5+5)
ManyMultiExpG1Reference-96       1.80s ± 2%    1.26s ± 6%  -30.07%  (p=0.008 n=5+5)
MultiExpG2/32_points-96        1.45ms ±109%   1.31ms ±77%     ~     (p=1.000 n=5+5)
MultiExpG2/64_points-96         6.02ms ±57%  2.29ms ±106%     ~     (p=0.056 n=5+5)
MultiExpG2/128_points-96       4.97ms ±103%   3.37ms ±81%     ~     (p=0.690 n=5+5)
MultiExpG2/256_points-96        2.11ms ±16%   2.00ms ± 4%     ~     (p=0.486 n=4+4)
MultiExpG2/512_points-96       5.80ms ±100%  4.14ms ±133%     ~     (p=0.690 n=5+5)
MultiExpG2/1024_points-96       3.00ms ±16%  8.83ms ±122%     ~     (p=0.421 n=5+5)
MultiExpG2/2048_points-96       4.29ms ±13%   4.03ms ± 7%     ~     (p=0.421 n=5+5)
MultiExpG2/4096_points-96       6.16ms ±15%   6.87ms ±70%     ~     (p=0.690 n=5+5)
MultiExpG2/8192_points-96       10.4ms ±17%    8.1ms ± 5%  -21.79%  (p=0.016 n=5+4)
MultiExpG2/16384_points-96      15.3ms ±12%   14.7ms ±15%     ~     (p=0.690 n=5+5)
MultiExpG2/32768_points-96      24.6ms ± 3%   25.8ms ±26%     ~     (p=0.548 n=5+5)
MultiExpG2/65536_points-96      46.4ms ± 1%   52.1ms ±23%     ~     (p=0.222 n=5+5)
MultiExpG2/131072_points-96      107ms ± 2%    105ms ±16%     ~     (p=1.000 n=5+5)
MultiExpG2/262144_points-96      255ms ±27%    264ms ± 3%     ~     (p=0.286 n=5+4)
MultiExpG2/524288_points-96      384ms ± 7%    312ms ±13%  -18.65%  (p=0.016 n=5+4)
MultiExpG2/1048576_points-96    1.09s ±108%    0.47s ±12%  -57.09%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     1.26s ± 9%   1.49s ±112%     ~     (p=1.000 n=5+5)
MultiExpG2/4194304_points-96     2.28s ± 1%    1.39s ± 5%  -38.98%  (p=0.016 n=5+4)
MultiExpG2/8388608_points-96     4.30s ± 1%    2.71s ± 6%  -36.94%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    8.26s ± 5%    5.15s ±14%  -37.65%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96    15.8s ± 1%     9.2s ± 2%  -41.35%  (p=0.016 n=4+5)
MultiExpG2/67108864_points-96    31.2s ± 3%    18.4s ± 1%  -40.99%  (p=0.008 n=5+5)
MultiExpG2Reference-96           705ms ± 6%   1756ms ±82%     ~     (p=0.690 n=5+5)
ManyMultiExpG2Reference-96       1.85s ± 4%    1.21s ±11%  -34.66%  (p=0.008 n=5+5)

@gbotrel gbotrel added the perf label Nov 9, 2022
@gbotrel gbotrel added this to the v0.9.0 milestone Nov 9, 2022
@gbotrel gbotrel marked this pull request as draft November 9, 2022 22:00
@gbotrel gbotrel marked this pull request as ready for review November 18, 2022 04:01
@gbotrel gbotrel changed the title [draft]batch affine msm + generic in msm Batch affine msm + generic in msm Nov 18, 2022
@0x0ece
Copy link

0x0ece commented Nov 18, 2022

In practice (for uniformly distributed points), the slow down is ~5%, but worth it to avoid too many code path / edge cases.

Just FYI, this is the part in our paper where we essentially said "we don't know how to build an efficient + compact scheduler", as the 5% slow down for small MSM defeats batch affine (in g1, on Intel -- I'm glad you guys found wider applications!).

The issue is that keeping track of which buckets are used costs 1 mem write for each iteration of the scheduler, and that's surprisingly non-trivial (you're storing a bool, we also tried with uint to avoid reset, but similar result). For what is worth, since you can only have at most 100 bool set to true, another approach is to just keep the ids in the queue and compare with those... in hardware we're doing that because you can run all compare in parallel. In software, again, we weren't able to do it fast enough (faster than this current method, or faster than letting the queue grow without mem writes).

Anyway, I just wanted to remark, either for you or for anyone else lurking, that there might be an interesting trick here to get an "easy" 5% gain :)

Copy link
Collaborator

@yelhousni yelhousni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbotrel This looks great!

regarding the TODO on double() and unsafeFromJacExtended() when _p=0

normally the point at infinity lies in Z=0. In Jacobian coordinates it is any (t^2:t^3:0) so the convention is to take (1:1:0) (as we do in gnark-crypto) and in projective coordinates it is any (0:t:0) and the convention is to take (0:1:0). In cryptography since we don't need this point to be on the curve (because we don't use it in formulas) we just check that Z=0. Some references take (0:1:0) for both coordinates systems.

Because of this, the formulas in double() outputs always Z=0 when fed with a point at infinity. Same for unsafeFromJacExtended() (we can even rename it FromJacExtended() since the result is always (0:0:0)).

@gbotrel gbotrel merged commit 37c3c93 into develop Nov 21, 2022
@gbotrel gbotrel deleted the feat/msm-affine branch November 21, 2022 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants