GLV framework + precomp scripts #296

jon-chuang · 2020-09-27T13:53:01Z

The final piece of the puzzle, GLV. Patent expired 2 days ago.
Recommended to review after rest of algorithmic optimisations have been merged.
A description of the approach taken for GLV and background knowledge can be found in this hackmd article.

3/3 of a series of PRs to scipr-lab/zexe.

Features:

Batch Affine ops
Batch and ordinary w-NAF
GLV framework, GLV precomputation script.
GLV impl for both batch affine and projective mul (only for BW6)
Batch bucket addition tree
Batched MSM (1.4-1.7x bump)
Batch subgroup verification based on addition to random buckets
2D test matrix feature-gating with axes (curve, test type)
Refactor tests to use common template
Code timing instrumentation with fine grained control of functions and callers

Combined speedup (with w-NAF) for

Batched mul (with GLV): 2.3-2.7x.
Non-batch projective mul (with GLV): 2.1x.
Without GLV: 1.65-1.8x and 1.4x resp.

Speedup for subgroup verification

Over old impl 25-80x.
Over new batch mul: 10-30x.

Future PRs:

Improved MSM scaling (free scaling to 100s of threads for problem sizes > 2^25). (Need to investigate if held back by Amdahl's law in serial portion of code).
Recursive batch verification that gives ~1.5x speedup for large n.
Accel-based CUDA scalar mul + load-balancing code/framework to balance between CPU and GPU. Performance is neck to neck between weak GPU (GTX 1650 GPU) and a powerful CPU (16T Ryzen 4800H that clocks a sustained 4.2GHz). Probably better than most laptop Intels.

Future work:

Extend GLV to all applicable curves https://github.com/scipr-lab/zexe/issues/267

* adds CIOS assembly * fix cfg for asm unsafe code Co-authored-by: jon-chuang <9093549+jon-chuang@users.noreply.github.com>

* First draft affine batch ops & wnaf * changes to mutability and lifetimes * delete superfluous files * crazy direction: Passing a FnMut to generate an iterator locally * unsuccessful further attempts * compile sucess using index approach * fixes for mutable borrows * Successfully passed scalar mul test * benchmarks + prefetching * stash * generic impl of batch arith for all affinecurves * batched affine formulas for TE - too expensive * improved TE affine * cleanup batch inversion * fmt... * fix minor error * remove debugging scaffolding * fmt... * delete batch arith bench as not suitable for criterion or bench * fix bench removal errors * fmt... * added missing coeff_a * refactor BatchGroupArithmetic to be separate trait * Batch verification with radix sort * Cache-locality & parallelisation * Successfully impl batch verify * added tests and bench for batch_ver, parallel_random_gen, ^ thread util * fmt * enabled missing test * remove voracious_radix_sort * commented unneeded Instant::now() * Fixed batch_ver tests for curves of small or unit cofactor * split recursive and non-recursive, tidy up shared functionality * reduce max_logn * adjust max_logn further * Batch MSM, speedup only for bw6 due to poor cache performance * fmt... * GLV iBiginteger * stash * stash * GLV with Parameter-based specialisation * GLV lattice basis script success * Successfully passed tests and benched * Improvments to MSM with and bucketed adds using lightweight index sort * changed rng to be external parameter for non-parallel batch veri * remove bench print scaffolding * remove old batch_bucketed_add using vectors instead of fixed offsets * retain parallel batch_add_split * Comments for batch arith * remove need for hashmap for no std for batch_bucketed_add * minor changes * cleanup * cleanup * fmt + use no_std Vec * removed std:: * add scratch space * Add GLV for non-batched SW mul * fix for glv_scalar_decomposition when k == MODULUS (subgroup check) * Fixed performance BUG: unnecessary table generation * GLV -> has_glv(), bigint slice bd check, refactor batch loops, u32 index * clean remove of batch_verify * fix mistake with elems indexing, unused arg for future recursion PR * trivial errors * more minor fixes * fix issues with batch_ver (.is_zero(), TE affine->proj mul) * fix issue with batch_bucketed_add_split * misname * Success in test and bench \(*v*)/ * tmp commit to cache experimental batch_add_write_shift_.. * remove batch_add_write_shift.. * optional dep, fmt... * undo accidental deletion of dlsd sort * fmt... * cleanup batch bucket add, unify impl * no std... * fixed tests * fixed unimplemented for TE, swapped wnaf table row/col for batchaddwrite * wnaf table generation uses fewer copies, remove timing instrumentation * Minor Cleanup * Add feature-activated timing instrumentation, reduce code bloat (wnaf) * unused var, no_std * Make timing macros defined globally, instrument more code * instrument w/ tid, better num_rounds est. f64, timing black/whitelisting * Minor changes * refactor tests, generic MSM test * 2D test matrix :) * batchaffine * tests * additive features * big_n feature for test-benching * prefetch unroll * minor adjustments * extension(s -> "")_fields * remove artifacts, fix asm * uncomment subgroup checks, glv param sources * Clean up GLV murkiness and add comments * Set defaults for glv_window_size * refactor glv to use examples * redact glv * ammend accidental failed resolve * remove all batch ops * ammend bititerator merge conflicts * ammend accidental deletion of batch ops and glv * Fix trait bound issues by adding `+ From` to ScalarField * fix inexplicable fp6 error * minor import errors * additional small errors * remove everything but test changes * yml * yml * remove batch arith from benches * minor changes * more minor fixes * fmt * fmt

* First draft affine batch ops & wnaf * changes to mutability and lifetimes * delete superfluous files * crazy direction: Passing a FnMut to generate an iterator locally * unsuccessful further attempts * compile sucess using index approach * fixes for mutable borrows * Successfully passed scalar mul test * benchmarks + prefetching * stash * generic impl of batch arith for all affinecurves * batched affine formulas for TE - too expensive * improved TE affine * cleanup batch inversion * fmt... * fix minor error * remove debugging scaffolding * fmt... * delete batch arith bench as not suitable for criterion or bench * fix bench removal errors * fmt... * added missing coeff_a * refactor BatchGroupArithmetic to be separate trait * Batch verification with radix sort * Cache-locality & parallelisation * Successfully impl batch verify * added tests and bench for batch_ver, parallel_random_gen, ^ thread util * fmt * enabled missing test * remove voracious_radix_sort * commented unneeded Instant::now() * Fixed batch_ver tests for curves of small or unit cofactor * split recursive and non-recursive, tidy up shared functionality * reduce max_logn * adjust max_logn further * Batch MSM, speedup only for bw6 due to poor cache performance * fmt... * GLV iBiginteger * stash * stash * GLV with Parameter-based specialisation * GLV lattice basis script success * Successfully passed tests and benched * Improvments to MSM with and bucketed adds using lightweight index sort * changed rng to be external parameter for non-parallel batch veri * remove bench print scaffolding * remove old batch_bucketed_add using vectors instead of fixed offsets * retain parallel batch_add_split * Comments for batch arith * remove need for hashmap for no std for batch_bucketed_add * minor changes * cleanup * cleanup * fmt + use no_std Vec * removed std:: * add scratch space * Add GLV for non-batched SW mul * fix for glv_scalar_decomposition when k == MODULUS (subgroup check) * Fixed performance BUG: unnecessary table generation * GLV -> has_glv(), bigint slice bd check, refactor batch loops, u32 index * clean remove of batch_verify * fix mistake with elems indexing, unused arg for future recursion PR * trivial errors * more minor fixes * fix issues with batch_ver (.is_zero(), TE affine->proj mul) * fix issue with batch_bucketed_add_split * misname * Success in test and bench \(*v*)/ * tmp commit to cache experimental batch_add_write_shift_.. * remove batch_add_write_shift.. * optional dep, fmt... * undo accidental deletion of dlsd sort * fmt... * cleanup batch bucket add, unify impl * no std... * fixed tests * fixed unimplemented for TE, swapped wnaf table row/col for batchaddwrite * wnaf table generation uses fewer copies, remove timing instrumentation * Minor Cleanup * Add feature-activated timing instrumentation, reduce code bloat (wnaf) * unused var, no_std * Make timing macros defined globally, instrument more code * instrument w/ tid, better num_rounds est. f64, timing black/whitelisting * Minor changes * refactor tests, generic MSM test * 2D test matrix :) * batchaffine * tests * additive features * big_n feature for test-benching * prefetch unroll * minor adjustments * extension(s -> "")_fields * remove artifacts, fix asm * uncomment subgroup checks, glv param sources * Clean up GLV murkiness and add comments * Set defaults for glv_window_size * refactor glv to use examples * redact glv * ammend accidental failed resolve * ammend bititerator merge conflicts * ammend accidental deletion of batch ops and glv * Fix trait bound issues by adding `+ From` to ScalarField * fix inexplicable fp6 error * minor import errors * additional small errors * yml * yml

Pratyush · 2020-11-20T02:23:02Z

Hi @jon-chuang do you mind re-opening this PR against arkworks-rs/algebra and against arkworks-rs/curves?

(It'll probably be quite a rework, sorry for the inconvenience!)

jon-chuang added 30 commits August 1, 2020 00:46

First draft affine batch ops & wnaf

a64d7fb

changes to mutability and lifetimes

b7024dd

delete superfluous files

40ef5d7

crazy direction: Passing a FnMut to generate an iterator locally

0fa5eeb

unsuccessful further attempts

eebb12b

compile sucess using index approach

4d22acf

fixes for mutable borrows

bbbec75

Successfully passed scalar mul test

3a6e45c

benchmarks + prefetching

5c65917

stash

3bf2bc1

generic impl of batch arith for all affinecurves

4bb5ad5

batched affine formulas for TE - too expensive

67da071

improved TE affine

2e54f67

cleanup batch inversion

62df27d

fmt...

e6d28b6

fix minor error

74d9bb7

remove debugging scaffolding

908fb73

fmt...

c0a5a07

delete batch arith bench as not suitable for criterion or bench

5c89660

fix bench removal errors

6359f7c

fmt...

56b8181

added missing coeff_a

ec2decd

refactor BatchGroupArithmetic to be separate trait

bad37bd

Batch verification with radix sort

5b9cae9

Cache-locality & parallelisation

cbf8e49

Successfully impl batch verify

200f5fa

added tests and bench for batch_ver, parallel_random_gen, ^ thread util

ed7c4a7

fmt

0e612e4

enabled missing test

8819290

remove voracious_radix_sort

a8e9c18

jon-chuang added 16 commits September 11, 2020 15:11

prefetch unroll

f21f40a

minor adjustments

c605894

extension(s -> "")_fields

6a70b67

remove artifacts, fix asm

c83b29d

uncomment subgroup checks, glv param sources

3a8e853

Clean up GLV murkiness and add comments

d8c5d08

Set defaults for glv_window_size

a5f4521

refactor glv to use examples

6b65eda

Merge branch 'master' into jonch/trinity/glv

65ff04f

ammend accidental failed resolve

a4c738a

ammend bititerator merge conflicts

ec4cd02

ammend accidental deletion of batch ops and glv

9a875f8

Fix trait bound issues by adding + From to ScalarField

663adc3

fix inexplicable fp6 error

0f41347

edit yml not to use --all-features

757afc0

yml indent...?

c0845fb

jon-chuang changed the title ~~Jonch/trinity/glv~~ GLV framework + scripts Sep 27, 2020

jon-chuang changed the title ~~GLV framework + scripts~~ GLV framework + precomp scripts Sep 27, 2020

kobigurk and others added 3 commits October 5, 2020 14:40

Adds BW6 assembly (#14)

cecdebd

* adds CIOS assembly * fix cfg for asm unsafe code Co-authored-by: jon-chuang <9093549+jon-chuang@users.noreply.github.com>

Merge branch 'master' into jonch/trinity/glv

54132f1

jon-chuang force-pushed the jonch/trinity/glv branch from aa38655 to 54132f1 Compare October 6, 2020 11:05

jon-chuang added 3 commits October 6, 2020 19:09

Include negative subgroup verify tests

c9fe2f6

Merge branch 'master' into jonch/trinity/glv

39a9e75

mratsim mentioned this pull request Oct 13, 2020

Research zkSNARKS blocker: Benchmark and optimize proof time vacp2p/research#7

Open

Pratyush closed this Nov 20, 2020

jon-chuang mentioned this pull request Sep 22, 2021

Making use of the new algorithmic optimisations available in Zexe (soon to be arkworks) AleoHQ/snarkVM#412

Open

PatStiles mentioned this pull request Nov 3, 2023

[FEAT]: Utilize glv decomposition to speed up MSM operations ingonyama-zk/icicle#260

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLV framework + precomp scripts #296

GLV framework + precomp scripts #296

jon-chuang commented Sep 27, 2020 •

edited

Pratyush commented Nov 20, 2020

GLV framework + precomp scripts #296

GLV framework + precomp scripts #296

Conversation

jon-chuang commented Sep 27, 2020 • edited

Pratyush commented Nov 20, 2020

jon-chuang commented Sep 27, 2020 •

edited