Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLV framework + precomp scripts #296

Closed
wants to merge 113 commits into from

Conversation

jon-chuang
Copy link
Contributor

@jon-chuang jon-chuang commented Sep 27, 2020

The final piece of the puzzle, GLV. Patent expired 2 days ago.
Recommended to review after rest of algorithmic optimisations have been merged.
A description of the approach taken for GLV and background knowledge can be found in this hackmd article.


3/3 of a series of PRs to scipr-lab/zexe.

Features:

  • Batch Affine ops
  • Batch and ordinary w-NAF
  • GLV framework, GLV precomputation script.
  • GLV impl for both batch affine and projective mul (only for BW6)
  • Batch bucket addition tree
  • Batched MSM (1.4-1.7x bump)
  • Batch subgroup verification based on addition to random buckets
  • 2D test matrix feature-gating with axes (curve, test type)
  • Refactor tests to use common template
  • Code timing instrumentation with fine grained control of functions and callers

Combined speedup (with w-NAF) for

  • Batched mul (with GLV): 2.3-2.7x.
  • Non-batch projective mul (with GLV): 2.1x.
    Without GLV: 1.65-1.8x and 1.4x resp.

Speedup for subgroup verification

  • Over old impl 25-80x.
  • Over new batch mul: 10-30x.

Future PRs:

  • Improved MSM scaling (free scaling to 100s of threads for problem sizes > 2^25). (Need to investigate if held back by Amdahl's law in serial portion of code).
  • Recursive batch verification that gives ~1.5x speedup for large n.
  • Accel-based CUDA scalar mul + load-balancing code/framework to balance between CPU and GPU. Performance is neck to neck between weak GPU (GTX 1650 GPU) and a powerful CPU (16T Ryzen 4800H that clocks a sustained 4.2GHz). Probably better than most laptop Intels.

Future work:

@jon-chuang jon-chuang changed the title Jonch/trinity/glv GLV framework + scripts Sep 27, 2020
@jon-chuang jon-chuang changed the title GLV framework + scripts GLV framework + precomp scripts Sep 27, 2020
kobigurk and others added 3 commits October 5, 2020 14:40
* adds CIOS assembly

* fix cfg for asm unsafe code

Co-authored-by: jon-chuang <9093549+jon-chuang@users.noreply.github.com>
* First draft affine batch ops & wnaf

* changes to mutability and lifetimes

* delete superfluous files

* crazy direction: Passing a FnMut to generate an iterator locally

* unsuccessful further attempts

* compile sucess using index approach

* fixes for mutable borrows

* Successfully passed scalar mul test

* benchmarks + prefetching

* stash

* generic impl of batch arith for all affinecurves

* batched affine formulas for TE - too expensive

* improved TE affine

* cleanup batch inversion

* fmt...

* fix minor error

* remove debugging scaffolding

* fmt...

* delete batch arith bench as not suitable for criterion or bench

* fix bench removal errors

* fmt...

* added missing coeff_a

* refactor BatchGroupArithmetic to be separate trait

* Batch verification with radix sort

* Cache-locality & parallelisation

* Successfully impl batch verify

* added tests and bench for batch_ver, parallel_random_gen, ^ thread util

* fmt

* enabled missing test

* remove voracious_radix_sort

* commented unneeded Instant::now()

* Fixed batch_ver tests for curves of small or unit cofactor

* split recursive and non-recursive, tidy up shared functionality

* reduce max_logn

* adjust max_logn further

* Batch MSM, speedup only for bw6 due to poor cache performance

* fmt...

* GLV iBiginteger

* stash

* stash

* GLV with Parameter-based specialisation

* GLV lattice basis script success

* Successfully passed tests and benched

* Improvments to MSM with and bucketed adds using lightweight index sort

* changed rng to be external parameter for non-parallel batch veri

* remove bench print scaffolding

* remove old batch_bucketed_add using vectors instead of fixed offsets

* retain parallel batch_add_split

* Comments for batch arith

* remove need for hashmap for no std for batch_bucketed_add

* minor changes

* cleanup

* cleanup

* fmt + use no_std Vec

* removed std::

* add scratch space

* Add GLV for non-batched SW mul

* fix for glv_scalar_decomposition when k == MODULUS (subgroup check)

* Fixed performance BUG: unnecessary table generation

* GLV -> has_glv(), bigint slice bd check, refactor batch loops, u32 index

* clean remove of batch_verify

* fix mistake with elems indexing, unused arg for future recursion PR

* trivial errors

* more minor fixes

* fix issues with batch_ver (.is_zero(), TE affine->proj mul)

* fix issue with batch_bucketed_add_split

* misname

* Success in test and bench \(*v*)/

* tmp commit to cache experimental batch_add_write_shift_..

* remove batch_add_write_shift..

* optional dep, fmt...

* undo accidental deletion of dlsd sort

* fmt...

* cleanup batch bucket add, unify impl

* no std...

* fixed tests

* fixed unimplemented for TE, swapped wnaf table row/col for batchaddwrite

* wnaf table generation uses fewer copies, remove timing instrumentation

* Minor Cleanup

* Add feature-activated timing instrumentation, reduce code bloat (wnaf)

* unused var, no_std

* Make timing macros defined globally, instrument more code

* instrument w/ tid, better num_rounds est. f64, timing black/whitelisting

* Minor changes

* refactor tests, generic MSM test

* 2D test matrix :)

* batchaffine

* tests

* additive features

* big_n feature for test-benching

* prefetch unroll

* minor adjustments

* extension(s -> "")_fields

* remove artifacts, fix asm

* uncomment subgroup checks, glv param sources

* Clean up GLV murkiness and add comments

* Set defaults for glv_window_size

* refactor glv to use examples

* redact glv

* ammend accidental failed resolve

* remove all batch ops

* ammend bititerator merge conflicts

* ammend accidental deletion of batch ops and glv

* Fix trait bound issues by adding `+ From` to ScalarField

* fix inexplicable fp6 error

* minor import errors

* additional small errors

* remove everything but test changes

* yml

* yml

* remove batch arith from benches

* minor changes

* more minor fixes

* fmt

* fmt
* First draft affine batch ops & wnaf

* changes to mutability and lifetimes

* delete superfluous files

* crazy direction: Passing a FnMut to generate an iterator locally

* unsuccessful further attempts

* compile sucess using index approach

* fixes for mutable borrows

* Successfully passed scalar mul test

* benchmarks + prefetching

* stash

* generic impl of batch arith for all affinecurves

* batched affine formulas for TE - too expensive

* improved TE affine

* cleanup batch inversion

* fmt...

* fix minor error

* remove debugging scaffolding

* fmt...

* delete batch arith bench as not suitable for criterion or bench

* fix bench removal errors

* fmt...

* added missing coeff_a

* refactor BatchGroupArithmetic to be separate trait

* Batch verification with radix sort

* Cache-locality & parallelisation

* Successfully impl batch verify

* added tests and bench for batch_ver, parallel_random_gen, ^ thread util

* fmt

* enabled missing test

* remove voracious_radix_sort

* commented unneeded Instant::now()

* Fixed batch_ver tests for curves of small or unit cofactor

* split recursive and non-recursive, tidy up shared functionality

* reduce max_logn

* adjust max_logn further

* Batch MSM, speedup only for bw6 due to poor cache performance

* fmt...

* GLV iBiginteger

* stash

* stash

* GLV with Parameter-based specialisation

* GLV lattice basis script success

* Successfully passed tests and benched

* Improvments to MSM with and bucketed adds using lightweight index sort

* changed rng to be external parameter for non-parallel batch veri

* remove bench print scaffolding

* remove old batch_bucketed_add using vectors instead of fixed offsets

* retain parallel batch_add_split

* Comments for batch arith

* remove need for hashmap for no std for batch_bucketed_add

* minor changes

* cleanup

* cleanup

* fmt + use no_std Vec

* removed std::

* add scratch space

* Add GLV for non-batched SW mul

* fix for glv_scalar_decomposition when k == MODULUS (subgroup check)

* Fixed performance BUG: unnecessary table generation

* GLV -> has_glv(), bigint slice bd check, refactor batch loops, u32 index

* clean remove of batch_verify

* fix mistake with elems indexing, unused arg for future recursion PR

* trivial errors

* more minor fixes

* fix issues with batch_ver (.is_zero(), TE affine->proj mul)

* fix issue with batch_bucketed_add_split

* misname

* Success in test and bench \(*v*)/

* tmp commit to cache experimental batch_add_write_shift_..

* remove batch_add_write_shift..

* optional dep, fmt...

* undo accidental deletion of dlsd sort

* fmt...

* cleanup batch bucket add, unify impl

* no std...

* fixed tests

* fixed unimplemented for TE, swapped wnaf table row/col for batchaddwrite

* wnaf table generation uses fewer copies, remove timing instrumentation

* Minor Cleanup

* Add feature-activated timing instrumentation, reduce code bloat (wnaf)

* unused var, no_std

* Make timing macros defined globally, instrument more code

* instrument w/ tid, better num_rounds est. f64, timing black/whitelisting

* Minor changes

* refactor tests, generic MSM test

* 2D test matrix :)

* batchaffine

* tests

* additive features

* big_n feature for test-benching

* prefetch unroll

* minor adjustments

* extension(s -> "")_fields

* remove artifacts, fix asm

* uncomment subgroup checks, glv param sources

* Clean up GLV murkiness and add comments

* Set defaults for glv_window_size

* refactor glv to use examples

* redact glv

* ammend accidental failed resolve

* ammend bititerator merge conflicts

* ammend accidental deletion of batch ops and glv

* Fix trait bound issues by adding `+ From` to ScalarField

* fix inexplicable fp6 error

* minor import errors

* additional small errors

* yml

* yml
@Pratyush
Copy link
Member

Hi @jon-chuang do you mind re-opening this PR against arkworks-rs/algebra and against arkworks-rs/curves?

(It'll probably be quite a rework, sorry for the inconvenience!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants