CUDA Scalar Mul #17

jon-chuang · 2020-10-06T11:40:37Z

Exposes single function cpu_gpu_scalar_mul

impl<G: AffineCurve> GPUScalarMulSlice<G> for [G] {
    fn cpu_gpu_scalar_mul(
        &mut self,
        exps_h: &[<<G as AffineCurve>::ScalarField as PrimeField>::BigInt],
        cuda_group_size: usize,
        // size of the batch for cpu scalar mul
        cpu_chunk_size: usize,
    ) {
        if accel::Device::init() && cfg!(feature = "cuda") {
            <G as AffineCurve>::Projective::cpu_gpu_static_partition_run_kernel(
                self,
                exps_h,
                cuda_group_size,
                cpu_chunk_size,
            );
        } else {
            let mut exps_mut = exps_h.to_vec();
            cfg_chunks_mut!(self, cpu_chunk_size)
                .zip(cfg_chunks_mut!(exps_mut, cpu_chunk_size))
                .for_each(|(b, s)| {
                    b[..].batch_scalar_mul_in_place(&mut s[..], 4);
                });
        }
    }
}

A majority of the PR lies in the algebra/curves/cuda folder (850 out of 1400 loc). There is a deletion of sw_projective.rs as it is not actually used anywhere and Pratyush mentioned he plans to drop it in future zexe. The rest of the diff is boilerplate to impl for different curves, or code displacement (300 loc). Also, there is a test.

Potential TODOs:

cleanup the GPUScalarMul interface, choose what to expose (ctx?). Maybe, make GPUScalarMul pub(crate) instead of pub, and have GPUScalarMulSlice be the only pub.
Alternatively, make a new trait, GPUScalarMulInternal pub(crate) and choose what to expose in public GPUScalarMul interface. (I prefer this).

jon-chuang added 30 commits August 1, 2020 00:46

First draft affine batch ops & wnaf

a64d7fb

changes to mutability and lifetimes

b7024dd

delete superfluous files

40ef5d7

crazy direction: Passing a FnMut to generate an iterator locally

0fa5eeb

unsuccessful further attempts

eebb12b

compile sucess using index approach

4d22acf

fixes for mutable borrows

bbbec75

Successfully passed scalar mul test

3a6e45c

benchmarks + prefetching

5c65917

stash

3bf2bc1

generic impl of batch arith for all affinecurves

4bb5ad5

batched affine formulas for TE - too expensive

67da071

improved TE affine

2e54f67

cleanup batch inversion

62df27d

fmt...

e6d28b6

fix minor error

74d9bb7

remove debugging scaffolding

908fb73

fmt...

c0a5a07

delete batch arith bench as not suitable for criterion or bench

5c89660

fix bench removal errors

6359f7c

fmt...

56b8181

added missing coeff_a

ec2decd

refactor BatchGroupArithmetic to be separate trait

bad37bd

Batch verification with radix sort

5b9cae9

Cache-locality & parallelisation

cbf8e49

Successfully impl batch verify

200f5fa

added tests and bench for batch_ver, parallel_random_gen, ^ thread util

ed7c4a7

fmt

0e612e4

enabled missing test

8819290

remove voracious_radix_sort

a8e9c18

kobigurk added 11 commits November 6, 2020 16:47

install required toolchain

9859cb7

Empty commit to get CI working

c60ca93

try to fix ci

7f7c887

Merge remote-tracking branch 'origin/master' into jonch/mongrel

9e7c407

fmt

22cfcd1

fix ci

f9355b8

safer error handling in gpu code

478a526

fix ci

ae0909c

handle dirs crate not available without cuda

16f408f

don't check early intermediate results

44ac6d9

fix no_std and nightly

0e5f2c4

jon-chuang force-pushed the jonch/mongrel branch from ca4e6a9 to 07b37a9 Compare November 7, 2020 06:47

fix remaining errors

06cc547

jon-chuang force-pushed the jonch/mongrel branch from ead9475 to 06cc547 Compare November 8, 2020 03:28

jon-chuang and others added 12 commits November 8, 2020 13:29

No for_tests

24bb1f1

Feature gate clear profile data

e4fcb04

install cuda library to successfully link

95902fc

change the order of CI jobs

5e9c0a0

change the order of CI again

1235667

cd ..

5b53d60

Get rid of cacheing

3b84656

Never all features

c966a57

Put back cacheing

a0ae36f

Remove cuda .deb to save disk space

152fd36

Increase max-parallel

51ce96b

check examples with all features

b508064

kobigurk force-pushed the jonch/mongrel branch from 7aaecde to b508064 Compare November 9, 2020 07:24

kobigurk approved these changes Nov 10, 2020

View reviewed changes

jon-chuang merged commit 7c518fd into master Nov 10, 2020

jon-chuang mentioned this pull request Sep 22, 2021

Making use of the new algorithmic optimisations available in Zexe (soon to be arkworks) AleoNet/snarkVM#412

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Scalar Mul #17

CUDA Scalar Mul #17

jon-chuang commented Oct 6, 2020 •

edited

Loading

CUDA Scalar Mul #17

CUDA Scalar Mul #17

Conversation

jon-chuang commented Oct 6, 2020 • edited Loading

jon-chuang commented Oct 6, 2020 •

edited

Loading