Fix degree bug, and add benchmarks of polynomial add/sub/neg #119

ValarDragon · 2020-12-07T22:11:23Z

Description

Parallelize Add, Sub, AddAssign, SubAssign, Neg operations on DensePolynomial.

Furthermore fix bugs in degree calculation when you have dense polynomial addition / subtraction that causes the leading coefficient to be 0.

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

Targeted PR against correct branch (main)
Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
Wrote unit tests
Updated relevant documentation in the code
Added a relevant changelog entry to the Pending section in CHANGELOG.md
Re-reviewed Files changed in the Github PR explorer

Pratyush · 2020-12-07T23:29:17Z

Hmm I wonder if parallelizing these cheap operations would give better speedups if we used this to increase the amount of computation on each thread

ValarDragon · 2020-12-08T04:19:17Z

I'll add benchmarks to see how it behaves with and without that.

ValarDragon · 2020-12-08T20:21:52Z

Based on benchmarks for small sized polynomials, definitely appears like we should be using that for small polynomials. Results of parallel addition speeds on the benchmark server:

"bls12_381" - add_polynomial/4
                        time:   [220.19 us 221.88 us 223.44 us]
"bls12_381" - add_polynomial/16
                        time:   [283.37 us 285.74 us 287.92 us]
"bls12_381" - add_polynomial/64
                        time:   [191.32 us 195.84 us 199.98 us]
"bls12_381" - add_polynomial/256
                        time:   [156.59 us 157.25 us 157.87 us]
"bls12_381" - add_polynomial/4096
                        time:   [308.35 us 313.17 us 317.55 us]
"bls12_381" - add_polynomial/1024
                        time:   [214.63 us 216.30 us 217.81 us]                                                          
"bls12_381" - add_polynomial/16384
                        time:   [570.16 us 572.52 us 574.73 us]
"bls12_381" - add_polynomial/65536
                        time:   [1.0938 ms 1.1022 ms 1.1127 ms]
"bls12_381" - add_polynomial/262144
                        time:   [3.6081 ms 3.6522 ms 3.6979 ms]

For reference, heres the serial times:

"bls12_381" - add_polynomial/4
                        time:   [37.483 ns 38.156 ns 38.833 ns]
"bls12_381" - add_polynomial/16
                        time:   [76.012 ns 77.167 ns 78.320 ns]
"bls12_381" - add_polynomial/64
                        time:   [272.49 ns 274.50 ns 276.44 ns]
"bls12_381" - add_polynomial/256
                        time:   [1.0792 us 1.0849 us 1.0905 us]
"bls12_381" - add_polynomial/1024
                        time:   [4.5605 us 4.5762 us 4.5916 us]
"bls12_381" - add_polynomial/4096
                        time:   [23.736 us 23.813 us 23.894 us]
"bls12_381" - add_polynomial/16384
                        time:   [156.44 us 157.11 us 158.31 us]
"bls12_381" - add_polynomial/65536
                        time:   [725.35 us 726.04 us 726.87 us]
"bls12_381" - add_polynomial/262144
                        time:   [3.2842 ms 3.2916 ms 3.3022 ms]

Should we block merging this PR until we update ark_std for this? The overhead of parallelization here is huge at small sizes

ValarDragon · 2020-12-08T20:51:00Z

with_min_len=64 benchmarks:
(we get similar numbers with with_min_len=4096, alot of the overhead is just due to attempting parallelism I guess)

"bls12_381" - add_polynomial/1
                        time:   [95.391 us 96.408 us 97.278 us]
                        change: [-9.8377% -8.8342% -7.9433%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/4
                        time:   [174.42 us 175.61 us 176.70 us]
                        change: [-11.489% -9.2890% -7.0652%] (p = 0.00 < 0.05)
bls12_381" - add_polynomial/16
                        time:   [223.92 us 225.63 us 227.20 us]
                        change: [-19.895% -18.767% -17.660%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/64
                        time:   [212.25 us 216.25 us 219.94 us]
                        change: [+10.884% +13.254% +15.973%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/256
                        time:   [244.27 us 247.81 us 251.16 us]
                        change: [+53.078% +55.495% +57.799%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/1024
                        time:   [181.77 us 182.84 us 183.86 us]
                        change: [-15.485% -14.604% -13.712%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/4096
                        time:   [199.15 us 202.84 us 206.36 us]
                        change: [-36.096% -34.938% -33.571%] (p = 0.00 < 0.05)                                                          
"bls12_381" - add_polynomial/16384
                        time:   [384.39 us 387.86 us 391.42 us]
                        change: [-33.035% -31.715% -30.393%] (p = 0.00 < 0.05)
"bls12_381" - add_polynomial/65536
                        time:   [965.96 us 975.14 us 985.47 us]
                        change: [-18.815% -14.592% -10.073%] (p = 0.00 < 0.05)                                                         
"bls12_381" - add_polynomial/262144
                        time:   [3.1964 ms 3.2494 ms 3.3045 ms]

ValarDragon · 2020-12-08T21:00:26Z

Hrmm, looks like with_min_len doesn't really fix the problem.

We should probably change add an ark_std macro for cfg_iter_with_min_size!(v, minimum_length) being equivalent to

#[cfg[feature(parallel)]]
if v.len() > minimum_length {
     cfg_iter!(v).with_min_len(minimum_length)
}
else {
    sequential code
}

And the minimum length being really large (like 2**20)?

Alternatively I can just undo the parallelization in this PR, and just copy over polynomial benchmark infra, and bug fixes

ValarDragon · 2020-12-08T21:21:07Z

Something seems very wrong about these add_polynomial benchmarks. Polynomial evaluation which is N multiplications is 8x faster

"bls12_381" - add_polynomial/262144
                        time:   [3.1964 ms 3.2494 ms 3.3045 ms]
"bls12_381" - evaluate_polynomial/262144
                        time:   [406.14 us 408.14 us 410.51 us]

Pratyush · 2020-12-08T21:31:02Z

Hmm I think the overhead of parallelization is causing issues here.

I think for these ops it's fine to leave them as not parallel, because they'll be dominated by other stuff anyway.

ValarDragon · 2020-12-08T21:40:21Z

Yeah, that sounds good to me. I'll remove the parallelization, but keep the bug fixes + benchmarks

ValarDragon · 2020-12-08T21:51:08Z

poly/src/polynomial/univariate/dense.rs

+                .zip(&self.coeffs)
+                .for_each(|(a, b)| {
+                    *a += b;
+                });


Should I undo changes like this?

nah this is fine

Fix degree bug, and add parallel implementations of add/sub/neg

7dae50b

ValarDragon requested a review from Pratyush December 7, 2020 22:11

Fix lint, and parallelize neg

527ca3c

Add Add, AddAssign benchmarks

d196037

Fix lint

6c6efbe

ValarDragon force-pushed the parallelize_dense_poly_ops branch from 65c4440 to 6c6efbe Compare December 8, 2020 20:25

Add with_min_len in benched locations

9e960ed

Try with larger min_len

ae92611

Undo parallelization for Add/Neg/AddAssign

b34c541

ValarDragon commented Dec 8, 2020

View reviewed changes

Pratyush approved these changes Dec 8, 2020

View reviewed changes

Pratyush changed the title ~~Fix degree bug, and add parallel implementations of add/sub/neg~~ Fix degree bug, and add benchmarks of polynomial add/sub/neg Dec 8, 2020

Pratyush merged commit 651a557 into master Dec 8, 2020

Pratyush deleted the parallelize_dense_poly_ops branch December 8, 2020 22:08

This was referenced Dec 9, 2020

Sparse poly implement poly #117

Merged

Add parallel coset shift method (speeds up coset_fft), add fft correctness tests #125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix degree bug, and add benchmarks of polynomial add/sub/neg #119

Fix degree bug, and add benchmarks of polynomial add/sub/neg #119

ValarDragon commented Dec 7, 2020 •

edited

Loading

Pratyush commented Dec 7, 2020

ValarDragon commented Dec 8, 2020 •

edited

Loading

ValarDragon commented Dec 8, 2020 •

edited

Loading

ValarDragon commented Dec 8, 2020 •

edited

Loading

ValarDragon commented Dec 8, 2020 •

edited

Loading

ValarDragon commented Dec 8, 2020

Pratyush commented Dec 8, 2020

ValarDragon commented Dec 8, 2020

ValarDragon Dec 8, 2020

Pratyush Dec 8, 2020

Fix degree bug, and add benchmarks of polynomial add/sub/neg #119

Fix degree bug, and add benchmarks of polynomial add/sub/neg #119

Conversation

ValarDragon commented Dec 7, 2020 • edited Loading

Description

Pratyush commented Dec 7, 2020

ValarDragon commented Dec 8, 2020 • edited Loading

ValarDragon commented Dec 8, 2020 • edited Loading

ValarDragon commented Dec 8, 2020 • edited Loading

ValarDragon commented Dec 8, 2020 • edited Loading

ValarDragon commented Dec 8, 2020

Pratyush commented Dec 8, 2020

ValarDragon commented Dec 8, 2020

ValarDragon Dec 8, 2020

Choose a reason for hiding this comment

Pratyush Dec 8, 2020

Choose a reason for hiding this comment

ValarDragon commented Dec 7, 2020 •

edited

Loading

ValarDragon commented Dec 8, 2020 •

edited

Loading

ValarDragon commented Dec 8, 2020 •

edited

Loading

ValarDragon commented Dec 8, 2020 •

edited

Loading

ValarDragon commented Dec 8, 2020 •

edited

Loading