Add polynomial benchmark infra, switch poly eval to horners methods #114

ValarDragon · 2020-12-06T22:59:28Z

Description

This adds infrastructure for benchmarking polynomial operations across polynomial degrees. Furthermore, it switches polynomial evaluation to use horner's method, which both makes evaluation take constant memory, but also results in a 2x speed improvement on my laptop. Benchmark results for serial polynomial evaluation with Horners method (with outlier messages trimmed):

"bls12_381" - evaluate_polynomial/32768
                        time:   [962.88 us 965.60 us 968.56 us]
                        change: [-52.089% -50.962% -50.033%] (p = 0.00 < 0.05)
                        Performance has improved.

 "bls12_381" - evaluate_polynomial/65536
                        time:   [1.9306 ms 1.9472 ms 1.9699 ms]
                        change: [-55.475% -55.072% -54.638%] (p = 0.00 < 0.05)
                        Performance has improved.

"bls12_381" - evaluate_polynomial/131072
                        time:   [3.8705 ms 3.8858 ms 3.9019 ms]
                        change: [-60.158% -59.669% -59.202%] (p = 0.00 < 0.05)
                        Performance has improved.

Additionally this PR implements a fully parallelized implementation of single point evaluation.

closes: #85

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

Targeted PR against correct branch (main)
Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
Wrote unit tests - existing tests cover correctness
Updated relevant documentation in the code
Added a relevant changelog entry to the Pending section in CHANGELOG.md
Re-reviewed Files changed in the Github PR explorer

ValarDragon · 2020-12-06T23:11:01Z

Ah didn't realize cfg_into_iter handled parallelization, forgot to test with the parallelization feature. I'll add parallelization in

ValarDragon · 2020-12-07T00:37:53Z

The parallel implementation is even more of a speedup! On my 16 core laptop, parallel horners method is a 5x speedup over the current parallel method.

"bls12_381" - evaluate_polynomial/32768
                        time:   [202.18 us 204.50 us 207.14 us]
                        change: [-79.199% -78.846% -78.385%] (p = 0.00 < 0.05)
                        Performance has improved.
"bls12_381" - evaluate_polynomial/65536
                        time:   [381.50 us 386.26 us 391.48 us]
                        change: [-80.146% -79.783% -79.437%] (p = 0.00 < 0.05)
                        Performance has improved.
 "bls12_381" - evaluate_polynomial/131072
                        time:   [738.91 us 746.82 us 755.92 us]
                        change: [-80.641% -80.355% -79.991%] (p = 0.00 < 0.05)
                        Performance has improved

ValarDragon · 2020-12-07T01:24:20Z

Looking into why the parallel speedup was so high, it turns out that the prior polynomial.evaluate was only half parallelized. The computation of all powers {x^i} was done sequentially. (Only the multiplication by the coefficients and summation was parallelized before)

poly/src/polynomial/univariate/dense.rs

ValarDragon · 2020-12-07T20:09:39Z

Locally, using par_chunks resulted in a slowdown, but could be due to system noise. I'll check on the benchmark server.

ValarDragon · 2020-12-07T20:47:45Z

On the benchmark server there was essentially no speed difference, so the par_chunks impl is good to use

…o bench_dense_poly

Pratyush · 2020-12-07T20:57:22Z

Awesome, thanks! Final point: should we extract the common core of both versions of the internal_evaluate algorithms into a separate method?

ValarDragon · 2020-12-07T21:00:12Z

Sure

Pratyush · 2020-12-07T21:07:12Z

Oh and finally, do you think if we use cfg_par_chunks we can unify the two versions? (if you'd like to keep it separate for clarity that's fine too)

ValarDragon · 2020-12-07T21:13:29Z

We'd have to add a cfg_par_chunks!(vector, min_num_elements_per_thread) to ark-std, but if we did that we probably could unify them. (Same for batch_inversion).

I'd prefer to merge this as is, and then update usages here after such an update to ark-std

ValarDragon added 5 commits December 6, 2020 16:35

Add benchmark for dense polynomial evaluate, rename fft bench

350f826

Use horners method for polynomial evaluation

9fa6843

fix bug

e6d4b71

Fix lint

60db7d5

Add changelog

5dea19c

Add parallel horners method support

0458855

Fix lint, add some more comments

8b74f1c

ValarDragon requested a review from Pratyush December 7, 2020 00:42

Move import to feature gate

4ce7d11

Pratyush reviewed Dec 7, 2020

View reviewed changes

poly/src/polynomial/univariate/dense.rs Outdated Show resolved Hide resolved

Refactor evaluate to use par_chunks

e5888ad

ValarDragon and others added 3 commits December 7, 2020 14:48

Merge branch 'master' into bench_dense_poly

1b04779

fix style

3e6c0e6

Merge branch 'bench_dense_poly' of github.com:arkworks-rs/algebra int…

0108c19

…o bench_dense_poly

Refactor common logic

4be36e4

Pratyush approved these changes Dec 7, 2020

View reviewed changes

ValarDragon merged commit 70ebfa6 into master Dec 7, 2020

ValarDragon deleted the bench_dense_poly branch December 7, 2020 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add polynomial benchmark infra, switch poly eval to horners methods #114

Add polynomial benchmark infra, switch poly eval to horners methods #114

ValarDragon commented Dec 6, 2020 •

edited

Loading

ValarDragon commented Dec 6, 2020 •

edited

Loading

ValarDragon commented Dec 7, 2020

ValarDragon commented Dec 7, 2020 •

edited

Loading

ValarDragon commented Dec 7, 2020

ValarDragon commented Dec 7, 2020 •

edited

Loading

Pratyush commented Dec 7, 2020

ValarDragon commented Dec 7, 2020

Pratyush commented Dec 7, 2020

ValarDragon commented Dec 7, 2020

Add polynomial benchmark infra, switch poly eval to horners methods #114

Add polynomial benchmark infra, switch poly eval to horners methods #114

Conversation

ValarDragon commented Dec 6, 2020 • edited Loading

Description

closes: #85

ValarDragon commented Dec 6, 2020 • edited Loading

ValarDragon commented Dec 7, 2020

ValarDragon commented Dec 7, 2020 • edited Loading

ValarDragon commented Dec 7, 2020

ValarDragon commented Dec 7, 2020 • edited Loading

Pratyush commented Dec 7, 2020

ValarDragon commented Dec 7, 2020

Pratyush commented Dec 7, 2020

ValarDragon commented Dec 7, 2020

ValarDragon commented Dec 6, 2020 •

edited

Loading

ValarDragon commented Dec 6, 2020 •

edited

Loading

ValarDragon commented Dec 7, 2020 •

edited

Loading

ValarDragon commented Dec 7, 2020 •

edited

Loading