Fast Coset Extrapolate #212

aszepieniec · 2024-05-15T12:25:15Z

How do you go from a column in the AET to the interpolant's values in a small number of points? INTT and batch-evaluation comes to mind, but in some cases a direct method is faster: fast_coset_extrapolate.

Anticipates faster batch-evaluation. Also: ignore conventional commit tag "support".

Also: add `batch_multiply` and `par_batch_multiply` functions.

It did not work on polynomials with trailing zeros (_i.e._, on low degree terms) because coefficient reversal got rid of the shift. Solution is to keep track of the disappeared shift and put it back in.

coveralls · 2024-05-17T11:07:09Z

coverage: 97.66% (+0.8%) from 96.84%
when pulling 0a71c3e on asz/fast-extrapolate
into 6e2c012 on master.

jan-ferdinand

The interface of ZerofierTree can be simplified very easily, leading to a cleaner design. The interface of some methods in polynomial.rs should be simplified due to their current complexity.

Additionally, code coverage reports a drop in polynomial.rs. This might mean that some functions are not sufficiently tested (anymore?). Unfortunately, the coveralls service seems to be down at the moment. Consequently, I can't easily pin-point what exactly needs improving on that front. Let's look again in a few days.

I have left requests, questions, and suggestions in-line.

twenty-first/benches/extrapolation.rs

twenty-first/src/math/polynomial.rs

jan-ferdinand · 2024-05-21T11:09:43Z

The coveralls service seems to be up again. Here's a summary of uncovered functions:

ZerofierTree::new()
Polynomial::par_interpolate()
Polynomial::par_batch_evaluate()
Polynomial::par_evaluate()
Polynomial::par_batch_coset_extrapolate()
Polynomial::par_batch_fast_coset_extrapolate()
Polynomial::par_batch_naive_coset_extrapolate()

aszepieniec · 2024-05-21T16:25:03Z

@jan-ferdinand Can you help me out with this failing test? I don't know what to do.

error: creating test list failed

Caused by:
  for `bfieldcodec_derive`, command `'D:\a\twenty-first\twenty-first\target\debug\deps\bfieldcodec_derive-af0485fa0f560c8c.exe' --list --format terse` exited with code 0xc0000135: The specified module could not be found. (os error 126)
--- stdout:

--- stderr:

---
Error: Process completed with exit code 1.

jan-ferdinand · 2024-05-21T17:34:36Z

First hunch: did you add a benchmark that writes to stdout as opposed to stderr? Yes: use stderr (through eprint! and eprintln!) instead. No: let me know, I'll take a closer look.

aszepieniec · 2024-05-21T18:54:48Z

First hunch: did you add a benchmark that writes to stdout as opposed to stderr? Yes: use stderr (through eprint! and eprintln!) instead. No: let me know, I'll take a closer look.

No, none of the new (or old) benchmarks write to either stdout or stderr.

jan-ferdinand · 2024-05-21T20:29:47Z

This seems to be an issue in the test runner we use, cargo nextest. They seem to have found a fix, and there is a pull request that is probably minutes away from being merged. This should probably resolve itself in the next few days, and for the immiadate being, there seems to be a workaround through explicitly setting a rustc-related environment variable. I'll tinker with our CI script tomorrrow.

jan-ferdinand

Great improvements! I left a few more nits inline.

I understand this nitpicking can be annoying. However, I want to argue that these review iterations are important work: the easier our code is to understand and maintain, the faster we can keep moving not just now but especially in the future, and the lower our apprehension of improving on existing code.

twenty-first/benches/coset_extrapolation.rs

twenty-first/src/math/b_field_element.rs

twenty-first/src/math/zerofier_tree.rs

twenty-first/src/math/polynomial.rs

Method is faster for no practical parameter sizes and also incorrect.

- Use precomputed reduced zerofiers - Add docstring to interface function. - Precompute NTT-friendly multiple of modulus

- Benchmark only main processing phase - Don't forget to scale after INTT

Also: Dispatcher method into faster coset extrapolate method based on threshold

- Apples-to-apples: extract preprocessing out of intt-then-eval. - Include dispatcher. - Dop barycentric.

- Explain disabled benchmark - Add struct `PreprocessingData` to replace impromptu tuple (later removed) - Use `bfe_vec!` to reduce boilerplate - Remove explicit type annotation to improve readability - Drop needless recast to same type - Use `par_chunks` to avoid calculating chunk endpoints - Drop inferred type from const variable declaration - Drop needless allocation; avoid `into_iter` immediately following `collect_vec` - Delete deprecated code - Drop unnecessary pass-through function - Avoid configuring sample size twice - De-nest imports - De-abbreviate variable name for readability - Add docstrings to public API functions - Remove `pub` marker on local const - Drop always-slower barycentric benchmark (but keep mention of conclusion in comments) - Add link and improve explanation - Use `resize` as opposed to pushing in a loop Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>

- Add `new` function to `Leaf` and use it - Add `new` function to `Branch` and use it - Derive `PartialEq` for zerofier tree types and use equality operator to match enums - Remove clunky accessors - De-nest imports - Use `Box<_>` instead of `Arc<OnceLock<_>>` The `Arc<OnceLock<_>>` serves two purposes - It introduces one level of indirection to create the recursive data structure. - It is thread-safe. In fact, `Box<_>` can serve the same purposes. The reason why you might want `Arc<OnceLock<_>>` sometimes is because you don't want to set the data and you might not even know when it is going to be set. In this case, we always set the data immediately. It follows that we do not need interior mutability (even once-mutability) and that the data structure is already thread-safe. To be extra careful, this commit adds compile-time tests that the `Sized + Send + Sync + Unpin` traits are implemented for the zerofier tree data types. Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>

Simplifies type signatures, even though the involved functions are private. Better for maintenance and readability. Also: - Chunk factors into pairs in `batch_multiply` - Use `bfe_vec!` to reduce boilerplate - Drop explicit type declaration when inferrable - Add test verifying that `batch_multiply` and iterative `multiply` are identical - Remove restriction on number of elements in batch - Short-circuit when batch size is zero - Reduce visibility of internal methods - Disclaim panic condition

- Drop redundant variable prefix - Rename const and drop inferrable type info - Avoid needless clone - Test identity between zerofier tree and polynomial zerofiers - Reduce visibility of field `points` on `Leaf` - Drop needless type info from const literals in tests. - Drop needless type info from const literal in test - Drop explicit type declarations when inferrable - Improve readability of domain generation - Add test: zerofier tree can be empty - Move expected value first in assert statements - Simplify zero-check shortcut - Get magical const from `BFieldElement` implementation - test that const is correct - explain why no batch-inversion happens - Drop explicit type declaration - Use `try_from` instead of `try_into` - Avoid `as`-cast from isize to usize - Avoid unnecessary tuple assignment - Use idiomatic chunk size arithmetic - Happify clippy - Add tests for `par` variants of functions (increases test coverage) - catch edge cases - drop deprecated function - Describe design decision in `batch_multiply` - Drop double specification of sample size - Simplify const correctness test - Integrate reviewer feedback - Add newline - Use `let ... else` instead of `match` - Use `expect` instead of `unwrap_or_else` - Use meaningful variable names - Use `div_ceil` instead of computing chunk size manually. - Avoid needless allocations - Shrink comment widths - Integrate reviewer feedback - Hide import in doctest - Make docstring markdown-friendly - Make docstring markdown-friendly Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>

aszepieniec added 5 commits May 14, 2024 10:33

bench: Add benchmark target

1c596b0

support: Add ZerofierTree

5be2c43

Anticipates faster batch-evaluation. Also: ignore conventional commit tag "support".

refactor: Separate parallel from sequential zerofier methods

881d441

Also: add `batch_multiply` and `par_batch_multiply` functions.

feat: Use zerofier tree in batch evaluate

4d8d792

fix: Fix structured_multiple_of_degree

4d86736

It did not work on polynomials with trailing zeros (_i.e._, on low degree terms) because coefficient reversal got rid of the shift. Solution is to keep track of the disappeared shift and put it back in.

aszepieniec requested a review from jan-ferdinand May 15, 2024 12:25

jan-ferdinand requested changes May 17, 2024

View reviewed changes

jan-ferdinand reviewed May 22, 2024

View reviewed changes

aszepieniec and others added 13 commits May 22, 2024 17:34

refactor: Drop fast_divide

3e978b6

Method is faster for no practical parameter sizes and also incorrect.

feat: Fast modular interpolate and extrapolate on coset

2c49134

bench: Benchmark coset extrapolate

3b90994

perf: Speed up with precomputation

510c4e7

- Use precomputed reduced zerofiers - Add docstring to interface function. - Precompute NTT-friendly multiple of modulus

bench: Update extrapolation benchmark

4c0ccd5

- Benchmark only main processing phase - Don't forget to scale after INTT

feat: Batch and parallel versions of coset extrapolate

9e7585d

Also: Dispatcher method into faster coset extrapolate method based on threshold

bench: Update benchmark

6abebca

- Apples-to-apples: extract preprocessing out of intt-then-eval. - Include dispatcher. - Dop barycentric.

style: Happify clippy

ba014ce

ci: Work around nextest bug

0a71c3e

aszepieniec force-pushed the asz/fast-extrapolate branch from 4db4726 to 0a71c3e Compare May 22, 2024 15:43

aszepieniec merged commit ea398c0 into master May 22, 2024
5 checks passed

aszepieniec deleted the asz/fast-extrapolate branch May 22, 2024 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast Coset Extrapolate #212

Fast Coset Extrapolate #212

aszepieniec commented May 15, 2024

coveralls commented May 17, 2024 •

edited

Loading

jan-ferdinand left a comment

jan-ferdinand commented May 21, 2024

aszepieniec commented May 21, 2024

jan-ferdinand commented May 21, 2024

aszepieniec commented May 21, 2024

jan-ferdinand commented May 21, 2024 •

edited

Loading

jan-ferdinand left a comment •

edited

Loading

Fast Coset Extrapolate #212

Fast Coset Extrapolate #212

Conversation

aszepieniec commented May 15, 2024

coveralls commented May 17, 2024 • edited Loading

jan-ferdinand left a comment

Choose a reason for hiding this comment

jan-ferdinand commented May 21, 2024

aszepieniec commented May 21, 2024

jan-ferdinand commented May 21, 2024

aszepieniec commented May 21, 2024

jan-ferdinand commented May 21, 2024 • edited Loading

jan-ferdinand left a comment • edited Loading

Choose a reason for hiding this comment

coveralls commented May 17, 2024 •

edited

Loading

jan-ferdinand commented May 21, 2024 •

edited

Loading

jan-ferdinand left a comment •

edited

Loading