-
Notifications
You must be signed in to change notification settings - Fork 585
chore: pippenger int audit #19302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: pippenger int audit #19302
Conversation
…i/pippenger-audit-0
| uint32_t result = lo + (hi << lo_slice_bits); | ||
| return result; | ||
| size_t lo_bit = (hi_bit < slice_size) ? 0 : hi_bit - slice_size; | ||
| return scalar.get_bit_slice_raw(lo_bit, hi_bit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this method to the field core primitives. it's very efficient and can't be replaced with existing primitives. I tried creating uint256_t and slicing it, it resulted in 4-6% regression
| cached_cost = total_cost; | ||
| target_bit_slice = bit_slice; | ||
| // Cost model: total_cost = num_rounds * (num_points + num_buckets * BUCKET_ACCUMULATION_COST) | ||
| auto compute_cost = [&](size_t bits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of renaming here
| const size_t num_rounds = numeric::ceil_div(NUM_BITS_IN_FIELD, bits_per_slice); | ||
| for (size_t i = 0; i < num_rounds; ++i) { | ||
| round_output = evaluate_pippenger_round(msm_data, i, affine_data, bucket_data, round_output, bits_per_slice); | ||
| evaluate_pippenger_round(msm_data, i, affine_data, bucket_data, msm_result, bits_per_slice); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't love void methods with a bunch of params, but i think it's clear that the method modifies msm_result
| result += round_output; | ||
| return result; | ||
| // Accumulate into running total | ||
| accumulate_round_result(msm_accumulator, bucket_result, round_index, bits_per_slice); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoiding duplication here
| * @param scratch_it Current scratch space iterator (modified in place) | ||
| */ | ||
| template <typename Curve> | ||
| __attribute__((always_inline)) static inline void process_single_point( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply isolating shared logic from consume_point_schedule
| } | ||
|
|
||
| /** | ||
| * @brief Process a pair of points/buckets using branchless conditional moves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
| MSM<Curve>::BucketAccumulators& bucket_data, | ||
| size_t num_input_points_processed, | ||
| size_t num_queued_affine_points) noexcept | ||
| MSM<Curve>::BucketAccumulators& bucket_data) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched from recursive to iterative structure here
| template <typename Curve> | ||
| void MSM<Curve>::transform_scalar_and_get_nonzero_scalar_indices(std::span<typename Curve::ScalarField> scalars, | ||
| std::vector<uint32_t>& consolidated_indices) noexcept | ||
| std::vector<uint32_t>& nonzero_scalar_indices) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renaming + added docs
| size_t thread_accumulated_work = 0; | ||
| size_t current_thread_idx = 0; | ||
| for (size_t i = 0; i < num_msms; ++i) { | ||
| BB_ASSERT_DEBUG(i < msm_scalar_indices.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant assert
| const size_t available_thread_work = total_thread_work - thread_accumulated_work; | ||
| const size_t work_to_assign = std::min(available_thread_work, msm_work_remaining); | ||
|
|
||
| work_units[current_thread_idx].push_back(MSMWorkUnit{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit cleaner when using min
| * @return constexpr size_t | ||
| */ | ||
| template <typename Curve> size_t MSM<Curve>::get_optimal_log_num_buckets(const size_t num_points) noexcept | ||
| template <typename Curve> uint32_t MSM<Curve>::get_optimal_log_num_buckets(const size_t num_points) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly improved the clarity here
| template <typename Curve> bool MSM<Curve>::use_affine_trick(const size_t num_points, const size_t num_buckets) noexcept | ||
| { | ||
| if (num_points < 128) { | ||
| if (num_points < AFFINE_TRICK_THRESHOLD) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all constants are now living in the header
| * @param scratch_space coordinate field scratch space needed for batched inversion | ||
| **/ | ||
| template <typename Curve> | ||
| void MSM<Curve>::add_affine_points(typename Curve::AffineElement* points, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deduplicated in the follow-up
| */ | ||
| template <typename Curve> | ||
| typename Curve::Element MSM<Curve>::small_pippenger_low_memory_with_transformed_scalars(MSMData& msm_data) noexcept | ||
| typename Curve::Element MSM<Curve>::jacobian_pippenger_with_transformed_scalars(MSMData& msm_data) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems clearer this way
|
|
||
| for (uint32_t round = 0; round < num_rounds; ++round) { | ||
| // Populate buckets using Jacobian accumulation | ||
| for (size_t i = 0; i < size; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically unfolded evaluate_small_pippenger_round, the impl is pretty concise
| for (size_t i = 0; i < num_rounds; ++i) { | ||
| round_output = evaluate_pippenger_round(msm_data, i, affine_data, bucket_data, round_output, bits_per_slice); | ||
| } | ||
| // Per-call allocation for WASM compatibility (thread_local causes issues in WASM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it was thread_local specifically or its combination with accidentally disabled multi-threading in Pippenger. But ivc integration test were failing, seemed worth documenting
| } else { | ||
| bucket_data.buckets[bucket_index] = points[nonzero_scalar_indices[i]]; | ||
| bucket_data.bucket_exists.set(bucket_index, true); | ||
| for (uint32_t round = 0; round < num_rounds; ++round) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, unfolded pippenger_round, just to be able to see the entire algo in one loop
| MSM<Curve>::BucketAccumulators& bucket_data, | ||
| size_t num_input_points_processed, | ||
| size_t num_queued_affine_points) noexcept | ||
| void MSM<Curve>::batch_accumulate_points_into_buckets(std::span<const uint64_t> point_schedule, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- somewhat more math-oriented name, makes it easier to match with the abstract description
- unfolded recursion
- both loops share bucket/point processing logic
| // unfortnately we need to remove const on this data type to prevent duplicating _scalars (which is typically | ||
| // large) We need to convert `_scalars` out of montgomery form for the MSM. We then convert the scalars back | ||
| // into Montgomery form at the end of the algorithm. NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast) | ||
| // TODO(https://github.com/AztecProtocol/barretenberg/issues/1449): handle const correctness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to make it cleaner but couldn't find something that would look way cleaner. Maybe can take a look in a followup PR. But added tests checking that batch msm preserves constness.
…i/pippenger-audit-0
barretenberg/cpp/src/barretenberg/ecc/fields/field_declarations.hpp
Outdated
Show resolved
Hide resolved
ledwards2225
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM - some great cleanup. Few issues with the get_bit_slice_raw method I think and some other minor comments
barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/bitvector.hpp
Outdated
Show resolved
Hide resolved
barretenberg/cpp/src/barretenberg/ecc/fields/field_declarations.hpp
Outdated
Show resolved
Hide resolved
barretenberg/cpp/src/barretenberg/ecc/fields/field_declarations.hpp
Outdated
Show resolved
Hide resolved
barretenberg/cpp/src/barretenberg/ecc/fields/field_declarations.hpp
Outdated
Show resolved
Hide resolved
| AffineElement result = scalar_multiplication::MSM<Curve>::msm(points, scalar_span); | ||
|
|
||
| AffineElement expected(base_point * scalar_sum); | ||
| EXPECT_EQ(result, expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the fact that it works expected even though handle_edge_cases is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't using enough points to trigger pippenger, actually, thanks for catching this
|
|
||
| ### Point Scheduling (Affine Variant Only) | ||
|
|
||
| Entries are packed as `(point_index << 32) | bucket_index` and sorted via **in-place MSD radix sort**. Sorting groups points by bucket, enabling efficient batch processing. The sort also detects entries with `bucket_index == 0` during the final radix pass, allowing zero-bucket entries to be skipped without a separate scan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth stressing that as c = 8, this is effectively appending the bucket index at the end of the index we use to pack the points
federicobarbacovi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Just some minor comments
barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/scalar_multiplication.cpp
Outdated
Show resolved
Hide resolved
BEGIN_COMMIT_OVERRIDE feat: support JSON input files for bb verify command (#19800) fix: update bootstrap.sh to use new JSON field names chore: Update `index.js` so that `HAS_ZK` and `PUBLIC_INPUTS` variables must always be set in tests (#19884) chore: pippenger int audit (#19302) chore: deduplicate batch affine addition trick (#19788) chore: transcript+codec+poseidon2 fixes (#19419) chore!: explicitly constrain inputs and intermediate witnesses (#19826) fix: exclude nlohmann/json from WASM builds in json_output.hpp chore: translator circuit builder and flavor audit (#19798) Revert "fix: exclude nlohmann/json from WASM builds in json_output.hpp" Revert "feat: support JSON input files for bb verify command (#19800)" Revert "fix: update bootstrap.sh to use new JSON field names" END_COMMIT_OVERRIDE
clean up + docs+ a couple of edge case tests Closes AztecProtocol/barretenberg#486 --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
clean up + docs+ a couple of edge case tests
Closes AztecProtocol/barretenberg#486