Skip to content

Conversation

@iakovenkos
Copy link
Contributor

@iakovenkos iakovenkos commented Jan 5, 2026

clean up + docs+ a couple of edge case tests

Closes AztecProtocol/barretenberg#486

uint32_t result = lo + (hi << lo_slice_bits);
return result;
size_t lo_bit = (hi_bit < slice_size) ? 0 : hi_bit - slice_size;
return scalar.get_bit_slice_raw(lo_bit, hi_bit);
Copy link
Contributor Author

@iakovenkos iakovenkos Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this method to the field core primitives. it's very efficient and can't be replaced with existing primitives. I tried creating uint256_t and slicing it, it resulted in 4-6% regression

cached_cost = total_cost;
target_bit_slice = bit_slice;
// Cost model: total_cost = num_rounds * (num_points + num_buckets * BUCKET_ACCUMULATION_COST)
auto compute_cost = [&](size_t bits) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of renaming here

const size_t num_rounds = numeric::ceil_div(NUM_BITS_IN_FIELD, bits_per_slice);
for (size_t i = 0; i < num_rounds; ++i) {
round_output = evaluate_pippenger_round(msm_data, i, affine_data, bucket_data, round_output, bits_per_slice);
evaluate_pippenger_round(msm_data, i, affine_data, bucket_data, msm_result, bits_per_slice);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't love void methods with a bunch of params, but i think it's clear that the method modifies msm_result

result += round_output;
return result;
// Accumulate into running total
accumulate_round_result(msm_accumulator, bucket_result, round_index, bits_per_slice);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoiding duplication here

* @param scratch_it Current scratch space iterator (modified in place)
*/
template <typename Curve>
__attribute__((always_inline)) static inline void process_single_point(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply isolating shared logic from consume_point_schedule

}

/**
* @brief Process a pair of points/buckets using branchless conditional moves
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

MSM<Curve>::BucketAccumulators& bucket_data,
size_t num_input_points_processed,
size_t num_queued_affine_points) noexcept
MSM<Curve>::BucketAccumulators& bucket_data) noexcept
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched from recursive to iterative structure here

template <typename Curve>
void MSM<Curve>::transform_scalar_and_get_nonzero_scalar_indices(std::span<typename Curve::ScalarField> scalars,
std::vector<uint32_t>& consolidated_indices) noexcept
std::vector<uint32_t>& nonzero_scalar_indices) noexcept
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming + added docs

size_t thread_accumulated_work = 0;
size_t current_thread_idx = 0;
for (size_t i = 0; i < num_msms; ++i) {
BB_ASSERT_DEBUG(i < msm_scalar_indices.size());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant assert

const size_t available_thread_work = total_thread_work - thread_accumulated_work;
const size_t work_to_assign = std::min(available_thread_work, msm_work_remaining);

work_units[current_thread_idx].push_back(MSMWorkUnit{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit cleaner when using min

* @return constexpr size_t
*/
template <typename Curve> size_t MSM<Curve>::get_optimal_log_num_buckets(const size_t num_points) noexcept
template <typename Curve> uint32_t MSM<Curve>::get_optimal_log_num_buckets(const size_t num_points) noexcept
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly improved the clarity here

template <typename Curve> bool MSM<Curve>::use_affine_trick(const size_t num_points, const size_t num_buckets) noexcept
{
if (num_points < 128) {
if (num_points < AFFINE_TRICK_THRESHOLD) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all constants are now living in the header

* @param scratch_space coordinate field scratch space needed for batched inversion
**/
template <typename Curve>
void MSM<Curve>::add_affine_points(typename Curve::AffineElement* points,
Copy link
Contributor Author

@iakovenkos iakovenkos Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deduplicated in the follow-up

*/
template <typename Curve>
typename Curve::Element MSM<Curve>::small_pippenger_low_memory_with_transformed_scalars(MSMData& msm_data) noexcept
typename Curve::Element MSM<Curve>::jacobian_pippenger_with_transformed_scalars(MSMData& msm_data) noexcept
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems clearer this way


for (uint32_t round = 0; round < num_rounds; ++round) {
// Populate buckets using Jacobian accumulation
for (size_t i = 0; i < size; ++i) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically unfolded evaluate_small_pippenger_round, the impl is pretty concise

for (size_t i = 0; i < num_rounds; ++i) {
round_output = evaluate_pippenger_round(msm_data, i, affine_data, bucket_data, round_output, bits_per_slice);
}
// Per-call allocation for WASM compatibility (thread_local causes issues in WASM)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it was thread_local specifically or its combination with accidentally disabled multi-threading in Pippenger. But ivc integration test were failing, seemed worth documenting

} else {
bucket_data.buckets[bucket_index] = points[nonzero_scalar_indices[i]];
bucket_data.bucket_exists.set(bucket_index, true);
for (uint32_t round = 0; round < num_rounds; ++round) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, unfolded pippenger_round, just to be able to see the entire algo in one loop

MSM<Curve>::BucketAccumulators& bucket_data,
size_t num_input_points_processed,
size_t num_queued_affine_points) noexcept
void MSM<Curve>::batch_accumulate_points_into_buckets(std::span<const uint64_t> point_schedule,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. somewhat more math-oriented name, makes it easier to match with the abstract description
  2. unfolded recursion
  3. both loops share bucket/point processing logic

// unfortnately we need to remove const on this data type to prevent duplicating _scalars (which is typically
// large) We need to convert `_scalars` out of montgomery form for the MSM. We then convert the scalars back
// into Montgomery form at the end of the algorithm. NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
// TODO(https://github.com/AztecProtocol/barretenberg/issues/1449): handle const correctness.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to make it cleaner but couldn't find something that would look way cleaner. Maybe can take a look in a followup PR. But added tests checking that batch msm preserves constness.

@iakovenkos iakovenkos removed the ci-full Run all master checks. label Jan 21, 2026
Copy link
Contributor

@ledwards2225 ledwards2225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - some great cleanup. Few issues with the get_bit_slice_raw method I think and some other minor comments

AffineElement result = scalar_multiplication::MSM<Curve>::msm(points, scalar_span);

AffineElement expected(base_point * scalar_sum);
EXPECT_EQ(result, expected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the fact that it works expected even though handle_edge_cases is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't using enough points to trigger pippenger, actually, thanks for catching this


### Point Scheduling (Affine Variant Only)

Entries are packed as `(point_index << 32) | bucket_index` and sorted via **in-place MSD radix sort**. Sorting groups points by bucket, enabling efficient batch processing. The sort also detects entries with `bucket_index == 0` during the final radix pass, allowing zero-bucket entries to be skipped without a separate scan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth stressing that as c = 8, this is effectively appending the bucket index at the end of the index we use to pack the points

Copy link
Contributor

@federicobarbacovi federicobarbacovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just some minor comments

@iakovenkos iakovenkos merged commit 644847c into merge-train/barretenberg Jan 23, 2026
8 checks passed
@iakovenkos iakovenkos deleted the si/pippenger-audit-0 branch January 23, 2026 15:37
github-merge-queue bot pushed a commit that referenced this pull request Jan 27, 2026
BEGIN_COMMIT_OVERRIDE
feat: support JSON input files for bb verify command (#19800)
fix: update bootstrap.sh to use new JSON field names
chore: Update `index.js` so that `HAS_ZK` and `PUBLIC_INPUTS` variables
must always be set in tests (#19884)
chore: pippenger int audit (#19302)
chore: deduplicate batch affine addition trick (#19788)
chore: transcript+codec+poseidon2 fixes (#19419)
chore!: explicitly constrain inputs and intermediate witnesses (#19826)
fix: exclude nlohmann/json from WASM builds in json_output.hpp
chore: translator circuit builder and flavor audit (#19798)
Revert "fix: exclude nlohmann/json from WASM builds in json_output.hpp"
Revert "feat: support JSON input files for bb verify command (#19800)"
Revert "fix: update bootstrap.sh to use new JSON field names"
END_COMMIT_OVERRIDE
danielntmd pushed a commit that referenced this pull request Jan 27, 2026
clean up  + docs+ a couple of edge case tests

Closes AztecProtocol/barretenberg#486

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants