Skip to content

fix(types): clone participants bitlist when building aggregation bits#875

Merged
anshalshukla merged 5 commits into
blockblaz:mainfrom
shariqnaiyer:fix/shariqnaiyer/aggregation-bits
May 13, 2026
Merged

fix(types): clone participants bitlist when building aggregation bits#875
anshalshukla merged 5 commits into
blockblaz:mainfrom
shariqnaiyer:fix/shariqnaiyer/aggregation-bits

Conversation

@shariqnaiyer
Copy link
Copy Markdown
Contributor

@shariqnaiyer shariqnaiyer commented May 13, 2026

When running a devnet with ream zeam ethlambda in 2 subnets we get the following error and ethlambda rejects zeams blocks at certain slots which lead to no finality in ethlambda during these devnets.

  ethlambda_s0_p0.log:
  WARN ethlambda_blockchain: Failed to process block slot=7  proposer=1 block_root=6526aa5a parent_root=fa625e7b err=Aggregated proof
  participants don't match attestation aggregation bits
  WARN ethlambda_blockchain: Failed to process block slot=16 proposer=4 block_root=a5631b2f parent_root=5884dc6c err=Aggregated proof
  participants don't match attestation aggregation bits

Some AI context:

  When zeam builds an aggregated attestation, it has to produce two parallel pieces of data:

  - attestation.aggregation_bits — lives in the block body, marks which validators are covered.
  - proof.participants — lives in the signature group on the signed block, marks the same set.

  Both are SSZ Bitlists. They are supposed to represent the same set of validators, and ethlambda (and the spec) require them to be
  byte-for-byte equal when verifying a block:

  // ethlambda/crates/blockchain/src/store.rs:1187
  if attestation.aggregation_bits != aggregated_proof.participants {
      return Err(StoreError::ParticipantsMismatch);
  }

  zeam built aggregation_bits from proof.participants like this (in block.zig:compactSingleProof and compactMultiProofWithPrep):

  var att_bits = try attestation.AggregationBits.init(allocator);  // length 0
  for (0..cloned_proof.participants.len()) |i| {
      if (cloned_proof.participants.get(i) catch false) {
          try attestation.aggregationBitsSet(&att_bits, i, true);  // grows to i+1
      }
  }

  aggregationBitsSet only extends the bitlist up to index + 1. So the resulting att_bits.len() equals (highest set index) + 1, while
  proof.participants.len() is whatever the FFI aggregate routine returned — which can include trailing zero bits.

  Why the lengths can differ

  proof.participants is the merged bitfield from a recursive aggregation. The FFI aggregate() call merges multiple child proofs and
  gossip signatures, and the merged bitlist's length is determined by the highest index across all inputs — including children that
  contributed to coverage but whose own highest bit was higher than the highest bit in this proof.

  Concretely: if a parent proof merges children where one child covers validator 5 (length 6) and the final selected subset only ends
  up with validators {0, 2} set, participants.len() can stay at 6 while the bit at index 5 is 0. Rebuilding att_bits by re-setting TRUE
   bits produces a bitlist of length 3 (highest set is 2 → length 2+1).

  In SSZ encoding, those two bitlists are different:
  - length 3, bits 101 → encoded with the length delimiter at bit position 3
  - length 6, bits 101000 → encoded with the length delimiter at bit position 6
  
  Same set of set-bits, different bytes on the wire, different tree_hash_root, different BitList == BitList result.

  Why it was invisible inside zeam

  zeam's own block-verification path (chain.zig:2396) compares cardinality only:

  if (validator_indices.items.len != participant_indices.items.len) {
      ...
  }

  So zeam happily produced mismatched-length pairs and happily re-imported them. ream was also lenient enough not to fire. Only
  ethlambda's strict equality check caught the bug — but ethlambda then rejected every zeam block carrying an aggregated attestation,
  kept retrying it from the orphan queue, and cross-subnet attestations never reached ethlambda's chain. Justification stayed pinned to
   slot 0.

  Why slot 1/4 worked but slot 7/16 failed

  The path that diverges only fires when there's >0 attestations and the aggregation step actually runs. zeam blocks at slot 1 had
  attestation_count=0 (nothing to mismatch) and slot 4 had attestation_count=1 from a single-validator path that happened to produce a
  length-matched bitlist. From slot 7 onward, real aggregation kicked in and the lengths diverged.

I have tested this in the devnets and it fixes the issue. However feel free to request changes or close this PR.

Copilot AI review requested due to automatic review settings May 13, 2026 13:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect construction of attestation aggregation_bits by cloning the underlying participants bitlist, preserving SSZ bitlist length semantics (including trailing false bits) when producing aggregated attestations/proofs.

Changes:

  • In pkgs/types, replace “rebuild bits by setting true indices” with sszClone when deriving AggregationBits from proof participants.
  • In pkgs/node, update forkchoice proposal attestation building to clone participants into AggregationBits instead of rebuilding.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkgs/types/src/block.zig Switches aggregation-bit derivation to cloning the participants bitlist (multiple call sites).
pkgs/node/src/forkchoice.zig Clones AggregationBits from selected proofs’ participants during proposal attestation selection.
Comments suppressed due to low confidence (1)

pkgs/node/src/forkchoice.zig:1076

  • This function still rebuilds AggregationBits elsewhere by initializing an empty bitlist and setting only the true indices, which can change the bitlist’s encoded length (dropping trailing false bits). Since this hunk switches to cloning via sszClone, it would be more consistent/safer to use the same approach for other AggregationBits copies in this path (e.g., when building candidate_atts).
                    var att_bits: types.AggregationBits = undefined;
                    try types.sszClone(self.allocator, types.AggregationBits, cloned_proof.participants, &att_bits);
                    errdefer att_bits.deinit();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1074 to 1078
var att_bits: types.AggregationBits = undefined;
try types.sszClone(self.allocator, types.AggregationBits, cloned_proof.participants, &att_bits);
errdefer att_bits.deinit();

for (0..cloned_proof.participants.len()) |i| {
Comment thread pkgs/types/src/block.zig Outdated
Comment on lines 636 to 639
var att_bits_val: attestation.AggregationBits = undefined;
try utils.sszClone(allocator, attestation.AggregationBits, child.participants, &att_bits_val);
var att_bits: ?attestation.AggregationBits = att_bits_val;
defer if (att_bits) |*ab| ab.deinit();
Copilot AI review requested due to automatic review settings May 13, 2026 14:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

anshalshukla
anshalshukla previously approved these changes May 13, 2026
@ch4r10t33r
Copy link
Copy Markdown
Contributor

Thanks for this @shariqnaiyer , few comments:

  • Same bug pattern still present at forkchoice.zig:1119-1128
  • chain.zig:2450 cardinality-only check still wrong
  • Can you pls add regression test for this change?

Copilot AI review requested due to automatic review settings May 13, 2026 14:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (3)

pkgs/types/src/block.zig:646

  • att_bits is declared as a non-optional attestation.AggregationBits, but it’s still used as optional (att_bits.?) and later assigned null. This won’t compile, and it also breaks the intended ownership-transfer pattern (avoiding defer deinit after appending into self.attestations). Either keep att_bits optional (as before) or switch to a non-optional value and use a cleanup flag / scoped defer so the bitlist isn’t deinitialized after ownership is transferred.

This issue also appears on line 713 of the same file.

                var att_bits: attestation.AggregationBits = undefined;
                try utils.sszClone(allocator, attestation.AggregationBits, child.participants, &att_bits);
                defer att_bits.deinit();

                // Clone the child proof for the result (original will be freed by deferred cleanup)
                var cloned_child: aggregation.AggregatedSignatureProof = undefined;
                try utils.sszClone(allocator, aggregation.AggregatedSignatureProof, child.*, &cloned_child);
                errdefer cloned_child.deinit();

                try self.attestations.append(.{ .aggregation_bits = att_bits.?, .data = data });
                att_bits = null; // ownership transferred to self.attestations

pkgs/types/src/block.zig:718

  • Same issue as above: att_bits is non-optional but is used as att_bits.? and later set to null. Beyond the compile error, the current defer att_bits.deinit() will also deinit memory that has been moved into self.attestations unless you explicitly disable it after the append (e.g., via optional + null, a cleanup flag, or a scoped defer).
            var att_bits: attestation.AggregationBits = undefined;
            try utils.sszClone(allocator, attestation.AggregationBits, proof.participants, &att_bits);
            defer att_bits.deinit();

            try self.attestations.append(.{ .aggregation_bits = att_bits.?, .data = data });
            att_bits = null; // ownership transferred to self.attestations

pkgs/state-transition/src/mock.zig:316

  • att_bits has errdefer att_bits.deinit() but is appended into agg_attestations before another fallible call (agg_signatures.append). If that second append fails, the function’s outer agg_att_cleanup will deinit the appended attestation (freeing att_bits) and this local errdefer will also run, risking a double free. Consider scoping the errdefer to only cover the pre-append region, or rely solely on the outer list cleanup / roll back on partial append.
            var att_bits: types.AggregationBits = undefined;
            try types.sszClone(allocator, types.AggregationBits, proof.participants, &att_bits);
            errdefer att_bits.deinit();

            try agg_attestations.append(.{ .aggregation_bits = att_bits, .data = att_data });
            try agg_signatures.append(proof);
        }

Comment on lines +1074 to 1080
var att_bits: types.AggregationBits = undefined;
try types.sszClone(self.allocator, types.AggregationBits, cloned_proof.participants, &att_bits);
errdefer att_bits.deinit();

for (0..cloned_proof.participants.len()) |i| {
if (cloned_proof.participants.get(i) catch false) {
try types.aggregationBitsSet(&att_bits, i, true);
if (i >= covered.capacity()) {
Comment on lines +1046 to +1050
// Clone participant bits into proof.
var cloned_bits: types.AggregationBits = undefined;
types.sszClone(ctx.allocator, types.AggregationBits, aggregation_bits, &cloned_bits) catch |err| {
std.debug.print(
"fixture {s} case {s}{f}: failed to clone aggregation bits ({s})\n",
@shariqnaiyer
Copy link
Copy Markdown
Contributor Author

@anshalshukla @ch4r10t33r Apart from this chain.zig:2450 cardinality-only check still wrong I have addressed the others. Feel free to give it another review.

@shariqnaiyer
Copy link
Copy Markdown
Contributor Author

@anshalshukla Feel free to add any commits and merge this. I am going to be afk for a bit.

Copy link
Copy Markdown
Contributor

@ch4r10t33r ch4r10t33r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anshalshukla anshalshukla merged commit 98ef227 into blockblaz:main May 13, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants