You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixes#899 by constraining aggregate-signature production to the slot currently being aggregated.
The aggregate worker is scheduled for one slot at a time, but computeAggregatedSignatures previously unioned every AttestationData key present in the snapshot maps. On devnet those maps can retain stale/future attestation data from gossip/new payload caches, so a single slot aggregation could recursively build proofs for unrelated slots before returning the current slot's aggregates. That inflated lean_pq_sig_aggregated_signatures_building_time_seconds into the multi-second range reported in #899.
Changes
Add an optional slot_filter to AggregatedAttestationsResult.computeAggregatedSignatures.
Add ForkChoice.aggregateForSlot(...) and use it from submitAggregateOnInterval.
Keep existing ForkChoice.aggregate(...) behavior unchanged for generic/test callers.
widened production aggregation from strict current_slot to a bounded backfill window {current_slot - 1, current_slot} so a skipped ConcurrencyUnavailable interval can recover late/skipped-slot attestations on the next successful worker pass;
changed the internal filter shape to ?[]const Slot and added aggregateForSlots(...), leaving strict aggregateForSlot(...) only as a convenience/test wrapper;
documented the production-vs-backfill tradeoff and clarified that unfiltered aggregate(...) is for tests/explicit defensive backfills, not the slot worker path;
added filter-path regression tests covering: mixed-slot skip behavior, same-slot filtered vs unfiltered equivalence, and clean empty-filter output;
added a metric precision/phase-vs-total comment for the phase histogram helper.
zig build test completed with the new block.test.computeAggregatedSignatures ... cases passing, plus the full logged test sequence through node tests passing.
Root cause: the SSE events integration test has an outer 480s deadline, but SSEClient.readEvent() performed a blocking stream read. If the simulator stopped emitting SSE bytes before node3 finalization was observed (the macOS CI case), the test never returned to the loop to check the deadline, so the Run all sim tests step could hang indefinitely.
Pushed 5fbe09f to bound SSE reads with poll(50ms) before reading. That makes idle SSE periods return null so the existing deadline can actually fire instead of blocking inside the read.
Validation:
zig fmt pkgs/cli/test/integration.zig
git diff --check
timeout 600 zig build simtest --summary all passed locally (EXIT:0, simtest success in ~1m)
Follow-up on the still-running macOS sim job: I did not have macOS locally; my prior validation was Linux. The macOS CI run on 5fbe09f is still stuck, so the first poll-based patch was insufficient there.
I pushed d749471 to use std.Io's timed socket receive path directly for the SSE client instead of poll + Stream.Reader. Reason: the macOS job still appears to block inside the sim SSE path; using receiveTimeout(50ms) avoids the stream reader blocking after readiness and lets the existing 480s loop deadline advance.
Validation on Linux:
zig fmt pkgs/cli/test/integration.zig
git diff --check
timeout 600 zig build simtest --summary all passed (EXIT:0, simtest success in ~1m)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #899 by constraining aggregate-signature production to the slot currently being aggregated.
The aggregate worker is scheduled for one slot at a time, but
computeAggregatedSignaturespreviously unioned everyAttestationDatakey present in the snapshot maps. On devnet those maps can retain stale/future attestation data from gossip/new payload caches, so a single slot aggregation could recursively build proofs for unrelated slots before returning the current slot's aggregates. That inflatedlean_pq_sig_aggregated_signatures_building_time_secondsinto the multi-second range reported in #899.Changes
slot_filtertoAggregatedAttestationsResult.computeAggregatedSignatures.ForkChoice.aggregateForSlot(...)and use it fromsubmitAggregateOnInterval.ForkChoice.aggregate(...)behavior unchanged for generic/test callers.zeam_pq_sig_aggregated_signatures_building_phase_seconds{phase="snapshot|compute_ffi|commit"}Validation
zig fmt pkgs/metrics/src/lib.zig pkgs/types/src/block.zig pkgs/node/src/forkchoice.zig pkgs/node/src/chain.ziggit diff --checkzig build testpassed locally (EXIT:0in/tmp/zeam-899-zig-build-test.log).