parallelise: review follow-ups (metrics, FFI hoist, determinism, prewarm)#796
Merged
Conversation
…minism) Addresses adversarial review on PR #780. Each fix is independent of the others; all of them target correctness or determinism risks introduced by moving signature verification and attestation compaction onto a shared work-stealing thread pool. state-transition: per-task histogram observation verifySignaturesParallel previously wrapped one batch_timer around the entire pool.scope call, so the lean_pq_sig_aggregated_signatures_verification_time_seconds histogram received exactly one sample per block — while the serial path observes once per attestation. Mixing the two granularities into the same histogram silently distorts P50/P99 between deployments. Each VerifyTask now records its own elapsed_ns inside the worker (using std.time.Timer for monotonic timing), and the post-pool emit calls Histogram.record per verified task. types/block: deterministic group ordering std.AutoHashMap iterator order is not stable across runs (insertion order is preserved only until the next rehash), so two validators producing identical attestation sets could emit byte-different blocks. Sort group_entries by AttestationData (slot, head/target/source root bytes, then checkpoint slots) before processing in either the serial or parallel branch. types/block: hoist xmss.PublicKey.fromBytes out of parallel workers Previously compactAttestationGroup called xmss.PublicKey.fromBytes inside each worker thread. Rust-side pubkey deserialization is not documented as Send, and setupVerifier (called transitively) carries first-time-init races. New CompactGroupPrep is built serially before pool.scope: every fromBytes call happens on the main thread, workers only invoke aggregate() on already-deserialized handles. The shared pubkey_wrappers ArrayList owns the wrappers' lifetime across the scope call. cli: pre-warm xmss.setupVerifier after pool init Both the multi-node devnet path (pkgs/cli/src/main.zig) and the single-node Node struct path (pkgs/cli/src/node.zig) now call xmss.setupVerifier() once on the main thread immediately after ThreadPool.init. The Rust verifier setup is documented as idempotent but is not hardened against first-time-init races between concurrent callers; pre-warming on the main thread removes the race regardless of the Rust implementation. node/chain: document thread_pool invariants Expanded the doc comment on BeamChain.thread_pool to spell out the three thread-safety invariants new consumers must preserve: 1. chain.allocator must be safe for concurrent use (today: GPA). 2. xmss.setupVerifier must be called on the main thread before the first parallel verify. 3. xmss.PublicKeyCache is NOT thread-safe; cache access stays in the serial pre-phase only.
g11tech
approved these changes
Apr 28, 2026
This was referenced Apr 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #780 (now merged into
main). Targetsmain. Each fix is independent — no consensus impact, all corrections are local to the parallel paths introduced by #780.Bot review classification
Adversarial review on #780 raised six concerns. Verdict + action:
pool.scopesynchronicity — Critical, memory safetylib.zig:466-484doesdefer self.scopeWait(&s)and the wait loop drainspending == 0before return. Doc string explicitly says "block until every task spawned onscopehas completed"xmss.setupVerifier()on main thread; hoistxmss.PublicKey.fromBytesout of workers incompactAttestationscli/main.zig(devnet) andcli/node.zig(Node struct) are mutually exclusive code paths. No deployment runs bothchain.thread_poolso future consumers understand the contractcompactAttestationsnot fully reviewed — Medium, liveness riskaggregation_bitsderived from sameproof.participants, cannot diverge). But real determinism gap missed by the bot: hashmap iteration ordergroup_entriesdeterministically byAttestationDataverifySignaturesParallel, matching serial granularityBot also missed: shared allocator across parallel workers (production = GPA = thread-safe; documented), and
setupVerifierfirst-call race (fixed by pre-warm).Changes
pkgs/state-transition/src/transition.zigVerifyTaskcarrieselapsed_ns: u64. Worker times the FFI verify call withstd.time.Timer(monotonic) and stores ns into the task slot.observe()with oneHistogram.record(elapsed_s)per verified task. Histogram percentiles now match the serial path's granularity.pkgs/types/src/block.zigCompactGroupPrepcarries pre-built per-child[]*const HashSigPublicKeyslices.compactSingleProof/compactMultiProofWithPrep/runCompactGroupPrepreplace the monolithiccompactAttestationGroup.xmss.PublicKey.fromBytescalls happen serially in a pre-phase beforepool.scope. Workers receive prebuilt handles and only invokeaggregate()(which takes const handles).pubkey_wrappersArrayList owns wrapper lifetime for the duration of the scope call; freed on unwind.group_entriessorted byAttestationData(slot → head/target/source root bytes → checkpoint slots) before any processing. Both serial and parallel branches consume the sorted list.pkgs/cli/src/main.zigandpkgs/cli/src/node.zigxmss.setupVerifier()immediately afterThreadPool.init, before any consumer runs a parallel verify. Removes first-time-init race regardless of Rust implementation.pkgs/node/src/chain.zigBeamChain.thread_poolto spell out the three invariants new consumers must preserve (allocator thread-safety,setupVerifierpre-warm,PublicKeyCachenon-thread-safety).Test plan
zig build all— clean rebuild, EXIT=0.zig build test— all existing unit tests pass.zeam --help— built CLI starts and prints usage.lean_pq_sig_aggregated_signatures_verification_time_secondspercentiles look reasonable under the parallel path (should match serial baseline).