Skip to content

fix(p2p): reduce flakiness in proposal tx collector benchmark#22240

Merged
PhilWindle merged 2 commits intomerge-train/spartanfrom
claudebox/38590d3cfe6a7000-2
Apr 7, 2026
Merged

fix(p2p): reduce flakiness in proposal tx collector benchmark#22240
PhilWindle merged 2 commits intomerge-train/spartanfrom
claudebox/38590d3cfe6a7000-2

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented Apr 1, 2026

Summary

Fixes flakiness in p2p_client.proposal_tx_collector.bench.test.ts caused by three compounding issues:

  1. chunkTxHashesRequest defaulted to chunkSize=1, creating 500 individual libp2p streams for the 500-tx send-batch-request case. The rapid stream churn overwhelms the connection, causing EPIPE cascades that kill the muxer. Bumped to chunkSize=8 as the existing TODO indicated.

  2. Peer scores persisted between benchmark cases, so hundreds of HighToleranceError penalties from EPIPE failures in one case degraded peer selection in subsequent cases. Added PeerScoring.resetAllScores() and called it in the worker before each benchmark run.

  3. No connectivity check between cases, so degraded connections from a previous case could silently affect the next. Added waitForConnectivity() to verify the aggregator has 80% of expected peers before each case starts.

Full analysis with CI log evidence: https://gist.github.com/AztecBot/e5af3238fbfefc29c51de2ee5deaa8ea

Changes

  • protocols/tx.ts: Change chunkTxHashesRequest default chunkSize from 1 to 8
  • peer_scoring.ts: Add resetAllScores() method
  • p2p_client_testbench_worker.ts: Reset peer scores before each bench case, add GET_PEER_COUNT IPC command
  • worker_client_manager.ts: Add waitForConnectivity() and getPeerCount() methods
  • p2p_client.proposal_tx_collector.bench.test.ts: Check connectivity in beforeEach

ClaudeBox log: https://claudebox.work/s/38590d3cfe6a7000?run=2

- Bump chunkTxHashesRequest default chunkSize from 1 to 8 (was a known TODO)
- Add PeerScoring.resetAllScores() and call it between benchmark cases
- Add connectivity check before each benchmark case to detect degraded state
@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels Apr 1, 2026
@AztecBot
Copy link
Copy Markdown
Collaborator Author

AztecBot commented Apr 7, 2026

Automatically closing this stale claudebox draft PR (no updates for 5+ days). Re-open if still needed.

@AztecBot AztecBot closed this Apr 7, 2026
@PhilWindle PhilWindle reopened this Apr 7, 2026
@PhilWindle PhilWindle marked this pull request as ready for review April 7, 2026 12:14
@PhilWindle PhilWindle enabled auto-merge (squash) April 7, 2026 12:16
@PhilWindle PhilWindle merged commit d26761f into merge-train/spartan Apr 7, 2026
12 checks passed
@PhilWindle PhilWindle deleted the claudebox/38590d3cfe6a7000-2 branch April 7, 2026 12:34
github-merge-queue bot pushed a commit that referenced this pull request Apr 9, 2026
BEGIN_COMMIT_OVERRIDE
chore: fix mempool limit test (#22332)
fix(bot): bot fee juice funding (#21949)
fix(foundation): flush current batch on BatchQueue.stop() (#22341)
chore: (A-750) read JSON body then parse to avoid double stream
consumption on error message (#22247)
chore: bump log level in stg-public (#22354)
chore: fix main.tf syntax (#22356)
chore: wire up spartan checks to make (#22358)
fix(p2p): reduce flakiness in proposal tx collector benchmark (#22240)
fix: disable sponsored fpc and test accounts for devnet (#22331)
chore: add v4-devnet-3 to tf network ingress (#22327)
chore: remove unused env var (#22365)
chore: add pdb (#22364)
chore: dispatch CB on failed deployments (#22367)
chore: (A-749) single character url join (#22269)
feat: support different docker image for HA validator nodes (#22371)
chore: fix the daily healthchecks (#22373)
chore: remove v4-devnet-2 references (#22372)
fix: rename #team-alpha → #e-team-alpha slack channel (#22374)
chore(pipeline): timetable adjustments under pipelining (#21076)
feat(pipeline): handle pipeline prunes (#21250)
fix: handle error types serialization errors (#22379)
feat(spartan): configurable HA validator replica count (#22384)
fix(e2e): increase prune timeout in epochs_mbps_pipeline test (#22392)
fix(epoch-cache): use TTL-based caching with finalization tracking and
correct lag (#22204)
chore: deflake e2e ha sync test (#22403)
chore(ci): skip prunes-uncheckpointed test in epochs_mbps_pipeline
(#22401)
refactor(slasher): remove empire slasher model (#21830)
fix: use strict equality in world-state ops queue (#22398)
fix: remove unused BLOCK reqresp sub-protocol (#22407)
refactor(sequencer): sign last block before archiver sync (#22117)
feat(world-state): add genesis timestamp support and GenesisData type
(#22359)
fix: use Int64Value instead of Uint32Value for 64-bit map sizes (#22400)
chore: Reduce logging verbosity (#22423)
fix(p2p): include values in tx validation error messages (#22422)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants