Skip to content

fix(ci): retry nargo dep + solc downloads to survive transient DNS drops#23490

Merged
alexghr merged 1 commit into
merge-train/spartanfrom
cb/2c63a974cfc6
May 22, 2026
Merged

fix(ci): retry nargo dep + solc downloads to survive transient DNS drops#23490
alexghr merged 1 commit into
merge-train/spartanfrom
cb/2c63a974cfc6

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented May 22, 2026

Why

Merge-train/spartan keeps failing on transient DNS resolution errors, e.g.:

Cloning into '/home/aztec-dev/nargo/github.com/noir-lang/poseidon/v0.3.0'...
fatal: unable to access 'https://github.com/noir-lang/poseidon/': Could not resolve host: github.com
Cannot read file .../poseidon/v0.3.0/Nargo.toml - does it exist?

7 such failures in the last week (github.com ×3 via nargo, binaries.soliditylang.org ×1 via solc; release-assets.githubusercontent.com ×3 already fixed by #23333). Root cause is almost certainly the EC2 VPC resolver's ~1024 packets/sec-per-ENI cap being exhausted by heavy parallel builds, so lookups are silently dropped.

This is the cheap, immediate mitigation: retry the two un-retried network fetches that bite the merge train. It does not fix the root cause — a host-local caching resolver does (dnsmasq spike linked below, for the future).

What

  • noir-projects/bootstrap.sh — wrap the nargo dependency-download (fmt --check prep step) in ci3/retry. On failure it wipes the partial dependency cache ($HOME/nargo) before retrying: a half-finished clone is exactly what produces the Cannot read file .../Nargo.toml error, so a naive retry would just re-hit the poisoned dir. A warm cache is left intact on success.
  • l1-contracts/bootstrap.sh — wrap the forge build --use svm solc download in ci3/retry. The merge queue disables the S3 cache, so this download path runs on every merge-train build.

Both use the existing ci3/retry helper (3 attempts; RETRY_SLEEP=10 for the nargo step to give DNS a little longer to recover).

Dropped

An earlier commit hardened the runner's /etc/resolv.conf (options timeout:1 attempts:5 rotate + a public fallback nameserver). Removed — rotate to a public resolver is brittle and risks breaking any VPC-private name resolution. Not worth the blast radius for a mitigation.

Future: root-cause fix

Host-local dnsmasq caching resolver on the runner — what it would look like: https://gist.github.com/AztecBot/a22cc18bd30ec0bd3dff72b70d675304

@AztecBot AztecBot added ci claudebox Owned by claudebox. it can push to this PR. labels May 22, 2026
@AztecBot AztecBot changed the title fix(ci): retry nargo dep + solc downloads to survive transient DNS drops fix(ci): DNS relief for merge-train — retry flaky downloads + harden resolv.conf May 22, 2026
@AztecBot AztecBot changed the title fix(ci): DNS relief for merge-train — retry flaky downloads + harden resolv.conf fix(ci): retry nargo dep + solc downloads to survive transient DNS drops May 22, 2026
@alexghr alexghr marked this pull request as ready for review May 22, 2026 09:43
@alexghr alexghr enabled auto-merge (squash) May 22, 2026 09:43
@alexghr alexghr disabled auto-merge May 22, 2026 09:43
@alexghr alexghr enabled auto-merge (squash) May 22, 2026 09:43
@alexghr alexghr disabled auto-merge May 22, 2026 11:08
@alexghr alexghr enabled auto-merge (squash) May 22, 2026 11:08
@alexghr alexghr force-pushed the cb/2c63a974cfc6 branch from 9380c26 to 6cea711 Compare May 22, 2026 11:08
@alexghr alexghr merged commit 9675beb into merge-train/spartan May 22, 2026
21 of 27 checks passed
@alexghr alexghr deleted the cb/2c63a974cfc6 branch May 22, 2026 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci ci-skip claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants