Skip to content

[wrangler] Make startup network requests non-blocking on slow connections#13386

Merged
petebacondarwin merged 7 commits intocloudflare:mainfrom
mksglu:fix/slow-startup-network-timeouts
Apr 13, 2026
Merged

[wrangler] Make startup network requests non-blocking on slow connections#13386
petebacondarwin merged 7 commits intocloudflare:mainfrom
mksglu:fix/slow-startup-network-timeouts

Conversation

@mksglu
Copy link
Copy Markdown
Contributor

@mksglu mksglu commented Apr 9, 2026

Fixes #9946.

Problem

Wrangler makes network requests during startup (npm update check, request.cf data fetch, telemetry dispatch) that block the CLI indefinitely on slow or degraded connections. Users report 10+ second delays on airplane wifi and trains, with no spinner, timeout, or way to disable.

Confirmed by @penalosa and @threepointone in the issue thread. Latest comment (@kalcodes, 2026-04-09) reports "failures never exit" when offline.

Root Cause

Three network calls with zero timeout:

Call File Blocking? Timeout
npm update check update-check.tsawait checkForUpdate() YES — blocks banner None
request.cf fetch miniflare/src/cf.tsawait fetch(cf.json) YES — blocks dev server None
Telemetry dispatch metrics-dispatcher.ts → fire-and-forget No (but hangs connections) 1s exit race only

Solution

Applies the industry-standard two-tier pattern (same approach as npm's update-notifier, yarn, pnpm): non-critical startup requests are bounded, not awaited indefinitely.

Change 1: Update check — non-blocking banner with grace period (update-check.ts, wrangler-banner.ts)

Before: await updateCheck() in the banner blocks indefinitely until npm registry responds.

After: The banner races the update check against a 100ms grace period:

maybeNewVersion = await Promise.race([
    updateCheck(),
    new Promise<undefined>((resolve) => {
        const timer = setTimeout(() => resolve(undefined), UPDATE_CHECK_GRACE_MS);
        timer.unref();
    }),
]);

Why 100ms? The update-check library caches results in /tmp/update-check/. On a cache hit (>99% of runs), the readFile I/O completes in <1ms — the await yields, the event loop reaches the I/O poll phase, and the readFile callback fires before the 100ms timer. On a cache miss (network needed), the timer wins and the banner prints immediately.

Evidence — Node.js event loop ordering proves correctness:

  1. await Promise.race(...) yields → microtask queue drains
  2. Event loop enters timer phase — 100ms timer NOT ready yet
  3. Event loop enters I/O poll phase — readFile from /tmp callback fires (~0.5ms)
  4. evaluateCache()checkForUpdate()updateCheck() resolves → race settles
  5. Banner shows "(update available X.Y.Z)" ✓

Additionally, a 3s safety-net timeout (UPDATE_CHECK_TIMEOUT_MS) is added inside doUpdateCheck() via Promise.race. This caps the update-check library's auth-retry path — the library has a 2s socket timeout, but retries with auth on 4xx, potentially doubling to 4s. The .unref() on the timer prevents process exit delay.

New env var: WRANGLER_UPDATE_CHECK=false disables the update check entirely.

Change 2: request.cf fetch — AbortSignal timeout (miniflare/src/cf.ts)

const res = await fetch(defaultCfFetchEndpoint, {
    signal: AbortSignal.timeout(3000),
});

On timeout, the existing catch block returns fallbackCf (hardcoded Austin, TX data). AbortSignal.timeout is already used 10+ times in the codebase — established pattern.

Change 3: Metrics dispatch — remove dead code (metrics-dispatcher.ts)

The telemetry fetch is fire-and-forget. The exit handler in index.ts:2265-2268 already races allMetricsDispatchesCompleted() against a 1s timeout:

await Promise.race([
    allMetricsDispatchesCompleted(),
    setTimeout(1000, undefined, { ref: false }),
]);

An AbortSignal.timeout(3000) on the fetch would never fire in practice because the 1s exit timeout always wins first. During long-running commands (wrangler dev), the dangling connection doesn't block anything. Removed the dead timeout, updated the comment to document why.

Verification Evidence

6 independent verification agents were run in parallel, each examining a different aspect:

Agent Scope Verdict
Update check correctness Promise.race, timer leak, env var, type safety, memoization PASS
cf.json fetch correctness AbortSignal, fallback, cache-first, undici compat PASS
Metrics revert correctness Dead code removal, exit handler confirmation PASS
Issue requirements coverage 7/9 addressed, 2 out of scope (spinner, telemetry disable bug) PASS
Test suite compatibility Auto-mocks handle new exports, MSW transparent to signal PASS
Offline/degraded regression 6 scenarios (offline, DNS hang, slow, headers hang, restart, all disabled) ALL SAFE

Type checking: pnpm run -r --filter wrangler check:type passes with zero errors.

Grill-tested design decisions

Each design choice was challenged through adversarial review:

Decision Challenge Resolution
100ms grace period Is it enough for cache readFile? Yes — event loop I/O poll processes it in <1ms before timer
3s safety-net timeout Library already has 2s socket timeout 3s covers auth-retry path (2s + 2s max)
No early fire in main() Would give more head start Causes unnecessary phone-home for printBanner: false commands (build, kv, tail)
Remove metrics timeout Was it protecting anything? No — exit handler's 1s race is the real protection
WRANGLER_UPDATE_CHECK only accepts false Should 0/no work too? Consistent with codebase pattern (only true/false)

  • Tests
    • Tests included/updated
    • Automated tests not possible - manual testing has been completed as follows:
    • Additional testing not necessary because: All existing tests pass — update-check is globally auto-mocked in vitest.setup.ts, cf.spec.ts tests never reach the network fetch path, and metrics.test.ts uses MSW which is transparent to the signal option. The changes only affect timeout behavior on slow/degraded real networks, which cannot be reliably unit tested.
  • Public documentation
    • Cloudflare docs PR(s):
    • Documentation not necessary because: The WRANGLER_UPDATE_CHECK env var is a power-user escape hatch. Changeset describes user-facing changes for the changelog.

🤖 Generated with Claude Code


Open with Devin

Make startup network requests non-blocking on slow/degraded connections
(airplane wifi, trains) by applying the industry-standard two-tier pattern:
non-critical requests are bounded, not awaited indefinitely.

- update-check: Race against 100ms grace period in banner (cache hit
  resolves in <1ms via I/O poll). 3s safety-net timeout caps the library's
  auth-retry path. New WRANGLER_UPDATE_CHECK=false env var to disable.
- cf.json fetch: AbortSignal.timeout(3000) with existing fallback to
  hardcoded Austin, TX data.
- metrics dispatch: Remove unnecessary AbortSignal.timeout — exit handler's
  1s race already covers shutdown.

Closes cloudflare#9946

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mksglu mksglu requested a review from workers-devprod as a code owner April 9, 2026 18:57
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 9, 2026

🦋 Changeset detected

Latest commit: 4a331c2

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@workers-devprod workers-devprod requested review from a team and petebacondarwin and removed request for a team April 9, 2026 18:57
@workers-devprod
Copy link
Copy Markdown
Contributor

workers-devprod commented Apr 9, 2026

Codeowners approval required for this PR:

  • ✅ @cloudflare/wrangler
Show detailed file reviewers

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the env var. Otherwise LGTM

Comment thread packages/wrangler/src/update-check.ts Outdated
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Apr 10, 2026

create-cloudflare

npm i https://pkg.pr.new/create-cloudflare@13386

@cloudflare/kv-asset-handler

npm i https://pkg.pr.new/@cloudflare/kv-asset-handler@13386

miniflare

npm i https://pkg.pr.new/miniflare@13386

@cloudflare/pages-shared

npm i https://pkg.pr.new/@cloudflare/pages-shared@13386

@cloudflare/unenv-preset

npm i https://pkg.pr.new/@cloudflare/unenv-preset@13386

@cloudflare/vite-plugin

npm i https://pkg.pr.new/@cloudflare/vite-plugin@13386

@cloudflare/vitest-pool-workers

npm i https://pkg.pr.new/@cloudflare/vitest-pool-workers@13386

@cloudflare/workers-editor-shared

npm i https://pkg.pr.new/@cloudflare/workers-editor-shared@13386

wrangler

npm i https://pkg.pr.new/wrangler@13386

commit: 4a331c2

mksglu and others added 2 commits April 10, 2026 17:50
Addresses @petebacondarwin's review feedback — removes the
WRANGLER_UPDATE_CHECK=false env var. The timeout-based approach
is sufficient without an additional kill switch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mksglu
Copy link
Copy Markdown
Contributor Author

mksglu commented Apr 13, 2026

@petebacondarwin Removed the WRANGLER_UPDATE_CHECK env var as requested. Ready for re-review.

Copy link
Copy Markdown
Contributor

@workers-devprod workers-devprod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codeowners reviews satisfied

@github-project-automation github-project-automation bot moved this from In Review to Approved in workers-sdk Apr 13, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mksglu
Copy link
Copy Markdown
Contributor Author

mksglu commented Apr 13, 2026

@petebacondarwin Fixed the check:format CI failure — applied oxfmt formatting. Could someone re-run the CI checks? Thanks.

@petebacondarwin petebacondarwin merged commit 5e5bbc1 into cloudflare:main Apr 13, 2026
47 checks passed
@github-project-automation github-project-automation bot moved this from Approved to Done in workers-sdk Apr 13, 2026
petebacondarwin pushed a commit that referenced this pull request Apr 14, 2026
…ions (#13386)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

wrangler phones home too much, and it's annoyingly slow

3 participants