[miniflare] Recover from corrupted @puppeteer/browsers cache in launchBrowser#13980
Conversation
…hBrowser When Miniflare's local Browser Run binding launches Chrome, it calls `@puppeteer/browsers`' `install()` to ensure the binary is present. If a previous `install()` was interrupted mid-extraction (test timeout, process kill, antivirus quarantine), the cache directory can be left partially populated — the folder exists but the executable inside it is missing. `install()` then throws `The browser folder (...) exists but the executable (...) is missing` on every subsequent call within the same process and the entire test session, breaking every later Browser Run operation until the cache is manually cleared. `launchBrowser` now catches that specific error, removes the corrupted cache directory, and retries `install()` once. If the corruption persists after cleanup, the original error is rethrown with a clearer message. This complements #13971, which surfaced the original error from inside the binding worker. With that diagnostic in place and this self-healing layer, the previously-intermittent "browser folder exists but executable missing" failure mode should no longer fail an entire CI run. Manually verified by deleting the chrome executable from the local wrangler cache, then running the browser tests — the install error is detected, the cache directory is cleared, and Chrome is re-downloaded on the retry.
🦋 Changeset detectedLatest commit: c5c1999 The changes in this PR will be included in the next version bump. This PR includes changesets to release 5 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Codeowners approval required for this PR:
Show detailed file reviewers |
|
✅ All changesets look good |
|
Now I have a full picture. Let me verify the type signature. The export function install(options: InstallOptions & {unpack?: true}): Promise<InstalledBrowser>;
export function install(options: InstallOptions & {unpack: false}): Promise<string>;The wrapper function signature Now let me verify the error message in the source matches the regex: The error in @puppeteer/browsers is: The regex is: This correctly captures Let me also verify that the error path captured ( One thing I notice: the downloadProgressCallback?: 'default' | ((downloadedBytes: number, totalBytes: number) => void);The explicit annotations on the callback parameters are fine — they were likely needed because extracting the options object into a separate variable lost the contextual typing from Let me also check for any potential issues with the The regex captures the path between Let me also verify the error thrown in the cleanup failure case preserves the original error information: throw new Error(
`Failed to clear corrupted Chrome cache at ${corruptedPath} after detecting "${(e as Error).message}". Manual cleanup may be required.`,
{ cause: cleanupError }
);This preserves the cleanup error as Actually, looking more carefully — this is a minor concern. The original error message is fully quoted in the new error message string, which is sufficient for diagnostics. The Overall this is a clean, well-scoped change. The logic is sound, the recovery is single-retry (no infinite loops), non-matching errors pass through unchanged, and the regex degradation mode (if @puppeteer/browsers changes the error message) is a safe no-op. LGTM |
create-cloudflare
@cloudflare/kv-asset-handler
miniflare
@cloudflare/pages-shared
@cloudflare/unenv-preset
@cloudflare/vite-plugin
@cloudflare/vitest-pool-workers
@cloudflare/workers-editor-shared
@cloudflare/workers-utils
wrangler
commit: |
workers-devprod
left a comment
There was a problem hiding this comment.
Codeowners reviews satisfied
- Stop spinner with a failure message if install throws after a download was already in progress, so the spinner does not stay spinning on error. - Wrap the post-recovery retry install() error with a message that mentions the corruption-recovery context, preserving the original error via `Error.cause`. Addresses Devin review comments on #13980.
Browser Run tests in `packages/miniflare/test/plugins/browser/index.spec.ts` and the `fixtures/browser-run` fixture call into `@puppeteer/browsers` to ensure Chrome is downloaded into the global Wrangler cache. Every CI run currently re-downloads ~150 MB of Chrome from scratch because the cache directory is per-runner-instance and not shared between runs. Add an `actions/cache@v4` step keyed on the OS + the Chrome version hardcoded in `packages/miniflare/src/index.ts` (`126.0.6478.182")`, restoring/saving `~/.cache/.wrangler/chrome` (Linux), `~/Library/Caches/.wrangler/chrome` (macOS), and `~/AppData/Local/xdg.cache/.wrangler/chrome` (Windows). Benefits: - Cuts ~150 MB and the associated download time off cold CI runs. - Reduces the surface area for the intermittent partial-extraction race that surfaces as `The browser folder (...) exists but the executable (...) is missing` (see #13971 for the diagnostic that exposed this, #13980 for the in-process recovery layer). When the cache is warm and the binary is already extracted, this race can't fire at all because `install()` short-circuits. The cache step runs for the `packages-and-tools` suite on all three OSes and for the `fixtures` suite on macOS + Windows (the Browser Run fixture is excluded on Ubuntu because of AppArmor). When the Chrome version in `packages/miniflare/src/index.ts` changes, the cache key here needs to be bumped manually. A miss only triggers a fresh download — no functional impact.
No tracked issue; follow-up to #13971.
What
When Miniflare's local Browser Run binding launches Chrome, it calls
@puppeteer/browsers'install()to ensure the binary is present. If a previousinstall()was interrupted mid-extraction (test timeout, process kill, antivirus quarantine), the cache directory can be left partially populated — the folder exists but the executable inside it is missing.install()then throws on every subsequent call within the same process for the rest of the test session:This was visible on a previous CI run for #13971 (job log) and recurs intermittently on Windows runners. With
retry: 3configured on the affected tests, all four attempts hit the same corrupted state and all four fail.How
Wrap the
install()call inlaunchBrowserwithinstallWithCorruptedCacheRecovery, which:install(options)./The browser folder \((.+?)\) exists but the executable .+? is missing/, captures the corrupted path.removeDirs the corrupted path, retriesinstall()once.The regex is matched against the literal error string from
@puppeteer/browsers@2.10.6/src/install.ts:309-311. If@puppeteer/browsersever rewords that error, this branch becomes a no-op (we'd rethrow the original error, just without recovery) — no risk of silently swallowing unrelated install failures.Test plan
pnpm -F miniflare check:type✅pnpm check:lint✅pnpm check:format✅pnpm -F miniflare test:ci test/plugins/browser/index.spec.ts✅ — all 21 browser tests pass on the happy path~/Library/Caches/.wrangler/chrome/mac_arm-126.0.6478.182/.../Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing), then ranpnpm -F miniflare test:ci test/plugins/browser/index.spec.ts -t "it creates a browser session". The test passed — the install error was detected, the cache directory was cleared, and Chrome was re-downloaded on the retry.Why no new tests
The recovery path is only hit when an external system (filesystem state, antivirus, OS) puts the cache into an invalid state. Writing a unit test that synthetically corrupts the cache would either (a) duplicate the @puppeteer/browsers state machine in mock code, or (b) actually invoke the real
install()against a real cache, which makes it slow and flaky. The manual verification above exercises the real flow against a real corrupted cache, which is what we care about.pnpm -F miniflare test:ci test/plugins/browser/index.spec.ts -t "it creates a browser session", confirmed the install error was detected, the cache directory was cleared, and Chrome was re-downloaded. See "Test plan" above.@puppeteer/browserscache failure mode. No public API or user-facing behaviour change beyond a more helpful warning log when the recovery fires.