Skip to content

Clear stale SW caches before reloading on ChunkLoadError#92339

Merged
mountiny merged 15 commits into
mainfrom
michal-fix-safari-crash
Jun 8, 2026
Merged

Clear stale SW caches before reloading on ChunkLoadError#92339
mountiny merged 15 commits into
mainfrom
michal-fix-safari-crash

Conversation

@mjasikowski
Copy link
Copy Markdown
Contributor

@mjasikowski mjasikowski commented Jun 2, 2026

Both the manual Refresh path (usePageRefresh) and the automatic lazyRetry path were calling window.location.reload() without first clearing the Workbox service worker caches. In Safari PWA standalone mode the new SW (skipWaiting + clientsClaim) intercepts the reload and re-serves the same stale precached shell, reproducing the identical ChunkLoadError on every refresh attempt.

Route both reloads through clearWorkboxRecoveryCaches() first, which unregisters the SW and deletes all Cache Storage so the reload fetches a fresh, internally-consistent shell from the CDN.

Adds regression tests for both paths.

Explanation of Change

Fixed Issues

$ #92407
PROPOSAL:

Tests

  1. In Safari PWA, run the following script in console:
(async () => {
  const report = {controller: navigator.serviceWorker.controller?.scriptURL ?? 'NONE', patched: [], caches: {}};
  if (report.controller === 'NONE') {
    report.note = 'Page is not SW-controlled yet. Reload once, then run again.';
    return JSON.stringify(report, null, 2);
  }
  for (const name of await caches.keys()) {
    const cache = await caches.open(name);
    const keys = await cache.keys();
    report.caches[name] = keys.length;
    for (const req of keys) {
      if (!/\/runtime-[^/]*\.bundle\.js/.test(req.url)) continue;
      const res = await cache.match(req);
      const text = await res.text();
      if (!text.includes('.p="/"')) { report.patched.push(`${req.url} :: PATTERN NOT FOUND`); continue; }
      const headers = new Headers(res.headers);
      headers.delete('content-encoding');
      headers.delete('content-length');
      const broken = text.replace('.p="/"', '.p="/__chunk404__/"');
      await cache.put(req, new Response(broken, {status: res.status, statusText: res.statusText, headers}));
      report.patched.push(`${name} :: ${req.url} :: OK`);
    }
  }
  report.note = report.patched.some((p) => p.endsWith('OK'))
    ? 'Patched. Fully quit the PWA (swipe away) and reopen while logged in to see the ChunkLoadError loop.'
    : 'No runtime patched — paste this output back.';
  return JSON.stringify(report, null, 2);
})();
  1. Refresh the app
  2. Confirm that the app reloads fine

(if the QA fails and it breaks, run the following the script to fix)

(async () => { for (const n of await caches.keys()) await caches.delete(n); for (const r of await navigator.serviceWorker.getRegistrations()) await r.unregister(); location.reload(); })();
  • Verify that no errors appear in the JS console

Offline tests

QA Steps

Same as tests

  • Verify that no errors appear in the JS console

PR Author Checklist

  • I linked the correct issue in the ### Fixed Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I added steps for the expected offline behavior in the Offline steps section
    • I added steps for Staging and/or Production testing in the QA steps section
    • I added steps to cover failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android: Native
    • Android: mWeb Chrome
    • iOS: Native
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
    • MacOS: Desktop
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I verified there are no new alerts related to the canBeMissing param for useOnyx
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
      • If any non-english text was added/modified, I used JaimeGPT to get English > Spanish translation. I then posted it in #expensify-open-source and it was approved by an internal Expensify engineer. Link to Slack message:
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))
  • If new assets were added or existing ones were modified, I verified that:
    • The assets are optimized and compressed (for SVG files, run npm run compress-svg)
    • The assets load correctly across all supported platforms.
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • I added unit tests for any new feature or bug fix in this PR to help automatically prevent regressions in this user flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.

Both the manual Refresh path (usePageRefresh) and the automatic
lazyRetry path were calling window.location.reload() without first
clearing the Workbox service worker caches. In Safari PWA standalone
mode the new SW (skipWaiting + clientsClaim) intercepts the reload and
re-serves the same stale precached shell, reproducing the identical
ChunkLoadError on every refresh attempt.

Route both reloads through clearWorkboxRecoveryCaches() first, which
unregisters the SW and deletes all Cache Storage so the reload fetches
a fresh, internally-consistent shell from the CDN.

Adds regression tests for both paths.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🚧 @mjasikowski has triggered a test Expensify/App build. You can view the workflow run here.

@github-actions

This comment has been minimized.

@mjasikowski
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mjasikowski
Copy link
Copy Markdown
Contributor Author

@MelvinBot create an issue for this

@MelvinBot
Copy link
Copy Markdown
Contributor

🤖 Created an issue for this: #92407

It describes the Safari PWA ChunkLoadError crash loop (stale SW precache re-served on reload) that this PR fixes. You can link it in the Fixed Issues section above.

index.html is served with Cache-Control: max-age=86400, so after the
service worker and Cache Storage are cleared, a plain window.location.reload()
can still be handed a stale shell from Safari's HTTP cache (or the CDN edge)
that references dead chunk hashes, keeping the loop alive on Safari PWA.

Add reloadWithCacheBust(), which navigates to a unique URL via location.replace
so the document misses the HTTP/edge cache and is fetched fresh, and use it in
both recovery paths after clearing caches. Falls back to a plain reload if the
navigation throws.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🚧 @mjasikowski has triggered a test Expensify/App build. You can view the workflow run here.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

❌ Looks like you've decreased code coverage for some files. Please write tests to increase, or at least maintain, the existing level of code coverage. See our documentation here for how to interpret this table.

Files with missing lines Coverage Δ
src/components/LazyModalSlot.tsx 36.36% <ø> (-5.31%) ⬇️
src/hooks/usePageRefresh/index.ts 100.00% <100.00%> (+100.00%) ⬆️
src/libs/clearWorkboxRecoveryCaches/index.ts 21.42% <ø> (ø)
src/utils/lazyRetry.ts 100.00% <100.00%> (+100.00%) ⬆️
src/libs/Navigation/AppNavigator/index.tsx 0.00% <0.00%> (ø)
src/libs/isChunkLoadError.ts 75.00% <75.00%> (ø)
src/libs/Navigation/AppNavigator/AppNavigator.tsx 0.00% <0.00%> (ø)
src/pages/ErrorPage/GenericErrorPage.tsx 0.00% <0.00%> (ø)
... and 159 files with indirect coverage changes

@github-actions

This comment has been minimized.

The service-worker precache is the confirmed source of the stale shell, and
clearWorkboxRecoveryCaches() before a plain reload addresses it. A reload
navigation already bypasses Safari's local HTTP cache (cache-mode "reload"),
so the cache-busting URL added little beyond defeating a stale CDN edge entry
(better handled by edge no-cache headers) while leaving a forceReload query
param in the address bar. Revert it back to a plain window.location.reload().
A chunk load can fail from a transient network blip, not just a
stale post-deploy shell. Routing the first failure through
clearWorkboxRecoveryCaches() nuked all caches and re-precached
the full app unnecessarily for a problem a plain reload would fix.

First failure: plain window.location.reload() (cheap, handles blips).
Second failure: clearWorkboxRecoveryCaches() then reload (handles
the stale SW precache scenario where the plain reload did not help).

lazyRetry no longer rejects on second failure - it always reloads.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🚧 @mjasikowski has triggered a test Expensify/App build. You can view the workflow run here.

@github-actions

This comment has been minimized.

@mjasikowski mjasikowski marked this pull request as ready for review June 3, 2026 11:54
@mjasikowski mjasikowski requested review from a team as code owners June 3, 2026 11:54
@melvin-bot melvin-bot Bot requested review from QichenZhu and heyjennahay and removed request for a team June 3, 2026 11:54
@mjasikowski mjasikowski added the InternalQA This pull request required internal QA label Jun 3, 2026
@mjasikowski mjasikowski added the Internal Requires API changes or must be handled by Expensify staff label Jun 3, 2026
@mjasikowski mjasikowski requested a review from a team June 3, 2026 11:55
Comment thread src/hooks/usePageRefresh/index.ts Outdated

window.location.reload();
sessionStorage.removeItem(CONST.SESSION_STORAGE_KEYS.LAST_REFRESH_TIMESTAMP);
clearWorkboxRecoveryCaches().then(() => window.location.reload());
Copy link
Copy Markdown
Contributor

@situchan situchan Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If caches.keys or navigator.serviceWorker.getRegistrations fails on safari (the exact platform this bug targets), this promise is never resolved so reload won't happen. Is this expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of them are wrapped in try/catch, so the promise will resolve

Comment thread src/utils/lazyRetry.ts Outdated
Comment on lines +18 to +23
function isChunkLoadError(error: unknown): boolean {
if (!(error instanceof Error)) {
return false;
}
return error.name === CONST.CHUNK_LOAD_ERROR || /Loading chunk \S+ failed/i.test(error.message);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is duplicated. Consider extracting a shared function and use it everywhere.

const isChunkLoadError = (error: Error) => error.name === 'ChunkLoadError' || /Loading chunk \S+ failed/i.test(error.message);

Also here:

const isChunkLoadError = error?.name === CONST.CHUNK_LOAD_ERROR || /Loading chunk [\d]+ failed/.test(error?.message ?? '');

One minor difference is [\d]+ vs \S+. Latter one seems better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted isChunkLoadError.ts to a separate file

Comment thread src/hooks/usePageRefresh/index.ts Outdated

window.location.reload();
sessionStorage.removeItem(CONST.SESSION_STORAGE_KEYS.LAST_REFRESH_TIMESTAMP);
clearWorkboxRecoveryCaches().then(() => window.location.reload());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When isChunkLoadError is false but the user hits Refresh twice within ERROR_WINDOW_RELOAD_TIMEOUT, it now falls through to clearWorkboxRecoveryCaches() here even for non-chunk errors. Hope this is fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it so clearWorkboxRecoveryCaches() is now only called when isChunkLoadError is true. Non-chunk errors use a plain window.location.reload(). Also added a test to cover that scenario.

… path

isChunkLoadError was defined three times with slight differences
(LazyModalSlot.tsx, GenericErrorPage.tsx, lazyRetry.ts). Extract it to
src/libs/isChunkLoadError.ts and import from there in all three places.
Uses CONST.CHUNK_LOAD_ERROR and the broader \S+ pattern everywhere.

usePageRefresh was calling clearWorkboxRecoveryCaches() for all reload
paths including non-chunk errors. When Refresh was clicked twice within
ERROR_WINDOW_RELOAD_TIMEOUT for a generic render error the fall-through
branch would unnecessarily clear the service worker cache. Gate the
cache-clear on isChunkLoadError and use a plain reload otherwise.
@mjasikowski mjasikowski requested a review from situchan June 4, 2026 10:23
@situchan
Copy link
Copy Markdown
Contributor

situchan commented Jun 5, 2026

Reviewer Checklist

  • I have verified the author checklist is complete (all boxes are checked off).
  • I verified the correct issue is linked in the ### Fixed Issues section above
  • I verified testing steps are clear and they cover the changes made in this PR
    • I verified the steps for local testing are in the Tests section
    • I verified the steps for Staging and/or Production testing are in the QA steps section
    • I verified the steps cover any possible failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
  • I checked that screenshots or videos are included for tests on all platforms
  • I included screenshots or videos for tests on all platforms
  • I verified that the composer does not automatically focus or open the keyboard on mobile unless explicitly intended. This includes checking that returning the app from the background does not unexpectedly open the keyboard.
  • I verified tests pass on all platforms & I tested again on:
    • Android: HybridApp
    • Android: mWeb Chrome
    • iOS: HybridApp
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
    • MacOS: Desktop
  • If there are any errors in the console that are unrelated to this PR, I either fixed them (preferred) or linked to where I reported them in Slack
  • I verified there are no new alerts related to the canBeMissing param for useOnyx
  • I verified proper code patterns were followed (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick).
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I verified that this PR follows the guidelines as stated in the Review Guidelines
  • I verified other components that can be impacted by these changes have been tested, and I retested again (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar have been tested & I retested again)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG)
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • For any bug fix or new feature in this PR, I verified that sufficient unit tests are included to prevent regressions in this flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR reviewer checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: HybridApp
android.mov
Android: mWeb Chrome
mchrome.mov
iOS: HybridApp
ios.mov
iOS: mWeb Safari
msafari.mov
MacOS: Chrome / Safari
safari.mov
web.mov

@situchan
Copy link
Copy Markdown
Contributor

situchan commented Jun 5, 2026

Please merge main to fix failing tests

@mjasikowski
Copy link
Copy Markdown
Contributor Author

@MelvinBot merge main

Co-authored-by: Michał Jasikowski <mjasikowski@users.noreply.github.com>
@MelvinBot
Copy link
Copy Markdown
Contributor

🤖 Merged main into michal-fix-safari-crash (merge commit aabcf88d5c9) and pushed. The branch is now up to date with main with no conflicts.

@situchan
Copy link
Copy Markdown
Contributor

situchan commented Jun 5, 2026

Regression Analysis

🔴 1. usePageRefresh wipes the offline precache with no online guard — destroys the exact thing the PR protects in lazyRetry

The whole reason lazyRetry added && navigator.onLine is that clearing Cache Storage while offline nukes the only app shell keeping the PWA usable. usePageRefresh has no such guard.

-   window.location.reload();
+   clearWorkboxRecoveryCaches().then(() => window.location.reload());

This branch is taken whenever isChunkLoadError === true. Trace the offline path:

  1. User is offline, app shell chunk fails → error page renders (from SW cache).
  2. GenericErrorPage calls refreshPage(isChunkLoadError) with true.
  3. New code → clearWorkboxRecoveryCaches() deletes every Cache Storage entry and unregisters the SW — including the offline precache — then reloads with no network.

Before: plain reload, offline shell preserved → app still works.
After: offline shell gone → app is unusable until connectivity returns. This is precisely the failure lazyRetry's online guard was written to prevent, just on the sibling path. Add the same navigator.onLine guard here.

Summary: Please check if it regresses offline users on the exact scenario the PR set out to handle safely, and the fix is a one-line navigator.onLine guard mirroring lazyRetry

🟠 2. Reset-to-INITIAL on the terminal reject re-arms the full reload cycle (old code latched permanently)

Old final-failure branch did reject(component) and left the flag at 'true'. New code resets it:

    // else branch (exhausted / offline / non-chunk):
    sessionStorage.setItem(CONST.SESSION_STORAGE_KEYS.RETRY_LAZY_REFRESHED, RETRY_STATE.INITIAL);
    reject(...);

Consequence for a genuinely broken deploy (chunks still missing on CDN):

Before After
Land on error page flag stays 'true' flag reset to 'false'
User taps Refresh → reload next lazyRetry'true' branch → immediate reject, 1 reload total INITIAL→ reload, RELOADEDclear caches+reload, then reject → 2 automatic reloads + a cache wipe per click

So every manual refresh now triggers a 2-reload dance and another Cache Storage wipe before the error page returns, versus a single reload before.
The reset is also redundant for recovery — the success branch already sets INITIAL on the next good import.
So reset-on-reject buys nothing for the happy path and only weakens loop-prevention.
Recommend dropping the reset in the else branch (keep CACHE_CLEARED latched, like the old code latched 'true').

Summary: Lower-severity UX/cache-churn regression with a clean fix (don't reset on reject)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aabcf88d5c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/utils/lazyRetry.ts Outdated
.then((component) => {
// Reset the retry status to 'false' on successful import
sessionStorage.setItem(CONST.SESSION_STORAGE_KEYS.RETRY_LAZY_REFRESHED, 'false'); // success so reset the refresh
sessionStorage.setItem(CONST.SESSION_STORAGE_KEYS.RETRY_LAZY_REFRESHED, RETRY_STATE.INITIAL);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve retry state across unrelated lazy successes

When a nested lazy import fails after the first reload, another lazy import can succeed first and reset the global RETRY_LAZY_REFRESHED flag here. In the inspected navigation path, AppNavigator is lazy-loaded before AuthScreens/PublicScreens, so a stale AuthScreens chunk can be treated as a first failure again after AppNavigator succeeds, meaning the new cache-clearing branch never runs and the app can keep plain-reloading against the stale SW cache. Keep this retry state until the failing import recovers, or scope it per chunk/import.

Useful? React with 👍 / 👎.

// The error page is shown after lazyRetry has already done a plain reload
// and it did not fix the problem. Clear the service worker cache so the
// next load fetches a fresh app shell from the CDN.
clearWorkboxRecoveryCaches().then(() => window.location.reload());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid clearing offline precache from Refresh

When lazyRetry rejects a second ChunkLoadError because navigator.onLine is false, the error page still calls refreshPage(true). Pressing Refresh while still offline now unconditionally deletes Cache Storage and unregisters the service worker before reloading, removing the offline app shell that the lazyRetry offline guard deliberately preserved. Gate this cache clear on being online, or fall back to the previous reload/reset behavior while offline.

Useful? React with 👍 / 👎.

@MelvinBot
Copy link
Copy Markdown
Contributor

🤖 Code review — 7 files, +275/−25. The fix is well-structured: the three-state lazyRetry machine, the navigator.onLine guard, the shared isChunkLoadError helper, and the regression tests all look solid. One issue is worth addressing before merge, plus a few smaller considerations.

🟠 The manual Refresh button clears the SW cache while offline — the exact scenario the automatic path guards against

The automatic path only clears the service-worker cache when online, and the code comment calls this guard "critical": clearing it offline "would destroy the cached app shell that is the only thing keeping the PWA usable until connectivity returns"src/utils/lazyRetry.ts:51.

But the manual Refresh button has no such guard. usePageRefresh clears caches whenever its isChunkLoadError flag is true (src/hooks/usePageRefresh/index.ts:21-25), and GenericErrorPage passes that flag straight through with no connectivity check (src/pages/ErrorPage/GenericErrorPage.tsx:67).

This is reachable end-to-end: when a chunk fails offline, lazyRetry (after the first plain reload) hits the else branch and rejects to the error boundary, which renders GenericErrorPage with the ChunkLoadError. If the still-offline user taps Refresh, clearWorkboxRecoveryCaches() deletes all Cache Storage and unregisters the SW, then reloads — but offline the reload can't fetch index.html or any chunk, so the PWA is bricked until connectivity returns. That is precisely the outcome the automatic guard was added to prevent.

Suggested fix: mirror the guard on the manual path, e.g. refreshPage(chunkLoadError && navigator.onLine) in GenericErrorPage, or check navigator.onLine inside usePageRefresh before clearing caches.

Smaller considerations (non-blocking)
  • Retry flag reset re-enables the whole cycle. On the terminal reject, lazyRetry resets RETRY_LAZY_REFRESHED back to INITIAL (src/utils/lazyRetry.ts:64), whereas the old code left the flag set after exhausting retries. For a persistently broken deploy, navigating between different lazy routes can re-trigger the full reload + cache-clear cycle rather than landing on the error page. The comment frames this as intentional — just worth confirming it can't cause reload churn across multiple lazy routes.

  • navigator.onLine is a weak signal. It reflects only whether a network interface exists, not real connectivity. On Safari / captive portals / flaky cellular it can report true while fetches fail, so even the automatic guard can clear the cache and brick the offline shell. An inherent limitation of the API, but relevant given the guard is the linchpin of the offline-safety argument.

  • Test variant resolution. The web-path tests force the web hook via jest.requireActual('../../src/hooks/usePageRefresh/index.ts'). This depends on that path resolving to the web file (not .native.ts) under jest-expo's platform resolution. A green CI run confirms it — the assertions require clear + reload, which the native variant (return resetBoundary) can't produce — so just make sure those tests actually execute rather than silently no-op.

Static review only — I did not run the app or the test suite. The headline finding is verified by reading the code paths; the items above are lower-confidence.

Copy link
Copy Markdown
Contributor

@situchan situchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Please check and address above comments if applicable.

@mountiny
Copy link
Copy Markdown
Contributor

mountiny commented Jun 5, 2026

@mjasikowski let me know once you resolve those last comments, thanks!

- Scope the retry state by a per-import key so a sibling chunk's success
  (e.g. AppNavigator resolving) can no longer reset the retry flag of a
  still-failing chunk (e.g. AuthScreens), which otherwise restarted that
  chunk's cycle forever instead of reaching the cache-clearing branch.
  Callers now pass a stable retryKey.
- Gate usePageRefresh's cache clear on navigator.onLine. When lazyRetry
  rejects a second ChunkLoadError offline, tapping Refresh would otherwise
  wipe the cached app shell that the offline guard deliberately preserved.
@mjasikowski
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mjasikowski
Copy link
Copy Markdown
Contributor Author

@mountiny I think we're good here

@MelvinBot
Copy link
Copy Markdown
Contributor

Regression review

No crashing/loop regressions found — the new retry state machine is bounded (max 2 automatic reloads per import key, then it rejects to the error boundary), all 3 lazyRetry call sites were updated with the new required retryKey, the new constants exist, and native/jsdom are guarded. The items below are behavioral-change risks worth a conscious sign-off, ranked by blast radius.


1. lazyRetry now wipes the entire SW/Cache Storage on environmental chunk failures, not just stale deploys ⚠️ (highest risk)

Previously lazyRetry never touched Cache Storage or the service worker — it only reloaded once, then rejected. Now, on the second failure when the error is a ChunkLoadError and the device is online, it calls clearWorkboxRecoveryCaches(), which unregisters every SW and deletes all Cache Storage for the origin (lazyRetry.ts:64-71clearWorkboxRecoveryCaches/index.ts:22-40).

The PR itself notes chunk failures are "largely environmental (deploy churn, ad-blockers, flaky networks)". The new code can't distinguish a stale-deploy chunk miss from an ad-blocker / content-filter / flaky-network chunk miss — any online second failure now nukes the offline PWA shell. For a user whose chunk is persistently blocked (ad-blocker), this leaves them with no SW and no offline support until they can fetch a fresh shell, which is strictly worse than before.

It's bounded (the CACHE_CLEARED guard means it happens at most once per import key per session) and self-heals on a later success, so it's a tradeoff rather than a bug — but it's a real broadening of blast radius. Worth confirming this is intended for the non-deploy cases.

2. Final reject now resets the retry flag to INITIAL (was 'true')

When all recovery is exhausted, the flag is reset to INITIAL before rejecting (lazyRetry.ts:72-79). Old code left it at 'true'. Consequence: after the user has already hit the error boundary once, the next independent lazy-import failure will trigger a full-page reload again instead of failing fast. Still bounded (no infinite loop), and arguably better recovery — just a more reload-happy posture than main. Flagging so it's a conscious choice.

3. Broadened chunk-error detection in GenericErrorPage

GenericErrorPage previously matched /Loading chunk [\d]+ failed/ (numeric ids only). The shared isChunkLoadError uses /Loading chunk \S+ failed/i (any token, case-insensitive). More generic errors now classify as chunk errors, so the Refresh button routes them through the cache-clearing reload path. Intended unification, but slightly wider than before.

4. Latent: clearWorkboxRecoveryCaches().then(reload) has no .catch

Both new call sites (lazyRetry.ts:71, usePageRefresh/index.ts:26) rely on clearWorkboxRecoveryCaches never rejecting. It's safe today — the function is async with internal try/catch around every await and can't throw synchronously — so the reload always fires. But it's a fragile coupling: if that function is ever changed to throw/reject, the .then(reload) silently never runs and the app hangs on a blank/error screen with no recovery. A .catch(() => window.location.reload()) would harden it.

Things I checked that are NOT regressions
  • All lazyRetry call sites updated — only 3 exist (AppNavigator/index.tsx, AppNavigator/AppNavigator.tsx ×2), all pass retryKey. No missed callers ending up with RETRY_LAZY_REFRESHED:undefined.
  • No infinite reload loop — every path either advances state (INITIALRELOADEDCACHE_CLEARED) or rejects. Max 2 automatic reloads per key.
  • sessionStorage survives the cache clearclearWorkboxRecoveryCaches only touches Cache Storage + SW registrations, not sessionStorage, so the CACHE_CLEARED guard persists across the reload (loop prevention holds).
  • Constants existCONST.CHUNK_LOAD_ERROR, CONST.SESSION_STORAGE_KEYS.RETRY_LAZY_REFRESHED.
  • Native + jsdom safeclearWorkboxRecoveryCaches early-returns on Platform.OS !== 'web'; the native usePageRefresh still returns resetBoundary and harmlessly ignores the new boolean arg.
  • clearWorkboxRecoveryCaches behavior unchanged for the existing Troubleshoot flow in App.ts — only its JSDoc was edited.
  • LazyModalSlot signature widening ((error: Error)(error: unknown)) is backward-compatible.

Caveat: I was not able to execute the Jest suite in this environment (npx jest required interactive approval), so I reviewed the new chunkLoadErrorRecoveryTest.ts logic statically rather than running it.


Next Steps: Reply with @MelvinBot <feedback> to dig deeper into any of these (e.g. "is #1 acceptable for ad-blocker users?").

@mjasikowski
Copy link
Copy Markdown
Contributor Author

Re: Melvin comments:

#1 - If ad-blocker blocks the chunk, it's a really bad ad-blocker and we can't be responsible for that.
I'll fix #2 and #4
#3 is not really a problem

- Do not reset the retry flag to INITIAL when lazyRetry exhausts recovery
  and rejects to the error boundary. Resetting made a later failure of the
  same import restart the full reload cycle; leaving the advanced state
  means it fails fast (already cache-cleared) or retries the cache clear
  once back online. A successful import still resets the flag.
- Add a .catch to clearWorkboxRecoveryCaches() before reload in both call
  sites so the reload still fires if that function is ever changed to
  reject (today it cannot, but the coupling was fragile).
Comment thread src/hooks/usePageRefresh/index.ts Outdated
Comment on lines +26 to +28
clearWorkboxRecoveryCaches()
.catch(() => undefined)
.then(() => window.location.reload());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think catch will never happen here as already caught in clearWorkboxRecoveryCaches.

clearWorkboxRecoveryCaches already swallows errors from caches.keys/delete
and serviceWorker.getRegistrations/unregister internally, so the promise it
returns never rejects. The .catch(() => undefined) was dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@mountiny mountiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets give it a try

@mountiny mountiny merged commit 3781e94 into main Jun 8, 2026
36 checks passed
@mountiny mountiny deleted the michal-fix-safari-crash branch June 8, 2026 09:53
@OSBotify
Copy link
Copy Markdown
Contributor

OSBotify commented Jun 8, 2026

✋ This PR was not deployed to staging yet because QA is ongoing. It will be automatically deployed to staging after the next production release.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 8, 2026

🚧 @mountiny has triggered a test Expensify/App build. You can view the workflow run here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Internal Requires API changes or must be handled by Expensify staff InternalQA This pull request required internal QA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants