Skip to content

[Performance] App hangs when OpenApp / ReconnectApp / GetMissingOnyxMessages fire in rapid stampedes #92541

@mountiny

Description

@mountiny

If you haven't already, check out our contributing guidelines.

Version Number: Multiple production versions observed (e.g. 9.3.54–9.3.64)
Reproducible in staging?: Unknown
Reproducible in production?: Yes
Email or phone of affected tester (no customers): N/A — reproduced via production telemetry, not a single test account
Logs: https://stackoverflow.com/c/expensify/questions/4856
Expensify/Expensify Issue URL: https://github.com/Expensify/Expensify/issues/644858
Issue reported by: Internal engineering investigation

What performance issue do we need to solve?

The JS thread is blocked for 10+ seconds (dead clicks / fully unresponsive UI) when large OpenApp, ReconnectApp, and GetMissingOnyxMessages responses arrive in rapid succession. Sentry traces show this drives a storm of Onyx updates and deep equality work across subscribers.

Symptom tracking (aggregate): https://expensify.sentry.io/issues/7323151341/ (~14k events / 90 days)

What is the impact on end-users?

  • App appears frozen after navigation, reconnect, or opening reports/search/settings
  • Affects iOS, Android, Windows, and Mac (not platform-specific)
  • Both new and established accounts

Benchmarks / evidence (from production traces)

Payload sizes observed: ~0.5 MB – 4 MB per response on sessions that should be lightweight.

Pattern A — Same-endpoint stampede

One session fires the same command many times in minutes:

Pattern B — Byte-identical duplicate responses (≤3s apart)

Server returns the same payload twice; client processes duplicate Onyx data:

Pattern C — Cross-command overlap (≤10s)

Example hang timeline (iOS HybridApp 9.3.64-31)

  1. OpenApp on Home — https://expensify.sentry.io/explore/traces/trace/940b5545caec4d5ba1ac8ce1c0b8738b
  2. ReconnectApp on Onboarding_Purpose ~32s later — https://expensify.sentry.io/explore/traces/trace/0df88e7f01bc4a1ea048f491b87f021d
  3. APP HANG ~52s after open — https://expensify.sentry.io/issues/7323151341/events/8677a76e805d4948bd37c2412b90e3fd/

Proposed investigation / solution direction

Open questions (from internal triage):

  1. Why are payloads so large? — Possible whole-collection responses; correlate server output.size in logs for OpenApp / ReconnectApp.
  2. Why are commands fired repeatedly? — Network flapping vs uncoordinated client callers (e.g. subscribeToFullReconnect.ts and other call sites).
  3. Sentry ↔ server correlation — Confirm requestID still appears in HTTP breadcrumbs for recent sessions.
  4. Delegate sessions — Check overlap with known GetMissingOnyxMessages over-fetch in delegate mode (PR in review internally).
  5. Mitigation — Client in-flight dedup by updateIDFrom/updateIDTo may help, but root cause (why duplicate/large calls happen) is priority.

Sentry traces above are shareable; customer emails and FullStory remain on the internal issue only.

Platforms

  • iOS: App
  • Android: App (inferred from cross-platform scope)
  • Windows: Chrome
  • MacOS: Chrome / Safari
Issue OwnerCurrent Issue Owner: @adhorodyski

Metadata

Metadata

Assignees

Labels

BugSomething is broken. Auto assigns a BugZero manager.DailyKSv2EngineeringInternalRequires API changes or must be handled by Expensify staffPerformance

Type

No type
No fields configured for issues without a type.

Projects

Status
CRITICAL

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions