Skip to content

flaky tests - percySnapshot: cap upload wait at 25s, log phase timing#4806

Merged
habdelra merged 3 commits into
mainfrom
fix-percy-snapshot-flake
May 13, 2026
Merged

flaky tests - percySnapshot: cap upload wait at 25s, log phase timing#4806
habdelra merged 3 commits into
mainfrom
fix-percy-snapshot-flake

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

The flake

Run 25775029967 (host shard 13) failed three tests in Acceptance | interact submode tests > 1 stack:

# Test Wall Failure
12 restoring the stack from query param 74907 ms Test took longer than 60000ms; test timed out.
14 restoring the stack from query param when card is in edit format 69514 ms Assertion occurred after test finished (referencing test 12's assert.dom(...).includesText("Person"))
17 search can be dismissed with escape 9755 ms Assertion occurred after test finished (referencing test 14's assert.dom('[data-test-field="firstName"] input').exists)

Mechanism

The local Percy SDK retries page navigation on a 30 s budget per attempt. Logs show two [percy] Retrying snapshot lines spaced ~30 s apart inside test 12 alone — those two stacked retries push await percySnapshot(...) past QUnit's 60 s test budget.

When QUnit times out test 12, the still-pending await eventually resolves late. The lines AFTER await percySnapshot(...) (test:148) — assert.dom(...).includesText('Person') — then run and push assertions onto a test QUnit already marked dead. QUnit reports that as a global failure attached to whichever test is currently running. That's how test 12's leftover assertion ends up failing test 14, and test 14's leftover assertion ends up failing test 17. The actual problem is always upstream: Percy was slow in test 12.

Fix

  • Race the actual originalPercySnapshot(...) call against a 25 s budget (well under QUnit's 60 s default). When the budget fires, the await returns; the test continues and exits cleanly within budget. The visual diff for that particular snapshot may be missing, but no late assertions contaminate later tests.
  • Attach .catch to the upstream Percy promise BEFORE the race, so if it eventually rejects after we've moved on it doesn't surface as an unhandled rejection during a later test.
  • Log per-phase timing (settled / fonts / images over pending count / percy) when a snapshot abandons OR when it runs over 5 s. If this flake recurs we'll see which phase blew up.

Test plan

  • CI host shards green on this branch.
  • If a percy snapshot still abandons, the [percy-snapshot] "<test name>" abandoned after Xms warning surfaces the timing breakdown so we can decide whether to bump the budget, fix Percy server config, or pursue something else.

🤖 Generated with Claude Code

When the local Percy SDK retries page navigation (30s budget per attempt),
two stacked retries push `await percySnapshot(...)` past QUnit's 60s test
budget. QUnit kills the test, then the still-pending await resolves late
and the lines after it push assertions onto a test QUnit already marked
dead — surfaced as "Assertion occurred after test finished" attached to
whichever test is currently running. That cross-test contamination is the
flake mode: a slow Percy call in test N looks like an unrelated failure
in test N+1.

Race the upload against a 25s budget so the test continues even when
Percy retries internally, attach a late-rejection catch so an abandoned
upload doesn't surface as an unhandled rejection in a later test, and
log per-phase timing (settled / fonts / images / percy) when a snapshot
abandons or runs over 5s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 633470fba5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/host/tests/helpers/percy-snapshot.ts Outdated
The previous commit unconditionally swallowed all rejections from the
upload promise. That hid real upload failures (Percy server down, bad
request, SDK misconfig) — those should propagate and fail the test.

Gate the swallow on an `abandoned` flag that flips only when the budget
timer fires. Rejections arriving before that flips still surface through
`Promise.race` and fail the test as before; rejections arriving after
log a warning and don't pollute a later test with an unhandled rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the percySnapshot test helper against Percy SDK upload stalls that can exceed QUnit’s per-test timeout, causing late-running assertions to leak into subsequent tests and create misleading failures.

Changes:

  • Adds a 25s upload budget for Percy snapshots via Promise.race, allowing tests to continue/finish even if Percy retries internally.
  • Attaches an early .catch handler to the Percy upload promise to prevent late rejections from surfacing as unhandled rejections after a snapshot is abandoned.
  • Logs per-phase timing (settled, fonts, images, percy) when snapshots are slow or abandoned to aid future debugging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/host/tests/helpers/percy-snapshot.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Preview deployments

Host Test Results

    1 files      1 suites   6h 11m 57s ⏱️
2 658 tests 2 642 ✅ 15 💤 0 ❌ 1 🔥
8 031 runs  7 984 ✅ 45 💤 1 ❌ 1 🔥

Results for commit 2050137.

For more details on these errors, see this check.

Realm Server Test Results

    1 files  ±    0      1 suites  +1   12m 14s ⏱️ + 12m 14s
1 345 tests +1 345  1 345 ✅ +1 345  0 💤 ±0  0 ❌ ±0 
1 424 runs  +1 424  1 424 ✅ +1 424  0 💤 ±0  0 ❌ ±0 

Results for commit 2050137. ± Comparison against earlier commit 6744fdc.

If `upload` resolves or rejects before the budget elapses, the timer
keeps running and fires later — flipping a now-stale `abandoned` flag
inside a different test's runtime and holding the snapshot's local
closure live for the rest of the budget. Capture the handle and clear
it in `finally` so each invocation leaves no stray timers behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team May 13, 2026 13:35
@habdelra habdelra changed the title percySnapshot: cap upload wait at 25s, log phase timing flaky tests - percySnapshot: cap upload wait at 25s, log phase timing May 13, 2026
@habdelra
Copy link
Copy Markdown
Contributor Author

habdelra commented May 13, 2026

it seems silly that the test timeout budget is taken up by percy. it would be nice if percy uploads could just happen out of band so that they don't use up the test timeout budget. of course we'd have to think about how to block the test shard itself on waiting for all the percy uploads to complete, as well as to make sure that any exceptions during the upload don't crash into the context of other tests, but that they are still visible in teh logs for debugging. but i think the real work would have to be upstream in percy's API such that it would only await the time to take the snapshot and fire the request, as well as have a place to drain the async....

@burieberry
Copy link
Copy Markdown
Contributor

Are the 54 missing snapshots related to this time cap?

@habdelra
Copy link
Copy Markdown
Contributor Author

habdelra commented May 13, 2026

Are the 54 missing snapshots related to this time cap?

unsure. it not clear if the time we are waiting on is the request to percy or the response-

although bascially what we are doing is putting more value on the host test pass/fail signal vs the percy snapshot signal. percy sometimes takes a very long time to submit the snapshots, and all that time counts towards running the test. if percy takes too long the test will fail even though nothing is wrong with the host test. so there is a tradeoff, which is signal is more important to us: the host test pass/fail or the percy snapshot. generally 25s is a pretty high amount of time. but if percy has SLA issues of their own, then yes snapshots may be missing.

@habdelra
Copy link
Copy Markdown
Contributor Author

@burieberry i ran the host tests again and this time there are no missing snapshots.

@habdelra habdelra merged commit 2a45e3d into main May 13, 2026
129 of 131 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants