flaky tests - percySnapshot: cap upload wait at 25s, log phase timing#4806
Conversation
When the local Percy SDK retries page navigation (30s budget per attempt), two stacked retries push `await percySnapshot(...)` past QUnit's 60s test budget. QUnit kills the test, then the still-pending await resolves late and the lines after it push assertions onto a test QUnit already marked dead — surfaced as "Assertion occurred after test finished" attached to whichever test is currently running. That cross-test contamination is the flake mode: a slow Percy call in test N looks like an unrelated failure in test N+1. Race the upload against a 25s budget so the test continues even when Percy retries internally, attach a late-rejection catch so an abandoned upload doesn't surface as an unhandled rejection in a later test, and log per-phase timing (settled / fonts / images / percy) when a snapshot abandons or runs over 5s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 633470fba5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The previous commit unconditionally swallowed all rejections from the upload promise. That hid real upload failures (Percy server down, bad request, SDK misconfig) — those should propagate and fail the test. Gate the swallow on an `abandoned` flag that flips only when the budget timer fires. Rejections arriving before that flips still surface through `Promise.race` and fail the test as before; rejections arriving after log a warning and don't pollute a later test with an unhandled rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the percySnapshot test helper against Percy SDK upload stalls that can exceed QUnit’s per-test timeout, causing late-running assertions to leak into subsequent tests and create misleading failures.
Changes:
- Adds a 25s upload budget for Percy snapshots via
Promise.race, allowing tests to continue/finish even if Percy retries internally. - Attaches an early
.catchhandler to the Percy upload promise to prevent late rejections from surfacing as unhandled rejections after a snapshot is abandoned. - Logs per-phase timing (
settled,fonts,images,percy) when snapshots are slow or abandoned to aid future debugging.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Preview deploymentsHost Test Results 1 files 1 suites 6h 11m 57s ⏱️ Results for commit 2050137. For more details on these errors, see this check. Realm Server Test Results 1 files ± 0 1 suites +1 12m 14s ⏱️ + 12m 14s Results for commit 2050137. ± Comparison against earlier commit 6744fdc. |
If `upload` resolves or rejects before the budget elapses, the timer keeps running and fires later — flipping a now-stale `abandoned` flag inside a different test's runtime and holding the snapshot's local closure live for the rest of the budget. Capture the handle and clear it in `finally` so each invocation leaves no stray timers behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
it seems silly that the test timeout budget is taken up by percy. it would be nice if percy uploads could just happen out of band so that they don't use up the test timeout budget. of course we'd have to think about how to block the test shard itself on waiting for all the percy uploads to complete, as well as to make sure that any exceptions during the upload don't crash into the context of other tests, but that they are still visible in teh logs for debugging. but i think the real work would have to be upstream in percy's API such that it would only await the time to take the snapshot and fire the request, as well as have a place to drain the async.... |
|
Are the 54 missing snapshots related to this time cap? |
unsure. it not clear if the time we are waiting on is the request to percy or the response- although bascially what we are doing is putting more value on the host test pass/fail signal vs the percy snapshot signal. percy sometimes takes a very long time to submit the snapshots, and all that time counts towards running the test. if percy takes too long the test will fail even though nothing is wrong with the host test. so there is a tradeoff, which is signal is more important to us: the host test pass/fail or the percy snapshot. generally 25s is a pretty high amount of time. but if percy has SLA issues of their own, then yes snapshots may be missing. |
|
@burieberry i ran the host tests again and this time there are no missing snapshots. |
The flake
Run 25775029967 (host shard 13) failed three tests in
Acceptance | interact submode tests > 1 stack:Test took longer than 60000ms; test timed out.Assertion occurred after test finished(referencing test 12'sassert.dom(...).includesText("Person"))Assertion occurred after test finished(referencing test 14'sassert.dom('[data-test-field="firstName"] input').exists)Mechanism
The local Percy SDK retries page navigation on a 30 s budget per attempt. Logs show two
[percy] Retrying snapshotlines spaced ~30 s apart inside test 12 alone — those two stacked retries pushawait percySnapshot(...)past QUnit's 60 s test budget.When QUnit times out test 12, the still-pending
awaiteventually resolves late. The lines AFTERawait percySnapshot(...)(test:148) —assert.dom(...).includesText('Person')— then run and push assertions onto a test QUnit already marked dead. QUnit reports that as a global failure attached to whichever test is currently running. That's how test 12's leftover assertion ends up failing test 14, and test 14's leftover assertion ends up failing test 17. The actual problem is always upstream: Percy was slow in test 12.Fix
originalPercySnapshot(...)call against a 25 s budget (well under QUnit's 60 s default). When the budget fires, the await returns; the test continues and exits cleanly within budget. The visual diff for that particular snapshot may be missing, but no late assertions contaminate later tests..catchto the upstream Percy promise BEFORE the race, so if it eventually rejects after we've moved on it doesn't surface as an unhandled rejection during a later test.settled/fonts/imagesover pending count /percy) when a snapshot abandons OR when it runs over 5 s. If this flake recurs we'll see which phase blew up.Test plan
[percy-snapshot] "<test name>" abandoned after Xmswarning surfaces the timing breakdown so we can decide whether to bump the budget, fix Percy server config, or pursue something else.🤖 Generated with Claude Code