flaky tests - percySnapshot: cap upload wait at 25s, log phase timing by habdelra · Pull Request #4806 · cardstack/boxel

habdelra · 2026-05-13T07:21:34Z

The flake

Run 25775029967 (host shard 13) failed three tests in Acceptance | interact submode tests > 1 stack:

#	Test	Wall	Failure
12	restoring the stack from query param	74907 ms	`Test took longer than 60000ms; test timed out.`
14	restoring the stack from query param when card is in edit format	69514 ms	`Assertion occurred after test finished` (referencing test 12's `assert.dom(...).includesText("Person")`)
17	search can be dismissed with escape	9755 ms	`Assertion occurred after test finished` (referencing test 14's `assert.dom('[data-test-field="firstName"] input').exists`)

Mechanism

The local Percy SDK retries page navigation on a 30 s budget per attempt. Logs show two [percy] Retrying snapshot lines spaced ~30 s apart inside test 12 alone — those two stacked retries push await percySnapshot(...) past QUnit's 60 s test budget.

When QUnit times out test 12, the still-pending await eventually resolves late. The lines AFTER await percySnapshot(...) (test:148) — assert.dom(...).includesText('Person') — then run and push assertions onto a test QUnit already marked dead. QUnit reports that as a global failure attached to whichever test is currently running. That's how test 12's leftover assertion ends up failing test 14, and test 14's leftover assertion ends up failing test 17. The actual problem is always upstream: Percy was slow in test 12.

Fix

Race the actual originalPercySnapshot(...) call against a 25 s budget (well under QUnit's 60 s default). When the budget fires, the await returns; the test continues and exits cleanly within budget. The visual diff for that particular snapshot may be missing, but no late assertions contaminate later tests.
Attach .catch to the upstream Percy promise BEFORE the race, so if it eventually rejects after we've moved on it doesn't surface as an unhandled rejection during a later test.
Log per-phase timing (settled / fonts / images over pending count / percy) when a snapshot abandons OR when it runs over 5 s. If this flake recurs we'll see which phase blew up.

Test plan

CI host shards green on this branch.
If a percy snapshot still abandons, the [percy-snapshot] "<test name>" abandoned after Xms warning surfaces the timing breakdown so we can decide whether to bump the budget, fix Percy server config, or pursue something else.

🤖 Generated with Claude Code

When the local Percy SDK retries page navigation (30s budget per attempt), two stacked retries push `await percySnapshot(...)` past QUnit's 60s test budget. QUnit kills the test, then the still-pending await resolves late and the lines after it push assertions onto a test QUnit already marked dead — surfaced as "Assertion occurred after test finished" attached to whichever test is currently running. That cross-test contamination is the flake mode: a slow Percy call in test N looks like an unrelated failure in test N+1. Race the upload against a 25s budget so the test continues even when Percy retries internally, attach a late-rejection catch so an abandoned upload doesn't surface as an unhandled rejection in a later test, and log per-phase timing (settled / fonts / images / percy) when a snapshot abandons or runs over 5s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 633470fba5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The previous commit unconditionally swallowed all rejections from the upload promise. That hid real upload failures (Percy server down, bad request, SDK misconfig) — those should propagate and fail the test. Gate the swallow on an `abandoned` flag that flips only when the budget timer fires. Rejections arriving before that flips still surface through `Promise.race` and fail the test as before; rejections arriving after log a warning and don't pollute a later test with an unhandled rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR hardens the percySnapshot test helper against Percy SDK upload stalls that can exceed QUnit’s per-test timeout, causing late-running assertions to leak into subsequent tests and create misleading failures.

Changes:

Adds a 25s upload budget for Percy snapshots via Promise.race, allowing tests to continue/finish even if Percy retries internally.
Attaches an early .catch handler to the Percy upload promise to prevent late rejections from surfacing as unhandled rejections after a snapshot is abandoned.
Logs per-phase timing (settled, fonts, images, percy) when snapshots are slow or abandoned to aid future debugging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-05-13T07:35:10Z

Preview deployments

Host staging preview

Host production preview

Host Test Results

1 files 1 suites 6h 11m 57s ⏱️
2 658 tests 2 642 ✅ 15 💤 0 ❌ 1 🔥
8 031 runs 7 984 ✅ 45 💤 1 ❌ 1 🔥

Results for commit 2050137.

For more details on these errors, see this check.

Realm Server Test Results

1 files ± 0 1 suites +1 12m 14s ⏱️ + 12m 14s
1 345 tests +1 345 1 345 ✅ +1 345 0 💤 ±0 0 ❌ ±0
1 424 runs +1 424 1 424 ✅ +1 424 0 💤 ±0 0 ❌ ±0

Results for commit 2050137. ± Comparison against earlier commit 6744fdc.

If `upload` resolves or rejects before the budget elapses, the timer keeps running and fires later — flipping a now-stale `abandoned` flag inside a different test's runtime and holding the snapshot's local closure live for the rest of the budget. Capture the handle and clear it in `finally` so each invocation leaves no stray timers behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

habdelra · 2026-05-13T15:10:13Z

it seems silly that the test timeout budget is taken up by percy. it would be nice if percy uploads could just happen out of band so that they don't use up the test timeout budget. of course we'd have to think about how to block the test shard itself on waiting for all the percy uploads to complete, as well as to make sure that any exceptions during the upload don't crash into the context of other tests, but that they are still visible in teh logs for debugging. but i think the real work would have to be upstream in percy's API such that it would only await the time to take the snapshot and fire the request, as well as have a place to drain the async....

burieberry · 2026-05-13T15:46:11Z

Are the 54 missing snapshots related to this time cap?

habdelra · 2026-05-13T16:06:28Z

Are the 54 missing snapshots related to this time cap?

unsure. it not clear if the time we are waiting on is the request to percy or the response-

although bascially what we are doing is putting more value on the host test pass/fail signal vs the percy snapshot signal. percy sometimes takes a very long time to submit the snapshots, and all that time counts towards running the test. if percy takes too long the test will fail even though nothing is wrong with the host test. so there is a tradeoff, which is signal is more important to us: the host test pass/fail or the percy snapshot. generally 25s is a pretty high amount of time. but if percy has SLA issues of their own, then yes snapshots may be missing.

habdelra · 2026-05-13T17:35:45Z

@burieberry i ran the host tests again and this time there are no missing snapshots.

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Comment thread packages/host/tests/helpers/percy-snapshot.ts Outdated

habdelra requested a review from Copilot May 13, 2026 07:27

Copilot started reviewing on behalf of habdelra May 13, 2026 07:29 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread packages/host/tests/helpers/percy-snapshot.ts Outdated

habdelra requested a review from a team May 13, 2026 13:35

habdelra changed the title ~~percySnapshot: cap upload wait at 25s, log phase timing~~ flaky tests - percySnapshot: cap upload wait at 25s, log phase timing May 13, 2026

lukemelia approved these changes May 13, 2026

View reviewed changes

habdelra merged commit 2a45e3d into main May 13, 2026
129 of 131 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flaky tests - percySnapshot: cap upload wait at 25s, log phase timing#4806

flaky tests - percySnapshot: cap upload wait at 25s, log phase timing#4806
habdelra merged 3 commits into
mainfrom
fix-percy-snapshot-flake

habdelra commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

habdelra commented May 13, 2026 •

edited

Loading

Uh oh!

burieberry commented May 13, 2026

Uh oh!

habdelra commented May 13, 2026 •

edited

Loading

Uh oh!

habdelra commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

habdelra commented May 13, 2026

The flake

Mechanism

Fix

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview deployments

Host Test Results

Realm Server Test Results

Uh oh!

habdelra commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

burieberry commented May 13, 2026

Uh oh!

habdelra commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

habdelra commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented May 13, 2026 •

edited

Loading

habdelra commented May 13, 2026 •

edited

Loading

habdelra commented May 13, 2026 •

edited

Loading