matrix: Fix flaky host mode tests by backspace · Pull Request #5086 · cardstack/boxel

backspace · 2026-06-02T22:26:14Z

I saw tests flake here. This makes changes to increase stability:

domcontentloaded lets interaction start sooner, to stay more within timeouts
beforeAll to set up a realm to share between read-only tests
createRealm detects an error and creates another
more prerenderer pages, because publishing in parallel can use up the default count

Ideally the errors in createRealm would be eliminated, but I didn’t want to get deep into that here.

To verify reduced flakiness, I had Claude hack CI to run 20 copies of the flaky shard and did that three times. That revealed some other problems with the publishing endpoint that are recorded in CS-11323, but the above will help at least.

Replace bare `page.goto` (default waitUntil: 'load', which blocks on every subresource) with `waitUntil: 'domcontentloaded'` across the host-mode spec; the tests already wait on the meaningful `[data-test-*]` markers afterward, so the full load wait was the fragile part. Warm up the print test by polling the prerendered HTML for the card marker before navigating cold, mirroring the existing `published card response` test, so navigation only races the publish fan-out once the card is render-ready. Add timing diagnostics on the two observed-flaky navigations so a future timeout reveals whether prerender or the navigation itself was the slow part. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The host-mode beforeEach created a user, built a realm from ~7 card sources, and published it (a full reindex + prerender fan-out) before every test. With fullyParallel + 2 workers, those publish storms ran concurrently and starved individual renders, which is what made the page.goto navigations flaky. Extract the setup into createAndPublishHostModeRealm and run it once per worker via beforeAll for the read-only tests, which now share a single published realm. The two routing-rule tests overwrite realm.json and re-publish, so they keep an isolated per-test realm in their own describe block. Also fold the duplicated publish + warm-up polling into shared helpers (publishRealm, waitForPublishedMarker) and warm up the page-title test's cold navigation the same way the print test does. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Throwaway CI changes to surface the host-mode page.goto flake on the first attempt rather than waiting for it to recur. Revert this whole commit before merging. ci.yaml: - Run only Matrix shard 1, twenty times in parallel (matrix `copy` 1..20), with fail-fast off so every copy reports independently. Per-shard artifact names get the copy suffix so the parallel jobs don't collide on upload. - Force every change-check suite output OFF except `matrix`, so nothing but the stress jobs runs. Editing ci.yaml otherwise flips every suite on via the shared paths-filter anchor. - Disable the "Merge Matrix reports and publish" job: `playwright merge-reports` can't reconcile N blob reports that all claim to be shard 1/3, so it fails and would be the run's only red mark even when every stress shard passes. ci-host.yaml, ci-software-factory.yaml: - Gate the entry jobs with `if: false` so these suites don't run. Both are otherwise triggered for this branch — Software Factory by the packages/matrix/** path (host-mode.spec.ts), and CI Host because editing its own file matches its paths filter. Disabling the entry jobs cascades to skip everything downstream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The 20× shard-1 stress run surfaced a setup flake distinct from the page.goto timeouts: under load, creating a workspace (which provisions a matrix room + personal realm) transiently errors, so the modal shows `data-test-error-message` and stays open instead of rendering the workspace tile. `createRealm` then waited the full 30s on a modal that would never resolve and failed. It hit head-tags.spec.ts too, via its own createRealm call. Make createRealm wait for whichever happens first — the workspace tile or the error — and on a surfaced error, dismiss the modal and retry the whole form from clean state (bounded, with the server message in the final throw). A late-landing prior attempt is detected up front so we never try to create a duplicate endpoint. Because the host-mode read-only tests share one realm via beforeAll, a setup failure there fails the whole group; previously Playwright's hook retry also wedged the hand-rolled browser context. Retry the setup in-hook with a fresh context (swallowing close errors so a wedged context can't mask success or block the retry) under a longer budget. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Each matrix shard runs one prerender server, shared by both Playwright workers (fullyParallel). With the pool size unset it collapses to a fixed 4 tabs, so the shard's concurrent publish + index work (host-mode plus the other published-realm specs) exhausts it: the pool thrashes (`standby refill failed to produce a fresh tab`, cross-affinity steals) and realm-server requests stall, surfacing as 60s page.goto / _publish-realm timeouts that recover on retry. Enable the dynamic pool envelope for the harness's prerender server: keep a 4-tab idle floor (no extra baseline memory) but let it burst to 8 tabs under load. This is the residual flake the 20x shard-1 stress run surfaced after the navigation and createRealm fixes — and the 20 copies are independent runners, so it reflects the real per-shard contention, not the stress harness itself. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The two host-mode routing-rules tests were the suite's heaviest and flakiest: each published a plain realm, then rewrote realm.json with the routing rule and re-published — two full `_publish-realm` fan-outs plus an interact-mode boot per test. Under shard load the second publish (and the boot) repeatedly timed out on the first attempt and only passed on retry. Seed the routing rule into realm.json before the single initial publish (new `routingRulePath` option on createAndPublishHostModeRealm), so each routing test does exactly one publish and no interact-mode navigation — it just polls the published realm and asserts. The assertions are unchanged; only the redundant republish/boot are gone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

This reverts commit e8aa04e.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a840fb6033

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The warm-up polls added for the async publish path used waitUntil's 10s default. Under CI load a published realm can still be indexing past 10s, so the poll would throw earlier than the old bare navigation (bounded by the 60s test timeout) — failing a slow-but-eventually-ready realm instead of stabilizing it. Budget the readiness polls at 45s (under the test timeout, with headroom for the navigation/assertions that follow). This also future-proofs the polls for when _publish-realm returns 202 before indexing finishes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

backspace and others added 3 commits June 2, 2026 16:34

backspace force-pushed the host-mode-matrix-test-flakiness-cs-11323 branch from 9edcfbc to e8aa04e Compare June 3, 2026 12:53

backspace and others added 5 commits June 3, 2026 09:26

Add empty commit

ceeca77

Revert "ci: TEMPORARY host-mode flake stress run (revert before merge)"

a840fb6

This reverts commit e8aa04e.

backspace marked this pull request as ready for review June 3, 2026 19:25

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread packages/matrix/tests/host-mode.spec.ts

backspace requested a review from a team June 3, 2026 20:04

backspace mentioned this pull request Jun 3, 2026

ci: Remove Matrix as factory workflow dependency #5096

Merged

habdelra approved these changes Jun 3, 2026

View reviewed changes

backspace merged commit 6bc503d into main Jun 4, 2026
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

matrix: Fix flaky host mode tests#5086

matrix: Fix flaky host mode tests#5086
backspace merged 9 commits into
mainfrom
host-mode-matrix-test-flakiness-cs-11323

backspace commented Jun 2, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

backspace commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

backspace commented Jun 2, 2026 •

edited

Loading