matrix: Fix flaky host mode tests#5086
Merged
Merged
Conversation
Replace bare `page.goto` (default waitUntil: 'load', which blocks on every subresource) with `waitUntil: 'domcontentloaded'` across the host-mode spec; the tests already wait on the meaningful `[data-test-*]` markers afterward, so the full load wait was the fragile part. Warm up the print test by polling the prerendered HTML for the card marker before navigating cold, mirroring the existing `published card response` test, so navigation only races the publish fan-out once the card is render-ready. Add timing diagnostics on the two observed-flaky navigations so a future timeout reveals whether prerender or the navigation itself was the slow part. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The host-mode beforeEach created a user, built a realm from ~7 card sources, and published it (a full reindex + prerender fan-out) before every test. With fullyParallel + 2 workers, those publish storms ran concurrently and starved individual renders, which is what made the page.goto navigations flaky. Extract the setup into createAndPublishHostModeRealm and run it once per worker via beforeAll for the read-only tests, which now share a single published realm. The two routing-rule tests overwrite realm.json and re-publish, so they keep an isolated per-test realm in their own describe block. Also fold the duplicated publish + warm-up polling into shared helpers (publishRealm, waitForPublishedMarker) and warm up the page-title test's cold navigation the same way the print test does. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Throwaway CI changes to surface the host-mode page.goto flake on the first attempt rather than waiting for it to recur. Revert this whole commit before merging. ci.yaml: - Run only Matrix shard 1, twenty times in parallel (matrix `copy` 1..20), with fail-fast off so every copy reports independently. Per-shard artifact names get the copy suffix so the parallel jobs don't collide on upload. - Force every change-check suite output OFF except `matrix`, so nothing but the stress jobs runs. Editing ci.yaml otherwise flips every suite on via the shared paths-filter anchor. - Disable the "Merge Matrix reports and publish" job: `playwright merge-reports` can't reconcile N blob reports that all claim to be shard 1/3, so it fails and would be the run's only red mark even when every stress shard passes. ci-host.yaml, ci-software-factory.yaml: - Gate the entry jobs with `if: false` so these suites don't run. Both are otherwise triggered for this branch — Software Factory by the packages/matrix/** path (host-mode.spec.ts), and CI Host because editing its own file matches its paths filter. Disabling the entry jobs cascades to skip everything downstream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9edcfbc to
e8aa04e
Compare
The 20× shard-1 stress run surfaced a setup flake distinct from the page.goto timeouts: under load, creating a workspace (which provisions a matrix room + personal realm) transiently errors, so the modal shows `data-test-error-message` and stays open instead of rendering the workspace tile. `createRealm` then waited the full 30s on a modal that would never resolve and failed. It hit head-tags.spec.ts too, via its own createRealm call. Make createRealm wait for whichever happens first — the workspace tile or the error — and on a surfaced error, dismiss the modal and retry the whole form from clean state (bounded, with the server message in the final throw). A late-landing prior attempt is detected up front so we never try to create a duplicate endpoint. Because the host-mode read-only tests share one realm via beforeAll, a setup failure there fails the whole group; previously Playwright's hook retry also wedged the hand-rolled browser context. Retry the setup in-hook with a fresh context (swallowing close errors so a wedged context can't mask success or block the retry) under a longer budget. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each matrix shard runs one prerender server, shared by both Playwright workers (fullyParallel). With the pool size unset it collapses to a fixed 4 tabs, so the shard's concurrent publish + index work (host-mode plus the other published-realm specs) exhausts it: the pool thrashes (`standby refill failed to produce a fresh tab`, cross-affinity steals) and realm-server requests stall, surfacing as 60s page.goto / _publish-realm timeouts that recover on retry. Enable the dynamic pool envelope for the harness's prerender server: keep a 4-tab idle floor (no extra baseline memory) but let it burst to 8 tabs under load. This is the residual flake the 20x shard-1 stress run surfaced after the navigation and createRealm fixes — and the 20 copies are independent runners, so it reflects the real per-shard contention, not the stress harness itself. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two host-mode routing-rules tests were the suite's heaviest and flakiest: each published a plain realm, then rewrote realm.json with the routing rule and re-published — two full `_publish-realm` fan-outs plus an interact-mode boot per test. Under shard load the second publish (and the boot) repeatedly timed out on the first attempt and only passed on retry. Seed the routing rule into realm.json before the single initial publish (new `routingRulePath` option on createAndPublishHostModeRealm), so each routing test does exactly one publish and no interact-mode navigation — it just polls the published realm and asserts. The assertions are unchanged; only the redundant republish/boot are gone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This reverts commit e8aa04e.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a840fb6033
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The warm-up polls added for the async publish path used waitUntil's 10s default. Under CI load a published realm can still be indexing past 10s, so the poll would throw earlier than the old bare navigation (bounded by the 60s test timeout) — failing a slow-but-eventually-ready realm instead of stabilizing it. Budget the readiness polls at 45s (under the test timeout, with headroom for the navigation/assertions that follow). This also future-proofs the polls for when _publish-realm returns 202 before indexing finishes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
habdelra
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I saw tests flake here. This makes changes to increase stability:
domcontentloadedlets interaction start sooner, to stay more within timeoutsbeforeAllto set up a realm to share between read-only testscreateRealmdetects an error and creates anotherIdeally the errors in
createRealmwould be eliminated, but I didn’t want to get deep into that here.To verify reduced flakiness, I had Claude hack CI to run 20 copies of the flaky shard and did that three times. That revealed some other problems with the publishing endpoint that are recorded in CS-11323, but the above will help at least.