Skip to content

matrix: Fix flaky host mode tests#5086

Merged
backspace merged 9 commits into
mainfrom
host-mode-matrix-test-flakiness-cs-11323
Jun 4, 2026
Merged

matrix: Fix flaky host mode tests#5086
backspace merged 9 commits into
mainfrom
host-mode-matrix-test-flakiness-cs-11323

Conversation

@backspace

@backspace backspace commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

I saw tests flake here. This makes changes to increase stability:

  • domcontentloaded lets interaction start sooner, to stay more within timeouts
  • beforeAll to set up a realm to share between read-only tests
  • createRealm detects an error and creates another
  • more prerenderer pages, because publishing in parallel can use up the default count

Ideally the errors in createRealm would be eliminated, but I didn’t want to get deep into that here.

To verify reduced flakiness, I had Claude hack CI to run 20 copies of the flaky shard and did that three times. That revealed some other problems with the publishing endpoint that are recorded in CS-11323, but the above will help at least.

backspace and others added 3 commits June 2, 2026 16:34
Replace bare `page.goto` (default waitUntil: 'load', which blocks on
every subresource) with `waitUntil: 'domcontentloaded'` across the
host-mode spec; the tests already wait on the meaningful
`[data-test-*]` markers afterward, so the full load wait was the
fragile part. Warm up the print test by polling the prerendered HTML
for the card marker before navigating cold, mirroring the existing
`published card response` test, so navigation only races the publish
fan-out once the card is render-ready.

Add timing diagnostics on the two observed-flaky navigations so a
future timeout reveals whether prerender or the navigation itself was
the slow part.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The host-mode beforeEach created a user, built a realm from ~7 card
sources, and published it (a full reindex + prerender fan-out) before
every test. With fullyParallel + 2 workers, those publish storms ran
concurrently and starved individual renders, which is what made the
page.goto navigations flaky.

Extract the setup into createAndPublishHostModeRealm and run it once
per worker via beforeAll for the read-only tests, which now share a
single published realm. The two routing-rule tests overwrite
realm.json and re-publish, so they keep an isolated per-test realm in
their own describe block. Also fold the duplicated publish + warm-up
polling into shared helpers (publishRealm, waitForPublishedMarker) and
warm up the page-title test's cold navigation the same way the print
test does.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Throwaway CI changes to surface the host-mode page.goto flake on the
first attempt rather than waiting for it to recur. Revert this whole
commit before merging.

ci.yaml:
- Run only Matrix shard 1, twenty times in parallel (matrix `copy`
  1..20), with fail-fast off so every copy reports independently.
  Per-shard artifact names get the copy suffix so the parallel jobs
  don't collide on upload.
- Force every change-check suite output OFF except `matrix`, so nothing
  but the stress jobs runs. Editing ci.yaml otherwise flips every suite
  on via the shared paths-filter anchor.
- Disable the "Merge Matrix reports and publish" job: `playwright
  merge-reports` can't reconcile N blob reports that all claim to be
  shard 1/3, so it fails and would be the run's only red mark even when
  every stress shard passes.

ci-host.yaml, ci-software-factory.yaml:
- Gate the entry jobs with `if: false` so these suites don't run. Both
  are otherwise triggered for this branch — Software Factory by the
  packages/matrix/** path (host-mode.spec.ts), and CI Host because
  editing its own file matches its paths filter. Disabling the entry
  jobs cascades to skip everything downstream.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@backspace backspace force-pushed the host-mode-matrix-test-flakiness-cs-11323 branch from 9edcfbc to e8aa04e Compare June 3, 2026 12:53
backspace and others added 5 commits June 3, 2026 09:26
The 20× shard-1 stress run surfaced a setup flake distinct from the
page.goto timeouts: under load, creating a workspace (which provisions
a matrix room + personal realm) transiently errors, so the modal shows
`data-test-error-message` and stays open instead of rendering the
workspace tile. `createRealm` then waited the full 30s on a modal that
would never resolve and failed. It hit head-tags.spec.ts too, via its
own createRealm call.

Make createRealm wait for whichever happens first — the workspace tile
or the error — and on a surfaced error, dismiss the modal and retry the
whole form from clean state (bounded, with the server message in the
final throw). A late-landing prior attempt is detected up front so we
never try to create a duplicate endpoint.

Because the host-mode read-only tests share one realm via beforeAll, a
setup failure there fails the whole group; previously Playwright's hook
retry also wedged the hand-rolled browser context. Retry the setup
in-hook with a fresh context (swallowing close errors so a wedged
context can't mask success or block the retry) under a longer budget.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each matrix shard runs one prerender server, shared by both Playwright
workers (fullyParallel). With the pool size unset it collapses to a
fixed 4 tabs, so the shard's concurrent publish + index work
(host-mode plus the other published-realm specs) exhausts it: the pool
thrashes (`standby refill failed to produce a fresh tab`,
cross-affinity steals) and realm-server requests stall, surfacing as
60s page.goto / _publish-realm timeouts that recover on retry.

Enable the dynamic pool envelope for the harness's prerender server:
keep a 4-tab idle floor (no extra baseline memory) but let it burst to
8 tabs under load. This is the residual flake the 20x shard-1 stress
run surfaced after the navigation and createRealm fixes — and the 20
copies are independent runners, so it reflects the real per-shard
contention, not the stress harness itself.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two host-mode routing-rules tests were the suite's heaviest and
flakiest: each published a plain realm, then rewrote realm.json with
the routing rule and re-published — two full `_publish-realm` fan-outs
plus an interact-mode boot per test. Under shard load the second
publish (and the boot) repeatedly timed out on the first attempt and
only passed on retry.

Seed the routing rule into realm.json before the single initial publish
(new `routingRulePath` option on createAndPublishHostModeRealm), so each
routing test does exactly one publish and no interact-mode navigation —
it just polls the published realm and asserts. The assertions are
unchanged; only the redundant republish/boot are gone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@backspace backspace marked this pull request as ready for review June 3, 2026 19:25

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a840fb6033

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/matrix/tests/host-mode.spec.ts
The warm-up polls added for the async publish path used waitUntil's 10s
default. Under CI load a published realm can still be indexing past 10s,
so the poll would throw earlier than the old bare navigation (bounded by
the 60s test timeout) — failing a slow-but-eventually-ready realm instead
of stabilizing it. Budget the readiness polls at 45s (under the test
timeout, with headroom for the navigation/assertions that follow). This
also future-proofs the polls for when _publish-realm returns 202 before
indexing finishes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@backspace backspace merged commit 6bc503d into main Jun 4, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants