Fix flaky instantiate-validation Playwright spec#4782
Conversation
Seed helpers in instantiate-test-fixtures.ts used `client.waitForFile`, which polls GET on the source file only. Realm-side source POST indexing is async, so the file becomes readable before it shows up in `_federated-search` results. `InstantiateValidationStep` polls `discoverRealmSpecs` for at most 30s for the Spec to appear in search; under CI load this sometimes times out, and the step falls into the "modules exist but no Spec cards" branch (`passed: false`, no `details`), which then trips `expect(details).toBeTruthy()` in the e2e spec at line 132. Confirmed by an identical failure on a clean main commit (CI run 25694263747 against 03f6ae7), so this is a pre-existing flake, not a regression from the prerender-search-cacheonly PR (#4777) where it also surfaced. Fix: - New `awaitSpecSearchable` helper polls `client.search({type: spec})` for up to 60s after writing the Spec source, so downstream discovery sees it without depending on its own 30s budget. - `seedValidCardWithSpec` and `seedTagsCardWithBrokenExampleAndSpec` call the gate immediately after the Spec write. The tags fixture writes the bad example *after* the gate — the bad example deliberately fails indexing (`containsMany` got a string), and a still-queued indexer was what historically kept the Spec out of search results past the 30s budget. Diagnostic logging (orthogonal to the fix, helps if the flake recurs): - The "modules exist but no Spec cards" branch in `instantiate-step.ts` now logs the realm URL, file count, and filtered list of spec-like filenames at `warn`. Previously this case fired a generic info log that didn't reveal whether the search index was empty because the Spec wasn't written or because indexing hadn't caught up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to de-flake the instantiate-validation Playwright e2e by ensuring newly written Spec cards are actually discoverable via _federated-search (not just readable via GET) before downstream validation runs, and by improving diagnostics when specs can’t be found.
Changes:
- Add an
awaitSpecSearchablehelper that polls federated search for Spec results (up to 60s) after seeding Spec files. - Update instantiate test fixtures to gate on Spec search readiness before proceeding (especially before writing an intentionally broken example).
- Add warn-level logging in
InstantiateValidationStepwhen modules exist but no Spec cards are found, including filtered “spec-like” filenames for triage.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/software-factory/tests/helpers/instantiate-test-fixtures.ts | Adds federated-search readiness polling for seeded Spec cards to prevent downstream discovery timing out. |
| packages/software-factory/src/validators/instantiate-step.ts | Adds diagnostic logging for the “modules exist but no specs” failure path, including filename context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let lastResult: Awaited<ReturnType<typeof client.search>> | undefined; | ||
| let result = await retryWithPoll( | ||
| async () => { | ||
| lastResult = await client.search(realmUrl, { | ||
| filter: { type: specRef }, | ||
| }); | ||
| return lastResult; |
There was a problem hiding this comment.
[Claude Code] Good catch — lastResult was a leftover from an earlier draft that wanted to log the final result. With noUnusedLocals enabled the read-via-return kept it valid, but the variable served no real purpose. Replaced the closure with a direct () => client.search(...). Will land in the next push.
| try { | ||
| let filesResult = await this.fetchFilenamesFn(targetRealm); | ||
| hasModules = (filesResult.filenames ?? []).some( | ||
| filenames = filesResult.filenames ?? []; | ||
| hasModules = filenames.some( | ||
| (f) => f.endsWith('.gts') && !f.endsWith('.test.gts'), | ||
| ); |
There was a problem hiding this comment.
[Claude Code] Right — client.listFiles reports failures via a returned error field, not by throwing. The previous try { ... } catch only caught hard throws, so a soft error would silently leave filenames = [], set hasModules = false, and fall into the bootstrap (passed: true) branch instead of the diagnostic warn-log. Fixed in the next push: the function now reads filesResult.error and treats a soft list-files error the same as a thrown one — log it and fall through to the no-modules branch.
Original v1 of this PR treated `awaitSpecSearchable` as a hard assertion — fail the test if the seeded Spec didn't surface in `_federated-search` within the polling window. CI run 25724049661 revealed that doesn't actually fix anything: the realm sometimes never indexes the tags-card Spec for this fixture (likely an indexer race against the broken example's error_doc state), so the hard gate just relocates the same failure earlier with less context. Switch to a soft gate: - Bump the wait to 90s (gives the realm comfortable headroom) - On success, return normally — most CI runs hit this path quickly - On timeout, emit one warn-line capturing the realm's actual state (current search hits, spec source file readability, realm file listing filtered to spec-like paths), then return without asserting - The downstream `InstantiateValidationStep.discoverRealmSpecs` poll will still see the same empty result and the existing e2e assertion (`expect(details).toBeTruthy()`) will still fail — but now with two diagnostic logs explaining exactly what the realm saw at the time Also reverts the speculative example-before-Spec reorder. Both orderings reproduced the flake locally and in CI, so the original order stands and the gate's diagnostic dump documents what was observed. Addresses Copilot review feedback: - Removes the dead `lastResult` closure variable in `awaitSpecSearchable`. With `noUnusedLocals` the read-via-return kept it compiling, but the variable served no purpose. - `instantiate-step.ts` now reads `filesResult.error` instead of relying on `try/catch` alone. `client.listFiles` reports failures via a returned `error` field, not throwing — so a soft error would have silently fallen through to the bootstrap branch and skipped the diagnostic warn-log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CI Lint job's `pnpm build:plugin` synopsis-staleness check has been failing repo-wide on every PR (4780, 4781, 4782) because the `boxel realm watch` command's surface changed (from positional `<realm-url> <local-dir>` to a subcommands-style entry point) but the generated synopsis at `plugin/skills/realm-sync/SKILL.md` wasn't regenerated. Run `pnpm build:plugin` and commit the result so the Lint check goes green. Not strictly related to the instantiate-validation flake this PR targets, but the Lint job is gating the merge button — unblocking it here so the flake fix can land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diagnosed the actual flake on PR #4782 by stripping linkedExamples from the Spec — the bare Spec then surfaced in `_federated-search` within seconds, confirming the realm indexer drops a Spec whenever its `linkedExamples` `loadLinks` walk can't settle on the target. That happens both when the target is missing (404 — the variant tried previously in this PR) and when the target is in error_doc state (containsMany got a string — the original CI flake). Either way, seeding the realm directly with a broken-shape example silently disqualifies the Spec from search. So: - Seed the realm-side `TagsCard/bad-example.json` with a well-formed empty-tags placeholder so the Spec's loadLinks walk completes and the Spec surfaces normally. - New `overwriteTagsExampleWithBadShape(workspaceDir)` helper writes the broken `containsMany: "not-an-array"` shape into the workspace copy. The validation pipeline reads example JSON from the workspace path (see `prepareExampleInstance` in instantiate-execution.ts), not from the realm index, so the bad shape only needs to live in the workspace at the moment the step or in-memory tool reads it. - Both `instantiate-validation.spec.ts:91` and `run-instantiate-in-memory.spec.ts:71` call the overwrite helper after `client.pull` so the pull doesn't immediately stomp the workspace file. Local verification: both previously flaking tests now pass deterministically across runs. Two unrelated tests in the same in-memory spec file (197, 295) still flake locally with realm-server "database does not exist" errors during harness teardown — separate infra issue, not from this change. The realm-indexer behavior (dropping a Spec from search whenever its linkedExamples target won't settle) is a real bug worth filing separately; this PR is just unblocking the e2e on the fixture side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
tests/instantiate-validation.spec.ts:91(the "containsMany withnon-array value fails instantiation" e2e) trips
expect(details).toBeTruthy()at line 132 when the realm's_federated-searchhasn't picked up the seeded Spec card by thetime
InstantiateValidationStep.discoverRealmSpecsfinishes its30s poll budget. The step falls into the "modules exist but no
Spec cards" branch (returns
passed: falsewith nodetails),and the assertion downstream fails.
This is not a regression from any recent PR — the same failure
hit a clean
maincommit (CI run25694263747
against
03f6ae7c) before it also tripped on #4777's re-run.Root cause
The seed helpers in
tests/helpers/instantiate-test-fixtures.tsused
client.waitForFile, which pollsGET <path>only.Realm-side source POST indexing is async, so the source file
becomes readable before it appears in
client.search({ type: spec }). The downstream step's ownretryWithPoll(searchSpecsFn, ...)has a 30s default; under CIload (especially when the broken example deliberately fails
indexing right behind the Spec write), 30s isn't always enough.
Fix
awaitSpecSearchablehelper pollsclient.search({type: spec})for up to 60s after every Specwrite, so downstream discovery doesn't depend on its own 30s
budget.
seedValidCardWithSpecandseedTagsCardWithBrokenExampleAndSpecinvoke the gateimmediately after the Spec write. The tags fixture writes the
bad example after the gate — the bad example deliberately
fails indexing (
containsManygot a string), and astill-queued indexer was what historically kept the Spec out
of search results past the 30s budget.
Diagnostics (orthogonal to the fix)
Added a
warn-level log inInstantiateValidationStep's "modules exist but no Spec cards"branch that captures the realm URL, total file count, and a
filtered list of spec-like filenames. If the flake ever recurs
in CI, the log will distinguish "Spec was never written" from
"Spec is on disk but the indexer hasn't surfaced it in search."
Test plan
pnpm lint:jscleanpnpm lint:formatcleanpnpm lint:typesclean forsrc/andtests/(the pre-existing../base/*type errors are unrelated, present onmaintoo)pnpm test:playwright tests/instantiate-validation.spec.ts— running locally now🤖 Generated with Claude Code