Skip to content

Fix flaky instantiate-validation Playwright spec#4782

Merged
habdelra merged 3 commits into
mainfrom
worktree-fix-instantiate-validation-flake
May 12, 2026
Merged

Fix flaky instantiate-validation Playwright spec#4782
habdelra merged 3 commits into
mainfrom
worktree-fix-instantiate-validation-flake

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

Summary

tests/instantiate-validation.spec.ts:91 (the "containsMany with
non-array value fails instantiation" e2e) trips
expect(details).toBeTruthy() at line 132 when the realm's
_federated-search hasn't picked up the seeded Spec card by the
time InstantiateValidationStep.discoverRealmSpecs finishes its
30s poll budget. The step falls into the "modules exist but no
Spec cards" branch (returns passed: false with no details),
and the assertion downstream fails.

This is not a regression from any recent PR — the same failure
hit a clean main commit (CI run
25694263747
against 03f6ae7c) before it also tripped on #4777's re-run.

Root cause

The seed helpers in tests/helpers/instantiate-test-fixtures.ts
used client.waitForFile, which polls GET <path> only.
Realm-side source POST indexing is async, so the source file
becomes readable before it appears in
client.search({ type: spec }). The downstream step's own
retryWithPoll(searchSpecsFn, ...) has a 30s default; under CI
load (especially when the broken example deliberately fails
indexing right behind the Spec write), 30s isn't always enough.

Fix

  • New awaitSpecSearchable helper polls
    client.search({type: spec}) for up to 60s after every Spec
    write, so downstream discovery doesn't depend on its own 30s
    budget.
  • seedValidCardWithSpec and
    seedTagsCardWithBrokenExampleAndSpec invoke the gate
    immediately after the Spec write. The tags fixture writes the
    bad example after the gate — the bad example deliberately
    fails indexing (containsMany got a string), and a
    still-queued indexer was what historically kept the Spec out
    of search results past the 30s budget.

Diagnostics (orthogonal to the fix)

Added a warn-level log in
InstantiateValidationStep's "modules exist but no Spec cards"
branch that captures the realm URL, total file count, and a
filtered list of spec-like filenames. If the flake ever recurs
in CI, the log will distinguish "Spec was never written" from
"Spec is on disk but the indexer hasn't surfaced it in search."

Test plan

  • pnpm lint:js clean
  • pnpm lint:format clean
  • pnpm lint:types clean for src/ and tests/ (the pre-existing
    ../base/* type errors are unrelated, present on main too)
  • pnpm test:playwright tests/instantiate-validation.spec.ts — running locally now

🤖 Generated with Claude Code

Seed helpers in instantiate-test-fixtures.ts used `client.waitForFile`,
which polls GET on the source file only. Realm-side source POST
indexing is async, so the file becomes readable before it shows up in
`_federated-search` results. `InstantiateValidationStep` polls
`discoverRealmSpecs` for at most 30s for the Spec to appear in search;
under CI load this sometimes times out, and the step falls into the
"modules exist but no Spec cards" branch (`passed: false`, no
`details`), which then trips `expect(details).toBeTruthy()` in the
e2e spec at line 132.

Confirmed by an identical failure on a clean main commit (CI run
25694263747 against 03f6ae7), so this is a pre-existing flake, not
a regression from the prerender-search-cacheonly PR (#4777) where it
also surfaced.

Fix:
- New `awaitSpecSearchable` helper polls `client.search({type: spec})`
  for up to 60s after writing the Spec source, so downstream
  discovery sees it without depending on its own 30s budget.
- `seedValidCardWithSpec` and `seedTagsCardWithBrokenExampleAndSpec`
  call the gate immediately after the Spec write. The tags fixture
  writes the bad example *after* the gate — the bad example
  deliberately fails indexing (`containsMany` got a string), and a
  still-queued indexer was what historically kept the Spec out of
  search results past the 30s budget.

Diagnostic logging (orthogonal to the fix, helps if the flake
recurs):
- The "modules exist but no Spec cards" branch in `instantiate-step.ts`
  now logs the realm URL, file count, and filtered list of spec-like
  filenames at `warn`. Previously this case fired a generic info log
  that didn't reveal whether the search index was empty because the
  Spec wasn't written or because indexing hadn't caught up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to de-flake the instantiate-validation Playwright e2e by ensuring newly written Spec cards are actually discoverable via _federated-search (not just readable via GET) before downstream validation runs, and by improving diagnostics when specs can’t be found.

Changes:

  • Add an awaitSpecSearchable helper that polls federated search for Spec results (up to 60s) after seeding Spec files.
  • Update instantiate test fixtures to gate on Spec search readiness before proceeding (especially before writing an intentionally broken example).
  • Add warn-level logging in InstantiateValidationStep when modules exist but no Spec cards are found, including filtered “spec-like” filenames for triage.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/software-factory/tests/helpers/instantiate-test-fixtures.ts Adds federated-search readiness polling for seeded Spec cards to prevent downstream discovery timing out.
packages/software-factory/src/validators/instantiate-step.ts Adds diagnostic logging for the “modules exist but no specs” failure path, including filename context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +197 to +203
let lastResult: Awaited<ReturnType<typeof client.search>> | undefined;
let result = await retryWithPoll(
async () => {
lastResult = await client.search(realmUrl, {
filter: { type: specRef },
});
return lastResult;
Copy link
Copy Markdown
Contributor Author

@habdelra habdelra May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Good catch — lastResult was a leftover from an earlier draft that wanted to log the final result. With noUnusedLocals enabled the read-via-return kept it valid, but the variable served no real purpose. Replaced the closure with a direct () => client.search(...). Will land in the next push.

Comment on lines 154 to 159
try {
let filesResult = await this.fetchFilenamesFn(targetRealm);
hasModules = (filesResult.filenames ?? []).some(
filenames = filesResult.filenames ?? [];
hasModules = filenames.some(
(f) => f.endsWith('.gts') && !f.endsWith('.test.gts'),
);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Right — client.listFiles reports failures via a returned error field, not by throwing. The previous try { ... } catch only caught hard throws, so a soft error would silently leave filenames = [], set hasModules = false, and fall into the bootstrap (passed: true) branch instead of the diagnostic warn-log. Fixed in the next push: the function now reads filesResult.error and treats a soft list-files error the same as a thrown one — log it and fall through to the no-modules branch.

Original v1 of this PR treated `awaitSpecSearchable` as a hard
assertion — fail the test if the seeded Spec didn't surface in
`_federated-search` within the polling window. CI run 25724049661
revealed that doesn't actually fix anything: the realm sometimes
never indexes the tags-card Spec for this fixture (likely an indexer
race against the broken example's error_doc state), so the hard gate
just relocates the same failure earlier with less context.

Switch to a soft gate:
- Bump the wait to 90s (gives the realm comfortable headroom)
- On success, return normally — most CI runs hit this path quickly
- On timeout, emit one warn-line capturing the realm's actual state
  (current search hits, spec source file readability, realm file
  listing filtered to spec-like paths), then return without
  asserting
- The downstream `InstantiateValidationStep.discoverRealmSpecs` poll
  will still see the same empty result and the existing e2e
  assertion (`expect(details).toBeTruthy()`) will still fail — but
  now with two diagnostic logs explaining exactly what the realm
  saw at the time

Also reverts the speculative example-before-Spec reorder. Both
orderings reproduced the flake locally and in CI, so the original
order stands and the gate's diagnostic dump documents what was
observed.

Addresses Copilot review feedback:
- Removes the dead `lastResult` closure variable in
  `awaitSpecSearchable`. With `noUnusedLocals` the read-via-return
  kept it compiling, but the variable served no purpose.
- `instantiate-step.ts` now reads `filesResult.error` instead of
  relying on `try/catch` alone. `client.listFiles` reports failures
  via a returned `error` field, not throwing — so a soft error would
  have silently fallen through to the bootstrap branch and skipped
  the diagnostic warn-log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CI Lint job's `pnpm build:plugin` synopsis-staleness check has
been failing repo-wide on every PR (4780, 4781, 4782) because the
`boxel realm watch` command's surface changed (from positional
`<realm-url> <local-dir>` to a subcommands-style entry point) but
the generated synopsis at `plugin/skills/realm-sync/SKILL.md`
wasn't regenerated. Run `pnpm build:plugin` and commit the result
so the Lint check goes green.

Not strictly related to the instantiate-validation flake this PR
targets, but the Lint job is gating the merge button — unblocking
it here so the flake fix can land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team May 12, 2026 12:28
@habdelra habdelra merged commit d0ad5c8 into main May 12, 2026
28 of 29 checks passed
habdelra added a commit that referenced this pull request May 13, 2026
Diagnosed the actual flake on PR #4782 by stripping linkedExamples
from the Spec — the bare Spec then surfaced in `_federated-search`
within seconds, confirming the realm indexer drops a Spec whenever
its `linkedExamples` `loadLinks` walk can't settle on the target.
That happens both when the target is missing (404 — the variant
tried previously in this PR) and when the target is in error_doc
state (containsMany got a string — the original CI flake).

Either way, seeding the realm directly with a broken-shape example
silently disqualifies the Spec from search. So:

- Seed the realm-side `TagsCard/bad-example.json` with a well-formed
  empty-tags placeholder so the Spec's loadLinks walk completes and
  the Spec surfaces normally.
- New `overwriteTagsExampleWithBadShape(workspaceDir)` helper writes
  the broken `containsMany: "not-an-array"` shape into the workspace
  copy. The validation pipeline reads example JSON from the workspace
  path (see `prepareExampleInstance` in instantiate-execution.ts),
  not from the realm index, so the bad shape only needs to live in
  the workspace at the moment the step or in-memory tool reads it.
- Both `instantiate-validation.spec.ts:91` and
  `run-instantiate-in-memory.spec.ts:71` call the overwrite helper
  after `client.pull` so the pull doesn't immediately stomp the
  workspace file.

Local verification: both previously flaking tests now pass
deterministically across runs. Two unrelated tests in the same
in-memory spec file (197, 295) still flake locally with realm-server
"database does not exist" errors during harness teardown — separate
infra issue, not from this change.

The realm-indexer behavior (dropping a Spec from search whenever its
linkedExamples target won't settle) is a real bug worth filing
separately; this PR is just unblocking the e2e on the fixture side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants