Skip to content

infra: exclude demo-ubi.spec.ts from CI Playwright run (unblock smoke runtime budget)#424

Merged
SoundMindsAI merged 5 commits into
mainfrom
infra/smoke-reseed-runtime-budget
Jun 2, 2026
Merged

infra: exclude demo-ubi.spec.ts from CI Playwright run (unblock smoke runtime budget)#424
SoundMindsAI merged 5 commits into
mainfrom
infra/smoke-reseed-runtime-budget

Conversation

@SoundMindsAI
Copy link
Copy Markdown
Owner

Summary

The smoke job's demo-ubi.spec.ts beforeAll hook drives a full demo reseed whose in-flight wall-clock alone consumes the smoke job's 25-min cap. AC-8 of feat_demo_ubi_study_comparison bounds the reseed at 1140s (~19 min hard ceiling) with §14 estimating ~28 min worst case once the Solr scenario lights up. Adding Playwright + smoke-job setup overhead pushes total wall-clock past 25 min — PR #383 run 26790636716 hit the cap at 25:18.

This PR clears the runtime block with a single-file fix:

  • Story 1.1 — extend ui/playwright.config.ts's testIgnore CI-gated branch with '**/demo-ubi.spec.ts'. Single-file edit, matches the established pattern from chore_drop_demo_seed_from_ci + PR chore(ci): optimize smoke + backend jobs (buildx artifact handoff + image cache + pytest-xdist) #291's 4th-run surface (6 pre-existing CI-gated specs; this is the 7th).
  • Story 1.2 — new vitest regression guard at ui/src/__tests__/playwright-config-test-ignore.test.ts. Reads playwright.config.ts as text and asserts (a) demo-ubi in CI branch, (b) all 7 expected entries present, (c) demo-ubi NOT outside the CI ternary (local coverage stays intact).
  • Story 1.3docs/03_runbooks/smoke-solr-stability.md §5 "Reseed runtime (demo-ubi exclusion)" — explains why the exclusion exists, where it lives, the local-coverage promise, the nightly-CI caveat, and the Option C path-forward.
  • Story 1.4.github/workflows/pr.yml comment blocks (lines 42-58 + 507-523) refreshed: stale "demo-ubi reseed exceeds the budget" framing replaced with "runtime block cleared, flip SMOKE_TEST=true after §16 verification". YAML structure untouched — comments-only diff.
  • Story 1.5state.md updated: CI-note paragraph stale sentences replaced; Solr-not-CI-ready debt entry rewritten as fully resolved (this PR ships the third sub-task).

Operator decisions locked at idea-preflight:

  • D-1/D-2 Option A locked (testIgnore exclusion). Option C (env-var scenario filter, preserves per-PR demo-ubi smoke coverage at multi-file cost) explicitly deferred per operator choice.
  • D-3 Option B rejected — bumping timeout-minutes from 25 → 35 leaves <7 min margin against spec §14's ~28 min worst case; erodes with each future demo scenario.
  • D-4 sibling coordinationinfra_smoke_fork_pr_secret_skip edits the same pr.yml job; resolved by moving the exclusion to playwright.config.ts (sibling now has zero collision risk).

Local make up smoke retains full demo-ubi coverage (CI= unset → testIgnore CI branch doesn't fire). After this merges, operators may flip SMOKE_TEST=true to re-enable per-PR smoke signal after the §16 manual verification below.

Cross-model review trail:

  • Spec: 3 GPT-5.5 cycles, 13 findings (1 H + 5 M + 7 L), all accepted and applied.
  • Plan: 3 GPT-5.5 cycles, 11 findings (0 H + 4 M + 7 L), all accepted and applied.

§16 Manual verification (recorded for the reviewer)

# AC-1: CI=true excludes demo-ubi from Playwright discovery
$ CI=true pnpm --dir ui exec playwright test --list --reporter=list | grep -c demo-ubi
0
# Total: 86 tests in 30 files

# AC-2: CI=unset includes demo-ubi in Playwright discovery
$ env -u CI pnpm --dir ui exec playwright test --list --reporter=list | grep -c demo-ubi
5  # the spec file + its 4 test cases
# Total: 110 tests in 37 files

# Delta: 7 files / 24 tests removed under CI=true — matches the 7 CI-gated
# entries in playwright.config.ts's testIgnore branch exactly.

Test plan

  • pnpm vitest run src/__tests__/playwright-config-test-ignore.test.ts — 3/3 pass
  • pnpm exec tsc --noEmit — clean
  • pnpm lint — clean (0 errors, 136 pre-existing warnings — none on new file)
  • §16 manual playwright test --list AC-1 + AC-2 verified (table above)
  • git diff -U0 -- .github/workflows/pr.yml | awk '/^[+-]/ && !/^([+-]{3})/ && \$0 !~ /^[+-][[:space:]]*#/' — zero output (comments-only)
  • python3 -c "import yaml; yaml.safe_load(open('.github/workflows/pr.yml'))" — YAML parses, all 10 jobs intact
  • wc -c state.md — 22 KB (under 60 KB pre-commit gate)
  • Post-merge: operator may opt in with gh variable set SMOKE_TEST --body true to re-enable per-PR smoke signal once they're satisfied with the §16 evidence

Diff scope (vs AC-7's strict "5 files")

AC-7 said exactly 5 files. The actual branch diff has 13: the 5 deliverables, plus the standard pipeline-trail artifacts (idea preflight + spec + plan + pipeline_status — every recent feature PR ships these together), plus 4 dashboard regen files emitted by the mvp1-dashboard-regen pre-commit hook. The 5 AC-7 deliverables are still byte-identical to what the spec described:

AC-7 deliverable File Change
1 ui/playwright.config.ts +13 / -0 (single entry + comment block)
2 ui/src/__tests__/playwright-config-test-ignore.test.ts NEW (118 lines)
3 docs/03_runbooks/smoke-solr-stability.md +47 / -0 (new §5)
4 .github/workflows/pr.yml comments-only diff in two known blocks
5 state.md -2 / +2 (CI note paragraph + debt entry rewrites)

The non-AC-7 files are all metadata or regen artifacts. None of them alter the runtime behavior of the fix.

Notes on the demo-ubi.spec.ts AC-8 citation drift

Both pr.yml:518-523 (now refreshed) and ui/tests/e2e/demo-ubi.spec.ts:11 carry "24 min" / "25-minute ceiling per AC-8" framing that drifts from the spec's actual 1140s/19 min hard ceiling. The pr.yml comment IS refreshed here as part of Story 1.4 (structural to the comment block being rewritten). The demo-ubi.spec.ts:11 drift is explicitly deferred per spec D-6 — the spec file is not edited at all by this PR, and touching it for an unrelated citation fix would violate AC-7's file-shape contract.

🤖 Generated with Claude Code

SoundMindsAI and others added 3 commits June 2, 2026 17:18
… plan

Pipeline trail for the demo-ubi CI exclusion work that clears the smoke
job's reseed-runtime block.

idea.md (preflight refresh): priority framing shifted from "smoke red on
every PR" to "precondition for re-enabling per-PR smoke" since the
SMOKE_TEST gate landed 2026-06-02. AC-8 citation corrected (1140s/19 min
hard ceiling, ~28 min worst case — not the 24-min downstream drift in
pr.yml/demo-ubi.spec.ts). Decisions locked: D-1 Option A (testIgnore),
D-2 Option C deferred (operator picked), D-3 Option B rejected (math),
D-4 sibling coordination.

feature_spec.md: 5 FRs (testIgnore extension, vitest regression guard,
runbook section, pr.yml comment refresh, state.md update), 7 ACs, single-
phase. GPT-5.5 cross-model review: 3 cycles, 13 findings (1 H + 5 M +
7 L), all accepted and applied. Convergence at cycle 3.

implementation_plan.md: 1 epic, 5 stories one-per-FR, 0 endpoints, 1
new test file, 4 modified files. GPT-5.5: 3 cycles, 11 findings (0 H +
4 M + 7 L), all accepted. Convergence at cycle 3.

pipeline_status.md: spec + plan finalized, ready for execution.

Dashboards regenerated by the mvp1-dashboard-regen pre-commit hook
(176 features across 3 releases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
….1 + 1.2)

The smoke job's demo-ubi.spec.ts beforeAll hook drives a full reseed
that exceeds the 25-min job cap. AC-8 of feat_demo_ubi_study_comparison
bounds the in-flight reseed at 1140s (~19 min hard ceiling) with §14
estimating ~28 min worst case once the Solr scenario lights up. Adding
Playwright + smoke-job setup overhead pushes total wall-clock past
the cap (PR #383 run 26790636716 hit it at 25:18).

Fix: extend playwright.config.ts's testIgnore CI-gated branch by one
entry — '**/demo-ubi.spec.ts' — joining the 6 pre-existing demo-data-
dependent specs. Single-file edit; matches the established pattern from
chore_drop_demo_seed_from_ci + PR #291's 4th-run surface.

Local coverage preserved: CI=unset (the normal local-dev case) still
discovers and runs demo-ubi.spec.ts. The file itself is unchanged.

Story 1.2 (vitest regression guard):
ui/src/__tests__/playwright-config-test-ignore.test.ts reads
playwright.config.ts as text and asserts (a) demo-ubi entry is in the
CI ternary branch, (b) all 7 expected CI-gated entries are present,
(c) demo-ubi does NOT appear outside the CI ternary (local coverage
intact). Text-grep approach per spec D-7 — lowest-coupling, no
module-reload tricks.

§16 manual verification (recorded in this PR's body):
  CI=true  playwright test --list -> 86 tests in 30 files, 0 demo-ubi
  CI=unset playwright test --list -> 110 tests in 37 files, demo-ubi
                                    discovered (5 grep matches)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…ries 1.3 + 1.4 + 1.5)

Story 1.3 — docs/03_runbooks/smoke-solr-stability.md gains §5
"Reseed runtime (demo-ubi exclusion)". Explains why the exclusion
exists (AC-8 vs smoke-cap mismatch — cites the actual 1140s/19 min
hard ceiling, not the 24-min downstream drift), where it lives (the
testIgnore CI branch in playwright.config.ts — single source of
truth), the local-coverage promise (CI=unset keeps demo-ubi running
locally), the nightly-CI caveat (a future nightly-on-GHA job would
also exclude demo-ubi unless it overrides CI or uses a separate
config — defer until needed), and the Option C path-forward if
per-PR demo-ubi coverage is ever wanted.

Note: numbered §5 (not the spec's literal "§4") because the existing
§4 "Why each lever is GHA-only" pairs tightly with §3's lever
cascade — inserting between them would interrupt that flow. FR-3's
"or wherever it fits the runbook's flow" clause covers this; AC-4's
literal number was paraphrasing FR-3's intent (section by name, not
by ordinal).

Story 1.4 — .github/workflows/pr.yml comment blocks refreshed:
  - Lines 42-58 (SMOKE-TEST opt-in switch note): replace "demo-ubi
    reseed exceeds the per-PR budget" framing with "runtime block
    cleared via testIgnore — flip SMOKE_TEST=true after the §16
    verification". Operator opt-in commands unchanged.
  - Lines 507-523 (smoke-test job header / timeout-minutes comment):
    replace "AC-8 bounds at 24 min" framing with "runtime is
    expected to fit within the 25-min cap post-demo-ubi-exclusion".
YAML structure untouched: if-gate, timeout-minutes, needs, env,
permissions, steps all byte-identical. Comments-only diff verified
with awk filter (zero non-comment changed lines).

Story 1.5 — state.md updated:
  - "CI note" paragraph (lines 13-15): the two stale sentences
    ("drives the demo-ubi reseed, which routinely hits the 25-min
    cap" and "Until the reseed-runtime fix lands, leave it off")
    replaced with framing that preserves SMOKE_TEST=OFF-by-default,
    names the demo-ubi exclusion as the shipped fix, and points at
    the spec §16 verification.
  - "Known debt / fragility" section: the Solr CI-readiness entry
    was the umbrella tracking three sub-tasks (backend, Solr
    stability, reseed runtime); rewritten as fully resolved with
    the third sub-task now shipped here.

The "Last 5 merges" entry is NOT added here — that's the
finalization step's responsibility (after PR merge), per Epic gate
item #9 of the implementation plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the infra_smoke_reseed_runtime_budget feature, which excludes the heavy demo-ubi.spec.ts Playwright E2E test from the CI smoke test run via the testIgnore configuration in ui/playwright.config.ts to prevent job timeouts. It also adds a Vitest regression test to guard this configuration and updates relevant documentation. The reviewer provided valuable feedback on the new test file, suggesting the use of import.meta.url instead of process.cwd() for more robust path resolution, and pointing out a potential cross-platform failure on Windows due to hardcoded \n line endings in the sliceConfig helper function.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +32 to +37
import { readFileSync } from 'node:fs';
import { join } from 'node:path';

import { describe, expect, it } from 'vitest';

const CONFIG_PATH = join(process.cwd(), 'playwright.config.ts');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using process.cwd() to resolve the path to playwright.config.ts makes the test fragile if executed from a different directory (e.g., the repository root in a monorepo setup). It is more robust to resolve the path relative to the test file's directory using import.meta.url.

Suggested change
import { readFileSync } from 'node:fs';
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
const CONFIG_PATH = join(process.cwd(), 'playwright.config.ts');
import { readFileSync } from 'node:fs';
import { dirname, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { describe, expect, it } from 'vitest';
const __dirname = dirname(fileURLToPath(import.meta.url));
const CONFIG_PATH = resolve(__dirname, '../../playwright.config.ts');

Comment on lines +73 to +103
function sliceConfig(source: string): {
beforeCi: string;
ciBranch: string;
afterCi: string;
} {
const ciIdx = source.indexOf('process.env.CI');
if (ciIdx === -1) {
throw new Error(
'Could not locate `process.env.CI` in playwright.config.ts — testIgnore CI-ternary shape changed?',
);
}
// Find the `? [` after process.env.CI.
const openBracketIdx = source.indexOf('? [', ciIdx);
if (openBracketIdx === -1) {
throw new Error(
'Could not locate `? [` after `process.env.CI` — testIgnore CI-ternary shape changed?',
);
}
// Find the matching `] : []` close that ends the ternary.
const closeIdx = source.indexOf(']\n : []', openBracketIdx);
if (closeIdx === -1) {
throw new Error(
'Could not locate `] : []` closing the CI ternary — testIgnore CI-ternary shape changed?',
);
}
return {
beforeCi: source.slice(0, ciIdx),
ciBranch: source.slice(openBracketIdx, closeIdx),
afterCi: source.slice(closeIdx),
};
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sliceConfig function is sensitive to line endings because it searches for ]\n : [] and expects \n. On Windows environments where files are checked out with \r\n (CRLF), this test will fail. Normalizing the line endings of the source string to \n before processing will make the test cross-platform robust.

function sliceConfig(source: string): {
  beforeCi: string;
  ciBranch: string;
  afterCi: string;
} {
  const normalizedSource = source.replace(/\r\n/g, '\n');
  const ciIdx = normalizedSource.indexOf('process.env.CI');
  if (ciIdx === -1) {
    throw new Error(
      'Could not locate `process.env.CI` in playwright.config.ts — testIgnore CI-ternary shape changed?',
    );
  }
  // Find the `? [` after process.env.CI.
  const openBracketIdx = normalizedSource.indexOf('? [', ciIdx);
  if (openBracketIdx === -1) {
    throw new Error(
      'Could not locate `? [` after `process.env.CI` — testIgnore CI-ternary shape changed?',
    );
  }
  // Find the matching `] : []` close that ends the ternary.
  const closeIdx = normalizedSource.indexOf(']\n      : []', openBracketIdx);
  if (closeIdx === -1) {
    throw new Error(
      'Could not locate `] : []` closing the CI ternary — testIgnore CI-ternary shape changed?',
    );
  }
  return {
    beforeCi: normalizedSource.slice(0, ciIdx),
    ciBranch: normalizedSource.slice(openBracketIdx, closeIdx),
    afterCi: normalizedSource.slice(closeIdx),
  };
}

SoundMindsAI and others added 2 commits June 2, 2026 17:33
…fig-test-ignore.test.ts

Two Medium-severity findings, both accepted:

1. Path resolution via `import.meta.url` instead of `process.cwd()`.
   Plan D-7 explicitly approved both options; Gemini's robustness
   point holds for ad-hoc operator runs like
   `pnpm vitest run ui/src/__tests__/...` from the repo root (where
   cwd would be the repo root, not ui/, and the lookup would fail).
   `import.meta.url` works in both the canonical
   `pnpm --dir ui test` shape and the ad-hoc shape. Strictly more
   robust.

2. CRLF normalization in sliceConfig() before the `\n`-anchored
   indexOf searches. Zero-cost defense for any future Windows
   checkout where git's autocrlf converts line endings; macOS/Linux
   unchanged. Spec D-7 didn't address this; accepting as free
   defense.

Vitest after both fixes: 3/3 still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…issing markdown links)

Two findings accepted, one rejected:

ACCEPTED #2 (Medium): ui/playwright.config.ts comment said
  "See ... smoke-solr-stability.md §4 for the lever cascade context"
but §4 is "Why each lever is GHA-only", not the lever cascade
(which is §3). My new section about reseed runtime is §5. Updated
to point at §5 for the reseed-runtime-vs-Solr-stability
relationship table (which is where the broader cascade context is
explained in the demo-ubi exclusion narrative).

ACCEPTED #3 (Low): FR-3 required the new runbook §5 to "cross-link"
to ui/playwright.config.ts and ui/tests/e2e/demo-ubi.spec.ts.
Inline-code mentions don't satisfy "cross-link" — converted to
clickable markdown links with verified resolvable relative paths.

REJECTED #1 (High): "AC-7 file-shape contract violated" — re-raise
without new evidence. Counter-evidence cited in PR #424 body's
"Diff scope" section: every recent PR (#383, #416, #421, #422)
ships the pipeline-trail (idea/spec/plan/pipeline_status) per
project convention; dashboard regen files are emitted by the
mvp1-dashboard-regen pre-commit hook (forbidden to skip per
CLAUDE.md Rule #7 "never skip hooks"). AC-7's strict literal "5
files" predates the project-convention consideration of pipeline-
trail co-shipping; the spec's intent (the 5 deliverables described
in FR-1..FR-5) is satisfied byte-identically in this diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
@SoundMindsAI
Copy link
Copy Markdown
Owner Author

Review adjudication (Gemini Code Assist + GPT-5.5 final review)

Commits landing fixes: 17b34619 (Gemini), fd70f6df (GPT-5.5).

Gemini Code Assist (2 findings — both accepted)

# Sev Location Verdict Notes
1 Medium ui/src/__tests__/playwright-config-test-ignore.test.ts:37 Accepted Swapped process.cwd()import.meta.url for path resolution. Plan D-7 approved both shapes; Gemini's robustness point is real for ad-hoc operator runs (pnpm vitest run <path> from repo root, where cwd ≠ ui/). Fixed in 17b34619.
2 Medium ui/src/__tests__/playwright-config-test-ignore.test.ts:103 Accepted Added CRLF normalization in sliceConfig() before the \n-anchored indexOf searches. Zero-cost cross-platform defense. Fixed in 17b34619.

GPT-5.5 final review (3 findings — 2 accepted, 1 rejected)

# Sev Location Verdict Notes
1 High branch diff scope Rejected (re-raise without new evidence) Counter-evidence: PR body's "Diff scope (vs AC-7's strict '5 files')" section already documents this. Every recent merged PR (#383, #416, #421, #422) ships the pipeline-trail (idea/spec/plan/pipeline_status) per project convention; the 4 dashboard regen files are emitted by the mvp1-dashboard-regen pre-commit hook (forbidden to skip per CLAUDE.md Rule #7). AC-7's strict literal "5 files" predates the project-convention consideration of pipeline-trail co-shipping; the spec's intent (the 5 deliverables described in FR-1..FR-5) is satisfied byte-identically.
2 Medium ui/playwright.config.ts:73 Accepted Real broken pointer: comment said "See ... §4 for the lever cascade context" but runbook §4 is "Why each lever is GHA-only" — the lever cascade is §3 and my new section is §5. Updated to point at §5 (which itself carries the reseed-runtime-vs-Solr-stability relationship table). Fixed in fd70f6df.
3 Low docs/03_runbooks/smoke-solr-stability.md §5 Accepted FR-3 required actual "cross-link"s, not inline-code mentions. Converted three mentions to relative-path markdown links (verified all 3 targets resolve). Fixed in fd70f6df.

Outcomes

CI is re-running on fd70f6df. Ready for human review + merge once green.

@SoundMindsAI SoundMindsAI merged commit 035d794 into main Jun 2, 2026
13 checks passed
SoundMindsAI added a commit that referenced this pull request Jun 2, 2026
Move the feature folder from planned_features/02_mvp2/ to
implemented_features/2026_06_02_infra_smoke_reseed_runtime_budget/
(flat, date-prefixed). pipeline_status.md → Implementation: Complete
(+Release: mvp2 marker for the dashboard classifier);
implementation_plan.md → Complete (PR #424).

state.md: prepend the PR #424 one-liner to "Last 5 merges", drop the
now-6th infra_solr_ci_readiness entry to state_history.md, refresh the
current-branch context + In-flight (Solr-CI debt chain fully drained).
Full reasoning narrative added to state_history.md.

Dashboards + website roadmap regenerated by the pre-commit hooks
(MVP2 now 15 done; this feature classified mvp2 via the Release marker).
Tracking issue #409 closed.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant