feat: studies-list convergence visibility + real demo data (feat_studies_convergence_visibility) by SoundMindsAI · Pull Request #422 · SoundMindsAI/relyloop

SoundMindsAI · 2026-06-02T17:34:17Z

Summary

Epic 2 only — Epic 1 already on main via PR #421's squash-merge bundle. This PR completes feat_studies_convergence_visibility by shipping the demo-data enrichment + max_trials bump + headroom test scaffold.

Context

The earlier push (5dab2a5e) included both epics, but during CI watch I discovered Epic 1 had already merged to main as part of PR #421 e5c3b8b9 — a squash-merge that bundled the complementary-architecture-onepager docs PR with the entire Epic 1 backend (count_trials_for_studies / resolve_list_convergence_verdicts / StudySummary.trial_count + convergence_verdict) + frontend (Trials + Convergence columns + glossary entry + E2E spec). I rebased this branch onto e5c3b8b9 and force-pushed; this PR now carries only the Epic 2 commits on top.

What this PR ships

Story 2.1 — enrich docs + judgments (FR-5) — rewrote all 5 small SCENARIOS in scripts/seed_meaningful_demos.py using the decoy-by-title pattern: each query has a "best answer" doc whose query terms appear in description/body/bullet_points (NOT in title) and a "decoy" doc with query terms densely in title only with shallow description. The deterministic engine-backed headroom harness (backend/tests/integration/test_demo_scenarios_headroom.py) asserts the FR-5 bounds (0.40 ≤ baseline ≤ 0.70, lift ≥ 0.10, better < 0.99) for each scenario:

Scenario	Baseline NDCG@10	Better NDCG@10	Lift
acme-products-prod	0.597	0.851	+0.254
corp-docs-search	0.633	0.863	+0.230
news-search-staging	0.561	0.799	+0.238
jobs-marketplace-prod	0.690	0.985	+0.295
acme-kb-docs-solr	0.644	0.878	+0.234

Story 2.2 — max_trials 12 → 50 via shared constant (FR-6, D-11) — new DEMO_SMALL_STUDY_MAX_TRIALS exported from scripts/seed_meaningful_demos.py and aliased by backend/app/services/demo_seeding.py:_REAL_STUDY_MAX_TRIALS, single-sourcing the CLI seed and home-button reseed paths so they cannot drift. Pinned at STUDIES_TPE_WARMUP_FLOOR so the small-scenario LLM + UBI studies clear the warmup floor and convergence reads converged / still_improving instead of too_few_trials.
Story 2.3 scaffold — engine-backed headroom harness — new fixture module backend/tests/integration/fixtures/headroom_harness.py provides ES/OS bulk-index + Solr configset-upload + collection-create helpers; the test renders each scenario's template with the baseline (midpoint) params and a hand-picked "better" param set, evaluates NDCG@10 via the shipped eval engine, and asserts the FR-5 bounds. ES/OS scenarios are hard CI gates (via _require_es_or_fail() / _require_opensearch_or_fail() — CI=true → pytest.fail, local → pytest.skip); Solr skip-gates per D-18.
Story 2.3 finalize — shape invariants + AC-7/AC-8 — backend/tests/unit/scripts/test_scenarios_judgment_density.py (21 parametrized invariants enforcing doc-count floor, judgment density, valid doc_id/query_idx refs, and the FULL {0,1,2,3} rubric per query). The heavy-lane backend/tests/integration/test_demo_seeding_ubi_full.py gains an AC-7/AC-8 assertion block reading persisted Study.baseline_metric + Study.best_metric via the live count_trials_for_studies + resolve_list_convergence_verdicts path (matches the StudySummary.convergence_verdict pipeline). AC-8 wall-clock ceiling raised 1140s → 3600s per D-9.

Tangential inline fix

backend/tests/integration/test_health_integration.py: the contract test was asserting the /healthz subsystems set didn't include solr despite Solr being live since 2026-05-31. Fixed inline per CLAUDE.md fix-inline-by-default.

Test coverage

Layer	File	New tests
Unit	`backend/tests/unit/scripts/test_scenarios_judgment_density.py`	21
Unit	`backend/tests/unit/scripts/test_demo_max_trials_single_source.py`	4
Integration	`backend/tests/integration/test_demo_scenarios_headroom.py`	6 (5 scenarios + resolver-parity guard)
Integration	`backend/tests/integration/test_demo_seeding_ubi_full.py`	+ extended with AC-7/AC-8 block
Integration	`backend/tests/integration/test_health_integration.py`	+ tangential `solr` subsystem fix

All Epic 2 tests pass locally: 6 headroom + 21 shape + 4 single-source = 31 new tests green; 2187 unit + 330 contract (existing) green.

Test plan

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces completed-trial counts and convergence verdicts to the studies list endpoint and UI, utilizing new batched database queries to maintain a strict performance budget. It also enriches the five small demo scenarios with denser graded judgments and raises their trial budget to 50, allowing the optimizer to demonstrate real lift and meaningful convergence states. Feedback on the changes suggests explicitly coercing study_id to a string when mapping trial counts and grouping complete trials to prevent potential type mismatch issues if the database returns UUID objects.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T17:36:11Z

+    result: dict[str, TrialCounts] = {
+        row.study_id: TrialCounts(total=int(row.total), complete=int(row.complete)) for row in rows
+    }


To ensure robustness and prevent potential type mismatch issues (e.g., if study_id is stored or returned as a uuid.UUID object rather than a string), it is safer to explicitly coerce row.study_id to a string when constructing the result dictionary keys. This aligns with the type annotation dict[str, TrialCounts] and prevents silent failures where trial_counts.get(str(r.id)) fails to find the key.

Suggested change

result: dict[str, TrialCounts] = {

row.study_id: TrialCounts(total=int(row.total), complete=int(row.complete)) for row in rows

}

result: dict[str, TrialCounts] = {

str(row.study_id): TrialCounts(total=int(row.total), complete=int(row.complete)) for row in rows

}

gemini-code-assist · 2026-06-02T17:36:11Z

+    grouped: dict[str, list[Trial]] = {sid: [] for sid in study_ids}
+    for trial in (await db.execute(stmt)).scalars().all():
+        grouped[trial.study_id].append(trial)
+    return grouped


Similarly, to prevent a potential KeyError if trial.study_id is returned as a uuid.UUID object (while study_ids are passed as strings), explicitly coerce trial.study_id to a string when indexing into the grouped dictionary. This ensures defensive programming and compatibility across different database column types.

Suggested change

grouped: dict[str, list[Trial]] = {sid: [] for sid in study_ids}

for trial in (await db.execute(stmt)).scalars().all():

grouped[trial.study_id].append(trial)

return grouped

grouped: dict[str, list[Trial]] = {sid: [] for sid in study_ids}

for trial in (await db.execute(stmt)).scalars().all():

grouped[str(trial.study_id)].append(trial)

return grouped

SoundMindsAI · 2026-06-02T17:38:03Z

Review adjudication (Gemini Code Assist)

Commits landing fixes: none — both findings rejected with counter-evidence.

Gemini Code Assist (2 findings)

#	Sev	Location	Verdict	Notes
1	Medium	`backend/app/db/repo/trial.py:167`	Rejected — hunk-isolated	Counter-evidence: `backend/app/db/models/trial.py:73` declares `study_id: Mapped[str] = mapped_column(String(36), ForeignKey(...), nullable=False)` — the column type is a fixed-width VARCHAR(36) UUIDv7 string per the project's data-model convention (UUIDv7 strings, not native UUID type). `row.study_id` at runtime is already `str`; explicit `str(...)` coercion would be speculative future-proofing for a migration that isn't planned. Gemini reviewed the diff hunk without the model definition above it. The reciprocal `list_studies` caller already uses `str(r.id)` defensively, so the keys/lookups match.
2	Medium	`backend/app/db/repo/trial.py:203`	Rejected — same root cause as #1	Same counter-evidence: `Trial.study_id` is `Mapped[str]` (VARCHAR(36)). The `grouped[trial.study_id].append(trial)` lookup is already str-keyed against `study_ids` (also strings on the call sites). No coercion needed.

Outcomes

Applied fixes (0): —
Rejected with cited counter-evidence (2): Gemini docs: reorganize into numbered IA, add pipeline skills and feature templates #1 + docs: MVP1 architecture + 12 feature specs + GPT-5.5 review fixes #2 (both hunk-isolated false positives — model column type pins study_id to str).
Deferred as non-regression follow-ups (0): —

The cross-model phase-gate review (GPT-5.5 cycle 2) returned {"findings": []} — no other open items from the model-side review. Ready for human review + CI green.

SoundMindsAI · 2026-06-02T17:42:52Z

Final cross-model review (GPT-5.5, post-Gemini-adjudication)

GPT-5.5 reviewed the full PR diff (git diff main..HEAD, 366 KB across both epics) with the phase-gate + Gemini rejection log in the system prompt. 2 findings raised, both rejected with counter-evidence.

GPT-5.5 final review (2 findings)

#	Sev	Location	Verdict	Notes
1	High	`scripts/seed_meaningful_demos.py`	Rejected — out of scope, pre-existing behavior	The Solr CLI scenario being study-less in `make seed-demo` is pre-existing behavior from `infra_adapter_solr` Story A13 (PR #336, 2026-05-31) and intentionally documented in the script: `_seed_solr_scenario_minimum` only registers cluster + template, with the comment "operators kick that off via the UI once they've verified the collection contents". The home-button reseed at `backend/app/services/demo_seeding.py` DOES create the full Solr study + UBI study (8 studies without Solr, 10 with — asserted by `test_demo_seeding_ubi_full.py`). This PR's spec (feature_spec.md §14) explicitly scopes the FR-5 verification to the headroom test (covers all 5 directly, no optimizer run) + an `@pytest.mark.slow` end-to-end seed test for ONE representative scenario (acme-products-prod). Expanding the CLI Solr seed path would conflict with the existing `make seed-solr` separation discipline — that's a separate scope from `infra_adapter_solr`.
2	Low	`ui/src/components/studies/studies-table.column-config.tsx`	Rejected — UX non-regression	The convergence column uses `tooltipKey: 'convergence_verdict'` which the DataTable primitive renders as an `<InfoTooltip>` in the column HEADER (per the inline comment at `studies-table.column-config.tsx:135-137`). This is the established convention for multi-row list columns where every row shares the same semantic — the sibling `trial_count` column uses the same pattern. Per-row badge tooltips would be visual noise (operator scans down the column; the per-row badge color + label IS the cue). Epic 1's own phase-gate GPT-5.5 review accepted this pattern (per the implementation plan tracker §9 — "Epic 1 phase gate — GPT-5.5 review: 5 findings (4 accepted+fixed `a0c40d37`, F4 rejected with counter-evidence); re-review clean"). The plan's "adjacent" wording (line 115 of implementation_plan.md) is satisfied by the DataTable's header `InfoTooltip` rendered at `data-table.tsx:435`.

Outcomes

Applied fixes (0): —
Rejected with cited counter-evidence (2): Final-F1 (Solr CLI scope) + Final-F2 (header-tooltip UX convention) — both phase-gate-accepted patterns, no regression in this PR.
Deferred as non-regression follow-ups (0): —

Convergence summary

Epic 2 phase-gate cycle 1: 6 findings — 4 accepted + fixed (6e7934a7), 1 accepted as comment, 1 deferred to docs step (applied in 5dab2a5e).
Epic 2 phase-gate cycle 2: clean pass {"findings": []}.
Gemini Code Assist (on the PR): 2 findings — both rejected as hunk-isolated false positives (model column type is Mapped[str]); see adjudication summary above.
Final GPT-5.5 review: 2 findings — both rejected as out-of-scope/UX non-regression (this comment).

Ready for human review + merge once CI is green.

…y 2.3 scaffold + 2.1) Story 2.3 scaffold — engine-backed headroom test - backend/tests/integration/test_demo_scenarios_headroom.py: per-scenario test that indexes each scenario's docs into the live ES/OS/Solr container, renders the template with baseline (midpoint) + hand-picked "better" params, scores NDCG@10 via the shipped eval engine, and asserts the FR-5 bounds (0.40 <= baseline <= 0.70, lift >= 0.10, better < 0.99). - backend/tests/integration/fixtures/headroom_harness.py: ES/OS bulk-index + Solr configset-upload + collection-create helpers; build_adapter + run_scenario_metric driver. Raw httpx for indexing (mirrors the seed_es.py + es_overlap_probe precedent); adapter for render + search so the harness exercises the same code paths the live optimizer hits. - backend/tests/integration/fixtures/opensearch_reachability.py: new opensearch_required marker — sibling of the existing es_required + solr_required, probes localhost:9201 then opensearch:9200. Story 2.1 — enrich docs + judgments (5 scenarios) - scripts/seed_meaningful_demos.py: rewrote docs + judgments_map for all 5 small SCENARIOS using the decoy-by-title pattern (best-answer doc has query terms in description/body/bullet_points; decoy has them densely in title only with shallow description). Added _days_ago_iso() helper so news + jobs published_at stays within the freshness-decay window. - backend/tests/integration/test_demo_scenarios_headroom.py: hand-picked _BETTER_PARAMS per scenario favor description/body/bullets over title (flipped from the initial title-heavy draft once the recipe direction was confirmed empirically). - backend/tests/unit/services/test_demo_seeding.py: updated one pinned title assertion for the enriched Solr scenario's best-answer doc. Per-scenario headroom (baseline -> better): acme-products-prod 0.597 -> 0.851 (+0.254 lift) corp-docs-search 0.633 -> 0.863 (+0.230 lift) news-search-staging 0.561 -> 0.799 (+0.238 lift) jobs-marketplace-prod 0.690 -> 0.985 (+0.295 lift) acme-kb-docs-solr 0.644 -> 0.878 (+0.234 lift) All 6 headroom tests pass (5 scenarios + the resolver-parity guard); 2187 unit tests + 330 contract tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…ies 2.2 + 2.3 finalize) Story 2.2 — max_trials 12 -> 50 via shared constant DEMO_SMALL_STUDY_MAX_TRIALS - scripts/seed_meaningful_demos.py: new module-level constant DEMO_SMALL_STUDY_MAX_TRIALS = 50 (pinned at STUDIES_TPE_WARMUP_FLOOR per D-11) — exported alongside DEMO_ES_INDICES + SCENARIOS so the home-button reseed path imports it. Replaced the literal 12 in the CLI study config with the constant. - backend/app/services/demo_seeding.py: import the shared constant and alias _REAL_STUDY_MAX_TRIALS to it so the CLI and home-button reseed paths cannot drift. Refreshed the comment block + the UBI study seed log line to drop the now-stale "max_trials=12" wording. - backend/tests/unit/scripts/test_demo_max_trials_single_source.py (NEW): four parity guards — (1) DEMO_SMALL_STUDY_MAX_TRIALS == 50 == STUDIES_TPE_WARMUP_FLOOR; (2) _REAL_STUDY_MAX_TRIALS aliases the shared constant via `is` (catches a re-introduced literal); (3) rich scenario stays at 15 per D-11; (4) CLI study-config block uses the symbol, not the literal. Story 2.3 finalize — shape invariants + heavy-lane AC-7/AC-8 - backend/tests/unit/scripts/test_scenarios_judgment_density.py (NEW): 21 parametrized invariants on the enriched SCENARIOS — doc-count floor (>= 12), judgment density per query (>= 4), distinct ratings per query (>= 3), valid doc_id / query_idx refs, ratings in {0,1,2,3}. Pure-domain, runs in milliseconds with no engine; catches the cheap regression modes before the slow headroom test loads. - backend/tests/integration/test_demo_seeding_ubi_full.py: added the feat_studies_convergence_visibility AC-7 + AC-8 assertion block — reads the persisted Study.baseline_metric / best_metric for acme-products-prod (the representative scenario) and asserts the FR-5 bounds AND trial_count == 50 + verdict in {converged, still_improving}. Raised the existing AC-8 wall-clock ceiling from 1140s to 3600s per D-9 (the bump's wall-clock cost is explicitly accepted; smoke is opt-in/off so default CI lanes are unaffected). Tangential fix (CLAUDE.md fix-inline-by-default rule) - backend/tests/integration/test_health_integration.py: the contract test asserted the /healthz subsystems set was exactly {db, redis, openai, elasticsearch, opensearch, elasticsearch_clusters} but the actual response includes 'solr' (added when infra_adapter_solr shipped 2026-05-31). Added 'solr' to both the expected set and the blocking-down branch of the consistency test; allowed 'not_configured' as a valid Solr-probe state alongside reachable / unreachable. All Epic 2 tests pass: - 6 headroom tests (5 scenarios + resolver-parity guard) - 21 shape invariants - 4 max_trials parity guards - 2187 unit + 330 contract + 2 health integration Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…subsystem fix backend/tests/unit/scripts/test_scenarios_judgment_density.py: should have landed in the previous commit (Story 2.3 finalize) but missed the stage. 21 parametrized invariants on the enriched SCENARIOS. backend/tests/integration/test_health_integration.py: tangential — the contract test was asserting the /healthz subsystems set didn't include 'solr' but the actual response includes it (added when infra_adapter_solr shipped 2026-05-31). Added 'solr' to the expected set and the blocking-down branch; allowed 'not_configured' alongside reachable / unreachable as a valid Solr-probe state. Noticed during the Epic 2 phase gate full-suite run; fixed inline per the CLAUDE.md fix-inline-by-default rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…4/F5) F1 (High) — ES/OS headroom tests now hard-fail in CI when the engine is unreachable instead of silently skipping. Added _require_es_or_fail() + _require_opensearch_or_fail() helpers that route to pytest.fail when CI=true (the GHA-set env var) and pytest.skip otherwise. Preserves the local-dev skip ergonomics while making a CI service-container failure loud (per plan D-18 / spec §6 the 4 ES/OS scenarios are hard CI gates). Mirrors the precedent at backend/tests/integration/fixtures/ es_overlap_probe.py:_check_local_es_credentials_or_skip. Solr stays skip-only (no Solr container in backend CI per infra_solr_ci_readiness). F2 (Medium) — The heavy-lane AC-8 verdict assertion now routes through the live list-endpoint path (count_trials_for_studies + resolve_list_convergence_verdicts), not a direct classify_convergence call. Catches regressions in the list wiring (the path StudySummary. convergence_verdict exercises) instead of only the underlying classifier. classify_convergence + Study imports kept as _ = ... for docstring-reference linting. F3 (Medium — accepted as comment) — Documented the determinism trade-off in scripts/seed_meaningful_demos.py:_days_ago_iso(): the helper produces dates that shift one day per calendar day relative to the engine's origin: now. The RELATIVE distance between best-answer and decoy docs is preserved so ranking monotonicity is stable; headroom-test margins (≥ +0.23 lift across the 5 scenarios) absorb the daily freshness-decay shift. The trade is intentional — relative dates keep the operator-facing make seed-demo output plausible (news with a stale 2025 date would read as broken to an evaluator running the demo in 2027). Documented the fixed-anchor fallback for future flake remediation. F4 (Low) — Shape test now requires the FULL {0,1,2,3} rubric per query (was: >= 3 distinct ratings). Catches a regression that drops one rubric bucket while still satisfying the count floor. Renamed the test function to test_scenario_each_query_spans_full_rubric for clarity. F5 (Low) — Replaced unreliable `is`-identity check on small ints (CPython interns 50, so a re-introduced literal would still satisfy `is`) with inspect.getsource() of demo_seeding.py asserting the canonical alias-binding form. Belt-and-suspenders equality check kept as defense-in-depth. F6 (Medium) deferred to the post-implementation documentation step (state.md, convergence-verdict.md, ui-architecture.md updates run as part of the impl-execute workflow's Step 2). All 33 Epic 2 tests still pass. Lint, format, mypy all clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…md + guide-06 caption Plan §4 documentation update workstream: - docs/03_runbooks/convergence-verdict.md — added a list-page-vs-detail-page surface map at the top: the badge column on /studies uses the SAME classifier with compact labels (Converged/Improving/Too few trials/em-dash), and `null` verdicts mean the same thing on both surfaces (in-flight, invalid objective.direction, or fewer than 5 complete trials). Linked feat_studies_convergence_visibility Epic 1 as the source. - docs/01_architecture/ui-architecture.md — extended the /studies row in the page-route table with the column inventory (name / cluster / status / best_metric+ceiling-badge / Trials / Convergence / created / completed), the backend wiring pointer (count_trials_for_studies + resolve_list_convergence_verdicts; bounded to 1-2 queries per FR-3), and the source-of-truth pointers (CONVERGENCE_VERDICT_VALUES + convergence_verdict glossary key) so a future column change has the reuse path documented. - state.md — refreshed the "Last updated" + "Current branch / execution context" + "In flight" sections to reflect the in-flight feat/studies-convergence-visibility branch (Epic 1 + Epic 2 both committed locally; PR not yet opened; finalization in progress). Full feature shape + GPT-5.5 phase-gate cycle outcomes inline. Final merge entry lands on Step 5 of finalization after the PR merges. - ui/public/guides/06_create_and_monitor_study/metadata.json — updated the 01-studies-list.png caption to mention the new Trials + Convergence columns and the at-a-glance "is this trustworthy" cue. Caption notes the screenshot pre-dates the feature and will refresh at the next /guide-gen 06 --regen pass — the change is purely additive (new columns appended right of existing ones) so the screenshot is stale but not misleading. Deferred a Playwright regen run to a future guide-gen pass. Tangential observations sweep: 1 inline fix (healthz contract test accepts the solr subsystem the live response carries — already committed in 64e6ab6); 0 new idea files needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…ia PR #421) The earlier docs commit recorded "Epic 1 + Epic 2 committed locally" but Epic 1 was actually merged to main as part of PR #421 e5c3b8b (a squash-merge that bundled complementary-architecture-onepager + the entire Epic 1 backend/ frontend code). This PR only ships Epic 2 on top. Adjusts: - "Last updated" — explicit about Epic 1 vs Epic 2 origins - "Current branch / execution context" — branch is 5 commits ahead of main (not 6), PR is open (#422) - "In flight" — references PR #422 and notes Epic 1 already on main Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

SoundMindsAI · 2026-06-02T17:52:53Z

Branch rebased — Epic 2 only

Discovered during CI watch that Epic 1 of this feature already merged to main as part of PR #421 e5c3b8b9 (a squash-merge that bundled complementary-architecture.md with the entire Epic 1 backend + frontend code). The original push here carried both epics and conflicted with main on the Epic 1 files.

Rebased onto e5c3b8b9 and force-pushed (5dab2a5e...e0742e71). This PR now carries only the 6 Epic 2 commits on top of main. PR description updated to reflect the Epic 2-only scope.

Gemini findings on the prior push (2 medium, both about backend/app/db/repo/trial.py:167 / :203) are now moot — that file is no longer modified in this PR (it shipped with Epic 1 via #421). Both were rejected on review as hunk-isolated false positives (the Trial.study_id column is Mapped[str] per backend/app/db/models/trial.py:73); they will not surface against the rebased SHA.

The final GPT-5.5 cross-model review's adjudication comment above (re: Solr CLI scope + header-tooltip UX) still applies — both findings reference Epic 2 content unchanged by the rebase.

…_status Updates the in-flight feature folder's plan + pipeline_status to reflect: - Epic 1 already shipped via PR #421 (e5c3b8b squash-merge bundle) - Epic 2 in flight as PR #422 — all 5 stories committed locally + cross- model-reviewed; awaiting CI + merge. Per the impl-execute Step 8 finalization workflow these would normally land on a docs/finalize-* branch post-merge, but the tracker checkboxes + pipeline_status are useful to update inline while the PR is open so operators looking at the planned-features folder see the live status. The Implementation status will flip to "Complete (PR #422, merged <date>)" + folder move to implemented_features/ happen in the post-merge finalization step. Includes the MVP2 dashboard regen output from the dashboard pre-commit hook (auto-generated from the planned_features tree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…c 2 #422 merged) (#423) Step 8 finalization for the shipped feature: - implementation_plan.md Status → Complete (Epic 1 PR #421 e5c3b8b, Epic 2 PR #422 49a0e1b). - pipeline_status.md Implementation → Complete with both PR refs + cross-model review outcomes + 5/5 Epic 2 stories. - Moved the feature folder planned_features/02_mvp2/feat_studies_convergence_visibility → implemented_features/2026_06_02_feat_studies_convergence_visibility (flat, date-prefixed per the archive convention). - state.md: branch → main, active feature → none, prepended the merge to "Last 5 merges" (dropped the now-6th MVP2-backlog-batch one-liner), removed from "In flight", de-brittled the stale 02_mvp2 folder count. - state_history.md: full feature-merge narrative (both epics, the mid-flight rebase story, all cross-model review cycles). - Dashboard regen (DASHBOARD.md + MVP2_DASHBOARD.md + *.html) from the pre-commit hook (folder moved buckets — two-shot commit). No tracking issue existed to close. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…issing markdown links) Two findings accepted, one rejected: ACCEPTED #2 (Medium): ui/playwright.config.ts comment said "See ... smoke-solr-stability.md §4 for the lever cascade context" but §4 is "Why each lever is GHA-only", not the lever cascade (which is §3). My new section about reseed runtime is §5. Updated to point at §5 for the reseed-runtime-vs-Solr-stability relationship table (which is where the broader cascade context is explained in the demo-ubi exclusion narrative). ACCEPTED #3 (Low): FR-3 required the new runbook §5 to "cross-link" to ui/playwright.config.ts and ui/tests/e2e/demo-ubi.spec.ts. Inline-code mentions don't satisfy "cross-link" — converted to clickable markdown links with verified resolvable relative paths. REJECTED #1 (High): "AC-7 file-shape contract violated" — re-raise without new evidence. Counter-evidence cited in PR #424 body's "Diff scope" section: every recent PR (#383, #416, #421, #422) ships the pipeline-trail (idea/spec/plan/pipeline_status) per project convention; dashboard regen files are emitted by the mvp1-dashboard-regen pre-commit hook (forbidden to skip per CLAUDE.md Rule #7 "never skip hooks"). AC-7's strict literal "5 files" predates the project-convention consideration of pipeline- trail co-shipping; the spec's intent (the 5 deliverables described in FR-1..FR-5) is satisfied byte-identically in this diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

… runtime budget) (#424) * docs(planned): infra_smoke_reseed_runtime_budget — preflight + spec + plan Pipeline trail for the demo-ubi CI exclusion work that clears the smoke job's reseed-runtime block. idea.md (preflight refresh): priority framing shifted from "smoke red on every PR" to "precondition for re-enabling per-PR smoke" since the SMOKE_TEST gate landed 2026-06-02. AC-8 citation corrected (1140s/19 min hard ceiling, ~28 min worst case — not the 24-min downstream drift in pr.yml/demo-ubi.spec.ts). Decisions locked: D-1 Option A (testIgnore), D-2 Option C deferred (operator picked), D-3 Option B rejected (math), D-4 sibling coordination. feature_spec.md: 5 FRs (testIgnore extension, vitest regression guard, runbook section, pr.yml comment refresh, state.md update), 7 ACs, single- phase. GPT-5.5 cross-model review: 3 cycles, 13 findings (1 H + 5 M + 7 L), all accepted and applied. Convergence at cycle 3. implementation_plan.md: 1 epic, 5 stories one-per-FR, 0 endpoints, 1 new test file, 4 modified files. GPT-5.5: 3 cycles, 11 findings (0 H + 4 M + 7 L), all accepted. Convergence at cycle 3. pipeline_status.md: spec + plan finalized, ready for execution. Dashboards regenerated by the mvp1-dashboard-regen pre-commit hook (176 features across 3 releases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * infra(ci): exclude demo-ubi.spec.ts from CI Playwright run (Stories 1.1 + 1.2) The smoke job's demo-ubi.spec.ts beforeAll hook drives a full reseed that exceeds the 25-min job cap. AC-8 of feat_demo_ubi_study_comparison bounds the in-flight reseed at 1140s (~19 min hard ceiling) with §14 estimating ~28 min worst case once the Solr scenario lights up. Adding Playwright + smoke-job setup overhead pushes total wall-clock past the cap (PR #383 run 26790636716 hit it at 25:18). Fix: extend playwright.config.ts's testIgnore CI-gated branch by one entry — '**/demo-ubi.spec.ts' — joining the 6 pre-existing demo-data- dependent specs. Single-file edit; matches the established pattern from chore_drop_demo_seed_from_ci + PR #291's 4th-run surface. Local coverage preserved: CI=unset (the normal local-dev case) still discovers and runs demo-ubi.spec.ts. The file itself is unchanged. Story 1.2 (vitest regression guard): ui/src/__tests__/playwright-config-test-ignore.test.ts reads playwright.config.ts as text and asserts (a) demo-ubi entry is in the CI ternary branch, (b) all 7 expected CI-gated entries are present, (c) demo-ubi does NOT appear outside the CI ternary (local coverage intact). Text-grep approach per spec D-7 — lowest-coupling, no module-reload tricks. §16 manual verification (recorded in this PR's body): CI=true playwright test --list -> 86 tests in 30 files, 0 demo-ubi CI=unset playwright test --list -> 110 tests in 37 files, demo-ubi discovered (5 grep matches) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(infra): document demo-ubi exclusion + refresh stale framing (Stories 1.3 + 1.4 + 1.5) Story 1.3 — docs/03_runbooks/smoke-solr-stability.md gains §5 "Reseed runtime (demo-ubi exclusion)". Explains why the exclusion exists (AC-8 vs smoke-cap mismatch — cites the actual 1140s/19 min hard ceiling, not the 24-min downstream drift), where it lives (the testIgnore CI branch in playwright.config.ts — single source of truth), the local-coverage promise (CI=unset keeps demo-ubi running locally), the nightly-CI caveat (a future nightly-on-GHA job would also exclude demo-ubi unless it overrides CI or uses a separate config — defer until needed), and the Option C path-forward if per-PR demo-ubi coverage is ever wanted. Note: numbered §5 (not the spec's literal "§4") because the existing §4 "Why each lever is GHA-only" pairs tightly with §3's lever cascade — inserting between them would interrupt that flow. FR-3's "or wherever it fits the runbook's flow" clause covers this; AC-4's literal number was paraphrasing FR-3's intent (section by name, not by ordinal). Story 1.4 — .github/workflows/pr.yml comment blocks refreshed: - Lines 42-58 (SMOKE-TEST opt-in switch note): replace "demo-ubi reseed exceeds the per-PR budget" framing with "runtime block cleared via testIgnore — flip SMOKE_TEST=true after the §16 verification". Operator opt-in commands unchanged. - Lines 507-523 (smoke-test job header / timeout-minutes comment): replace "AC-8 bounds at 24 min" framing with "runtime is expected to fit within the 25-min cap post-demo-ubi-exclusion". YAML structure untouched: if-gate, timeout-minutes, needs, env, permissions, steps all byte-identical. Comments-only diff verified with awk filter (zero non-comment changed lines). Story 1.5 — state.md updated: - "CI note" paragraph (lines 13-15): the two stale sentences ("drives the demo-ubi reseed, which routinely hits the 25-min cap" and "Until the reseed-runtime fix lands, leave it off") replaced with framing that preserves SMOKE_TEST=OFF-by-default, names the demo-ubi exclusion as the shipped fix, and points at the spec §16 verification. - "Known debt / fragility" section: the Solr CI-readiness entry was the umbrella tracking three sub-tasks (backend, Solr stability, reseed runtime); rewritten as fully resolved with the third sub-task now shipped here. The "Last 5 merges" entry is NOT added here — that's the finalization step's responsibility (after PR merge), per Epic gate item #9 of the implementation plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(test): apply Gemini Code Assist review findings on playwright-config-test-ignore.test.ts Two Medium-severity findings, both accepted: 1. Path resolution via `import.meta.url` instead of `process.cwd()`. Plan D-7 explicitly approved both options; Gemini's robustness point holds for ad-hoc operator runs like `pnpm vitest run ui/src/__tests__/...` from the repo root (where cwd would be the repo root, not ui/, and the lookup would fail). `import.meta.url` works in both the canonical `pnpm --dir ui test` shape and the ad-hoc shape. Strictly more robust. 2. CRLF normalization in sliceConfig() before the `\n`-anchored indexOf searches. Zero-cost defense for any future Windows checkout where git's autocrlf converts line endings; macOS/Linux unchanged. Spec D-7 didn't address this; accepting as free defense. Vitest after both fixes: 3/3 still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(docs): apply GPT-5.5 final review findings (broken §4 pointer + missing markdown links) Two findings accepted, one rejected: ACCEPTED #2 (Medium): ui/playwright.config.ts comment said "See ... smoke-solr-stability.md §4 for the lever cascade context" but §4 is "Why each lever is GHA-only", not the lever cascade (which is §3). My new section about reseed runtime is §5. Updated to point at §5 for the reseed-runtime-vs-Solr-stability relationship table (which is where the broader cascade context is explained in the demo-ubi exclusion narrative). ACCEPTED #3 (Low): FR-3 required the new runbook §5 to "cross-link" to ui/playwright.config.ts and ui/tests/e2e/demo-ubi.spec.ts. Inline-code mentions don't satisfy "cross-link" — converted to clickable markdown links with verified resolvable relative paths. REJECTED #1 (High): "AC-7 file-shape contract violated" — re-raise without new evidence. Counter-evidence cited in PR #424 body's "Diff scope" section: every recent PR (#383, #416, #421, #422) ships the pipeline-trail (idea/spec/plan/pipeline_status) per project convention; dashboard regen files are emitted by the mvp1-dashboard-regen pre-commit hook (forbidden to skip per CLAUDE.md Rule #7 "never skip hooks"). AC-7's strict literal "5 files" predates the project-convention consideration of pipeline- trail co-shipping; the spec's intent (the 5 deliverables described in FR-1..FR-5) is satisfied byte-identically in this diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

SoundMindsAI and others added 6 commits June 2, 2026 13:48

SoundMindsAI force-pushed the feat/studies-convergence-visibility branch from 5dab2a5 to e0742e7 Compare June 2, 2026 17:51

SoundMindsAI closed this Jun 2, 2026

SoundMindsAI reopened this Jun 2, 2026

SoundMindsAI merged commit 49a0e1b into main Jun 2, 2026
21 of 26 checks passed

SoundMindsAI deleted the feat/studies-convergence-visibility branch June 2, 2026 18:11

SoundMindsAI mentioned this pull request Jun 2, 2026

docs: finalize feat_studies_convergence_visibility (move to implemented_features) #423

Merged

2 tasks

SoundMindsAI mentioned this pull request Jun 2, 2026

infra: exclude demo-ubi.spec.ts from CI Playwright run (unblock smoke runtime budget) #424

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: studies-list convergence visibility + real demo data (feat_studies_convergence_visibility)#422

feat: studies-list convergence visibility + real demo data (feat_studies_convergence_visibility)#422
SoundMindsAI merged 7 commits into
mainfrom
feat/studies-convergence-visibility

SoundMindsAI commented Jun 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

SoundMindsAI commented Jun 2, 2026

Uh oh!

SoundMindsAI commented Jun 2, 2026

Uh oh!

SoundMindsAI commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SoundMindsAI commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

What this PR ships

Tangential inline fix

Test coverage

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

SoundMindsAI commented Jun 2, 2026

Review adjudication (Gemini Code Assist)

Gemini Code Assist (2 findings)

Outcomes

Uh oh!

SoundMindsAI commented Jun 2, 2026

Final cross-model review (GPT-5.5, post-Gemini-adjudication)

GPT-5.5 final review (2 findings)

Outcomes

Convergence summary

Uh oh!

SoundMindsAI commented Jun 2, 2026

Branch rebased — Epic 2 only

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SoundMindsAI commented Jun 2, 2026 •

edited

Loading