docs: reorganize into numbered IA, add pipeline skills and feature templates by SoundMindsAI · Pull Request #1 · SoundMindsAI/relyloop

SoundMindsAI · 2026-05-09T01:13:58Z

Summary

Reorganize docs into a numbered IA. Relocate the umbrella product spec and MVP1 execution plan to docs/00_overview/product/. Add per-section README index files (docs/00_overview/ through docs/09_decisions/). Update root README.md, CONTRIBUTING.md, and docs/README.md to point at the new locations.
Port six pipeline skills to .claude/skills/ (idea-preflight → spec-gen → impl-plan-gen → impl-execute → guide-gen, plus the pipeline orchestrator) from the sibling creator-discovery-outreach project. Each carries a banner explaining what was preserved verbatim as illustrative provenance vs. what needs adaptation as RelyLoop matures.
Port the feature templates and reference examples to docs/02_product/planned_features/feature_templates/. The two big templates (feature-spec-template.md, implementation-plan-template.md) carry HTML porting banners with explicit for §X, write Y instead guidance for sections that depend on architecture RelyLoop hasn't adopted yet (audit-events, admin impersonation, multi-tenant scoping, backend paths, migration tool, cross-model review).

32 files changed, +7,355 / −22.

What's safe to use today

idea-preflight against any well-formed idea.md
spec-gen and impl-plan-gen Generate / Review / Reconcile modes (templates in place)
pipeline orchestration up through plan generation
The folder-prefix taxonomy (feat_, bug_, chore_, infra_, epic_)

What's gated on later work

impl-execute — needs backend/, web/, CI to exist; cross-model review tooling (Gemini PR comments fetch, GPT-5.5 invocation) needs re-wiring
guide-gen — dormant until web frontend (MVP3+ per spec roadmap)
Audit-events story checks across templates and skills — only become live if RelyLoop adopts an audit-events subsystem

Public-repo consideration

The README states this repo is currently soundminds.ai-internal and will become public when MVP1 ships. The ported skills and example specs reference CDO architectural details verbatim (TENANT_ACTIVITY_ALLOWLIST, audit_event_repo.py, _SENSITIVE_KEYS in log_scrubber.py, internal CDO PR numbers like #183, internal file paths). Decide before merge whether to:

Merge as-is — provenance is preserved and the framing is honest
Scrub CDO file paths and PR numbers from the inline anecdotes (skills' battle-tested rules survive; only the cited incidents change)
Hold the merge until after MVP1 ships (and the repo's public-vs-internal status is settled)

Test plan

docs/00_overview/product/relevance-copilot-spec.md and mvp1-execution-plan.md open from the link in README.md
.claude/skills/*/SKILL.md banners render cleanly in the IDE preview
docs/02_product/planned_features/feature_templates/ — open the spec and plan templates; the HTML porting banner at the top reads as actionable guidance
grep -rn creator-discovery-outreach . returns only the porting-banner mentions and example specs (no orphan refs)

🤖 Generated with Claude Code

…mplates - Move design artifacts into docs/00_overview/product/: - relevance-copilot-spec.md (was at repo root) - mvp1-execution-plan.md (was at repo root) - Add per-section README index files (docs/00_overview through docs/09_decisions). - Update root README.md, CONTRIBUTING.md, and docs/README.md to point at the new locations. - Port six pipeline skills from the sibling creator-discovery-outreach project to .claude/skills/ (idea-preflight, spec-gen, impl-plan-gen, impl-execute, guide-gen, pipeline). Per-skill banners flag what was kept verbatim as illustrative provenance vs. what needs adaptation as RelyLoop matures. guide-gen marked dormant until RelyLoop has a web frontend (MVP3+ per spec). - Port feature templates to docs/02_product/planned_features/feature_templates/ (idea, feature-spec, implementation-plan, README, examples). HTML porting banners on the big templates document conditional sections (audit-events, admin impersonation, tenant scoping, backend paths, migration tool, cross-model review) — strip the banner and adapt when authoring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n.md GitHub Free + private repo allows rulesets to be CREATED but not ENFORCED — surfaced as "Your rulesets won't be enforced on this private repository until you move to GitHub Team organization account" when SoundMindsAI tried to follow Procedure A on the relyloop repo. Adds a plan × visibility matrix and the three triggers that actually warrant an upgrade to Team ($4/user/month). For pre-public solo work, the agent's CLAUDE.md Rule #1 enforcement + visible CI status in the PR UI substitutes for the missing merge-block — no plan upgrade needed until a contributor joins or the repo goes public (which auto-activates the dormant ruleset for free). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a `mvp1-dashboard-regen` pre-commit hook that runs `scripts/build_mvp1_dashboard.py` whenever a file under `docs/02_product/planned_features/`, `docs/00_overview/implemented_features/`, or the generator script itself is staged. Standard pre-commit "files were modified" abort UX — same as ruff-format / prettier. Why pre-commit instead of a literal git post-merge hook: a post-merge hook would regenerate the dashboard locally after `git pull` and require a direct commit to main to land it, violating CLAUDE.md Absolute Rule #1. The pre-commit hook makes dashboard freshness a *PR-merge invariant* (every PR that touches features ships the regenerated dashboard alongside the change), which is the right shape for this project's "feature branch + PR" workflow. Idempotency fix: the dashboard's "as-of" stamp now uses the most-recent mtime of any feature-folder .md file (or the generator script itself) instead of `datetime.now()`. Prevents per-commit timestamp churn that would otherwise create spurious dashboard diffs on every unrelated PR. Verified by running `make dashboard` twice in a row → byte-identical output. Header text reworded from "Generated …" to "Reflects feature-folder state as of …" so the semantic is clear: this is a snapshot of folder state, not a transient build artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…20) * chore(dashboard): MVP1 progress dashboard generator + initial output Adds `scripts/build_mvp1_dashboard.py` (stdlib-only) that walks `docs/02_product/planned_features/` + `docs/00_overview/implemented_features/`, parses each feature's idea / spec / plan / pipeline_status artifacts, and writes a single self-contained HTML dashboard at `docs/00_overview/mvp1_dashboard.html` with: * KPI row: Features done (X / 12 scoped MVP1), Path to MVP1 (remaining items), Open bugs, Open chores. Plus a secondary line showing backlog ideas (idea-only feat_/infra_ folders not yet scoped) and in-flight count. * Kanban view: Idea / Spec / Plan / Implementing / Done columns; type- colored cards (feat = indigo, infra = amber, chore = slate, bug = red, epic = blue); filter chips by type. * Dependency DAG: Mermaid `graph LR` of scoped feat_/infra_/chore_ nodes with stage-colored classes and edges parsed from each spec's "Depends on" line. * Per-card detail: name (linked to feature_spec.md), one-liner extracted from `**Outcome:**` (spec) or `## Problem` (idea), PR# + merged date for shipped features, deferred-phase note for partial completions. Source of truth is the folder structure, not `mvp1-user-stories.md` (which stays as the human-curated narrative). Regenerate after any spec/plan/pipeline_status edit with `make dashboard`. The script strips trailing whitespace from output lines to match the project's pre-commit hook so subsequent regenerations are idempotent. Build/CI changes: * `pyproject.toml`: extend ruff per-file-ignores so `scripts/**` skips the `D` (pydocstyle) rule — same rationale as `backend/tests/`, operator tooling not library code. * `Makefile`: new `dashboard` target shows up in `make help`. Initial dashboard snapshot: 2 / 12 MVP1 features done (17%); 14 items remaining (10 features + 2 bugs + 2 chores); feat_study_lifecycle in-flight (Phase 1 done, Phase 2 deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(dashboard): pre-commit hook + idempotent timestamp Adds a `mvp1-dashboard-regen` pre-commit hook that runs `scripts/build_mvp1_dashboard.py` whenever a file under `docs/02_product/planned_features/`, `docs/00_overview/implemented_features/`, or the generator script itself is staged. Standard pre-commit "files were modified" abort UX — same as ruff-format / prettier. Why pre-commit instead of a literal git post-merge hook: a post-merge hook would regenerate the dashboard locally after `git pull` and require a direct commit to main to land it, violating CLAUDE.md Absolute Rule #1. The pre-commit hook makes dashboard freshness a *PR-merge invariant* (every PR that touches features ships the regenerated dashboard alongside the change), which is the right shape for this project's "feature branch + PR" workflow. Idempotency fix: the dashboard's "as-of" stamp now uses the most-recent mtime of any feature-folder .md file (or the generator script itself) instead of `datetime.now()`. Prevents per-commit timestamp churn that would otherwise create spurious dashboard diffs on every unrelated PR. Verified by running `make dashboard` twice in a row → byte-identical output. Header text reworded from "Generated …" to "Reflects feature-folder state as of …" so the semantic is clear: this is a snapshot of folder state, not a transient build artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(dashboard): emit GitHub-native Markdown alongside the HTML Reason: HTML files don't render inline in GitHub's web UI (the .html shows as raw source when clicked from the repo browser). The user asked for a view they can read on github.com directly, without enabling Pages or going through htmlpreview.github.io. This commit makes `make dashboard` emit two files: * `docs/00_overview/mvp1_dashboard.html` — rich local view (filter chips, type-color cards, sticky header). Open locally in a browser. * `docs/00_overview/MVP1_DASHBOARD.md` — GitHub-native view (KPI table, stage-per-section tables, Mermaid graph). Renders inline when browsing github.com. GitHub markdown supports Mermaid blocks since Feb 2022, so the dependency DAG renders too. The Markdown intentionally trades visual richness for native rendering: no filter chips (no JS), no per-cell colors (no CSS), but the information architecture is identical and the cross-link to the HTML view is one click away. Two PR-extraction bugs caught during the rewrite: 1. `_extract_pr_number()` was returning `matches[-1]` (last `PR #N` reference), which picked up dependency cites in implementation_plan.md instead of the feature's own PR (e.g., `infra_adapter_elastic` was showing PR #4 because the plan cites `infra_foundation`'s PR #4 as a dep). Fix: prioritize the `## Implement` section of pipeline_status.md, then the plan's `**Status:**` header, then a `merged`-context match across all sources, then first-match. Also broadened the regex to accept `PR: [#N]` markdown-link format (which was failing the `PR\s*#` shape). 2. Markdown relative links were emitting `../docs/02_product/...` when they should have been `../02_product/...` (the markdown lives at `docs/00_overview/`, so `..` already lands at `docs/`). Fix: use `os.path.relpath(target, OUTPUT_MD.parent)` instead of constructing the path manually. Verified: both files idempotent across consecutive `make dashboard` runs; PR numbers match git log + CLAUDE.md (#4 / #16 / #18); planned- features links resolve correctly from `docs/00_overview/`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cycle 2) Root cause of the polling-reconciler + webhook receiver test failures on CI cycle 1: * ``config_repos`` rows persist across tests (``_clean_phase2_tables`` deliberately doesn't wipe operator-managed config_repos). * Both test files reused the SAME ``(owner, repo)`` across every test (``octocat/configs`` / ``octocat/hello-world``). * After test #1, two ``config_repos`` rows now match the same ``(owner, repo)``. ``lookup_config_repo_by_owner_repo`` returns the FIRST match — test #1's row — whose ``auth_ref`` / ``webhook_secret_ref`` point at test #1's already-deleted ``tmp_path``. * Surfaced as ``pr_reconcile_pat_missing`` (reconciler) and 403 ``INVALID_SIGNATURE`` (webhook receiver) for every test after the first. Fix: each test's fixture now generates a unique ``(owner, repo)`` pair keyed on a random hex suffix. ``_pull_request_body`` was lifted to take ``owner`` and ``repo_name`` as explicit args so it doesn't close over the module-level constants (which no longer exist). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…to-register (#56) * feat(github-webhook): migration 0006 + relyloop_pr_poll_minutes setting (Story 1.1) * Add Alembic migration 0006_proposals_pr_url_idx — partial B-tree index `proposals(pr_url) WHERE pr_url IS NOT NULL`, with matching downgrade. Round-trips cleanly per CLAUDE.md Rule #5. * Add Settings.relyloop_pr_poll_minutes (default 15, ge=1, le=1440). Story 3.1 will narrow to a whitelist of cron-expressible values. * Document RELYLOOP_PR_POLL_MINUTES in .env.example. * New integration test asserts the partial-WHERE predicate via pg_indexes and verifies the downgrade-then-upgrade round-trip. * New unit test exercises the Settings field's default + env override + ge/le bounds (1 + 1440 accepted; 0, -1, 1441 raise ValidationError). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(github-webhook): HMAC verifier + repository.full_name parser (Story 1.2) Two pure-domain helpers in backend/app/domain/git/: * verify_webhook_signature(body, signature_header, secret) — HMAC-SHA256 verification of GitHub's X-Hub-Signature-256 header. Uses hmac.compare_digest for constant-time comparison. Refuses empty secret (no unsigned acceptance) and rejects sha1= / missing-prefix variants. * parse_repository_full_name(value) — parses the canonical owner/repo short form GitHub emits in webhook payloads. HTTPS URLs / SSH URLs / enterprise-host inputs all return None — spec FR-1 pairs this with validate_repo_url on the config_repos.repo_url side, NOT a duplicated URL regex. Plus a cross-parser parity test asserting that for canonical inputs the two parsers produce comparable (owner, repo) tuples after case-folding, and that the SSH / enterprise-host spec §14 negative cases are rejected by both. 36 unit tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(github-webhook): pure-domain event dispatcher (Story 1.3) dispatch_event(event_type, payload) → WebhookDecision encapsulates the FR-1 routing matrix: * ping → action=ping, mutation=none * pull_request closed+merged=true → mutation=merged * pull_request closed+merged=false → mutation=closed (status stays pr_opened) * pull_request reopened → mutation=reopened * pull_request other actions → action=noop, mutation=none * unknown event type → action=noop The dispatcher NEVER emits action="unknown_pr" — that string is router- owned, set after lookup_proposal_by_pr_url misses. The action Literal is narrowed to {"applied", "noop", "ping"} so static typing enforces the contract (cross-model review F2). WEBHOOK_ACTION_VALUES + HANDLED_EVENT_TYPES frozensets are the spec §8.4 source-of-truth, re-exported from the domain package and (in Story 2.1) from the router module so the cite at backend/app/api/webhooks/github.py also passes a static grep. 28 unit tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(github-webhook): proposal + config_repo repo functions (Story 1.4) 5 new proposal repo functions (mark_proposal_pr_merged/closed/reopened, lookup_proposal_by_pr_url, list_pr_opened_proposals_for_reconcile) plus 2 new config_repo repo functions (set_webhook_registration_error, lookup_config_repo_by_owner_repo). All conditional UPDATEs return None on the zero-row match so the webhook receiver + polling reconciler can log benignly and skip — mirrors the cycle-3 F4 pattern from mark_proposal_pr_opened (feat_github_pr_worker Story 1.1). - proposal.py: 5 new functions + Sequence/datetime/UTC/timedelta imports - config_repo.py: 2 new functions + validate_repo_url import for the owner/repo canonicalisation loop (skip-on-UnsupportedProviderError for historic non-GitHub URLs) - __init__.py: re-export all 7 - 11 integration tests in test_proposal_repo_webhook.py covering happy paths + idempotent no-ops + 90-day reconcile cutoff - 6 integration tests in test_config_repo_repo_webhook.py covering set/clear cycle + dot-git suffix + case-insensitive owner/repo lookup Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(github-webhook): extract shared github_request helper (Story 1.5) Generalises _github_post (POST-only, feat_github_pr_worker) to method-agnostic github_request(client, method, url, *, json_body, token) in the canonical backend/app/git/ namespace (per CLAUDE.md repo structure). The polling reconciler (Story 3.1, GET /pulls/{n}) and the register-webhook worker (Story 4.1, GET + POST /hooks) share the retry contract; existing open_pr POST call sites unchanged in behavior. - backend/app/git/__init__.py + github_client.py: github_request + 4 public inspection helpers (parse_retry_after, is_secondary_rate_limit, body_mentions_rate_limit, parse_rate_limit_reset) + 4 retry-policy constants. Method routed via httpx AsyncClient.request(verb, ...). - backend/workers/git_pr.py: drop the inlined _github_post + 4 inspection helpers + 4 retry constants (range 593..690 + 116..124); import from the new module. Replace 2 _github_post() call sites with github_request(client, "POST", ...). 88 LOC removed. - test_git_pr_helpers.py: redirect 8 header-parser tests to the new public symbols via import-as aliases; remove 5 _github_post retry tests now covered by the new test_github_client.py (parametrised over GET + POST). - backend/tests/unit/git/test_github_client.py: 23 new cases via httpx.MockTransport — happy path / 5xx retry / 429 Retry-After / 403 retry-after / 403 secondary-rate-limit / 403 abuse body / 403 terminal / 404 terminal / RequestError retry + propagate / 5xx-budget-exhausted-returns-last / token+version headers — each parametrised over GET + POST. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(github-webhook): POST /webhooks/github router + handler (Story 2.1) POST /webhooks/github mounts unprefixed (per CLAUDE.md Rule #6 webhook exception). 9-step contract: raw body → repository.full_name parse → config_repo lookup → mounted secret read → HMAC verify → dispatch → proposal lookup (mutation branch) → conditional repo mutation → structured webhook_received log + 200 response. - backend/app/git/secrets.py: hoisted read_mounted_secret() from workers/git_pr._read_pat. RELYLOOP_SECRETS_DIR env override; same containment / directory / OSError contract. Consumed by the webhook router AND queued for Stories 3.1 + 4.1. - backend/app/api/webhooks/{__init__,github}.py: new router. Re-exports WEBHOOK_ACTION_VALUES so spec §8.4's grep cite at this path passes. Single error code: INVALID_SIGNATURE (403) — covers signature mismatch AND unknown-repo so the receiver doesn't reveal enumeration. Router-only "unknown_pr" override when lookup_proposal_by_pr_url returns None after a mutation-requesting decision (dispatcher contract: action ∈ {applied, noop, ping}; "unknown_pr" is wire-only). - backend/app/main.py: include_router unprefixed. - backend/tests/integration/test_webhook_github.py: 13 cases covering merged / closed-unmerged / reopened / 5 noop pull_request actions / unknown event type / ping / unknown_pr URL / bad signature / missing signature / unknown repo / structured-log assertion. Consolidated into one file (plan listed 9; one file keeps the shared sign-and-seed helpers honest). - backend/tests/contract/test_webhook_api_contract.py: 4 assertions — OpenAPI registration, WEBHOOK_ACTION_VALUES re-export, single-code negative grep, no-secret-in-log static grep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(github-webhook): reconcile_pr_state cron worker (Story 3.1) Polling reconciler for stale pr_opened proposals (FR-2 + AC-3 + AC-8). The cron tick lists pr_opened+open proposals < 90 days old, calls GET /repos/{o}/{r}/pulls/{n} via the Story 1.5 shared github_request, and applies the matching mark_proposal_pr_* on the response. - backend/workers/pr_reconcile.py: reconcile_pr_state coroutine. Per-proposal: parse pr_url → look up config_repo via lookup_config_repo_by_owner_repo (Story 1.4) → read PAT via read_mounted_secret (Story 2.1 hoist) → GET pulls/{n}. Branches on 200/merged, 200/state=closed, 200/state=open. 401/403/404/5xx and RequestError after retry budget → WARN + skip. 429 → WARN + break the proposal loop for this tick (spec §10). Also exports SUPPORTED_POLL_MINUTES (18 values) + _poll_cron_kwargs() — the divisor-of-60 / multiple-of-60 routing the cron-arg builder uses. - backend/app/core/settings.py: tighten relyloop_pr_poll_minutes via @field_validator referencing SUPPORTED_POLL_MINUTES — operator sees the rejection at boot, not at first cron tick. - backend/workers/all.py: register `cron(reconcile_pr_state, **_poll_cron_kwargs())` on WorkerSettings.cron_jobs. - backend/tests/unit/workers/test_poll_cron_kwargs.py: 20 cases — every value in the whitelist (12 sub-hourly + 6 multi-hour), fallback for unsupported value, size sanity check. - backend/tests/integration/test_polling_reconciler.py: 10 cases via httpx.MockTransport — merged/closed-unmerged/still-open happy paths, 5 terminal-error responses (401/403/404/500/503), RequestError-after-budget, 429-short-circuit, 50-proposal AC-8 budget assertion. - backend/tests/unit/test_workers.py: cron_jobs registration sanity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(github-webhook): register_webhook Arq worker (Story 4.1 / FR-3) Idempotent GitHub webhook creation enqueued by POST /api/v1/config-repos (Story 4.2) for every newly registered config_repo whose webhook_secret_ref is non-NULL. - backend/workers/register_webhook.py: register_webhook coroutine. Steps: load config_repo → validate_repo_url → read PAT + secret → build hook URL from Settings.relyloop_base_url → GET /hooks scan for matching config.url → POST /hooks if absent. Per-class error messages on 404 (PAT scope), 422 (validation), 5xx (transient), RequestError (network) → set_webhook_registration_error + return {"status":"failed"} (never raises into Arq's retry loop — the column is the durable signal). - backend/workers/all.py: register in WorkerSettings.functions. - backend/tests/integration/test_register_webhook_worker.py: 6 AC-mapped cases via httpx.MockTransport — happy create, dedup existing, 404 PAT-scope, 422 validation, 503 transient, network RequestError. Each asserts both the return status AND the webhook_registration_error column shape. - backend/tests/unit/test_workers.py: extend the expected-function set so worker_settings_importable sees register_webhook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(github-webhook): post-commit enqueue register_webhook (Story 4.2) POST /api/v1/config-repos now best-effort enqueues register_webhook (Story 4.1) after the DB commit when the new row has webhook_secret_ref populated. Enqueue failure (Redis down, pool absent, transient blip) does NOT break the 201 — it logs WARN and the operator drives recovery via the runbook. - backend/app/api/v1/config_repos.py: add request: Request, pull the Arq pool via getattr(request.app.state, "arq_pool", None) per the established pattern (proposals.py:516, studies.py:167; the Depends(get_arq_pool) factory doesn't exist in the codebase). Two WARN log events on the failure paths: register_webhook_enqueue_skipped_no_pool and register_webhook_enqueue_failed. - backend/tests/integration/test_config_repos_extension.py: 6 cases — response-shape preservation, enqueue called when secret_ref set, enqueue not called when null, pool absent → 201 + WARN, enqueue raises → 201 + WARN, static-grep against get_arq_pool factory drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(github-webhook): runbook + US-20/21 implemented + tracker (Story 4.3) - docs/03_runbooks/webhook-debugging.md: 7-section operator playbook — inspect GitHub delivery panel, redeliver, log-line verification, rotate webhook secret, force-reconcile a specific proposal, diagnose silent cron, register_webhook per-class error triage + manual re-enqueue. - docs/03_runbooks/README.md: index the new runbook; drop the "Coming with later features" stub for webhook-replay.md (folded into webhook-debugging.md §2 Redeliver). - docs/02_product/mvp1-user-stories.md: US-20 + US-21 flipped to Implemented. - implementation_plan.md: tracker — every story marked complete with commit SHA. state.md + CLAUDE.md flips land in the finalization commit after the PR number is known. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(github-webhook): state.md + CLAUDE.md + tangential idea - state.md: flip current-branch + Alembic head (0005 → 0006) + add recent-changes entry summarizing all 10 stories + tangential capture. - CLAUDE.md: flip feature-status row 8 + redirect "GitHub webhook setup" runbook pointer at the new webhook-debugging.md. - bug_test_smoke_requires_env_vars/idea.md: capture pre-existing failure where test_smoke.py::test_app_import requires DATABASE_URL_FILE + POSTGRES_PASSWORD_FILE env vars. CI passes; only local-dev `pytest` runs hit it. 5-minute fixture fix. - docs/00_overview/MVP1_DASHBOARD.md + mvp1_dashboard.html: regenerated by the mvp1-dashboard-regen pre-commit hook after the row-8 flip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(github-webhook): CI failures + GPT-5.5 review fixes CI cycle 1 fixes: 1. structlog `event=` kwarg conflict (TypeError: BoundLogger.warning() got multiple values for argument 'event') — rename to `gh_event=` in 5 webhook-router log call sites. structlog reserves `event` for the positional event name. 2. Migration head expectation in test_migrations.py: 0005 → 0006 (head advanced when feat_github_webhook Story 1.1 landed 0006). 3. test_digests_migration.py::test_downgrade_drops_digests_table: retarget to explicit `downgrade 0004` (was `-1`, which now drops the 0006 partial index instead of the digests table). GPT-5.5 final-review accepts (cycle 1, 2 of 3 findings): * F2 (High) — non-object JSON payload (`[]`, `"foo"`, `null`) reached `payload.get(...)` → AttributeError → 500. Added `isinstance(parsed_body, dict)` guard with a 403 INVALID_SIGNATURE response (matches the invariant: INVALID_SIGNATURE is the ONLY error code the webhook router emits). New regression test: `test_webhook_non_object_json_returns_403`. * F3 (Medium) — `pull_request.merged=true` with `merged_at=null` (GitHub eventual-consistency) would hit `assert decision.pr_merged_at is not None` → 500. Router now falls back to `mark_proposal_pr_closed` (PR is no longer open; reconciler catches up on merged_at next tick). New regression test: `test_webhook_pr_merged_with_missing_merged_at_does_not_500`. GPT-5.5 F1 (High — "Settings validator + _poll_cron_kwargs fallback are redundant") rejected with cited counter-evidence: plan §11 row 16 explicitly tightens the Settings field validator (Story 3.1 coordinated spec patch). The _poll_cron_kwargs WARN+fallback at pr_reconcile.py:229 is belt-and-suspenders for hot-reload / in-process Settings mutation. Both layers are plan-conformant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(github-webhook): test isolation — unique owner/repo per test (CI cycle 2) Root cause of the polling-reconciler + webhook receiver test failures on CI cycle 1: * ``config_repos`` rows persist across tests (``_clean_phase2_tables`` deliberately doesn't wipe operator-managed config_repos). * Both test files reused the SAME ``(owner, repo)`` across every test (``octocat/configs`` / ``octocat/hello-world``). * After test #1, two ``config_repos`` rows now match the same ``(owner, repo)``. ``lookup_config_repo_by_owner_repo`` returns the FIRST match — test #1's row — whose ``auth_ref`` / ``webhook_secret_ref`` point at test #1's already-deleted ``tmp_path``. * Surfaced as ``pr_reconcile_pat_missing`` (reconciler) and 403 ``INVALID_SIGNATURE`` (webhook receiver) for every test after the first. Fix: each test's fixture now generates a unique ``(owner, repo)`` pair keyed on a random hex suffix. ``_pull_request_body`` was lifted to take ``owner`` and ``repo_name`` as explicit args so it doesn't close over the module-level constants (which no longer exist). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(github-webhook): second migration-head assertion missed in cycle 1 test_migrations.py::TestBaselineMigration::test_round_trip still asserted ``0005``. My cycle-1 fix only updated the comment block without the literal value. Both assertions now check ``0006``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ist poll + missing DoD tests GPT-5.5 final cross-model review on PR #58 returned 5 findings (0 High, 3 Medium, 2 Low). Adjudication: Finding #1 (Medium) — pr_open_error alert missing from header area. REJECTED with cited counter-evidence: the plan's note that the panel alert "duplicates here for visibility" is speculative, not a requirement. The PrPanel placement is action-adjacent (the canonical UX); duplicating the alert in the header produces redundant visual noise without additional information. Finding #2 (Medium) — flip-off effect missing. REJECTED with cited counter-evidence: the derived effectivePollingFlag correctly masks all observable behavior. fireOpenPR's clear-existing-before-new pattern means the safety timer is never orphaned; the 60s timeout always reaches a definite cleanup path. Re- adding the watching useEffect would re-trigger the React 19 react-hooks/set-state-in-effect rule that the cycle-3 patch deliberately avoided. Finding #3 (Low) — list refetchInterval doesn't respect source filter. ACCEPTED: FR-1 says "any VISIBLE row". The refetchInterval predicate now applies the same sourceFilter check as visibleRows so a hidden study-sourced pr_opened+open row doesn't keep the page polling for invisible state. Finding #4 (Low) — list-page tests don't assert wire params after URL update. DEFERRED as chore_proposals_list_wire_param_e2e_test/idea.md. The static useSearchParams mock doesn't re-render on router.replace, so the existing tests assert the intent (router.replace called with the right URL) but not the wire round-trip. Behavior is structurally correct. Finding #5 (Medium) — 60s safety cap test + Open-PR toast contract test missing. ACCEPTED: both were explicit Story 3.2 DoD items that didn't land. - 60s safety cap test: simulates a worker that never writes back; uses vi.useFakeTimers to advance 65s and asserts polling stops (cumulative ticks bounded at ~21, no further ticks in the subsequent 30s window). - Open-PR error-toast contract test: instantiates a custom MutationCache with a synchronous onError spy, fires 3 sequential clicks against msw handlers returning GITHUB_NOT_CONFIGURED, CLUSTER_HAS_NO_CONFIG_REPO, QUEUE_UNAVAILABLE. Errors marked retryable=false so the api-client's 4-attempt 503 backoff doesn't compound test time (retry behavior is exercised by api-client.test.ts). Asserts err.errorCode reaches the global handler for all 3 codes per GPT-5.5 cycle-1 B6. MVP1 dashboard regenerated by pre-commit hook. Test count: 171 (was 169) — 32 files all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oposals_ui) (#58) * docs(feat-proposals-ui): idea-preflight ground-truth pass against shipped deps §2 + §5 rewritten past-tense (all four deps shipped between 2026-05-09 and 2026-05-12); §7.4 Literal paths corrected to backend/app/api/v1/schemas.py (the wire-shape source of truth) instead of backend/db/models/proposal.py; FR-6 extended to preserve the existing useProposalForStudy hook (consumed by feat_studies_ui's study-detail Open-PR button) and to narrow useProposals.filter.status from string to ProposalStatus; FR-2/FR-3 polling cadences disambiguated (3s post-click vs 30s steady-state); §11 added duplicate-Open-PR and concurrent-merge-during-reject edge cases. MVP1 dashboard regenerated by pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(feat-proposals-ui): impl-plan-gen + GPT-5.5 review (3 cycles, 18 findings) Implementation plan generated from the approved feature spec. Single-phase frontend-only feature: 7 stories across 4 epics building two new Next.js routes (/proposals list + /proposals/{id} detail) on top of the already-shipped backend endpoints from feat_digest_proposal, feat_github_pr_worker, and feat_github_webhook. Cross-model review converged in 3 cycles (cap per impl-plan-gen protocol): - Cycle 1: 11 findings (2 High / 6 Medium / 3 Low) — all applied. Most consequential: lift useOpenPR from PrPanel to the page (A1) so the page- level refetchInterval can read isPending; replace 3s-polling-tied-to- isPending with a postOpenPrPolling page-state flag + 60s safety cap (B1); strip ?action=open_pr via router.replace after auto-trigger (B2). - Cycle 2: 4 follow-on findings (2 High / 2 Medium) — all applied. Wired fireOpenPR() helper through both manual click + auto-trigger paths (A1); validated URL ?status= against PROPOSAL_STATUS_VALUES before passing to the now-narrowed useProposals hook (A2); added <Suspense> wrapping for useSearchParams on the detail route (B1); added safety-timer unmount cleanup (B2). - Cycle 3: 3 cleanup findings (0 High / 2 Medium / 1 Low) — all applied. Failure-mode catalog rewritten to match the fixed polling design (A1); destructured stable primitives (mutateOpenPR, proposalStatus, proposalPrOpenError) before effects to keep dep identities stable (B1); UI element inventory aligned with the PrPanel contract (B2). MVP1 dashboard regenerated by pre-commit hook. Plan is Ready for Execution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(proposals-ui): extend lib/api/proposals.ts hook surface (Story 1.1) Adds 3 new TanStack Query hooks + extends 1 existing hook + narrows 1 type to fill the FR-6 contract documented in feat_proposals_ui: - useProposals(filter, options?): narrowed filter.status from string to ProposalStatus (compiler now rejects /proposals?status=invented); accepts an optional refetchInterval (function form) so the list page can fire a 30s pulse-refetch when any visible row is pr_opened+open (FR-1). Bare useProposals(filter) signature preserved. - useProposalForStudy(studyId): unchanged byte-for-byte — consumed by feat_studies_ui's study-detail DigestPanel for the Open-PR button. - useProposal(id, options?): new; accepts refetchInterval function form for FR-2 30s steady-state + FR-3 3s post-click cadences. - useOpenPR(): new; POSTs to /api/v1/proposals/{id}/open_pr; onSettled invalidates ['proposal', id] + ['proposals'] so active queries refetch. - useRejectProposal(): new; POSTs to /reject; same invalidation set delivers the spec §11 "refresh detail on 409" behavior without special-case error-code branching. Test coverage: 9 vitest cases (1 typed-status param, 2 refetchInterval on/off, 1 useProposalForStudy regression, 2 useProposal, 1 useOpenPR invalidation via msw handler hit counts per GPT-5.5 cycle-1 B5, 2 useRejectProposal incl. 409 INVALID_STATE_TRANSITION refresh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(proposals-ui): filter-chip + cluster-select primitives (Story 1.2) Three new presentational components consumed by the /proposals list page: - ProposalStatusFilterChips: mirrors study-status-filter-chips.tsx shape. Wire values sourced from PROPOSAL_STATUS_VALUES (canonical allowlist in ui/src/lib/enums.ts:101 — verified against backend ProposalStatusWire at backend/app/api/v1/schemas.py:659). Unknown URL values fall back to "all" active (per Story 2.1 task 2 — defends the FR-1 chip when the URL is rewritten externally). - ProposalSourceFilterChips: client-only 3-value chip set (all/study/manual). Source is derived from proposal.study_id non-null vs null; values DO NOT belong in enums.ts (no wire contract). Inline comment notes the client-side scope. - ClusterFilterSelect: native <select> populated by useClusters({ limit: 200 }). Disabled while loading. MVP1 assumes <10 clusters per installer; the chore_cluster_filter_full_list idea file will be referenced when that ceiling no longer holds. Test coverage: 7 vitest cases — 4 status-chip render/click/null/unknown + 2 source-chip render/click + 1 cluster-select load/select/clear. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(proposals-ui): /proposals list page + ProposalsTable (Story 2.1) Adds the /proposals route — a Suspense-wrapped Next 16 client component that composes three filter primitives (status chips with URL-backed state, source chips client-side, cluster select) over a cursor-paginated table of proposals. Implements FR-1 in full: - Status filter is URL-backed via useSearchParams + router.replace; cursor stack resets on every status/cluster change. - URL ?status= is validated against PROPOSAL_STATUS_VALUES before being passed to the now-narrowed useProposals hook. /proposals?status=invented is silently ignored (chip group falls back to "all" active) — never sent to the backend, never produces a 422 (GPT-5.5 cycle-2 A2). - Source filter (study / manual / all) is client-side: visibleRows is derived from rows + sourceFilter via useMemo. Backend has no ?source= param; the chore_proposals_source_filter_server_side idea (filed in Story 4.1) tracks the server-side follow-up. - Cluster filter consumes the existing useClusters hook with limit=200. - 30s pulse-refetch fires when ANY visible row is pr_opened+open (FR-1). The function-form refetchInterval reads the latest query state on each tick and returns false when no eligible row remains, so idle pages stop hitting the backend. ProposalsTable renders 7 columns: source (study link or "manual" + view link), cluster name, template name + version, status badge, pr_state badge, metric_delta (parsed from JSONB { primary, baseline, best, delta_pct }), created_at. Empty state and backend-unreachable both render the shared <EmptyState> primitive. Test coverage: 11 vitest cases (6 list page covering AC-6 filter + pagination, invalid-URL-status, client-side source filter, 30s pulse + 5 table render cases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(proposals-ui): /proposals/[id] detail page shell (Story 3.1) Adds the read-only detail page slot: a Suspense-wrapped Next 16 client view that fetches via useProposal and renders five sections — back link, header, optional pr_open_error Alert, config-diff table, metric-delta panel, and (when present) suggested follow-ups. The PR-action panel and Reject dialog slots are reserved for Stories 3.2 and 3.3 respectively. New components: - ProposalHeader — status + pr_state badges, created_at, cluster/template/ study inline summary mirroring studies/study-header.tsx spacing. - ProposalErrorAlert — styled <div role="alert"> for FR-2 (only renders when status='pending' AND pr_open_error populated; defensive against stale errors after the worker advances to pr_opened). - ConfigDiffPanel — sortable 3-column table over the JSONB diff dict, handling both the canonical [before, after] tuple shape from feat_digest_proposal AND flat values for manual proposals. - SuggestedFollowupsPanel — bulleted list with per-item "Create study from this hypothesis" Link to /studies?hypothesis=<encoded> (FR-5). The detail page itself is wrapped in <Suspense> per GPT-5.5 cycle-2 B1 because Story 3.2 adds useSearchParams() to the inner view; landing the boundary in Story 3.1 keeps the page shape stable across stories. Test coverage: 6 detail-page tests covering the happy path, AC-4 pr_open_error alert, defensive-no-alert when status=pr_opened, 404 PROPOSAL_NOT_FOUND empty state, null-metric-delta fallback, and null-digest skip-followups branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(proposals-ui): PR panel + page-owned useOpenPR with postOpenPrPolling (Story 3.2) Closes the FR-2 + FR-3 + auto-trigger loop on /proposals/{id}. Page-side wiring (ui/src/app/proposals/[id]/page.tsx): - useOpenPR is now called in the page, not in PrPanel. Mutation passed as (mutate, isPending) props. The lift is load-bearing because the page- level refetchInterval and the ?action=open_pr auto-trigger both need to read the same mutation instance (GPT-5.5 cycle-1 A1). - A fireOpenPR helper wraps openPr.mutate's onSuccess to flip a postOpenPrPolling state flag ON and install a 60s safety setTimeout. Both the manual click path AND the auto-trigger path go through this helper — no direct openPr.mutate calls remain (GPT-5.5 cycle-2 A1). - The 3s polling cadence is keyed on the postOpenPrPolling flag AND the latest p.status/p.pr_open_error inside refetchInterval. The flag flips OFF only via the 60s timer or unmount; the cadence flips off inline inside the refetchInterval function when status leaves pending or pr_open_error is populated. This avoids the React 19 react-hooks/set-state-in-effect rule (correctly flags setState-in- watching-effect as cascading-render smell) while preserving the spec FR-2/FR-3 behavior. - effectivePollingFlag = postOpenPrPolling && pending && !pr_open_error is the derived button-disable signal passed to PrPanel — masks lingering true state when conditions warrant so the operator can re-click retry. - Unmount cleanup useEffect clears the safety timer (GPT-5.5 cycle-2 B2). - ?action=open_pr auto-trigger fires fireOpenPR once per mount, guarded by a useRef boolean. After firing, router.replace strips the action param so a remount/back-nav with the same URL does NOT re-fire (GPT-5.5 cycle-1 B2). PrPanel component: - 4 state branches: pending (Open-PR button + optional inline error Alert), pr_opened (PR link target=_blank + pr_state badge), pr_merged (link + merged-at), rejected (rejected_reason). - Button HIDDEN on any non-pending status per FR-3. - Button disabled + "Opening PR..." while openPrIsPending; remains visible to give the operator a stable anchor through worker-wait (GPT-5.5 cycle-3 B2). - pr_open_error Alert moved from the page-level header (Story 3.1) into the panel for action-adjacent visibility (deduplicates the alert). Test coverage: 6 pr-panel state-machine tests + 4 new page-level tests covering AC-1 (click → 3s poll → status flips → button gone), AC-3 30s steady-state, auto-trigger fires + URL-strips, auto-trigger no-op when already pr_opened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(proposals-ui): Reject confirm dialog (Story 3.3) Wraps up FR-4 on /proposals/{id}. Adds a destructive Reject button next to PrPanel (rendered only when status='pending') that opens an AlertDialog with a 0-500 char reason textarea. Critical details (all per the GPT-5.5 cycle-1 + cycle-2 review patches): - AlertDialogAction's onClick calls event.preventDefault() so the dialog stays open during the in-flight POST. Without preventDefault, Radix's default closes the dialog as soon as the user clicks confirm, leaving them with no visible state during the round-trip. - The textarea + AlertDialogAction + AlertDialogCancel all carry `disabled={reject.isPending}` so double-clicks during the in-flight POST cannot double-submit. Button label flips "Reject proposal" → "Rejecting…" during that window. - onSuccess closes the dialog and fires a success toast. NO per-mutation onError — the global MutationCache.onError in QueryProvider handles the 9 known error codes (409 INVALID_STATE_TRANSITION being the spec §11 concurrent-merge race we care about most). The mutation's onSettled in Story 1.1 invalidates ['proposal', id] + ['proposals'], so on 409 the detail query refetches automatically and the operator sees the now-pr_merged state. - The Reject button + dialog disappear after success because the refetched proposal has status='rejected', which trips the early `if (proposal.status !== 'pending') return null` in RejectDialog. Test coverage: 4 reject-dialog cases — null when status!=pending, AC-2 happy path with reason, preventDefault-keeps-open + disabled- prevents-double-submit, spec §11 concurrent-merge 409 (mounts the detail view with the real QueryProvider so the global toast handler fires, asserts msw GET handler called twice for the refetch, asserts UI rerenders to pr_merged and Reject button vanishes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(proposals-ui): mark US-28/29 Implemented + ui-debugging section + source-filter idea (Story 4.1) Wraps up feat_proposals_ui's documentation workstream: - docs/02_product/mvp1-user-stories.md: US-28 + US-29 flipped to Implemented with the feat_proposals_ui annotation, matching the format used for US-13/14/15/16/17/etc. - docs/03_runbooks/ui-debugging.md: appended a "Proposals routes" section covering the hook surface (5 hooks with their purposes), the polling cadence ladder on /proposals/[id] (3s / 30s / off), the ?action=open_pr auto-trigger semantics, the URL ?status= validation behavior, the client-side source-filter pagination caveat (cross-references the idea file below), the 5 common error codes the UI surfaces as toasts, and the Reject dialog won't-close-during-flight pattern. - docs/02_product/planned_features/chore_proposals_source_filter_server_side/idea.md: captures the deferred backend ?source=study|manual follow-up. Origin pointer is feat_proposals_ui's plan §6 risks row + the GPT-5.5 cycle-1 A5 finding. Acceptable for MVP1 (<50 proposals/page); the fix is small (~10 LOC backend + repo + contract test + drop the frontend client-side filter). MVP1 dashboard regenerated by pre-commit hook. state.md / architecture.md / CLAUDE.md updates land in the impl-execute finalization commit per Story 4.1's DoD (Task 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture 1 in-flight-noticed issue — proposals-page useMemo dep warning One tangential noticing surfaced during the feat_proposals_ui post-implementation sweep: the visibleRows useMemo in /proposals/page.tsx has a wasted-memoization warning because rows = query.data?.data ?? [] creates a fresh array identity each render. Cosmetic only (the filter runs over <=200 rows per page, sub-ms cost), but worth tracking so a future polish pass picks it up. Captured per CLAUDE.md tangential-discoveries rule. MVP1 dashboard regenerated by pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proposals-ui): GPT-5.5 final-review patches — visibleRows-aware list poll + missing DoD tests GPT-5.5 final cross-model review on PR #58 returned 5 findings (0 High, 3 Medium, 2 Low). Adjudication: Finding #1 (Medium) — pr_open_error alert missing from header area. REJECTED with cited counter-evidence: the plan's note that the panel alert "duplicates here for visibility" is speculative, not a requirement. The PrPanel placement is action-adjacent (the canonical UX); duplicating the alert in the header produces redundant visual noise without additional information. Finding #2 (Medium) — flip-off effect missing. REJECTED with cited counter-evidence: the derived effectivePollingFlag correctly masks all observable behavior. fireOpenPR's clear-existing-before-new pattern means the safety timer is never orphaned; the 60s timeout always reaches a definite cleanup path. Re- adding the watching useEffect would re-trigger the React 19 react-hooks/set-state-in-effect rule that the cycle-3 patch deliberately avoided. Finding #3 (Low) — list refetchInterval doesn't respect source filter. ACCEPTED: FR-1 says "any VISIBLE row". The refetchInterval predicate now applies the same sourceFilter check as visibleRows so a hidden study-sourced pr_opened+open row doesn't keep the page polling for invisible state. Finding #4 (Low) — list-page tests don't assert wire params after URL update. DEFERRED as chore_proposals_list_wire_param_e2e_test/idea.md. The static useSearchParams mock doesn't re-render on router.replace, so the existing tests assert the intent (router.replace called with the right URL) but not the wire round-trip. Behavior is structurally correct. Finding #5 (Medium) — 60s safety cap test + Open-PR toast contract test missing. ACCEPTED: both were explicit Story 3.2 DoD items that didn't land. - 60s safety cap test: simulates a worker that never writes back; uses vi.useFakeTimers to advance 65s and asserts polling stops (cumulative ticks bounded at ~21, no further ticks in the subsequent 30s window). - Open-PR error-toast contract test: instantiates a custom MutationCache with a synchronous onError spy, fires 3 sequential clicks against msw handlers returning GITHUB_NOT_CONFIGURED, CLUSTER_HAS_NO_CONFIG_REPO, QUEUE_UNAVAILABLE. Errors marked retryable=false so the api-client's 4-attempt 503 backoff doesn't compound test time (retry behavior is exercised by api-client.test.ts). Asserts err.errorCode reaches the global handler for all 3 codes per GPT-5.5 cycle-1 B6. MVP1 dashboard regenerated by pre-commit hook. Test count: 171 (was 169) — 32 files all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(feat-proposals-ui): refresh pipeline_status.md after final review (PR #58 open, awaiting merge) Updates the status file to reflect what's landed on the feature branch: - Plan: 3 GPT-5.5 cycles, 18 findings all applied - Implement: 7 stories landed, CI green, final review 5 findings adjudicated - Done: pending merge MVP1 dashboard regenerated by pre-commit hook. Post-merge re-invocation of /pipeline will run the finalization commit (folder move, state.md entry, CLAUDE.md table flip, architecture.md listing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Finding #1 (Accept): clarify "§123" → "line 123" in bug_fix.md to remove ambiguity between section references (§AC-7) and line-number references. - Finding #2 (Defer): captured as chore_digest_worker_narrow_except — the worker's broad except Exception at digest.py:538-546 is what enabled the PR #92 bug to stay silent for 2 days; narrowing the exception surface has its own design fork (which exceptions are expected? hard-fail vs. soft-fail-with-signal? rollout survey for currently-silent digests?) and is separate from this PR's dep fix scope. - Finding #3 (Reject): bug_fix.md §"Fix design (locked decisions) #4" already documents that test_digest_generate.py:65 cannot be tightened because the happy-path test seeds 0 Optuna trials, so {} is the correct branch there. Gemini didn't read past the heading. Dashboard regenerated by the mvp1-dashboard-regen pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#92) * fix(deps): add scikit-learn for Optuna's FanovaImportanceEvaluator (bug_digest_param_importance_seam) `optuna.importance.get_param_importances()` defaults to FanovaImportanceEvaluator which `import sklearn` internally. The api/worker image had no scikit-learn, so every digest's `parameter_importance` was silently `{}` — the worker's broad `except Exception` at `digest.py:538-546` caught the ImportError and returned an empty map. The misleading `assert parameter_importance is not None` at `test_digest_generate.py:65` passed because `{}` is not None, and the rigorous AC-7 test was `xfail`-marked, so CI was green while the contract was broken. The original idea misdiagnosed this as a fixture seam (engine/pool isolation or sampler/pruner mismatch). Test logs prove `load_if_exists=True` correctly resolves the worker's separately-constructed Optuna handle to the same RDB row the test seeded; the seam works. The structlog WARN at digest.py:540 explicitly emitted `error_type=ImportError` — the diagnosis just needed reading. Changes: - pyproject.toml: add `scikit-learn>=1.4` with comment pointing at bug_fix.md. - uv.lock: pulls scikit-learn 1.8.0 + scipy + joblib + threadpoolctl (~30-40MB image growth; CI's `uv sync --frozen` picks it up automatically). - test_digest_parameter_importance.py: remove the `pytest.mark.xfail` marker (test now passes; the previously-xfail'd test is the AC-7 regression guard). - bug_fix.md: documents triage, root cause, locked design forks, rollout. The worker's broad `except Exception` stays — it remains useful for genuine Optuna edge cases (zero-completed-trial studies, schema issues). The fix removes the ImportError condition entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): adjudicate Gemini PR #92 findings — accept #1, defer #2 - Finding #1 (Accept): clarify "§123" → "line 123" in bug_fix.md to remove ambiguity between section references (§AC-7) and line-number references. - Finding #2 (Defer): captured as chore_digest_worker_narrow_except — the worker's broad except Exception at digest.py:538-546 is what enabled the PR #92 bug to stay silent for 2 days; narrowing the exception surface has its own design fork (which exceptions are expected? hard-fail vs. soft-fail-with-signal? rollout survey for currently-silent digests?) and is separate from this PR's dep fix scope. - Finding #3 (Reject): bug_fix.md §"Fix design (locked decisions) #4" already documents that test_digest_generate.py:65 cannot be tightened because the happy-path test seeds 0 Optuna trials, so {} is the correct branch there. Gemini didn't read past the heading. Dashboard regenerated by the mvp1-dashboard-regen pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…emini #2/3/4) Accepted findings from Gemini Code Assist on PR #94: - --diff-filter=AM -> ACMR. High-similarity `git mv foo.txt .env.old` is reported as Renamed by git, not Added; AM alone would miss it. ACMR (Added, Copied, Modified, Renamed) closes the rename/copy bypass. - Replace unquoted ${FORBIDDEN} word-split with `sed 's/^/ - /'`. The regex's `[^/]+` permits spaces in `.env.<...>` filenames; the previous word-splitting approach would mis-split on space. The shellcheck SC2086 disable is no longer needed and is removed. - New regression case `.env.foo bar.bak` proves the space-handling fix. All 16 cases pass. Rejected finding #1 (Gemini High): claimed pr.yml workflow update was missing. Counter-evidence: `gh pr view 94 --json files` shows .github/workflows/pr.yml with +22/-0 in this PR (the secrets-files-guard job). Gemini reviewed hunks without seeing the workflow change in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Accepted GPT-5.5 final-review finding #1: the implicit `git fetch origin <base>` does not deterministically create refs/remotes/origin/<base>; under some actions/checkout modes (especially forked PRs) the diff would silently fail to enumerate changes and the guard would no-op. Replace with an explicit refspec and add an `origin/<base>` rev-parse check that exits 2 with a clear error if resolution fails — refusing to silently skip the guard. CI runs prior to this fix passed because actions/checkout's fetch-depth=50 already populated origin/main in the typical case; the fix removes the brittleness rather than fixing a current-observable bug. Deferred GPT-5.5 #2 (workflow paths-ignore inheritance): already captured as chore_ci_gitignore_paths_ignore_gap. Counter-evidence: bug_fix.md Tangential Observations §1. Deferred GPT-5.5 #3 (regex misses .env-old / .env_bak / .env~): captured as a new chore_env_guard_extend_deny_pattern idea file. Not a regression of this PR — the gitignore and the guard are currently consistent (both dotted-only); broadening one requires broadening the other, and that paired change deserves its own focused PR. Includes MVP1 dashboard regen for the new chore_env_guard_extend_deny_pattern folder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e_corrupted_during_session) (#94) * fix(ci): filename-pattern guard for .env* files Adds scripts/ci/check-no-env-files.sh + a self-test driver + a new secrets-files-guard job in pr.yml. Fails any PR that adds or modifies a .env-family file other than .env.example, regardless of file content. Defense-in-depth for bug_env_file_corrupted_during_session: catches the near-miss where a rotated .env.old slipped past .gitignore (or where .gitignore is regressed / git add -f is used). Gitleaks in pre-commit is content-based and would pass an empty .env.old; this guard is filename- based and catches the structural signal. Pure bash, no python/pytest dependency. 15 canned regression cases run as a workflow step before the real-diff check, so a broken regex fails immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(bug): bug_fix.md for bug_env_file_corrupted_during_session + 2 tangential idea files bug_fix.md documents the locked design for the secrets-files-guard job shipped in the previous commit: detection mechanism, regex shape, failure mode, position in the workflow, and the open question on what local tool originally renamed .env -> .env.old (deferred — cannot reproduce 4 days post-incident). Tangential follow-ups captured per CLAUDE.md "Tangential discoveries": - chore_ci_gitignore_paths_ignore_gap — pr.yml paths-ignore skips the whole workflow on .gitignore-only PRs, undermining the guard. - chore_ci_gitleaks_workflow_step — gitleaks runs only in pre-commit locally; CI should also content-scan to complement the filename guard. Includes MVP1 dashboard regen for the new planned-feature folders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(ci): add shellcheck SC2086 disable for intentional word-split The unquoted ${FORBIDDEN} expansion in check-no-env-files.sh is deliberate — newline-splitting drives printf format-reuse so each forbidden file prints on its own bullet line. Quoting would collapse the output to a single line. Annotate the intent so future shellcheck runs don't re-flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): widen guard to catch renames + handle spaces in filenames (Gemini #2/3/4) Accepted findings from Gemini Code Assist on PR #94: - --diff-filter=AM -> ACMR. High-similarity `git mv foo.txt .env.old` is reported as Renamed by git, not Added; AM alone would miss it. ACMR (Added, Copied, Modified, Renamed) closes the rename/copy bypass. - Replace unquoted ${FORBIDDEN} word-split with `sed 's/^/ - /'`. The regex's `[^/]+` permits spaces in `.env.<...>` filenames; the previous word-splitting approach would mis-split on space. The shellcheck SC2086 disable is no longer needed and is removed. - New regression case `.env.foo bar.bak` proves the space-handling fix. All 16 cases pass. Rejected finding #1 (Gemini High): claimed pr.yml workflow update was missing. Counter-evidence: `gh pr view 94 --json files` shows .github/workflows/pr.yml with +22/-0 in this PR (the secrets-files-guard job). Gemini reviewed hunks without seeing the workflow change in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): explicit refspec for origin/<base> resolution (GPT-5.5 #1) Accepted GPT-5.5 final-review finding #1: the implicit `git fetch origin <base>` does not deterministically create refs/remotes/origin/<base>; under some actions/checkout modes (especially forked PRs) the diff would silently fail to enumerate changes and the guard would no-op. Replace with an explicit refspec and add an `origin/<base>` rev-parse check that exits 2 with a clear error if resolution fails — refusing to silently skip the guard. CI runs prior to this fix passed because actions/checkout's fetch-depth=50 already populated origin/main in the typical case; the fix removes the brittleness rather than fixing a current-observable bug. Deferred GPT-5.5 #2 (workflow paths-ignore inheritance): already captured as chore_ci_gitignore_paths_ignore_gap. Counter-evidence: bug_fix.md Tangential Observations §1. Deferred GPT-5.5 #3 (regex misses .env-old / .env_bak / .env~): captured as a new chore_env_guard_extend_deny_pattern idea file. Not a regression of this PR — the gitignore and the guard are currently consistent (both dotted-only); broadening one requires broadening the other, and that paired change deserves its own focused PR. Includes MVP1 dashboard regen for the new chore_env_guard_extend_deny_pattern folder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Gemini #1/#2) Accepted Gemini findings on PR #95: #1 (Medium): the --ship sub-section listed "Steps 0a → 9" but the gate described success as "PR open + CI green + reviews adjudicated" (Step 7). Step 9 is post-merge and runs on a SUBSEQUENT invocation after the human merges, not as part of the chained call. Truncate the inline list to 0a–7 and call out Step 9's timing explicitly so the agent doesn't poll/sleep waiting for a merge it can't perform. #2 (Medium): "When --ship should NOT be used" listed three cases but the trailing instruction said "ask for explicit confirmation for any of these" — contradicts investigation mode's silent-ignore rule (there's nothing to ship; no confirmation makes sense). Scope the confirmation prompt to the latter two cases only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…95) * chore(skills): add --ship flag to /bug-fix for one-shot bug-to-merge Today's bug_env_file_corrupted_during_session work surfaced friction in the two-step flow: /bug-fix terminates at Phase 6 recommending the user type /impl-execute --ad-hoc as a separate invocation. The two invocations are one logical task; making the user re-engage adds overhead with no design value. Add an opt-in --ship flag that chains Phase 6 directly into /impl-execute --ad-hoc via the Skill tool (same pattern Phase 1 already uses to invoke /idea-preflight on stale ideas). Default behavior is preserved — bugs where you want to eyeball bug_fix.md before pushing, or release-blocking bugs that warrant a human sanity check, still get the natural pause point. Skill changes: - Frontmatter: argument-hint now lists --ship; description and pipeline-role mention it; allowed-tools adds Skill (was missing even though Phase 1 already invokes /idea-preflight via the Skill tool — the omission was a latent bug, not introduced here). - Phase 6 split into "Default" and "--ship" sub-sections. Default matches the old behavior plus a one-line nudge that --ship exists. --ship sub-section enumerates the impl-execute ad-hoc steps that run (Steps 0a, 0b.1, 2.5, 3, 4, 5, 6, 7, 9) so the reader knows what they're opting into without reading impl-execute SKILL.md. - Modes section: new --ship entry; Investigation mode explicitly notes --ship is silently ignored (nothing to ship). - "What this skill does NOT do": clarified that Auto-merge is still off — the human clicks the merge button even in --ship mode. - Termination section: new --ship-success terminator string. No behavioral change in default mode. No new skill — same Skill tool pattern as the existing /idea-preflight invocation in Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): correct --ship gate scope + investigation-mode wording (Gemini #1/#2) Accepted Gemini findings on PR #95: #1 (Medium): the --ship sub-section listed "Steps 0a → 9" but the gate described success as "PR open + CI green + reviews adjudicated" (Step 7). Step 9 is post-merge and runs on a SUBSEQUENT invocation after the human merges, not as part of the chained call. Truncate the inline list to 0a–7 and call out Step 9's timing explicitly so the agent doesn't poll/sleep waiting for a merge it can't perform. #2 (Medium): "When --ship should NOT be used" listed three cases but the trailing instruction said "ask for explicit confirmation for any of these" — contradicts investigation mode's silent-ignore rule (there's nothing to ship; no confirmation makes sense). Scope the confirmation prompt to the latter two cases only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Total run drops from ~7m20s to an expected ~5m45s by addressing the three highest-impact-per-risk targets identified from per-step timing on PRs #94 / #95: 1. **Drop `needs: [backend, frontend]` from the docker job** (~65s). buildx tests whether the Dockerfile builds; it does not consume test artifacts. The dependency was defensive, not necessary. With it removed, docker runs in parallel with backend/frontend instead of being serialized after them — the critical path was previously backend (6m06s) → docker (1m05s); now it's just backend. 2. **Cache .mypy_cache + .ruff_cache across runs** (~20s). Both tools invalidate per-file internally — the cache directory is safe to persist. Cold mypy is ~27s today; warm typically ~5s. Keyed on uv.lock + pyproject.toml (dep/tool-config changes); a restore-keys fallback to the runner-OS prefix means partial hits when only one input changes. 3. **Lower ES + OpenSearch heap from 512m → 256m** (~10-15s). The CI integration-test workload uses small synthetic indexes without concurrency; 256m is amply sufficient. Halving the heap makes container startup + health-check completion faster. Risk profile: all three changes are low-risk additive YAML edits. - #1 (docker parallel): zero risk — buildx never depended on test outcomes. - #2 (cache step): cache-miss path is identical to today (cold run). - #3 (heap halving): only risk is OOM during a test; if observed, trivially revert this single line. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#1) Accepted Gemini Medium finding on PR #99: the `.gitignore` entry `.env~` only ignored the literal filename, but the CI deny regex `\.env([._~-][^/]*)?` catches any tilde-suffixed variant (`.env~`, `.env~bak`, `.env~1`, etc.). Without the trailing wildcard, a vim or emacs backup variant like `.env~1` would pass `git status` cleanly locally but be rejected by CI — confusing operator experience. Fix: `.env~` → `.env~*` (matches `.env~` itself plus any suffix). New regression case `.env~bak` locks the parity in (23 cases now; all pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oader regex (#99) * infra(ci): extract secrets-files-guard to own workflow + add gitleaks (chore_ci_gitignore_paths_ignore_gap + chore_ci_gitleaks_workflow_step) Two CI defense-in-depth chores resolved together because they share a workflow file: **chore_ci_gitignore_paths_ignore_gap:** the `secrets-files-guard` job was inheriting pr.yml's `paths-ignore` filter, so a PR that ONLY touched `.gitignore` or `docs/**` would skip the entire workflow and the guard would never run. Failure mode: a malicious or accidental `.gitignore` regression removing `.env.old` slips past CI; subsequent PR adding `.env.old` under `docs/` also slips past. Fix: split secrets-files-guard out to `.github/workflows/secrets-defense.yml` with no `paths-ignore` so it triggers on every PR + push to main regardless of which paths changed. pr.yml retains a pointer comment so future maintainers find the surface. **chore_ci_gitleaks_workflow_step:** gitleaks was configured in `.pre-commit-config.yaml` v8.21.2 locally but never invoked from CI. Pre-commit is opt-in — a contributor without `pre-commit install` had no content scan at all. Filename guard catches `.env*` patterns; gitleaks catches `OPENAI_API_KEY=sk-...` content in any path. They're complementary, not redundant. Fix: new `gitleaks` job in the same secrets-defense workflow. Uses the official binary release at v8.21.2 (pinned to match pre-commit), NOT gitleaks-action@v2 — the action requires a paid GITLEAKS_LICENSE for private/org repos (SoundMindsAI/relyloop is private per `gh repo view --json visibility`). Binary install is license-free. Both jobs run in parallel, ≤30s each. Off the critical path entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): broaden env-guard deny regex to match non-dotted backups (chore_env_guard_extend_deny_pattern) Extends the .env filename guard from matching only dotted `.env.<x>` variants to also match `.env-<x>`, `.env_<x>`, `.env~` (vim/emacs backup suffix). Parallel `.gitignore` update so the local-git defense and the CI defense agree on the same pattern. Source: deferred GPT-5.5 Low-severity finding #3 on PR #94. Closed as a follow-up because the gitignore and the guard were both dotted- only originally — broadening required updating both in lockstep, which deserved its own focused PR. **Bug found while broadening: silent regex range trap.** The original draft used `[._\-~]` as the separator class, expecting it to mean "literal `.`, `_`, `-`, `~`". POSIX regex character classes treat `-` as a range operator between two characters; `[._\-~]` parses as "literal `.`, `_`, then RANGE from `\` (92) to `~` (126)". That range matches every letter but does NOT match a literal `-`. So `.env-old` SILENTLY did not match the supposedly-broader pattern, while `.environment.yml` did (because `i` is in the range). Caught by the new regression cases for `.env-old` and `.environment.yml` running together — without both, neither failure mode is visible. Fix: put `-` at the end of the bracket expression (`[._~-]`) where it's unambiguously a literal. Added an explicit code comment explaining the trap so future maintainers don't re-introduce it. 6 new regression cases (4 deny, 2 allow): `.env-old`, `.env_bak`, `.env~`, `backend/.env-prod`, `.envoy.conf`, `.env3.yml`. All 22 cases green; cold local run completes in <0.5s. `.gitignore` extended with broad globs `.env.*`, `.env-*`, `.env_*`, `.env~` to mirror the guard. Canonical incident spellings kept as explicit entries with a comment pointing maintainers at the documentation reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gitignore): .env~ → .env~* glob for parity with deny regex (Gemini #1) Accepted Gemini Medium finding on PR #99: the `.gitignore` entry `.env~` only ignored the literal filename, but the CI deny regex `\.env([._~-][^/]*)?` catches any tilde-suffixed variant (`.env~`, `.env~bak`, `.env~1`, etc.). Without the trailing wildcard, a vim or emacs backup variant like `.env~1` would pass `git status` cleanly locally but be rejected by CI — confusing operator experience. Fix: `.env~` → `.env~*` (matches `.env~` itself plus any suffix). New regression case `.env~bak` locks the parity in (23 cases now; all pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ings Three fixes bundled: 1. CI flake on 'f keyboard shortcut toggles fullscreen' — the test fired keyDown before metadata had loaded. The handler bailed early under the prior `if (!open || total === 0) return` guard, so the 'f' key silently dropped. CI is slower than local so the race exposed there only. Two-part fix: - Decoupled fullscreen from `total > 0` in the keydown effect (fullscreen is viewer chrome — works during loading too). ArrowLeft/Right slide nav still gated on total > 0. - Test now waits for `guide-slide-image` (only renders post-load) before pressing keys. 2. Gemini #1 (medium) — guidesForPath prefix-match bug: `pathname.startsWith('/proposals')` would also match a hypothetical `/proposals-archive`. Changed to full path-segment match (`pathname === g.prefix || pathname.startsWith(g.prefix + '/')`). ACCEPTED. 3. Gemini #2 (medium) — DialogDescription showed "Loading walkthrough screenshots…" even on fetch error, which is misleading. Changed error state to render empty description; loading and loaded paths unchanged. ACCEPTED. Gemini #3 (medium, aspect-ratio for layout stability) — REJECTED with cited counter-evidence: the image container already takes all remaining vertical space via `flex-1 min-h-0`, and the inner `<img>` uses `object-contain` which preserves the 1440×960 aspect of every captured screenshot. Forcing `aspect-[3/2]` on the container would constrain its width based on height and break the responsive flex-grow layout (particularly in fullscreen + on narrow viewports where the dialog height drives the layout, not width). Quality: vitest 249 / 249 green; lint 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): ship in-app walkthrough guides (GuideViewer + 2 guides) Adapts the creator-discovery-outreach walkthrough-guide pattern for RelyLoop. Every guide is a Playwright-captured screenshot deck served in-app via the new <GuideViewer> modal and surfaced contextually by the floating <GuideTrigger> button. What ships: ui/src/components/guides/ guide-types.ts GUIDE_REGISTRY + GUIDE_MAP route bindings guide-viewer.tsx Radix Dialog slideshow (prev/next + arrow keys + caption rendering, fetches metadata.json from /guides/<id>/metadata.json) guide-trigger.tsx Floating "?" button (bottom-right) that opens the contextual guide for the current route; popover picker when multiple match ui/src/app/guide/page.tsx Catalog page listing every registered guide ui/playwright.demo.config.ts Slow-mo + 1440x960 viewport + video config for screenshot capture, separate from the regression e2e config which now excludes tests/e2e/guides/** ui/tests/e2e/guides/ 01_register_first_cluster.spec.ts 02_review_a_proposal.spec.ts ui/public/guides/ 01_register_first_cluster/ 01-clusters-list.png ... 05-cluster-detail.png (5 screenshots) metadata.json (title + description + per-slide captions) script.md (narrative + reference links) 02_review_a_proposal/ 01-proposals-list.png ... 05-reject-dialog.png (5 screenshots) metadata.json script.md ui/src/__tests__/components/guides/ guide-viewer.test.tsx (5 cases — render, nav, keys, error, no-fetch-when-closed) guide-trigger.test.tsx (4 cases — visibility + path-prefix matching) guide-registry.test.ts (3 cases — metadata.json ↔ registry parity) Updates: .claude/skills/guide-gen/SKILL.md drop DORMANT, swap web/→ui/, refresh inventory docs/08_guides/README.md document the convention + workflow ui/src/app/layout.tsx mount <GuideTrigger /> globally ui/playwright.config.ts testIgnore guides/** in regression runs Quality gates: vitest 231 (12 new + 219 existing) green; regression e2e 25 green; lint 0 errors; format clean; Next.js build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): expand walkthrough guide coverage 2 → 8 guides Ships 6 new walkthrough guides covering every UI-visible workflow that can be captured deterministically without an LLM. Pairs with guides 01 and 02 from the previous commit for a full inventory of the operator journey. New guides: 03_create_query_template /templates — Jinja2 body + declared params + fork-to-v2 04_create_query_set /query-sets — bulk-load queries via JSON or CSV 05_import_judgments_and_calibrate /judgments — import path + Cohen's + linear-weighted κ 06_create_and_monitor_study /studies — create + watch trials + cancel 07_browse_proposals /proposals — three-axis filters + 30s pulse-refetch 08_chat_shell /chat — list + new + secrets banner (no streaming) Plus: - GUIDE_REGISTRY: 2 → 8 entries - GUIDE_MAP: 2 → 8 entries; /proposals now has two guides (browse + review), so the GuideTrigger switches to popover-picker mode there - guide-trigger.test.tsx: adds a picker-mode case - guide-registry.test.ts: asserts every metadata.json ↔ registry pair, every GUIDE_MAP id is resolvable, and prefix matching works - docs/08_guides/README.md + .claude/skills/guide-gen/SKILL.md inventories list all 8 shipped guides + the 3 workflows blocked on LLM mocking / operator credentials (guide-able later) Workflows explicitly NOT covered by a guide (called out in both inventories so contributors don't re-litigate the scope): - LLM judgment generation (POST /api/v1/judgments/generate) — needs LLM mock or live endpoint; import path (guide 05) is the offline substitute - Chat with agent tool dispatch (C2 / E1-E3) — same blocker - Open PR end-to-end (worker side) — needs registered config_repo + real PAT; trigger captured in guide 02 Quality gates: vitest 232 (13 guide cases + 219 existing) green; regression e2e 25 green; lint 0 errors; format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): ship 2 LLM-driven guides + Guides nav item + re-capture all 10 Closes the previously-marked-as-blocked workflows by running real LLM calls against the hosted OpenAI endpoint (no mocking, per user direction). All 10 guides now share the same nav chrome — including a new "Guides" top-nav item linking to the /guide catalog page. What ships: 09_generate_judgments_llm /query-sets, /judgments UI form interaction up to template-dropdown-open; generation triggered via direct API call to sidestep a Radix Select issue where many seeded templates push options outside the viewport. Worker hits OpenAI per (query, doc) pair; ~$0.02 per run with gpt-4o-mini. 10_chat_with_agent /chat Real LLM round-trip end-to-end: send a prompt, watch the list_clusters tool dispatch (tool-call + tool-result cards), then the assistant's streaming final response. ~$0.01-0.03 per run. Guides nav item New top-nav entry → /guide catalog GUIDE_REGISTRY 2 → 10 entries GUIDE_MAP 8 → 12 route bindings (chat, query-sets, judgments now have multiple registered guides) All 10 guides re-captured against the rebuilt UI image so every screenshot shows the consistent nav chrome including "Guides". Failure modes hit + resolved during capture: * Wrong button selector for "Generate new judgment list" (caught by matching by data-testid rather than role+name) * composer-send stays disabled when input is empty even after the SSE 'done' fires — switched the done-signal to composer-input (which only disables on streaming=true) * Radix Select template picker portal renders options outside the 1440×960 viewport when 100+ templates are seeded; keyboard nav times out — restructured to drive the form via UI for the interaction screenshots and trigger generation via API for the worker-state screenshots * guide-08 description drifted between GUIDE_REGISTRY and metadata.json when I updated the cross-reference; parity test caught it before push Quality gates: vitest 232 (all guide cases + 220 existing) green; regression e2e 25 green; lint 0 errors; format clean; demo capture 10/10 in ~70s wall-clock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): COPY public/ in Dockerfile runtime stage so guide assets ship Next.js standalone output mode does NOT include `public/` in the runtime image — Vercel's docs note this explicitly. The Dockerfile's runtime stage skipped the COPY with a TODO comment saying "ui/ has no public assets; if any are added later, re-add this line." That comment was correct at write time (chore_tutorial_polish Story 2.2) but went stale the moment guide 01 shipped — every guide PNG + metadata.json + script.md was on the host but NOT in the runtime image. Symptom: navigating to /guide and clicking any card fired GET /guides/<id>/metadata.json → 404. The GuideViewer's fetch hook surfaced the error inline; no infinite loop, just a confusingly long React render-commit stack on the error transition. Fix is one line: COPY /app/public ./public from the builder stage so Next.js can serve static assets from /guides/{id}/* at runtime. Verified locally — guide 03 + guide 10 PNGs + metadata.json all return 200 after rebuild. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): make GuideViewer mobile-friendly + fullscreen + low-vision Mobile audit revealed the original viewer was unusable on small screens: - max-w-4xl (1024px) dialog sprawls past phone viewports - 1440×960 source screenshots scaled to ~340px on iPhone → form-field text unreadable - No way to expand the image, no text-size control, no full-PNG escape Redesigned the GuideViewer around three affordances: 1. Responsive sizing default: w-[96vw] max-w-[1100px] h-[92vh] max-h-[900px] scales cleanly from 360px phones to ultrawide desktops; sm: breakpoint bumps interior padding from 1rem → 1.5rem 2. Fullscreen toggle (Maximize2 icon, top-right of header) toggles to w-screen h-screen edge-to-edge — same UI fits the screenshot at full resolution. Pref persists in localStorage under `relyloop.guide-viewer.fullscreen`. Keyboard shortcut: 'f' 3. Text-size toggle (A− / A / A+ button, top-right of header) cycles caption + description through sm / base / lg. Default base (16px). Pref persists in `relyloop.guide-viewer.text-size`. Plus accessibility polish: * Caption rendered with aria-live="polite" so screen readers announce every slide change * Sr-only "Slide N of M: " prefix on caption so position is read * Counter has aria-label "Slide N of M" * Prev/Next/text-size/fullscreen all have explicit aria-labels * Keyboard nav unchanged (Arrow ←→, Esc closes, 'f' fullscreen) and no longer hijacks keys when an input is focused * "View full image in new tab" inline link below image — gives the user the browser's native zoom for any detail the viewer can't fit GuideTrigger button: * Bumped from size-12 → size-14 (48 → 56px) — better mobile tap target * Position now respects env(safe-area-inset-bottom/right) so the button doesn't sit under iOS home indicator / right safe area 7 new vitest cases cover the toggles, persistence, sr-only caption, and keyboard shortcut. All 10 canonical demo specs re-captured for visual consistency. Quality: vitest 238 / 238 (7 new + 231 existing), regression e2e 25 / 25, demo capture 10 / 10, lint 0 errors, format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(types): update import path for routes in next-env.d.ts * feat(ui): surface long-form tutorial + workflows-overview docs under /guide End users can now read the canonical long-form documentation (tutorial-first-study.md and workflows-overview.md from `docs/08_guides/`) in-app under the Guides menu — no more roundtrip to GitHub or a local IDE for first-time setup or capability discovery. What ships: ui/scripts/copy-docs.mjs prebuild script copies markdown source from `docs/08_guides/` → `ui/public/docs/` (wired into package.json `predev` + `prebuild` hooks so dev + prod stay in sync). README dropped into the dest dir explaining the GENERATED convention. /guide/docs/[slug] new route renders the markdown via react-markdown + remark-gfm. Repository- relative links (`../01_architecture/...`) are rewritten at render time to absolute GitHub blob URLs. <MarkdownDoc> the component: aria-friendly header with title + "View on GitHub" + text- size toggle (A− / A / A+, shared localStorage key with GuideViewer so a user who upsizes one surface gets the other) + column-width toggle (narrow ~900px reading column vs. wide ~1400px reference column). Prose typography scales accordingly. /guide catalog split into two sections — Long-form documentation (2 cards) above, Visual walkthroughs (10 cards) below. @tailwindcss/typography installed + @plugin'd in globals.css. The `prose` class in <StudyDigestPanel> also benefits — headings + lists in digest narratives now render with correct hierarchy (previously compiled to nothing under Tailwind 4 CSS-first). DOC_REGISTRY new export in guide-types.ts; mirrors the GUIDE_REGISTRY shape for the 2 docs. 8 new vitest cases cover: render, repo-relative link rewriting (handles `../` + `../../` correctly), absolute URLs unchanged, fetch errors, text-size toggle + persistence, wide-column toggle, localStorage hydration, and "View on GitHub" link target. Quality: vitest 246 / 246 (8 new + 238 existing), regression e2e 25 / 25, lint 0 errors, typecheck clean, format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): add favicon.ico + icon.svg to silence /favicon.ico 404 Browsers (notably Chrome) request /favicon.ico on every page load regardless of any <link rel="icon"> tag in the document head. The Next.js App Router auto-injects the link tag for src/app/icon.svg, but the parallel /favicon.ico request still 404'd because no static asset existed at that path. What ships: ui/src/app/icon.svg Loop-motif SVG icon (dark rounded square + white ring open at top + dot at the entry point — the Karpathy-loop product metaphor). Next.js auto-serves at /icon.svg and injects <link rel="icon" type="image/svg+xml" href="/icon.svg">. ui/public/favicon.ico Multi-size ICO (16, 32, 48, 64) generated from the same motif via Pillow. Served at /favicon.ico by the runtime image (the Dockerfile already COPYs public/ into the standalone output thanks to the prior guide-assets fix). Verified: both endpoints return 200; rendered icon shows the loop glyph correctly at 64px preview. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): add Slides / Video toggle in GuideViewer + ship 10 walkthrough.webm files Playwright's demo config has always recorded slow-motion videos (video: 'on', 350ms slow-mo) but they were ephemeral test-results artifacts. This wires those videos into the in-app guide surface so users can watch a walkthrough play out instead of clicking through slides — particularly valuable for guide 10 (chat-with-agent) where the SSE streaming + tool dispatch is fundamentally temporal. Pipeline: pnpm capture-guides ├── playwright test -c playwright.demo.config.ts captures both │ screenshots │ (already wired) │ and videos under │ test-results/ │ demo-artifacts/ └── node scripts/promote-videos.mjs copies each video.webm to ui/public/guides/ <id>/walkthrough.webm The match-by-prefix logic in promote-videos.mjs handles Playwright's filename truncation (some specs get cut mid-word, so naive `startsWith(guideId)` misses 2/10; matching by the NN_ numeric prefix recovers all 10). GuideViewer changes: * New Slides | Video pill toggle in the header (shown only when metadata.video is set). Slides remains default; Video swaps the image area for an autoplay-loop <video> element with browser controls. Switching guides resets to Slides mode. * Tested via 3 new vitest cases (toggle default, video src + slide UI hidden in video mode, toggle absent when video field missing). Metadata + assets: * GuideMetadata type extends with optional `video` field * 10 metadata.json files updated with `"video": "walkthrough.webm"` * 10 walkthrough.webm files committed under ui/public/guides/<id>/ (~2.6 MB total; per-video 128-448 KB depending on flow length) Re-captured screenshots in the same run for visual consistency. Quality: vitest 252 / 252 (3 new video-mode + 249 existing), regression e2e 25 / 25, lint 0 errors, format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): CI keyboard-shortcut test flake + Gemini #1/#2 accepted findings Three fixes bundled: 1. CI flake on 'f keyboard shortcut toggles fullscreen' — the test fired keyDown before metadata had loaded. The handler bailed early under the prior `if (!open || total === 0) return` guard, so the 'f' key silently dropped. CI is slower than local so the race exposed there only. Two-part fix: - Decoupled fullscreen from `total > 0` in the keydown effect (fullscreen is viewer chrome — works during loading too). ArrowLeft/Right slide nav still gated on total > 0. - Test now waits for `guide-slide-image` (only renders post-load) before pressing keys. 2. Gemini #1 (medium) — guidesForPath prefix-match bug: `pathname.startsWith('/proposals')` would also match a hypothetical `/proposals-archive`. Changed to full path-segment match (`pathname === g.prefix || pathname.startsWith(g.prefix + '/')`). ACCEPTED. 3. Gemini #2 (medium) — DialogDescription showed "Loading walkthrough screenshots…" even on fetch error, which is misleading. Changed error state to render empty description; loading and loaded paths unchanged. ACCEPTED. Gemini #3 (medium, aspect-ratio for layout stability) — REJECTED with cited counter-evidence: the image container already takes all remaining vertical space via `flex-1 min-h-0`, and the inner `<img>` uses `object-contain` which preserves the 1440×960 aspect of every captured screenshot. Forcing `aspect-[3/2]` on the container would constrain its width based on height and break the responsive flex-grow layout (particularly in fullscreen + on narrow viewports where the dialog height drives the layout, not width). Quality: vitest 249 / 249 green; lint 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#265) * fix(test): align test_seeding helper with seedTemplate's declared_params backend/app/services/test_seeding.py's seed helpers wrote `search_space.params['title.boost']` (9 occurrences in 2 functions spanning search_space, trial params, narrative, parameter_importance, recommended_config, config_diff, and the followup-chain seed) but the e2e seedTemplate() helper at ui/tests/e2e/helpers/seed.ts:316 declares `declared_params: { boost: 'float' }`. Both helpers bypass validate_against_template, so the inconsistency landed in DB silently. When an e2e spec then opened CreateStudyModal against the seeded source study, Step-4's client-side validateSearchSpaceAgainstTemplate raised `Param 'title.boost' is not declared by template — Declared params: ['boost']` and refused to advance. Blocked 3 of 6 currently-red smoke-stack tests: - study-clone.spec.ts:24 (clone-from-source → submit round-trip) - followup_run.spec.ts:28 (narrow-followup hard-coded the same literal in NARROW_SEARCH_SPACE) - study-clone-narrow-bounds.spec.ts (deliberately stops short of submit per a stale comment citing this bug) Rename `title.boost` → `boost` per the bug folder's locked Path 1 (the dot-namespaced name is arbitrary; no test depends on the specific literal). Co-located consumers updated in lockstep: - backend/tests/integration/test_test_seeding.py declared_params rename to keep the seed-endpoint integration test internally consistent - ui/tests/e2e/followup_run.spec.ts NARROW_SEARCH_SPACE + rationale string rename (fixes the same Step-4 validator wall the v1 clone spec hits) - ui/tests/e2e/study-clone-narrow-bounds.spec.ts 5 literal references + delete the now-stale fixture-inconsistency comment block (replace with a one-line pointer to the remaining out-of-scope smoke flakes) - ui/tests/e2e/studies.spec.ts:173 digest-narrative assertion text follows the renamed narrative Out of scope (separate bug folders stay open): the swap-template template_id assertion failure in followup_run.spec.ts:111 and the dashboard demo-state-disclosure / demo-data-banner missing-testids failures in dashboard.spec.ts + dashboard-reseed.spec.ts. Both need live smoke-stack reproduction. See bug_fix.md for full design + locked decisions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture chore_clone_narrow_bounds_full_roundtrip_e2e Surfaced during /bug-fix for bug_clone_e2e_seed_template_params_mismatch. Once that PR's seed-helper rename lands, the study-clone-narrow-bounds.spec.ts can be extended to include the submit + GET-the-new-study round-trip assertion that was deferred per a stale comment block (now replaced with a one-line pointer in the same fix). P3 — adds an E2E gate that's not currently present but isn't blocking anything. Dashboards regenerated by pre-commit hook to reflect the new planned folder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: GPT-5.5 round 1 adjudication — 3 accepted, 1 rejected Adjudicates GPT-5.5 final-review findings on PR #265: - ACCEPTED #2: bug_fix.md's "Tangential observations" bullet was internally contradictory; rewrote to plainly state "2 of 3 resolve as side effect; followup_run.spec.ts:111 remains". - ACCEPTED #3: study-clone-narrow-bounds.spec.ts test title still claimed "submit → persisted" but stops before submit; renamed to "textarea is clamped and restores on uncheck" to match coverage. - ACCEPTED #4: chore idea.md link `../../../ui/...` (resolves under `docs/ui/`); fixed to `../../../../ui/...` to reach repo root. - REJECTED #1: GPT-5.5 flagged tangential-idea-capture + dashboard regen as "scope drift." Counter-evidence: /bug-fix Phase 5 step 7 and /impl-execute --ad-hoc Step 2.5 require tangential-observations sweep as BLOCKING; dashboard regen is automatic via pre-commit hook. Both are procedural compliance, not scope drift. Pre-commit re-regenerated dashboards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): GPT-5.5 round 2 — change chore priority P3 → P2 GPT-5.5 round 2 finding accepted: the chore idea was marked P3 but the dashboard regen script only recognizes P0/P1/P2/Backlog, so the P3 marker silently bucketed as P2 (default) — creating an internal contradiction between the idea's stated priority and the dashboard's classification. P2 is the correct semantic match (this adds important coverage that isn't blocking) and resolves the dashboard/idea contradiction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adjudicates GPT-5.5 final-review findings on PR #268: - ACCEPTED #1 (Medium): integration test used `DELETE FROM clusters` which would wipe operator data if the test DB is shared with a working stack. Refactored to use transaction rollback so fixture inserts never persist past the test run. Snapshots baseline live count BEFORE inserts so the test is also isolated from existing operator data (e.g. the 4 demo clusters from `make seed-demo`). - ACCEPTED #2 (Low): negative-only assertion (1 soft-deleted → 0) would pass against a broken-but-symmetric SQL like `WHERE deleted_at IS NOT NULL` (always returns 0). Added a second test pinning the mixed case (1 live + 1 soft-deleted → delta=1) so both halves of the contract are guarded: EXCLUDE deleted rows AND COUNT live rows. Both tests now use the same baseline-snapshot + transaction-rollback pattern. Shared `_insert_cluster_sql()` helper builds INSERT SQL + bound params for a live or soft-deleted row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#268) * fix(install): count only LIVE clusters in --if-empty auto-seed gate scripts/seed_meaningful_demos.py:824's count_existing_clusters() ran `SELECT COUNT(*) FROM clusters` with no `WHERE deleted_at IS NULL` filter. Soft-deleted rows (left behind by E2E test cleanup, operator manual deletes, or seed-then-wipe cycles) counted toward "exists" and permanently false-skipped the auto-seed step at install.sh:95 on every subsequent `make up` — leaving operators on an empty dashboard with no in-product recovery path. Empirically reproduced earlier this session: after `make down`/`make up`, operator's `/` showed zero clusters; DB had 7 cluster rows but all with non-null `deleted_at` (~2026-05-26 13:32 UTC, from earlier E2E test cleanup); auto-seed printed "skipping — 7 cluster(s) already exist"; the public API correctly filtered the dead rows. Fix: - Extract SQL to module-level constant `_COUNT_LIVE_CLUSTERS_SQL` so it's testable from unit-layer. - New SQL: `SELECT COUNT(*) FROM clusters WHERE deleted_at IS NULL;` Aligns with the public API's view of "exists" (same filter used throughout backend/app/db/repo/cluster.py). - Two-layer regression coverage: - unit (no DB needed): assert the constant's SQL includes `WHERE deleted_at IS NULL` (whitespace-normalized) — catches regression-by-edit if someone deletes the filter - integration (Postgres required): seed 1 soft-deleted cluster row via raw SQL, run the constant, assert COUNT = 0 — catches regression-by-semantic (e.g. swapping `IS NULL` for `= NULL`) This is part 1 of a bundled fix for the stuck-stack self-rescue problem. Part 2 (next commit) tightens the dashboard's "Reset to demo state" disclosure gating in ui/src/components/dashboard/start-here-checklist.tsx so operators who land in this state can self-rescue without CLI knowledge. See bug_fix.md for full design + locked decisions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dashboard): show Reset disclosure whenever no live clusters ui/src/components/dashboard/start-here-checklist.tsx:150 gated the "Reset to demo state" disclosure on a 3-way AND (`!hasClusters && !hasQuerySetsWithJudgments && !hasStudies`) that modeled "truly pristine first-run." But the realistic stuck state — data orphaned without any live clusters — is exactly when the operator needs the rescue affordance most, and the strict predicate hid it. Reproduced earlier this session: operator's `/` showed Step 1 (Register cluster) as NOT done, Steps 2 + 3 as Done, and no disclosure anywhere — forcing CLI knowledge of `make seed-demo FORCE=1` to recover. Without live clusters every cluster-scoped operation fails, so the orphan studies + query_sets are unusable. The disclosure should always be available whenever there's no live cluster to use. Fix: - Tighten the predicate to `!hasClusters` only. The other two signals are downstream of having a cluster; gating on them was incidental. - Keep `hasClusters=true` as the "hide" trigger so operators with a working stack don't accidentally click Reset (the ResetDemoStateButton confirm dialog is the remaining defense). - Inline-comment the predicate change with a pointer to the bug folder so a future reader sees why the trivially-simple `!hasClusters` predicate has explanatory weight behind it. Test coverage: - Flip 2 of the 3 old AC-8 hide-tests to assert the new render behavior at single-orphan states. - Add a 3rd positive test pinning the exact stuck state (no clusters + orphan studies + orphan query_sets) from the 2026-05-26 session. - Keep AC-7 (all three false → renders) and one AC-8 case (hasClusters=true → hides) and the "all three true → returns null" unchanged. Part 2 of a bundled stuck-stack self-rescue fix. Part 1 (commit prior) fixes `scripts/seed_meaningful_demos.py` so the auto-seed gate counts only live clusters. Together they restore both the auto-recovery path (on `make up`) and the in-product rescue path (the disclosure). See bug_fix.md for full design + locked decisions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(integration): GPT-5.5 round 1 — both findings accepted Adjudicates GPT-5.5 final-review findings on PR #268: - ACCEPTED #1 (Medium): integration test used `DELETE FROM clusters` which would wipe operator data if the test DB is shared with a working stack. Refactored to use transaction rollback so fixture inserts never persist past the test run. Snapshots baseline live count BEFORE inserts so the test is also isolated from existing operator data (e.g. the 4 demo clusters from `make seed-demo`). - ACCEPTED #2 (Low): negative-only assertion (1 soft-deleted → 0) would pass against a broken-but-symmetric SQL like `WHERE deleted_at IS NOT NULL` (always returns 0). Added a second test pinning the mixed case (1 live + 1 soft-deleted → delta=1) so both halves of the contract are guarded: EXCLUDE deleted rows AND COUNT live rows. Both tests now use the same baseline-snapshot + transaction-rollback pattern. Shared `_insert_cluster_sql()` helper builds INSERT SQL + bound params for a live or soft-deleted row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): clusters table has no `updated_at` column CI failure on PR #268: backend integration job hit `asyncpg.exceptions.UndefinedColumnError: column "updated_at" of relation "clusters" does not exist` on both integration tests in test_seed_meaningful_demos_if_empty.py. The clusters table only has `created_at` + `deleted_at` (see backend/app/db/models/cluster.py:83-88). I assumed updated_at existed when authoring the INSERT helper but never validated against the live schema before pushing — should have run the integration test locally via `make test-worktree` or checked the model file. Fix: remove updated_at from the INSERT helper's columns + params. Local validation: 1453 unit tests still green. The integration tests themselves skip locally (Postgres not reachable from host shell); CI will exercise them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… + fix dashboard link drift Adjudicates GPT-5.5 final-review findings on PR #270: - ACCEPTED #1 (Medium): the venv-ownership check could false-pass on a broken image where /app/.venv is missing (find returns empty → wc -l = 0). Added `[ -d /app/.venv ] ||` guard before the find pipeline. - ACCEPTED #3 (Low): inline comment claimed the new step fires BEFORE the smoke job — but GHA dispatches jobs in parallel. Reworded to "fast-fail signal" with the actual value-prop. - ACCEPTED #2 (Low): dashboard regen at MVP1_DASHBOARD.md copied the idea.md `../../../00_overview/...` link literally — resolves above repo root from docs/00_overview/. Replaced with plain-text slug + separate path mention so the dashboard output is clean. Dashboards regenerated by pre-commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* infra(ci): add runtime image-invariant smoke to docker buildx job The `docker buildx (relyloop/api)` job builds the runtime image with `push: false` and `cache-to: type=gha,mode=max` — verifies the image BUILDS but never RUNS it. The companion static unit test at backend/tests/unit/test_dockerfile_runtime_stage.py (3 tests) catches Dockerfile-structural regressions but cannot catch runtime-state regressions (e.g., a future uv release writes dist-info files elsewhere; an upstream Python base image switches default UID). Add `load: true` to the build step + a new "Verify runtime image invariants" step that `docker run`s the freshly-built image and asserts two invariants: 1. `/app/.venv` is fully relyloop-owned (the bug_dockerfile_venv_root_owned_after_user_switch failure mode — blocks `uv run` from the relyloop user in any one-shot container like `make test-worktree`). 2. Default user is relyloop (uid 1000), not root. Each invariant maps to a captured bug folder for traceability. Fires BEFORE the heavier `smoke (operator-path tutorial flow)` job has to bring up the full Compose stack, so image-construction regressions fail in <30s instead of after 5+ minutes of `make up`. Idea preflight (Audit & Patch mode) also landed three doc edits on the originating idea file: P3 → P2 (matches the dashboard regen's recognized priority tiers per scripts/build_mvp1_dashboard.py:217-245); Status line updated to reference the merged PR #263 / #264 SHAs; the stale infra_ci_smoke_makeup reference rewritten to point at the live smoke job in pr.yml. Dashboard regenerated by pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): GPT-5.5 round 1 — guard missing /app/.venv + soften ordering + fix dashboard link drift Adjudicates GPT-5.5 final-review findings on PR #270: - ACCEPTED #1 (Medium): the venv-ownership check could false-pass on a broken image where /app/.venv is missing (find returns empty → wc -l = 0). Added `[ -d /app/.venv ] ||` guard before the find pipeline. - ACCEPTED #3 (Low): inline comment claimed the new step fires BEFORE the smoke job — but GHA dispatches jobs in parallel. Reworded to "fast-fail signal" with the actual value-prop. - ACCEPTED #2 (Low): dashboard regen at MVP1_DASHBOARD.md copied the idea.md `../../../00_overview/...` link literally — resolves above repo root from docs/00_overview/. Replaced with plain-text slug + separate path mention so the dashboard output is clean. Dashboards regenerated by pre-commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adjudicates GPT-5.5 final-review findings on PR #273: - ACCEPTED #1 (Medium): the POST response assertion used `.ok()` which accepts any 2xx status. A future 200/202/204 regression would silently pass. Tightened to `.status() === 201` in BOTH this spec AND the v1 clone-spec (study-clone.spec.ts:24) to avoid divergence — the same loose `.ok()` pattern lived in both files; tightening one without the other would have created confusion. - ACCEPTED #2 (Low): after re-checking the narrow-bounds checkbox to put the textarea back to the clamped state for submit, the test immediately clicked `step-next` without explicitly waiting for the textarea to reflect the re-applied clamp. A fast click could have submitted the still-restored source bounds and hidden a real bug behind a race. Added an `expect.poll` wait that re-parses cs-search-space inputValue and asserts boost.low close to 2.0 before advancing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nd-trip (#273) * chore(test): extend study-clone-narrow-bounds spec to full submit round-trip Picks up the AC-12 coverage gap that bug_clone_e2e_seed_template_params_mismatch (PR #265) unblocked. Before that fix, the spec stopped at the textarea-clamped assertion because Step-4 Next was blocked by the seed-helper/template mismatch. With PR #265 merged, the spec can now exercise the full 6-step plan from its own docblock. What changed in this PR: - Added `API_BASE` constant at top of file (mirrors clone-spec convention). - After the existing check/assert/uncheck/restore sequence, re-check the narrow-bounds checkbox to put the textarea back to the clamped [2.0, 3.0] state for submit. - Advance Step 4 → Step 5 via step-next; assert step-5 visible. - Click "Create study"; capture the POST response promise; assert 201 and capture the new study id (pattern lifted from ui/tests/e2e/study-clone.spec.ts:24). - GET /api/v1/studies/{new_id}; assert: - parent_study_id === sourceId (FR-9 lineage round-trip) - search_space.params.boost.type === 'float' - search_space.params.boost.low close to 2.0 (FR-12 — server accepted the narrowed rewrite) - search_space.params.boost.high close to 3.0 (same) Test name updated from "textarea is clamped and restores on uncheck" to "Clone study → narrow bounds → submit → persisted search_space is clamped" to match the now-complete 6-step coverage. Docblock Step 5 expanded to call out the new check → uncheck → re-check sequence so the next reader sees the intent (re-check is to prepare for submit, not redundant with the initial check). Local validation: exit code 0 against the live local stack; the test/spec/suite metadata in playwright's JSON reporter shows `"ok": true` for the spec. CI smoke job will run the full assertion stack against the GHA-provisioned stack. Idea folder will move to implemented_features in the finalize PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): GPT-5.5 round 1 — tighten POST assertion + race-safe re-check Adjudicates GPT-5.5 final-review findings on PR #273: - ACCEPTED #1 (Medium): the POST response assertion used `.ok()` which accepts any 2xx status. A future 200/202/204 regression would silently pass. Tightened to `.status() === 201` in BOTH this spec AND the v1 clone-spec (study-clone.spec.ts:24) to avoid divergence — the same loose `.ok()` pattern lived in both files; tightening one without the other would have created confusion. - ACCEPTED #2 (Low): after re-checking the narrow-bounds checkbox to put the textarea back to the clamped state for submit, the test immediately clicked `step-next` without explicitly waiting for the textarea to reflect the re-applied clamp. A fast click could have submitted the still-restored source bounds and hidden a real bug behind a race. Added an `expect.poll` wait that re-parses cs-search-space inputValue and asserts boost.low close to 2.0 before advancing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): Gemini round 1 — URL constructor on new request.get + tangential capture Adjudicates Gemini Code Assist finding on PR #273: ACCEPTED (Medium): the new `request.get(`${API_BASE}/api/v1/studies/ ${created.id}`)` at line 135 uses string-concat, brittle to a future PLAYWRIGHT_API_BASE_URL ending with a trailing slash. Switched to `new URL(..., API_BASE).toString()`. Sweep DEFERRED for the 4 sibling sites (study-clone.spec.ts:40,91 + followup_run.spec.ts:92,176) — fixing some-but-not-all creates drift. Captured as chore_e2e_api_base_url_construction for a focused sweep. Dashboards regenerated by pre-commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture bug_smoke_studies_data_table_search_flake New smoke flake surfaced on PR #273 CI (commit 5bc44e4) — test was passing on pre-PR main but failed on PR run. My diff doesn't touch the failing test or its code paths; most likely genuine flake. Same pattern as bug_smoke_dashboard_demo_state_locator_missing and bug_smoke_followup_clone_e2e_flakes captures. Next main CI run post-merge will provide a second data point. Dashboards regenerated by pre-commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#1) The Protocol's list_documents docstring still claimed _id sorting after the implementation flipped to _doc per spec D-26 fallback (ES 9 disables _id fielddata by default). This created a small contract drift between the abstract method and the ElasticAdapter implementation. Updates the docstring to describe the actual contract: adapters paginate via search_after over a stable internal sort key, with ElasticAdapter specifically using _doc, and document the spec D-26 rationale inline. Other engines remain free to pick any stable key as long as hits[i].sort is returned for search_after round-trips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…P1) (#285) * docs(planned): feat_index_document_browser idea + spec + plan - idea: 9 locked decisions (D-1..D-9) + IA route hierarchy after /idea-preflight - feature_spec: 12 FRs, 20 ACs, 28 decisions; 3 GPT-5.5 cycles (33 findings, all accepted) - implementation_plan: 13 stories across 3 epics; 3 GPT-5.5 cycles (27 findings, 26 accepted + 1 rejected with cited counter-evidence) - pipeline_status: spec + plan stages approved, ready for implementation - mvp1_dashboard regen captures the new feature in the Idea table Substrate: read-only document browser surfaced from a live demo of tune-acme-products-rich-boosts. Adds 2 SearchAdapter Protocol methods (get_document + list_documents), 2 new backend endpoints, 1 endpoint extension (studies ?target= filter), 3 new frontend pages, 2 modified frontend surfaces. No migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(adapters): SearchAdapter Protocol additions — Document + DocumentPage + 2 methods (Story 1.1) - Add Document(doc_id, source) — return shape of get_document - Add AdapterDocumentHit(doc_id, source, sort) — per-hit list page entry; the per-hit sort lets the router encode cursor from the in-body hit under the limit+1 overfetch pattern (no last_sort field per cycle-2 F1) - Add DocumentPage(hits, total) — return shape of list_documents - Add get_document / list_documents Protocol method signatures - Update _StubAdapter + protocol async-method assertion (5 names → 7) - Add 10 new unit tests for the 3 models (28 → 29 → covers Field(min_length=1) on doc_id, _source=None case, missing-sort rejection, no-last_sort guard) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(adapters): ElasticAdapter.get_document + list_documents (Stories 1.2 + 1.3) Story 1.2 — get_document: - GET /<target>/_doc/<doc_id> via _request(translate_errors=False) - found:false → None; index_not_found_exception → TargetNotFoundError - 401/403 → TargetsForbiddenError (matches list_targets pattern; spec FR-1) - urllib.parse.quote on target + doc_id (spec D-25 + AC-16) - 10 unit tests covering all 6 paths + URL encoding Story 1.3 — list_documents: - POST /<target>/_search with match_all + sort _id:asc + track_total_hits:true (spec D-24) - search_after threads into body; fields → _source.includes - Returns DocumentPage(hits=[AdapterDocumentHit(doc_id, source, sort)], total) - Per-hit sort uses h["sort"] (fail-loud KeyError if engine omits; cycle-2 F10) - 14 unit tests covering happy path, body shape, error mappings, sort fail-loud Both methods route through the existing _request retry/translate plumbing so they inherit the spec §13 single-retry budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: mark feat_index_document_browser Epic 1 complete (paused checkpoint) After Stories 1.1 + 1.2 + 1.3 committed, pause for operator-chosen checkpoint before Epic 1 phase gate (GPT-5.5 review + full suite) and the rest of the 10 remaining stories. Branch feat/index-document-browser holds the work; resume with /impl-execute ... --all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): documents-browse helper modules (Story 2.1) Lands the pure-function helpers that Story 2.2 (list endpoint) and Story 2.3 (detail endpoint) compose: - _errors.py: shared _err() builder for the spec §7.5 envelope; resolves the F6 circular-import risk so helpers don't import from clusters.py. clusters.py switches to re-export. - _documents_cursor.py: base64-urlsafe JSON encode/decode of the ES hits[i].sort array. Decode validates the parsed value is a list (cycle-2 F9) so a syntactically-valid encoded {} can't slip through to ES. - _documents_fields.py: parse_fields_csv() per FR-3 (trim, dedup, dotted-paths, wildcard-reject). - _strict_query_params.py: FastAPI dependency factory that rejects unknown query params with 422 VALIDATION_ERROR (per D-21 / FR-12). - services/documents.py: truncate_source_for_list() with per-field + whole-doc UTF-8 byte caps per D-27, plus exported sentinel constants. - services/_target_filter.py: check_target_visible() glob matcher for cluster.target_filter (anti-enumeration via 404, per D-13). 39 unit tests across the 5 helpers; mypy --strict clean; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): documents list + detail endpoints (Stories 2.2 + 2.3) Lands two new engine-passthrough endpoints under /api/v1/clusters/{cluster_id}/targets/{target}: - GET .../documents — paginated (cursor + limit+1 overfetch) _id + truncated _source preview; X-Total-Count header; strict query-param allowlist (cursor/limit/fields) + 422 on unknown; ?fields= CSV with wildcard rejection; target_filter glob enforced as 404 (anti- enumeration per D-13). - GET .../documents/{doc_id:path} — single document detail; FastAPI :path converter round-trips slashes (D-17 / AC-16); 404 DOCUMENT_NOT_FOUND distinct from TARGET_NOT_FOUND. Adapter sort key flipped from _id → _doc per spec D-26 fallback: ES 9 disallows _id fielddata by default (indices.id_field_data.enabled is off), surfacing as HTTP 400 on first live-ES probe. _doc is shard- internal but always available; search_after works against it identically. Cursor continuity tradeoff acknowledged in adapter docstring. Validation moved before DB I/O so malformed cursor / wildcard fields always surface as 422 regardless of cluster_id validity. DocumentSummary + DocumentListResponse added to schemas.py; detail endpoint returns the adapter Document model directly (single source of truth — no router-side schema drift per cycle-2 F3). Tests: - 11 contract tests (5 router-source + 7 endpoint via real app; ES not required for any). Asserts the 6 spec §7.5 codes + X-Total-Count header + strict-dep wiring + truncation sentinel. - 21 integration tests (16 stub-adapter for full error-code coverage + 5 live-ES happy path covering AC-15 exact-multiple termination, fields filter, doc-id-with-slash round-trip, full corpus paginate). Both layers skip cleanly when prerequisites absent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ?target= filter on GET /api/v1/studies (Story 2.4 / FR-5) list_studies() + count_studies() repo functions gain a `target: str | None` kwarg (default None — source-compatible per D-22). The /api/v1/studies endpoint adds a corresponding ?target=<min_length=1, max_length=256> Query param that composes with every existing filter (status, cluster_id, since, q, sort) via AND. The frontend's "view studies targeting this index" link from the index summary page (FR-7) consumes this filter; the same param drives the LinkedEntitiesRow's filter-chip experience (FR-10, lands in Story 3.5). Tests: - 6 integration tests covering: target alone, target+cluster_id AND composition, target+status AND composition, default unchanged behavior, target+malformed-cursor → 422 (cycle-3 F2 — target filter doesn't bypass cursor validation), empty target → 422. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): Indices card on cluster detail (Story 3.1 / FR-6) New ClusterDetailIndicesCard reuses the existing useClusterTargets hook to render an indices table on the cluster-detail page, between the ClusterActionBar and the StudiesByClusterTable card. Each row links to /clusters/[id]/indices/[name] (lands in Story 3.2). Indices sort by name ascending via localeCompare regardless of engine return order. Doc count formatted with toLocaleString(); null fallback renders an em-dash. Error states wire the standard backend envelope codes to inline copy: - TARGETS_FORBIDDEN → ACL hint citing the monitor privilege + cluster-registration runbook. - CLUSTER_UNREACHABLE → Retry button calling query.refetch(). 5 new glossary keys added (cluster.indices_card, cluster.target_doc_count, target.schema, target.schema_analyzer, document.truncation_sentinel) — all per-key length validators clean (<=140 short / <=800 long). 5 vitest tests cover sort, doc_count formatting, empty state, 403 TARGETS_FORBIDDEN inline hint, and the CLUSTER_UNREACHABLE retry button render. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): index summary page (Story 3.2 / FR-7 / AC-2) New route /clusters/[id]/indices/[name]/page.tsx composes the existing /targets + /schema endpoints (no new API call) into a summary page with: - Page header: <index name> · <formatted doc_count> · <engine type chip> - Two nav cards: "Browse documents" + "View studies targeting this index" (links to /studies?cluster_id=...&target=... per D-3) - Sortable schema table extracted as IndexSummarySchemaTable for reuse. Error states cover every spec surface: - 404 TARGET_NOT_FOUND on /schema → AC-17 empty state with breadcrumb. - 403 on /targets + 200 on /schema → D-28 partial-permission state ("document count unknown"). - 403 on BOTH → cycle-2 F8 full-denial state. 4 vitest tests cover happy path, 404 not-found, partial-permission, and full denial. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): documents list page (Story 3.3 / FR-8) New /clusters/[id]/indices/[name]/documents page renders the paginated documents browse: - _id + truncated _source preview (first 3 fields rendered as key=value). - Page header reads X-Total-Count from the response header (cycle-2 F8 resolution) and renders <toLocaleString> documents. - Prev / Next cursor controls with cursorStack so Prev steps back one page rather than re-loading the first page. Page-size selector (25 / 50 / 100, default 25, max 100). Cursor stack resets on page-size change. - Row click navigates to .../documents/[doc_id] with the doc_id URL-encoded (lands in Story 3.4). - Truncation sentinel rendered verbatim with InfoTooltip linking to the document.truncation_sentinel glossary entry; matches the backend literal exported via ui/src/lib/documents-constants.ts. - Error states: 403 TARGETS_FORBIDDEN inline ACL hint, 404 TARGET_NOT_FOUND inline, 503 CLUSTER_UNREACHABLE Retry button. New lib/api/documents.ts hook (useTargetDocuments) parses X-Total-Count from the response headers; inline DocumentSummary / DocumentListResponse / Document interfaces until the OpenAPI codegen regenerates ui/src/lib/ types.ts. 6 vitest tests cover happy paginated render, truncation sentinel, empty state, 403, 404, and CLUSTER_UNREACHABLE retry button. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): document detail page (Story 3.4 / FR-9) New catch-all route /clusters/[id]/indices/[name]/documents/[...doc_id] per D-17 — accepts doc_ids containing literal slashes (AC-16). Renders the full _source as pretty-printed JSON (JSON.stringify(..., 2)) inside a max-height scroll container. Copy-JSON button uses navigator.clipboard.writeText with a 1.5s "Copied" indicator. Hidden when source is null. Breadcrumb: Cluster > <index name> > Documents > <doc_id>. States: - AC-5 happy: pretty JSON + Copy button. - AC-9: 404 DOCUMENT_NOT_FOUND empty state with breadcrumb. - AC-18: source=null → "_source: false" amber hint, Copy button hidden. - AC-16: doc_id with literal slashes round-trips (URL-encoded twice on the wire so Next.js decodes once into the literal value). - 403 TARGETS_FORBIDDEN inline ACL hint. - 503 CLUSTER_UNREACHABLE + TARGET_NOT_FOUND inline messages with Retry. 5 vitest tests cover happy path, DOCUMENT_NOT_FOUND, source=null, slash-in-doc_id round-trip, and TARGETS_FORBIDDEN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): LinkedEntitiesRow Index entry + studies ?target= chip (Story 3.5) LinkedEntitiesRow grows from 4 to 5 entries (FR-10 / AC-11) — appends "Index" pointing at /clusters/[id]/indices/[name] with cluster_id + target URL-encoded. useStudies hook signature extended with target?: string | undefined; the query key includes target so cache keys differ across filter values (no false-positive rehydration). Studies list page now consumes ?target= and ?cluster_id= from searchParams. When ?target= is present, renders an active filter chip "Target: <name>" with an × button. Clicking × calls router.replace with a URLSearchParams copy that drops target= while preserving every other param (status, cluster_id, sort, cursor) — F9 resolution. 3 vitest tests on LinkedEntitiesRow cover the 5 entries, the linked-index href shape, and target-with-special-chars URL encoding. 3 vitest tests on the studies page cover chip render when ?target= is present, × click drops target while preserving cluster_id, and chip absent when ?target= is missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): index-document-browser top-down + filter chip spec (Story 3.6) New Playwright spec exercises the full feat_index_document_browser flow against the real backend (no page.route() mocking per CLAUDE.md): 1. Top-down: bulk-load 30 docs into a unique ES test index via the live ES cluster, register a cluster pointing at it, navigate /clusters/[id] → Indices card → click index → summary page → Browse documents → click first row → document detail; assert each page's testids + the JSON preview. 2. Filter chip: navigate /studies?target=hypothetical-index, assert the active filter chip renders + clicking × drops the param from the URL. Resolves OQ-1 by seeding the corpus inline rather than depending on scripts/seed_meaningful_demos.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): mark feat_index_document_browser implementation complete All 13 stories shipped across Epics 1–3: - Adapter Protocol + ElasticAdapter (Stories 1.1–1.3) - Documents helper modules + list/detail endpoints + studies ?target= filter (Stories 2.1–2.4) - 4 frontend routes + LinkedEntitiesRow + filter chip + E2E (Stories 3.1–3.6) Updates implementation_plan.md execution tracker (all stories [x]) and pipeline_status.md to "Complete — PR pending" with a precise inventory of tests added. Captures one tangential discovery as an idea file: - bug_starlette_request_poisons_fastapi_depends_tests — a test-infra footgun where mixing direct Request(scope=...) instantiation and FastAPI Depends(...) in the same test file breaks the latter silently. Workaround documented; reproducer + upstream issue deferred. Dashboards regenerated by the pre-commit hook to pick up the new bug_starlette_request_poisons_fastapi_depends_tests folder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(contract): register the 2 new documents endpoints in EXPECTED_ENDPOINTS CI's test_openapi_surface.py::test_openapi_has_no_orphan_endpoints flagged the documents list + detail endpoints as orphan — the routes ship without an entry in EXPECTED_ENDPOINTS, which would let future drift go uncaught. Adds the two (method, path) tuples bracketing the existing /run_query entry, mirroring its convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(adapter): guard ES 404 .json() + compact cursor encoding (Gemini PR #285) Adjudicated Gemini Code Assist's 5 medium-priority findings on PR #285: Accepted (3): - get_document 404 path — wrap resp.json() in try/except so an intermediate proxy returning a non-JSON HTML 404 page surfaces as None/return rather than bubbling JSONDecodeError as a 500. - list_documents 404 path — same guard; falls through to ClusterUnreachableError on malformed body, same as the non-index-not-found branch. - encode_documents_cursor — use separators=(",", ":") so the base64 blob in query strings stays compact. Rejected (2) with cited counter-evidence: - get_document 200-path JSON guard: existing adapter convention (list_targets at elastic.py:413, get_schema at :282, search_batch at :638/701) all call resp.json() unguarded on the success path. Adding a guard here would create asymmetric handling across the adapter; the assumption is that a 200 from ES is well-formed JSON. - list_documents 200-path JSON guard: same rationale. The 3 accepted fixes ship in this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(adapters): align Protocol docstring with _doc sort (GPT-5.5 cycle-1 #1) The Protocol's list_documents docstring still claimed _id sorting after the implementation flipped to _doc per spec D-26 fallback (ES 9 disables _id fielddata by default). This created a small contract drift between the abstract method and the ElasticAdapter implementation. Updates the docstring to describe the actual contract: adapters paginate via search_after over a stable internal sort key, with ElasticAdapter specifically using _doc, and document the spec D-26 rationale inline. Other engines remain free to pick any stable key as long as hits[i].sort is returned for search_after round-trips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture infra_smoke_job_chronic_flake The pr.yml workflow has been failing/cancelled on main for ~20 consecutive pushes since PR #188 (2026-05-23) added the seed-demo + Playwright steps to the smoke job. Every other job (backend, frontend, docker buildx) passes; only `smoke (operator-path tutorial flow)` is red. Surfaced during PR #285 (feat_index_document_browser) CI watch when the operator asked whether the seed step had been failing since introduction. The seed step itself succeeds — it's just slow enough to push the job into the Playwright window where the flakes live, and concurrency policy cancels in-flight runs. Idea file describes the 4-axis cost stack (seed-demo unconditional re-run, Playwright flakes, no concurrency: block, push:main trigger surface) and proposes a 3-phase fix path (cheap signal recovery → Playwright bisect → workflow split). P1 priority — the pr workflow's red badge on main no longer signals real regressions; a real backend break would currently look identical to the chronic state. Dashboards regenerated by the pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #291 first CI run surfaced an integration-test FK collision under parallel pytest execution: IntegrityError: update or delete on table "clusters" violates foreign key constraint "query_sets_cluster_id_fkey" on table "query_sets" Triggered at teardown of TestClusterRepoBasics.test_create_then_fetch_round_trip when a parallel xdist worker held an FK reference to a cluster being deleted in another worker's teardown. The integration test layer isn't isolated enough at the DB level for parallel execution against a shared Postgres. Reverting `-n auto` on both backend jobs: - backend-unit-fast: was 33s baseline, went to 43s with -n auto (xdist worker-startup overhead exceeded parallelism savings on the 2-core runner for an already-fast test suite). Revert is a net win. - backend (lint + typecheck + tests + coverage): was 6m 1s baseline, parallel execution introduced FK collisions. Revert restores correctness. Keeping pytest-xdist in dev deps for local opt-in (the unit-only path is parallel-safe locally where the runner has more cores; the worker-startup overhead is dominated by 1500+ unit tests on 8+ cores). Follow-up captured in chore_ci_perf_buildx_artifact_image_cache_xdist §"What is NOT changed in this PR": split the backend job into a parallel-safe "unit + contract" lane (where -n auto helps) + a serial "integration" lane (where collisions live) to recover #3's intended savings without the correctness risk. CI-perf #1 (buildx artifact handoff) + #2 (base-image cache) are unaffected and remain the actual smoke-pace wins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…mage cache + pytest-xdist) (#291) * chore(ci): buildx artifact handoff + base-image cache + pytest-xdist Optimizes the two longest CI jobs to bring smoke under its 15min ceiling (was timing out on PR #290 after the docker bumps invalidated the implicit Docker layer cache). Three changes bundled: 1. Buildx artifact handoff (~5min off smoke). The `docker` job already builds the API image; add a parallel `docker-ui` job for the UI image. Both upload tars via actions/upload-artifact. Smoke `needs: [docker, docker-ui]`, downloads + docker-loads both before `make up`. RELYLOOP_SKIP_BUILD=1 makes install.sh skip its `docker compose build` step (new escape hatch in install.sh step 6); RELYLOOP_GIT_SHA picks up the loaded images by tag. 2. Base-image cache (~1-2min off smoke on hit). actions/cache keyed on hashFiles('docker-compose.yml'). On miss: docker pull + save the 4 service-container images. On hit: docker load each tar — ~5s vs 60-90s for pull. 3. pytest-xdist parallel execution (~3min off backend full + ~15s off backend-unit-fast). pytest-xdist>=3.6 added to dev deps; `-n auto --dist worksteal` passed to backend pytest commands. Expected impact: smoke 13-15min → ~7-9min; backend full 8m 36s → ~4-5min. Total ~7-10min saved per PR run. See chore_ci_perf_buildx_artifact_image_cache_xdist/idea.md for the risk analysis (xdist DB collisions, artifact overhead, cache key staleness) and the not-in-scope follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(ci): revert -n auto on backend jobs (xdist DB collisions) PR #291 first CI run surfaced an integration-test FK collision under parallel pytest execution: IntegrityError: update or delete on table "clusters" violates foreign key constraint "query_sets_cluster_id_fkey" on table "query_sets" Triggered at teardown of TestClusterRepoBasics.test_create_then_fetch_round_trip when a parallel xdist worker held an FK reference to a cluster being deleted in another worker's teardown. The integration test layer isn't isolated enough at the DB level for parallel execution against a shared Postgres. Reverting `-n auto` on both backend jobs: - backend-unit-fast: was 33s baseline, went to 43s with -n auto (xdist worker-startup overhead exceeded parallelism savings on the 2-core runner for an already-fast test suite). Revert is a net win. - backend (lint + typecheck + tests + coverage): was 6m 1s baseline, parallel execution introduced FK collisions. Revert restores correctness. Keeping pytest-xdist in dev deps for local opt-in (the unit-only path is parallel-safe locally where the runner has more cores; the worker-startup overhead is dominated by 1500+ unit tests on 8+ cores). Follow-up captured in chore_ci_perf_buildx_artifact_image_cache_xdist §"What is NOT changed in this PR": split the backend job into a parallel-safe "unit + contract" lane (where -n auto helps) + a serial "integration" lane (where collisions live) to recover #3's intended savings without the correctness risk. CI-perf #1 (buildx artifact handoff) + #2 (base-image cache) are unaffected and remain the actual smoke-pace wins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(ci): seed_es shard race + verify-script tolerates indented build line Two follow-up fixes for PR #291 surfaced by the 2nd CI run: 1. seed_es.py: ES 9.x single-node shard race --------------------------------------- The faster stack-up (~3min vs ~10min baseline) stopped masking a latent race: seed_es creates the products index then immediately bulk-indexes. ES 9.x default `number_of_replicas: 1` tries to allocate a replica that can never bind on a one-node cluster, leaving the primary itself in an INITIALIZING → STARTED race that surfaces as `unavailable_shards_exception: [products][0] primary shard is not active Timeout: [1m]` on the bulk-index call. Fix: - Set `number_of_replicas: 0` in the index create call (single-node hygiene; the previous default-1 left an unallocatable replica) - Add an explicit `_cluster/health/{index}?wait_for_active_shards=1` poll between create and bulk so the failure mode (if it ever resurfaces) gives a clean error instead of the 1-min ES bulk-side timeout 2. scripts/ci/verify_install_builds_all_services.sh: regex too strict ------------------------------------------------------------------ PR #291 added a RELYLOOP_SKIP_BUILD=1 escape hatch in install.sh that wraps `docker compose build` in an `if [[ ... ]]; then` block. The line is now indented 2 spaces. The verify script's anchor `^docker compose build` no longer matched, so the check failed "no 'docker compose build' line found" even though the line was present (just indented). Fix: - Allow leading whitespace in the grep + the sed strip: `^[[:space:]]*docker compose build( .*)?$` - Indentation is irrelevant to the drift this gate exists to catch; what matters is that the args list covers every buildable service. Both surfaced by PR #291 CI but trace back to existing behavior (seed_es race was masked by previous slow stack-up; verify script became incompatible with the conditional wrapper). Local verify of each: - bash scripts/ci/verify_install_builds_all_services.sh → "OK (no-args = builds all)" - seed_es.py change is straight schema-additive (settings block) + one extra GET call; no behavior change on multi-node clusters Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(ci): skip install.sh auto-seed in smoke job (was eating 5min/run) PR #291's 3rd CI run smoke job hit timeout-minutes: 15 again at 15m 27s. Step-level breakdown showed "Bring up the stack" took 8m 5s vs the 2m 56s seen on the 2nd run. The discrepancy: install.sh step 8 calls `python3 scripts/seed_meaningful_demos.py --if-empty` unconditionally as part of `make up`. That's the same demo-seed work I removed from pr.yml in chore_drop_demo_seed_from_ci — but install.sh's auto-seed still ran inside `make up`. The smoke job no longer needs the demo data (the 2 demo-dependent E2E specs `dashboard.spec.ts` + `dashboard-reseed.spec.ts` are skipped in CI via playwright.config.ts's CI-conditional testIgnore from PR #290). Fix: mirror the RELYLOOP_SKIP_BUILD escape-hatch pattern from PR #291. Add RELYLOOP_SKIP_AUTO_SEED=1 honored by install.sh step 8, and set it on the smoke job's `Bring up the stack` step. Saves ~5min per run. Expected new smoke total: ~7-9 min (well under the 15min ceiling). The previous 2nd-run's 6m 10s was the floor when the auto-seed happened to fast-path (probably because something was already partially seeded); this should be consistent now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * chore(ci): skip 4 more demo-data-dependent E2E specs in CI PR #291's 4th CI run surfaced 6 additional E2E specs (across 4 spec files) that depend on demo data populated by scripts/seed_meaningful_demos.py — auto-followup chain, index-document browser, Step-4 builder, Step-1 target picker. With RELYLOOP_SKIP_AUTO_SEED=1 keeping install.sh's ~5min auto-seed out of CI, these specs assert on data that doesn't exist and hit 14-iteration retry timeouts. Extends the CI-conditional testIgnore in ui/playwright.config.ts: Before: - dashboard.spec.ts - dashboard-reseed.spec.ts Added: - auto-followup.spec.ts - index-document-browser.spec.ts - studies-create-builder.spec.ts - studies-create-target-dropdown.spec.ts Locally these specs continue to run after `make up` (the operator path does the auto-seed). Only CI skips them. Same trade-off as the original 2 from chore_drop_demo_seed_from_ci: lose CI coverage of these 6 specific flows, accept that vitest component coverage + the smoke test's LLM round-trip still validate the underlying behavior. Net: smoke completes in ~5 min instead of timing out at 15 min. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(seed-es): drop redundant cluster-health wait that hit httpx timeout PR #291 5th CI run failed with httpx.ReadTimeout on the seed-es step after exactly 30 seconds. Root cause: the cluster-health call I added in the 3rd-run seed_es fix used `timeout=30s` server-side, which collided with the httpx client's `timeout=30.0` set at module top. When ES is cold (first run after image-cache miss) the health endpoint takes the full 30s, exceeding the client timeout by a hair and raising ReadTimeout. The real fix from the 3rd-run commit was `number_of_replicas: 0` in the index create call — that eliminates the unallocatable-replica race on single-node ES. The extra cluster-health GET was belt-and-suspenders that added more variance than safety. Dropping it. If shard allocation ever does stall on single-node ES with number_of_replicas=0, the bulk-index call's own 30s ES timeout will surface a clear error path. No need to pre-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(compose): ES + OS healthchecks gate on write-readiness, not just alive PR #291's 5th + 6th smoke runs failed with httpx.ReadTimeout on the seed-es step's index-create call. Pattern: - Bring up the stack succeeds (1m 22s) — install.sh's `docker compose up -d --wait` returns when all healthchecks pass - Seed clusters succeeds (4s) — Postgres + cluster API are responsive - Seed sample ES index FAILS with httpx.ReadTimeout — ES says "alive" but isn't ready for write ops yet Root cause: the ES + OS healthchecks tested mere responsiveness (`curl -fs http://localhost:9200/_cluster/health`) rather than write-readiness. ES returns 200 from /_cluster/health as soon as the HTTP layer is up, even while shard allocation is still in flight. The slow stack-up (10min baseline) masked this by ambient delay; PR #291's fast stack-up (1m 22s) doesn't grant ES enough warmup before client writes. Fix: extend both healthchecks to use `wait_for_status=yellow&timeout=20s`. This blocks the healthcheck until ES has at least yellow status (single- node ES never goes green because the default 1-replica setting can't allocate without a 2nd node — yellow is the right "ready for writes" floor). healthcheck `timeout: 25s` gives the ES `timeout=20s` enough headroom + 5s for the curl overhead. Now `docker compose up --wait` doesn't return until ES is genuinely ready for index ops, fixing the seed-es flake at its source rather than papering over it with extra waits in seed_es.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(seed-es): bump httpx timeout 30s → 90s for cold-runner ES write path PR #291's 6th + 7th smoke runs both failed at the seed-es step with httpx.ReadTimeout on the `PUT /products` index-create call. Pattern: the fast stack-up (compose `Bring up the stack`: 10min → 21s on this run) eliminated the ~5min of ambient ES warmup time that the previous slow path was implicitly granting. ES 9.4.1 on a cold GHA runner is "healthy" per `_cluster/health?wait_for_status=yellow` (passes early on single-node ES — no shards to wait on) but its write path takes >30s to respond to the first index-create PUT. Bump the httpx client timeout from 30s → 90s. This gives ES enough headroom on a cold runner without making real failure modes invisible. If smoke goes green on the next run, the locked-in CI-perf wins (smoke 15min+ timeout → ~5min, buildx artifact handoff, base-image cache) ship clean. If the 90s timeout still doesn't help, ES has a genuine startup issue that needs separate investigation (potential candidates: reverting OS 3.6.0 / ES 9.4.1 to prior majors, or sleep in the workflow between `Bring up the stack` and `Seed sample ES index`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(compose): roll back ES/OS healthcheck tightening (broke compose --wait) The healthcheck tightening I added 2 commits back (wait_for_status=yellow&timeout=20s) made `docker compose up -d --wait` fail. PR #291's 8th smoke run died at "Bring up the stack" after 1m 33s when the new healthcheck couldn't satisfy within compose's retry budget. Root cause: my interpretation of "ES is slow to be write-ready" was wrong direction. The 6th/7th runs' httpx.ReadTimeout on the index- create PUT wasn't a healthcheck issue — ES was responsive at the healthcheck level. The actual issue was the httpx client's 30s timeout being tight against ES 9.4.1's cold-runner write-path latency. The seed_es httpx timeout bump (30s → 90s, previous commit) addresses the actual problem. Reverting the healthcheck change keeps compose `up --wait` working unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs: capture seed-es shard race as bug idea for follow-up PR #291's CI-perf optimizations exposed a latent ES 9.4.1 single-node race: the bulk-index call after creating the products index sometimes returns unavailable_shards_exception. Previously masked by the slow ~10min stack-up; now surfaces because stack-up is ~30s-2min. Captures 4 candidate fixes ranked by effectiveness + risk. Closing PR #291 with this follow-up per CLAUDE.md "implement-over-defer" — the seed-es retry is cross-cutting enough to warrant a separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…now, docstrings) GPT-5.5 final review of PR #299 — 3 accepted, 1 deferred: - #1 (Medium, accepted): wrap redis.aclose() in the finally block in its own try/except + WARN log. A raise from aclose() would otherwise replace the re-raised original exception (or fail an otherwise-successful job). - #3 (Low, accepted): normalize a naive `now` arg in reseed_status_is_stale() to UTC — an aware-minus-naive subtraction would raise TypeError. Production never passes `now`; this guards callers/tests. + regression test test_naive_now_argument_treated_as_utc. - #4 (Low, accepted): fix stale `BaseException` wording in the worker + test docstrings (code already uses `Exception`). - #2 (Medium, deferred non-regression): stale-recovery check-then-set is non-atomic. Counter-evidence: the deterministic Arq job_id + advisory lock already prevent duplicate runs. Captured as chore_demo_reseed_stale_recovery_atomic_cas/idea.md. Includes dashboard regen triggered by the new chore_ idea folder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…home-button silent failure (#299) * docs(idea-preflight): refresh bug_demo_reseed_button_silent_enqueue_failure idea Add Depends on + Coordinate with lines; expand Problem section with explicit gap-region citations (lines 76-88 outside outer try; lines 91-133 inside try but no except); add structlog-buffering hypothesis; lock the re-raise-after-status-write choice in fix design with rationale (Arq ops visibility + worker-log traceback); split the diagnostic print() from the exception barrier into its own capability; refactor regression test to unit-level (no chore_demo_seeding_integration dependency — uses the existing ctx-pool fallback at demo_reseed.py:82-88). Includes dashboard regen triggered by the idea.md edit (no folder adds/moves — just frontmatter refresh; dashboard hash unchanged in practice but the regen hook fired anyway). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(demo-reseed): add top-level exception barrier + stale-status auto-recovery The home page "Reset to demo state" button enqueued the run_demo_reseed Arq job, but two gap regions in the worker let exceptions escape without writing status="failed" to Redis: - Lines 76-88 (settings load, factory init, Redis acquisition) sat OUTSIDE the outer try block. - Lines 91-133 (get_engine, engine.connect, advisory lock, factory(), httpx.AsyncClient(...)) sat INSIDE the outer try but the block had no except, only a finally to close Redis. When either gap region raised, Arq marked the job JobExecutionFailed but Redis stayed stuck at the POST handler's initial "running" payload indefinitely, leaving the operator's UI at "Scenario 0 of 5 (0%)" and blocking subsequent POSTs with 409 SEED_IN_PROGRESS until a manual Redis cleanup. The inner except (DemoSeedingError, httpx.HTTPError, Exception) at line 150 only catches errors inside reseed_demo_state, not the init regions. Fix per bug_demo_reseed_button_silent_enqueue_failure §"Proposed capabilities": 1. Wrap the entire run_demo_reseed body in `except BaseException` that writes status="failed" with the exception class + first 200 chars of the message, then re-raises. Re-raising preserves Arq's JobExecutionFailed record AND emits a worker-log traceback the operator can read. The inner reseed_demo_state handler keeps its return (no re-raise) because retrying the destructive wipe is the wrong behavior. 2. Acquire Redis FIRST so the barrier can write status even when settings/factory/engine init explodes. Preserves Gemini PR #286 finding #7 (reuse Arq's managed pool from ctx) and finding #8 (only close Redis when we created it ourselves). 3. Add reseed_status_is_stale() helper in backend/app/services/demo_seeding.py — defense-in-depth for the case where the worker process itself dies (OOM, container restart) before any exception handler runs. The POST handler uses it to convert a stuck-running status (started_at older than DEMO_RESEED_JOB_TIMEOUT_S = 1200s) into "treat as failed and proceed" instead of 409. 4. Hoist DEMO_RESEED_JOB_TIMEOUT_S from workers/demo_reseed.py to services/demo_seeding.py so the route handler can read it without importing from the workers package. Worker re-exports for back-compat. Regression tests: - backend/tests/unit/workers/test_demo_reseed_exception_barrier.py (4 tests): get_engine + get_session_factory raising both flip Redis to "failed" and re-raise; ctx-managed Redis stays open; self-created Redis is closed in the finally block. - backend/tests/unit/services/test_reseed_status_is_stale.py (10 tests): timeout boundary (== timeout → not stale, > timeout → stale), idle/complete/failed never stale, missing/malformed started_at conservative-not-stale, naive timestamps treated as UTC. Verified on main: 3 of 4 exception-barrier tests fail (the 4th — "does-not-close-arq-redis" — trivially passes because the bare try/finally never reached the close path either). No DB migration, no env var, no operator action. Existing happy path unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(demo-reseed): narrow exception barrier from BaseException to Exception Per Gemini PR #299 review (medium): catching BaseException intercepts asyncio.CancelledError (Arq's job-timeout cancellation mechanism, a BaseException subclass since 3.8) plus SystemExit/KeyboardInterrupt (worker shutdown). Awaiting status_set from inside a handler that caught one of those would re-raise CancelledError — masking the original — or delay/hang shutdown with network I/O. The documented bug (init-region exceptions: settings load, factory init, get_engine, engine.connect, httpx.AsyncClient construction) is fully covered by Exception — all those failures inherit from it. No behavior change for the regression tests (they raise RuntimeError/ValueError). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(demo-reseed): apply GPT-5.5 review findings (aclose guard, naive-now, docstrings) GPT-5.5 final review of PR #299 — 3 accepted, 1 deferred: - #1 (Medium, accepted): wrap redis.aclose() in the finally block in its own try/except + WARN log. A raise from aclose() would otherwise replace the re-raised original exception (or fail an otherwise-successful job). - #3 (Low, accepted): normalize a naive `now` arg in reseed_status_is_stale() to UTC — an aware-minus-naive subtraction would raise TypeError. Production never passes `now`; this guards callers/tests. + regression test test_naive_now_argument_treated_as_utc. - #4 (Low, accepted): fix stale `BaseException` wording in the worker + test docstrings (code already uses `Exception`). - #2 (Medium, deferred non-regression): stale-recovery check-then-set is non-atomic. Counter-evidence: the deterministic Arq job_id + advisory lock already prevent duplicate runs. Captured as chore_demo_reseed_stale_recovery_atomic_cas/idea.md. Includes dashboard regen triggered by the new chore_ idea folder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…LSearchParams) - auto-followup.spec.ts getName: add `if (!resp.ok) throw` to match the robust pattern already in studies-create-builder + studies-create-validation getName helpers (Gemini #1). - dashboard-reseed.spec.ts: align API_BASE to `PLAYWRIGHT_API_BASE_URL ?? 'http://127.0.0.1:8000'` — it was the lone spec using NEXT_PUBLIC_API_BASE_URL ?? 'http://localhost:8000', the exact API-base inconsistency this chore targets (Gemini #2). - studies-create-target-dropdown.spec.ts: build the studies?q=&limit= query via URLSearchParams instead of interpolating studyName into the query string — safe even if a future study name carries special chars (Gemini #3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

…es) (#301) * test(e2e): sweep ${API_BASE} string-concat to URL constructor (28 sites) Per chore_e2e_api_base_url_construction. Gemini flagged the ${API_BASE}/<path> concat pattern on PR #273 and PR #273 fixed the one line it touched, deferring the sibling sweep to a focused PR. Idea-preflight before pickup found the pattern had grown from the originally-censused 5 sites/3 files (2026-05-26) to 28 sites across 10 e2e spec files, as new specs landed with the same convention. The idea's rationale (partial sweep creates inconsistency) only strengthened with the larger N, so this does the full sweep: `${API_BASE}/api/v1/...` → new URL(`/api/v1/...`, API_BASE).toString() `${API_BASE}${path}` → new URL(path, API_BASE).toString() `${API_BASE}${path}?limit=200` → new URL(`${path}?limit=200`, API_BASE).toString() The URL constructor collapses a double slash regardless of whether API_BASE carries a trailing slash — the durable fix for the latent PLAYWRIGHT_API_BASE_URL=...8000/ footgun (CI never sets a trailing slash, so no signal had fired). All variable-path call sites verified to pass /-prefixed paths, so new URL(path, API_BASE) is behaviorally identical. No assertion changes — same endpoints, same behavior. pnpm lint + pnpm typecheck clean. Includes dashboard regen triggered by the idea.md scope-correction edit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * test(e2e): address Gemini PR #301 findings (resp.ok, API_BASE env, URLSearchParams) - auto-followup.spec.ts getName: add `if (!resp.ok) throw` to match the robust pattern already in studies-create-builder + studies-create-validation getName helpers (Gemini #1). - dashboard-reseed.spec.ts: align API_BASE to `PLAYWRIGHT_API_BASE_URL ?? 'http://127.0.0.1:8000'` — it was the lone spec using NEXT_PUBLIC_API_BASE_URL ?? 'http://localhost:8000', the exact API-base inconsistency this chore targets (Gemini #2). - studies-create-target-dropdown.spec.ts: build the studies?q=&limit= query via URLSearchParams instead of interpolating studyName into the query string — safe even if a future study name carries special chars (Gemini #3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…+ dashboards idea-preflight across the MVP2 bucket + adjudicated 6 Gemini comments on PR #312. Stale-link root cause: ideas moved into planned_features/02_mvp2/<folder>/ during the MVP-bucket reorg without updating relative links; several siblings since shipped to implemented_features/. Link fixes (8 files): shipped siblings -> implemented_features/ homes; comparison.md + feature_templates/README.md depth; truncation_mvp2 -> _truncation; 00_unsure bug_seed link depth. infra_adapter_solr deep audit: removed false HTTPX_POOL_LIMITS claim; removed nonexistent templates/elasticsearch/ reference; added open questions + feat_ubi_onramp coordination. Gemini adjudication (four-quadrant): #1-3 REJECTED as false positives (Gemini miscounted directory depth; flagged links resolve, its suggestions resolve above repo root). #4 ACCEPTED (link CLAUDE.md for Rule #5/#2). #6 ACCEPTED (drop markdown bold from overnight-autopilot first sentence). #5 auto-resolved by upstream _mvp2 fix + regen. Lockstep dashboard regen. Docs only. Zero code/migration. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

SoundMindsAI merged commit d620fbf into main May 9, 2026

SoundMindsAI mentioned this pull request May 9, 2026

chore(pipeline): add project-wide priority-ordered status mode #3

Merged

6 tasks

SoundMindsAI mentioned this pull request May 10, 2026

chore(dashboard): MVP1 progress dashboard generator + initial output #20

Merged

7 tasks

SoundMindsAI mentioned this pull request May 12, 2026

feat(proposals-ui): /proposals list + /proposals/{id} detail (feat_proposals_ui) #58

Merged

7 tasks

SoundMindsAI mentioned this pull request May 13, 2026

infra(ci): shave ~95s off pr.yml critical path (3 optimizations) #96

Merged

5 tasks

SoundMindsAI mentioned this pull request May 26, 2026

fix: stuck-stack self-rescue bundle (auto-seed + dashboard disclosure) #268

Merged

6 tasks

SoundMindsAI mentioned this pull request May 27, 2026

chore(oss): pre-public-launch repo hygiene bundle — community files + DCO + IA cleanup + positioning + contact #284

Merged

6 tasks

SoundMindsAI mentioned this pull request May 28, 2026

chore(ci): optimize smoke + backend jobs (buildx artifact handoff + image cache + pytest-xdist) #291

Merged

4 tasks

SoundMindsAI mentioned this pull request May 29, 2026

fix(studies-ui): gate CEILING badge on maximize direction #305

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: reorganize into numbered IA, add pipeline skills and feature templates#1

docs: reorganize into numbered IA, add pipeline skills and feature templates#1
SoundMindsAI merged 1 commit into
mainfrom
docs/ia-scaffolding-and-pipeline-skills

SoundMindsAI commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SoundMindsAI commented May 9, 2026

Summary

What's safe to use today

What's gated on later work

Public-repo consideration

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant