Skip to content

docs: reorganize into numbered IA, add pipeline skills and feature templates#1

Merged
SoundMindsAI merged 1 commit into
mainfrom
docs/ia-scaffolding-and-pipeline-skills
May 9, 2026
Merged

docs: reorganize into numbered IA, add pipeline skills and feature templates#1
SoundMindsAI merged 1 commit into
mainfrom
docs/ia-scaffolding-and-pipeline-skills

Conversation

@SoundMindsAI
Copy link
Copy Markdown
Owner

Summary

  • Reorganize docs into a numbered IA. Relocate the umbrella product spec and MVP1 execution plan to docs/00_overview/product/. Add per-section README index files (docs/00_overview/ through docs/09_decisions/). Update root README.md, CONTRIBUTING.md, and docs/README.md to point at the new locations.
  • Port six pipeline skills to .claude/skills/ (idea-preflight → spec-gen → impl-plan-gen → impl-execute → guide-gen, plus the pipeline orchestrator) from the sibling creator-discovery-outreach project. Each carries a banner explaining what was preserved verbatim as illustrative provenance vs. what needs adaptation as RelyLoop matures.
  • Port the feature templates and reference examples to docs/02_product/planned_features/feature_templates/. The two big templates (feature-spec-template.md, implementation-plan-template.md) carry HTML porting banners with explicit for §X, write Y instead guidance for sections that depend on architecture RelyLoop hasn't adopted yet (audit-events, admin impersonation, multi-tenant scoping, backend paths, migration tool, cross-model review).

32 files changed, +7,355 / −22.

What's safe to use today

  • idea-preflight against any well-formed idea.md
  • spec-gen and impl-plan-gen Generate / Review / Reconcile modes (templates in place)
  • pipeline orchestration up through plan generation
  • The folder-prefix taxonomy (feat_, bug_, chore_, infra_, epic_)

What's gated on later work

  • impl-execute — needs backend/, web/, CI to exist; cross-model review tooling (Gemini PR comments fetch, GPT-5.5 invocation) needs re-wiring
  • guide-gen — dormant until web frontend (MVP3+ per spec roadmap)
  • Audit-events story checks across templates and skills — only become live if RelyLoop adopts an audit-events subsystem

Public-repo consideration

The README states this repo is currently soundminds.ai-internal and will become public when MVP1 ships. The ported skills and example specs reference CDO architectural details verbatim (TENANT_ACTIVITY_ALLOWLIST, audit_event_repo.py, _SENSITIVE_KEYS in log_scrubber.py, internal CDO PR numbers like #183, internal file paths). Decide before merge whether to:

  1. Merge as-is — provenance is preserved and the framing is honest
  2. Scrub CDO file paths and PR numbers from the inline anecdotes (skills' battle-tested rules survive; only the cited incidents change)
  3. Hold the merge until after MVP1 ships (and the repo's public-vs-internal status is settled)

Test plan

  • docs/00_overview/product/relevance-copilot-spec.md and mvp1-execution-plan.md open from the link in README.md
  • .claude/skills/*/SKILL.md banners render cleanly in the IDE preview
  • docs/02_product/planned_features/feature_templates/ — open the spec and plan templates; the HTML porting banner at the top reads as actionable guidance
  • grep -rn creator-discovery-outreach . returns only the porting-banner mentions and example specs (no orphan refs)

🤖 Generated with Claude Code

…mplates

- Move design artifacts into docs/00_overview/product/:
  - relevance-copilot-spec.md (was at repo root)
  - mvp1-execution-plan.md (was at repo root)
- Add per-section README index files (docs/00_overview through docs/09_decisions).
- Update root README.md, CONTRIBUTING.md, and docs/README.md to point at the
  new locations.

- Port six pipeline skills from the sibling creator-discovery-outreach project
  to .claude/skills/ (idea-preflight, spec-gen, impl-plan-gen, impl-execute,
  guide-gen, pipeline). Per-skill banners flag what was kept verbatim as
  illustrative provenance vs. what needs adaptation as RelyLoop matures.
  guide-gen marked dormant until RelyLoop has a web frontend (MVP3+ per spec).

- Port feature templates to docs/02_product/planned_features/feature_templates/
  (idea, feature-spec, implementation-plan, README, examples). HTML porting
  banners on the big templates document conditional sections (audit-events,
  admin impersonation, tenant scoping, backend paths, migration tool,
  cross-model review) — strip the banner and adapt when authoring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SoundMindsAI SoundMindsAI merged commit d620fbf into main May 9, 2026
SoundMindsAI added a commit that referenced this pull request May 9, 2026
…n.md

GitHub Free + private repo allows rulesets to be CREATED but not
ENFORCED — surfaced as "Your rulesets won't be enforced on this
private repository until you move to GitHub Team organization account"
when SoundMindsAI tried to follow Procedure A on the relyloop repo.

Adds a plan × visibility matrix and the three triggers that actually
warrant an upgrade to Team ($4/user/month). For pre-public solo work,
the agent's CLAUDE.md Rule #1 enforcement + visible CI status in the
PR UI substitutes for the missing merge-block — no plan upgrade needed
until a contributor joins or the repo goes public (which auto-activates
the dormant ruleset for free).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 10, 2026
Adds a `mvp1-dashboard-regen` pre-commit hook that runs
`scripts/build_mvp1_dashboard.py` whenever a file under
`docs/02_product/planned_features/`, `docs/00_overview/implemented_features/`,
or the generator script itself is staged. Standard pre-commit
"files were modified" abort UX — same as ruff-format / prettier.

Why pre-commit instead of a literal git post-merge hook: a post-merge
hook would regenerate the dashboard locally after `git pull` and require
a direct commit to main to land it, violating CLAUDE.md Absolute Rule #1.
The pre-commit hook makes dashboard freshness a *PR-merge invariant*
(every PR that touches features ships the regenerated dashboard
alongside the change), which is the right shape for this project's
"feature branch + PR" workflow.

Idempotency fix: the dashboard's "as-of" stamp now uses the most-recent
mtime of any feature-folder .md file (or the generator script itself)
instead of `datetime.now()`. Prevents per-commit timestamp churn that
would otherwise create spurious dashboard diffs on every unrelated PR.
Verified by running `make dashboard` twice in a row → byte-identical
output.

Header text reworded from "Generated …" to "Reflects feature-folder
state as of …" so the semantic is clear: this is a snapshot of folder
state, not a transient build artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 10, 2026
…20)

* chore(dashboard): MVP1 progress dashboard generator + initial output

Adds `scripts/build_mvp1_dashboard.py` (stdlib-only) that walks
`docs/02_product/planned_features/` + `docs/00_overview/implemented_features/`,
parses each feature's idea / spec / plan / pipeline_status artifacts, and
writes a single self-contained HTML dashboard at
`docs/00_overview/mvp1_dashboard.html` with:

* KPI row: Features done (X / 12 scoped MVP1), Path to MVP1 (remaining
  items), Open bugs, Open chores. Plus a secondary line showing backlog
  ideas (idea-only feat_/infra_ folders not yet scoped) and in-flight
  count.
* Kanban view: Idea / Spec / Plan / Implementing / Done columns; type-
  colored cards (feat = indigo, infra = amber, chore = slate, bug = red,
  epic = blue); filter chips by type.
* Dependency DAG: Mermaid `graph LR` of scoped feat_/infra_/chore_ nodes
  with stage-colored classes and edges parsed from each spec's
  "Depends on" line.
* Per-card detail: name (linked to feature_spec.md), one-liner extracted
  from `**Outcome:**` (spec) or `## Problem` (idea), PR# + merged date
  for shipped features, deferred-phase note for partial completions.

Source of truth is the folder structure, not `mvp1-user-stories.md`
(which stays as the human-curated narrative). Regenerate after any
spec/plan/pipeline_status edit with `make dashboard`. The script strips
trailing whitespace from output lines to match the project's pre-commit
hook so subsequent regenerations are idempotent.

Build/CI changes:
* `pyproject.toml`: extend ruff per-file-ignores so `scripts/**` skips
  the `D` (pydocstyle) rule — same rationale as `backend/tests/`,
  operator tooling not library code.
* `Makefile`: new `dashboard` target shows up in `make help`.

Initial dashboard snapshot: 2 / 12 MVP1 features done (17%); 14 items
remaining (10 features + 2 bugs + 2 chores); feat_study_lifecycle
in-flight (Phase 1 done, Phase 2 deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(dashboard): pre-commit hook + idempotent timestamp

Adds a `mvp1-dashboard-regen` pre-commit hook that runs
`scripts/build_mvp1_dashboard.py` whenever a file under
`docs/02_product/planned_features/`, `docs/00_overview/implemented_features/`,
or the generator script itself is staged. Standard pre-commit
"files were modified" abort UX — same as ruff-format / prettier.

Why pre-commit instead of a literal git post-merge hook: a post-merge
hook would regenerate the dashboard locally after `git pull` and require
a direct commit to main to land it, violating CLAUDE.md Absolute Rule #1.
The pre-commit hook makes dashboard freshness a *PR-merge invariant*
(every PR that touches features ships the regenerated dashboard
alongside the change), which is the right shape for this project's
"feature branch + PR" workflow.

Idempotency fix: the dashboard's "as-of" stamp now uses the most-recent
mtime of any feature-folder .md file (or the generator script itself)
instead of `datetime.now()`. Prevents per-commit timestamp churn that
would otherwise create spurious dashboard diffs on every unrelated PR.
Verified by running `make dashboard` twice in a row → byte-identical
output.

Header text reworded from "Generated …" to "Reflects feature-folder
state as of …" so the semantic is clear: this is a snapshot of folder
state, not a transient build artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(dashboard): emit GitHub-native Markdown alongside the HTML

Reason: HTML files don't render inline in GitHub's web UI (the .html
shows as raw source when clicked from the repo browser). The user asked
for a view they can read on github.com directly, without enabling
Pages or going through htmlpreview.github.io.

This commit makes `make dashboard` emit two files:

* `docs/00_overview/mvp1_dashboard.html` — rich local view (filter
  chips, type-color cards, sticky header). Open locally in a browser.
* `docs/00_overview/MVP1_DASHBOARD.md` — GitHub-native view (KPI
  table, stage-per-section tables, Mermaid graph). Renders inline when
  browsing github.com. GitHub markdown supports Mermaid blocks since
  Feb 2022, so the dependency DAG renders too.

The Markdown intentionally trades visual richness for native rendering:
no filter chips (no JS), no per-cell colors (no CSS), but the
information architecture is identical and the cross-link to the HTML
view is one click away.

Two PR-extraction bugs caught during the rewrite:

1. `_extract_pr_number()` was returning `matches[-1]` (last `PR #N`
   reference), which picked up dependency cites in implementation_plan.md
   instead of the feature's own PR (e.g., `infra_adapter_elastic` was
   showing PR #4 because the plan cites `infra_foundation`'s PR #4 as
   a dep). Fix: prioritize the `## Implement` section of
   pipeline_status.md, then the plan's `**Status:**` header, then a
   `merged`-context match across all sources, then first-match. Also
   broadened the regex to accept `PR: [#N]` markdown-link format
   (which was failing the `PR\s*#` shape).

2. Markdown relative links were emitting `../docs/02_product/...` when
   they should have been `../02_product/...` (the markdown lives at
   `docs/00_overview/`, so `..` already lands at `docs/`). Fix: use
   `os.path.relpath(target, OUTPUT_MD.parent)` instead of constructing
   the path manually.

Verified: both files idempotent across consecutive `make dashboard`
runs; PR numbers match git log + CLAUDE.md (#4 / #16 / #18); planned-
features links resolve correctly from `docs/00_overview/`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 12, 2026
…cycle 2)

Root cause of the polling-reconciler + webhook receiver test failures
on CI cycle 1:

* ``config_repos`` rows persist across tests (``_clean_phase2_tables``
  deliberately doesn't wipe operator-managed config_repos).
* Both test files reused the SAME ``(owner, repo)`` across every test
  (``octocat/configs`` / ``octocat/hello-world``).
* After test #1, two ``config_repos`` rows now match the same
  ``(owner, repo)``. ``lookup_config_repo_by_owner_repo`` returns the
  FIRST match — test #1's row — whose ``auth_ref`` /
  ``webhook_secret_ref`` point at test #1's already-deleted
  ``tmp_path``.
* Surfaced as ``pr_reconcile_pat_missing`` (reconciler) and
  403 ``INVALID_SIGNATURE`` (webhook receiver) for every test after
  the first.

Fix: each test's fixture now generates a unique ``(owner, repo)`` pair
keyed on a random hex suffix. ``_pull_request_body`` was lifted to
take ``owner`` and ``repo_name`` as explicit args so it doesn't close
over the module-level constants (which no longer exist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 12, 2026
…to-register (#56)

* feat(github-webhook): migration 0006 + relyloop_pr_poll_minutes setting (Story 1.1)

* Add Alembic migration 0006_proposals_pr_url_idx — partial B-tree index
  `proposals(pr_url) WHERE pr_url IS NOT NULL`, with matching downgrade.
  Round-trips cleanly per CLAUDE.md Rule #5.
* Add Settings.relyloop_pr_poll_minutes (default 15, ge=1, le=1440). Story
  3.1 will narrow to a whitelist of cron-expressible values.
* Document RELYLOOP_PR_POLL_MINUTES in .env.example.
* New integration test asserts the partial-WHERE predicate via pg_indexes
  and verifies the downgrade-then-upgrade round-trip.
* New unit test exercises the Settings field's default + env override +
  ge/le bounds (1 + 1440 accepted; 0, -1, 1441 raise ValidationError).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(github-webhook): HMAC verifier + repository.full_name parser (Story 1.2)

Two pure-domain helpers in backend/app/domain/git/:

* verify_webhook_signature(body, signature_header, secret) — HMAC-SHA256
  verification of GitHub's X-Hub-Signature-256 header. Uses
  hmac.compare_digest for constant-time comparison. Refuses empty secret
  (no unsigned acceptance) and rejects sha1= / missing-prefix variants.
* parse_repository_full_name(value) — parses the canonical owner/repo
  short form GitHub emits in webhook payloads. HTTPS URLs / SSH URLs /
  enterprise-host inputs all return None — spec FR-1 pairs this with
  validate_repo_url on the config_repos.repo_url side, NOT a duplicated
  URL regex.

Plus a cross-parser parity test asserting that for canonical inputs the
two parsers produce comparable (owner, repo) tuples after case-folding,
and that the SSH / enterprise-host spec §14 negative cases are rejected
by both.

36 unit tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(github-webhook): pure-domain event dispatcher (Story 1.3)

dispatch_event(event_type, payload) → WebhookDecision encapsulates the
FR-1 routing matrix:
* ping → action=ping, mutation=none
* pull_request closed+merged=true → mutation=merged
* pull_request closed+merged=false → mutation=closed (status stays pr_opened)
* pull_request reopened → mutation=reopened
* pull_request other actions → action=noop, mutation=none
* unknown event type → action=noop

The dispatcher NEVER emits action="unknown_pr" — that string is router-
owned, set after lookup_proposal_by_pr_url misses. The action Literal is
narrowed to {"applied", "noop", "ping"} so static typing enforces the
contract (cross-model review F2).

WEBHOOK_ACTION_VALUES + HANDLED_EVENT_TYPES frozensets are the spec §8.4
source-of-truth, re-exported from the domain package and (in Story 2.1)
from the router module so the cite at backend/app/api/webhooks/github.py
also passes a static grep.

28 unit tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(github-webhook): proposal + config_repo repo functions (Story 1.4)

5 new proposal repo functions (mark_proposal_pr_merged/closed/reopened,
lookup_proposal_by_pr_url, list_pr_opened_proposals_for_reconcile) plus
2 new config_repo repo functions (set_webhook_registration_error,
lookup_config_repo_by_owner_repo). All conditional UPDATEs return None
on the zero-row match so the webhook receiver + polling reconciler can
log benignly and skip — mirrors the cycle-3 F4 pattern from
mark_proposal_pr_opened (feat_github_pr_worker Story 1.1).

- proposal.py: 5 new functions + Sequence/datetime/UTC/timedelta imports
- config_repo.py: 2 new functions + validate_repo_url import for the
  owner/repo canonicalisation loop (skip-on-UnsupportedProviderError
  for historic non-GitHub URLs)
- __init__.py: re-export all 7
- 11 integration tests in test_proposal_repo_webhook.py covering happy
  paths + idempotent no-ops + 90-day reconcile cutoff
- 6 integration tests in test_config_repo_repo_webhook.py covering
  set/clear cycle + dot-git suffix + case-insensitive owner/repo lookup

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(github-webhook): extract shared github_request helper (Story 1.5)

Generalises _github_post (POST-only, feat_github_pr_worker) to
method-agnostic github_request(client, method, url, *, json_body, token)
in the canonical backend/app/git/ namespace (per CLAUDE.md repo
structure). The polling reconciler (Story 3.1, GET /pulls/{n}) and
the register-webhook worker (Story 4.1, GET + POST /hooks) share the
retry contract; existing open_pr POST call sites unchanged in behavior.

- backend/app/git/__init__.py + github_client.py: github_request + 4
  public inspection helpers (parse_retry_after, is_secondary_rate_limit,
  body_mentions_rate_limit, parse_rate_limit_reset) + 4 retry-policy
  constants. Method routed via httpx AsyncClient.request(verb, ...).
- backend/workers/git_pr.py: drop the inlined _github_post + 4 inspection
  helpers + 4 retry constants (range 593..690 + 116..124); import from
  the new module. Replace 2 _github_post() call sites with
  github_request(client, "POST", ...). 88 LOC removed.
- test_git_pr_helpers.py: redirect 8 header-parser tests to the new
  public symbols via import-as aliases; remove 5 _github_post retry
  tests now covered by the new test_github_client.py (parametrised
  over GET + POST).
- backend/tests/unit/git/test_github_client.py: 23 new cases via
  httpx.MockTransport — happy path / 5xx retry / 429 Retry-After /
  403 retry-after / 403 secondary-rate-limit / 403 abuse body /
  403 terminal / 404 terminal / RequestError retry + propagate /
  5xx-budget-exhausted-returns-last / token+version headers — each
  parametrised over GET + POST.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(github-webhook): POST /webhooks/github router + handler (Story 2.1)

POST /webhooks/github mounts unprefixed (per CLAUDE.md Rule #6 webhook
exception). 9-step contract: raw body → repository.full_name parse →
config_repo lookup → mounted secret read → HMAC verify → dispatch →
proposal lookup (mutation branch) → conditional repo mutation →
structured webhook_received log + 200 response.

- backend/app/git/secrets.py: hoisted read_mounted_secret() from
  workers/git_pr._read_pat. RELYLOOP_SECRETS_DIR env override; same
  containment / directory / OSError contract. Consumed by the webhook
  router AND queued for Stories 3.1 + 4.1.
- backend/app/api/webhooks/{__init__,github}.py: new router. Re-exports
  WEBHOOK_ACTION_VALUES so spec §8.4's grep cite at this path passes.
  Single error code: INVALID_SIGNATURE (403) — covers signature
  mismatch AND unknown-repo so the receiver doesn't reveal enumeration.
  Router-only "unknown_pr" override when lookup_proposal_by_pr_url
  returns None after a mutation-requesting decision (dispatcher
  contract: action ∈ {applied, noop, ping}; "unknown_pr" is wire-only).
- backend/app/main.py: include_router unprefixed.
- backend/tests/integration/test_webhook_github.py: 13 cases covering
  merged / closed-unmerged / reopened / 5 noop pull_request actions /
  unknown event type / ping / unknown_pr URL / bad signature / missing
  signature / unknown repo / structured-log assertion. Consolidated
  into one file (plan listed 9; one file keeps the shared
  sign-and-seed helpers honest).
- backend/tests/contract/test_webhook_api_contract.py: 4 assertions —
  OpenAPI registration, WEBHOOK_ACTION_VALUES re-export, single-code
  negative grep, no-secret-in-log static grep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(github-webhook): reconcile_pr_state cron worker (Story 3.1)

Polling reconciler for stale pr_opened proposals (FR-2 + AC-3 + AC-8).
The cron tick lists pr_opened+open proposals < 90 days old, calls
GET /repos/{o}/{r}/pulls/{n} via the Story 1.5 shared github_request,
and applies the matching mark_proposal_pr_* on the response.

- backend/workers/pr_reconcile.py: reconcile_pr_state coroutine.
  Per-proposal: parse pr_url → look up config_repo via
  lookup_config_repo_by_owner_repo (Story 1.4) → read PAT via
  read_mounted_secret (Story 2.1 hoist) → GET pulls/{n}. Branches on
  200/merged, 200/state=closed, 200/state=open. 401/403/404/5xx and
  RequestError after retry budget → WARN + skip. 429 → WARN + break
  the proposal loop for this tick (spec §10).
  Also exports SUPPORTED_POLL_MINUTES (18 values) +
  _poll_cron_kwargs() — the divisor-of-60 / multiple-of-60 routing
  the cron-arg builder uses.
- backend/app/core/settings.py: tighten relyloop_pr_poll_minutes
  via @field_validator referencing SUPPORTED_POLL_MINUTES — operator
  sees the rejection at boot, not at first cron tick.
- backend/workers/all.py: register `cron(reconcile_pr_state,
  **_poll_cron_kwargs())` on WorkerSettings.cron_jobs.
- backend/tests/unit/workers/test_poll_cron_kwargs.py: 20 cases —
  every value in the whitelist (12 sub-hourly + 6 multi-hour),
  fallback for unsupported value, size sanity check.
- backend/tests/integration/test_polling_reconciler.py: 10 cases
  via httpx.MockTransport — merged/closed-unmerged/still-open happy
  paths, 5 terminal-error responses (401/403/404/500/503),
  RequestError-after-budget, 429-short-circuit, 50-proposal AC-8
  budget assertion.
- backend/tests/unit/test_workers.py: cron_jobs registration sanity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(github-webhook): register_webhook Arq worker (Story 4.1 / FR-3)

Idempotent GitHub webhook creation enqueued by POST /api/v1/config-repos
(Story 4.2) for every newly registered config_repo whose
webhook_secret_ref is non-NULL.

- backend/workers/register_webhook.py: register_webhook coroutine.
  Steps: load config_repo → validate_repo_url → read PAT + secret →
  build hook URL from Settings.relyloop_base_url → GET /hooks scan for
  matching config.url → POST /hooks if absent. Per-class error
  messages on 404 (PAT scope), 422 (validation), 5xx (transient),
  RequestError (network) → set_webhook_registration_error +
  return {"status":"failed"} (never raises into Arq's retry loop —
  the column is the durable signal).
- backend/workers/all.py: register in WorkerSettings.functions.
- backend/tests/integration/test_register_webhook_worker.py: 6
  AC-mapped cases via httpx.MockTransport — happy create, dedup
  existing, 404 PAT-scope, 422 validation, 503 transient, network
  RequestError. Each asserts both the return status AND the
  webhook_registration_error column shape.
- backend/tests/unit/test_workers.py: extend the expected-function
  set so worker_settings_importable sees register_webhook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(github-webhook): post-commit enqueue register_webhook (Story 4.2)

POST /api/v1/config-repos now best-effort enqueues register_webhook
(Story 4.1) after the DB commit when the new row has
webhook_secret_ref populated. Enqueue failure (Redis down, pool
absent, transient blip) does NOT break the 201 — it logs WARN and
the operator drives recovery via the runbook.

- backend/app/api/v1/config_repos.py: add request: Request, pull the
  Arq pool via getattr(request.app.state, "arq_pool", None) per the
  established pattern (proposals.py:516, studies.py:167; the
  Depends(get_arq_pool) factory doesn't exist in the codebase).
  Two WARN log events on the failure paths:
  register_webhook_enqueue_skipped_no_pool and
  register_webhook_enqueue_failed.
- backend/tests/integration/test_config_repos_extension.py: 6 cases —
  response-shape preservation, enqueue called when secret_ref set,
  enqueue not called when null, pool absent → 201 + WARN, enqueue
  raises → 201 + WARN, static-grep against get_arq_pool factory drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(github-webhook): runbook + US-20/21 implemented + tracker (Story 4.3)

- docs/03_runbooks/webhook-debugging.md: 7-section operator playbook —
  inspect GitHub delivery panel, redeliver, log-line verification,
  rotate webhook secret, force-reconcile a specific proposal,
  diagnose silent cron, register_webhook per-class error triage +
  manual re-enqueue.
- docs/03_runbooks/README.md: index the new runbook; drop the
  "Coming with later features" stub for webhook-replay.md (folded
  into webhook-debugging.md §2 Redeliver).
- docs/02_product/mvp1-user-stories.md: US-20 + US-21 flipped to
  Implemented.
- implementation_plan.md: tracker — every story marked complete with
  commit SHA.

state.md + CLAUDE.md flips land in the finalization commit after the
PR number is known.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(github-webhook): state.md + CLAUDE.md + tangential idea

- state.md: flip current-branch + Alembic head (0005 → 0006) + add
  recent-changes entry summarizing all 10 stories + tangential capture.
- CLAUDE.md: flip feature-status row 8 + redirect "GitHub webhook
  setup" runbook pointer at the new webhook-debugging.md.
- bug_test_smoke_requires_env_vars/idea.md: capture pre-existing
  failure where test_smoke.py::test_app_import requires
  DATABASE_URL_FILE + POSTGRES_PASSWORD_FILE env vars. CI passes; only
  local-dev `pytest` runs hit it. 5-minute fixture fix.
- docs/00_overview/MVP1_DASHBOARD.md + mvp1_dashboard.html: regenerated
  by the mvp1-dashboard-regen pre-commit hook after the row-8 flip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(github-webhook): CI failures + GPT-5.5 review fixes

CI cycle 1 fixes:

1. structlog `event=` kwarg conflict (TypeError: BoundLogger.warning()
   got multiple values for argument 'event') — rename to `gh_event=`
   in 5 webhook-router log call sites. structlog reserves `event` for
   the positional event name.
2. Migration head expectation in test_migrations.py: 0005 → 0006
   (head advanced when feat_github_webhook Story 1.1 landed 0006).
3. test_digests_migration.py::test_downgrade_drops_digests_table:
   retarget to explicit `downgrade 0004` (was `-1`, which now drops
   the 0006 partial index instead of the digests table).

GPT-5.5 final-review accepts (cycle 1, 2 of 3 findings):

* F2 (High) — non-object JSON payload (`[]`, `"foo"`, `null`) reached
  `payload.get(...)` → AttributeError → 500. Added
  `isinstance(parsed_body, dict)` guard with a 403 INVALID_SIGNATURE
  response (matches the invariant: INVALID_SIGNATURE is the ONLY
  error code the webhook router emits). New regression test:
  `test_webhook_non_object_json_returns_403`.
* F3 (Medium) — `pull_request.merged=true` with `merged_at=null`
  (GitHub eventual-consistency) would hit
  `assert decision.pr_merged_at is not None` → 500. Router now falls
  back to `mark_proposal_pr_closed` (PR is no longer open; reconciler
  catches up on merged_at next tick). New regression test:
  `test_webhook_pr_merged_with_missing_merged_at_does_not_500`.

GPT-5.5 F1 (High — "Settings validator + _poll_cron_kwargs fallback
are redundant") rejected with cited counter-evidence: plan §11 row 16
explicitly tightens the Settings field validator (Story 3.1
coordinated spec patch). The _poll_cron_kwargs WARN+fallback at
pr_reconcile.py:229 is belt-and-suspenders for hot-reload / in-process
Settings mutation. Both layers are plan-conformant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(github-webhook): test isolation — unique owner/repo per test (CI cycle 2)

Root cause of the polling-reconciler + webhook receiver test failures
on CI cycle 1:

* ``config_repos`` rows persist across tests (``_clean_phase2_tables``
  deliberately doesn't wipe operator-managed config_repos).
* Both test files reused the SAME ``(owner, repo)`` across every test
  (``octocat/configs`` / ``octocat/hello-world``).
* After test #1, two ``config_repos`` rows now match the same
  ``(owner, repo)``. ``lookup_config_repo_by_owner_repo`` returns the
  FIRST match — test #1's row — whose ``auth_ref`` /
  ``webhook_secret_ref`` point at test #1's already-deleted
  ``tmp_path``.
* Surfaced as ``pr_reconcile_pat_missing`` (reconciler) and
  403 ``INVALID_SIGNATURE`` (webhook receiver) for every test after
  the first.

Fix: each test's fixture now generates a unique ``(owner, repo)`` pair
keyed on a random hex suffix. ``_pull_request_body`` was lifted to
take ``owner`` and ``repo_name`` as explicit args so it doesn't close
over the module-level constants (which no longer exist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(github-webhook): second migration-head assertion missed in cycle 1

test_migrations.py::TestBaselineMigration::test_round_trip still
asserted ``0005``. My cycle-1 fix only updated the comment block
without the literal value. Both assertions now check ``0006``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 12, 2026
…ist poll + missing DoD tests

GPT-5.5 final cross-model review on PR #58 returned 5 findings (0 High,
3 Medium, 2 Low). Adjudication:

Finding #1 (Medium) — pr_open_error alert missing from header area.
REJECTED with cited counter-evidence: the plan's note that the panel
alert "duplicates here for visibility" is speculative, not a requirement.
The PrPanel placement is action-adjacent (the canonical UX); duplicating
the alert in the header produces redundant visual noise without
additional information.

Finding #2 (Medium) — flip-off effect missing.
REJECTED with cited counter-evidence: the derived effectivePollingFlag
correctly masks all observable behavior. fireOpenPR's
clear-existing-before-new pattern means the safety timer is never
orphaned; the 60s timeout always reaches a definite cleanup path. Re-
adding the watching useEffect would re-trigger the React 19
react-hooks/set-state-in-effect rule that the cycle-3 patch
deliberately avoided.

Finding #3 (Low) — list refetchInterval doesn't respect source filter.
ACCEPTED: FR-1 says "any VISIBLE row". The refetchInterval predicate
now applies the same sourceFilter check as visibleRows so a hidden
study-sourced pr_opened+open row doesn't keep the page polling for
invisible state.

Finding #4 (Low) — list-page tests don't assert wire params after URL update.
DEFERRED as chore_proposals_list_wire_param_e2e_test/idea.md. The static
useSearchParams mock doesn't re-render on router.replace, so the
existing tests assert the intent (router.replace called with the right
URL) but not the wire round-trip. Behavior is structurally correct.

Finding #5 (Medium) — 60s safety cap test + Open-PR toast contract test missing.
ACCEPTED: both were explicit Story 3.2 DoD items that didn't land.

- 60s safety cap test: simulates a worker that never writes back; uses
  vi.useFakeTimers to advance 65s and asserts polling stops (cumulative
  ticks bounded at ~21, no further ticks in the subsequent 30s window).
- Open-PR error-toast contract test: instantiates a custom MutationCache
  with a synchronous onError spy, fires 3 sequential clicks against msw
  handlers returning GITHUB_NOT_CONFIGURED, CLUSTER_HAS_NO_CONFIG_REPO,
  QUEUE_UNAVAILABLE. Errors marked retryable=false so the api-client's
  4-attempt 503 backoff doesn't compound test time (retry behavior is
  exercised by api-client.test.ts). Asserts err.errorCode reaches the
  global handler for all 3 codes per GPT-5.5 cycle-1 B6.

MVP1 dashboard regenerated by pre-commit hook.

Test count: 171 (was 169) — 32 files all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 12, 2026
…oposals_ui) (#58)

* docs(feat-proposals-ui): idea-preflight ground-truth pass against shipped deps

§2 + §5 rewritten past-tense (all four deps shipped between 2026-05-09 and 2026-05-12);
§7.4 Literal paths corrected to backend/app/api/v1/schemas.py (the wire-shape source of
truth) instead of backend/db/models/proposal.py; FR-6 extended to preserve the existing
useProposalForStudy hook (consumed by feat_studies_ui's study-detail Open-PR button) and
to narrow useProposals.filter.status from string to ProposalStatus; FR-2/FR-3 polling
cadences disambiguated (3s post-click vs 30s steady-state); §11 added duplicate-Open-PR
and concurrent-merge-during-reject edge cases.

MVP1 dashboard regenerated by pre-commit hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(feat-proposals-ui): impl-plan-gen + GPT-5.5 review (3 cycles, 18 findings)

Implementation plan generated from the approved feature spec. Single-phase
frontend-only feature: 7 stories across 4 epics building two new Next.js
routes (/proposals list + /proposals/{id} detail) on top of the already-shipped
backend endpoints from feat_digest_proposal, feat_github_pr_worker, and
feat_github_webhook.

Cross-model review converged in 3 cycles (cap per impl-plan-gen protocol):
- Cycle 1: 11 findings (2 High / 6 Medium / 3 Low) — all applied. Most
  consequential: lift useOpenPR from PrPanel to the page (A1) so the page-
  level refetchInterval can read isPending; replace 3s-polling-tied-to-
  isPending with a postOpenPrPolling page-state flag + 60s safety cap (B1);
  strip ?action=open_pr via router.replace after auto-trigger (B2).
- Cycle 2: 4 follow-on findings (2 High / 2 Medium) — all applied. Wired
  fireOpenPR() helper through both manual click + auto-trigger paths (A1);
  validated URL ?status= against PROPOSAL_STATUS_VALUES before passing to
  the now-narrowed useProposals hook (A2); added <Suspense> wrapping for
  useSearchParams on the detail route (B1); added safety-timer unmount
  cleanup (B2).
- Cycle 3: 3 cleanup findings (0 High / 2 Medium / 1 Low) — all applied.
  Failure-mode catalog rewritten to match the fixed polling design (A1);
  destructured stable primitives (mutateOpenPR, proposalStatus,
  proposalPrOpenError) before effects to keep dep identities stable (B1);
  UI element inventory aligned with the PrPanel contract (B2).

MVP1 dashboard regenerated by pre-commit hook.

Plan is Ready for Execution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(proposals-ui): extend lib/api/proposals.ts hook surface (Story 1.1)

Adds 3 new TanStack Query hooks + extends 1 existing hook + narrows 1 type
to fill the FR-6 contract documented in feat_proposals_ui:

- useProposals(filter, options?): narrowed filter.status from string to
  ProposalStatus (compiler now rejects /proposals?status=invented); accepts
  an optional refetchInterval (function form) so the list page can fire a
  30s pulse-refetch when any visible row is pr_opened+open (FR-1). Bare
  useProposals(filter) signature preserved.
- useProposalForStudy(studyId): unchanged byte-for-byte — consumed by
  feat_studies_ui's study-detail DigestPanel for the Open-PR button.
- useProposal(id, options?): new; accepts refetchInterval function form
  for FR-2 30s steady-state + FR-3 3s post-click cadences.
- useOpenPR(): new; POSTs to /api/v1/proposals/{id}/open_pr; onSettled
  invalidates ['proposal', id] + ['proposals'] so active queries refetch.
- useRejectProposal(): new; POSTs to /reject; same invalidation set
  delivers the spec §11 "refresh detail on 409" behavior without
  special-case error-code branching.

Test coverage: 9 vitest cases (1 typed-status param, 2 refetchInterval
on/off, 1 useProposalForStudy regression, 2 useProposal, 1 useOpenPR
invalidation via msw handler hit counts per GPT-5.5 cycle-1 B5, 2
useRejectProposal incl. 409 INVALID_STATE_TRANSITION refresh).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(proposals-ui): filter-chip + cluster-select primitives (Story 1.2)

Three new presentational components consumed by the /proposals list page:

- ProposalStatusFilterChips: mirrors study-status-filter-chips.tsx shape.
  Wire values sourced from PROPOSAL_STATUS_VALUES (canonical allowlist in
  ui/src/lib/enums.ts:101 — verified against backend ProposalStatusWire at
  backend/app/api/v1/schemas.py:659). Unknown URL values fall back to "all"
  active (per Story 2.1 task 2 — defends the FR-1 chip when the URL is
  rewritten externally).
- ProposalSourceFilterChips: client-only 3-value chip set (all/study/manual).
  Source is derived from proposal.study_id non-null vs null; values DO NOT
  belong in enums.ts (no wire contract). Inline comment notes the
  client-side scope.
- ClusterFilterSelect: native <select> populated by useClusters({ limit: 200 }).
  Disabled while loading. MVP1 assumes <10 clusters per installer; the
  chore_cluster_filter_full_list idea file will be referenced when that
  ceiling no longer holds.

Test coverage: 7 vitest cases — 4 status-chip render/click/null/unknown +
2 source-chip render/click + 1 cluster-select load/select/clear.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(proposals-ui): /proposals list page + ProposalsTable (Story 2.1)

Adds the /proposals route — a Suspense-wrapped Next 16 client component that
composes three filter primitives (status chips with URL-backed state, source
chips client-side, cluster select) over a cursor-paginated table of
proposals. Implements FR-1 in full:

- Status filter is URL-backed via useSearchParams + router.replace; cursor
  stack resets on every status/cluster change.
- URL ?status= is validated against PROPOSAL_STATUS_VALUES before being
  passed to the now-narrowed useProposals hook. /proposals?status=invented
  is silently ignored (chip group falls back to "all" active) — never sent
  to the backend, never produces a 422 (GPT-5.5 cycle-2 A2).
- Source filter (study / manual / all) is client-side: visibleRows is
  derived from rows + sourceFilter via useMemo. Backend has no ?source=
  param; the chore_proposals_source_filter_server_side idea (filed in
  Story 4.1) tracks the server-side follow-up.
- Cluster filter consumes the existing useClusters hook with limit=200.
- 30s pulse-refetch fires when ANY visible row is pr_opened+open (FR-1).
  The function-form refetchInterval reads the latest query state on each
  tick and returns false when no eligible row remains, so idle pages stop
  hitting the backend.

ProposalsTable renders 7 columns: source (study link or "manual" + view
link), cluster name, template name + version, status badge, pr_state
badge, metric_delta (parsed from JSONB { primary, baseline, best,
delta_pct }), created_at. Empty state and backend-unreachable both render
the shared <EmptyState> primitive.

Test coverage: 11 vitest cases (6 list page covering AC-6 filter +
pagination, invalid-URL-status, client-side source filter, 30s pulse +
5 table render cases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(proposals-ui): /proposals/[id] detail page shell (Story 3.1)

Adds the read-only detail page slot: a Suspense-wrapped Next 16 client
view that fetches via useProposal and renders five sections — back link,
header, optional pr_open_error Alert, config-diff table, metric-delta
panel, and (when present) suggested follow-ups. The PR-action panel and
Reject dialog slots are reserved for Stories 3.2 and 3.3 respectively.

New components:
- ProposalHeader — status + pr_state badges, created_at, cluster/template/
  study inline summary mirroring studies/study-header.tsx spacing.
- ProposalErrorAlert — styled <div role="alert"> for FR-2 (only renders
  when status='pending' AND pr_open_error populated; defensive against
  stale errors after the worker advances to pr_opened).
- ConfigDiffPanel — sortable 3-column table over the JSONB diff dict,
  handling both the canonical [before, after] tuple shape from
  feat_digest_proposal AND flat values for manual proposals.
- SuggestedFollowupsPanel — bulleted list with per-item "Create study
  from this hypothesis" Link to /studies?hypothesis=<encoded> (FR-5).

The detail page itself is wrapped in <Suspense> per GPT-5.5 cycle-2 B1
because Story 3.2 adds useSearchParams() to the inner view; landing the
boundary in Story 3.1 keeps the page shape stable across stories.

Test coverage: 6 detail-page tests covering the happy path, AC-4
pr_open_error alert, defensive-no-alert when status=pr_opened, 404
PROPOSAL_NOT_FOUND empty state, null-metric-delta fallback, and
null-digest skip-followups branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(proposals-ui): PR panel + page-owned useOpenPR with postOpenPrPolling (Story 3.2)

Closes the FR-2 + FR-3 + auto-trigger loop on /proposals/{id}.

Page-side wiring (ui/src/app/proposals/[id]/page.tsx):
- useOpenPR is now called in the page, not in PrPanel. Mutation passed as
  (mutate, isPending) props. The lift is load-bearing because the page-
  level refetchInterval and the ?action=open_pr auto-trigger both need to
  read the same mutation instance (GPT-5.5 cycle-1 A1).
- A fireOpenPR helper wraps openPr.mutate's onSuccess to flip a
  postOpenPrPolling state flag ON and install a 60s safety setTimeout.
  Both the manual click path AND the auto-trigger path go through this
  helper — no direct openPr.mutate calls remain (GPT-5.5 cycle-2 A1).
- The 3s polling cadence is keyed on the postOpenPrPolling flag AND the
  latest p.status/p.pr_open_error inside refetchInterval. The flag flips
  OFF only via the 60s timer or unmount; the cadence flips off inline
  inside the refetchInterval function when status leaves pending or
  pr_open_error is populated. This avoids the React 19
  react-hooks/set-state-in-effect rule (correctly flags setState-in-
  watching-effect as cascading-render smell) while preserving the spec
  FR-2/FR-3 behavior.
- effectivePollingFlag = postOpenPrPolling && pending && !pr_open_error
  is the derived button-disable signal passed to PrPanel — masks lingering
  true state when conditions warrant so the operator can re-click retry.
- Unmount cleanup useEffect clears the safety timer (GPT-5.5 cycle-2 B2).
- ?action=open_pr auto-trigger fires fireOpenPR once per mount, guarded
  by a useRef boolean. After firing, router.replace strips the action
  param so a remount/back-nav with the same URL does NOT re-fire
  (GPT-5.5 cycle-1 B2).

PrPanel component:
- 4 state branches: pending (Open-PR button + optional inline error
  Alert), pr_opened (PR link target=_blank + pr_state badge), pr_merged
  (link + merged-at), rejected (rejected_reason).
- Button HIDDEN on any non-pending status per FR-3.
- Button disabled + "Opening PR..." while openPrIsPending; remains
  visible to give the operator a stable anchor through worker-wait
  (GPT-5.5 cycle-3 B2).
- pr_open_error Alert moved from the page-level header (Story 3.1) into
  the panel for action-adjacent visibility (deduplicates the alert).

Test coverage: 6 pr-panel state-machine tests + 4 new page-level tests
covering AC-1 (click → 3s poll → status flips → button gone), AC-3 30s
steady-state, auto-trigger fires + URL-strips, auto-trigger no-op when
already pr_opened.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(proposals-ui): Reject confirm dialog (Story 3.3)

Wraps up FR-4 on /proposals/{id}. Adds a destructive Reject button next
to PrPanel (rendered only when status='pending') that opens an
AlertDialog with a 0-500 char reason textarea.

Critical details (all per the GPT-5.5 cycle-1 + cycle-2 review patches):
- AlertDialogAction's onClick calls event.preventDefault() so the
  dialog stays open during the in-flight POST. Without preventDefault,
  Radix's default closes the dialog as soon as the user clicks confirm,
  leaving them with no visible state during the round-trip.
- The textarea + AlertDialogAction + AlertDialogCancel all carry
  `disabled={reject.isPending}` so double-clicks during the in-flight
  POST cannot double-submit. Button label flips "Reject proposal" →
  "Rejecting…" during that window.
- onSuccess closes the dialog and fires a success toast. NO per-mutation
  onError — the global MutationCache.onError in QueryProvider handles
  the 9 known error codes (409 INVALID_STATE_TRANSITION being the
  spec §11 concurrent-merge race we care about most). The mutation's
  onSettled in Story 1.1 invalidates ['proposal', id] + ['proposals'],
  so on 409 the detail query refetches automatically and the operator
  sees the now-pr_merged state.
- The Reject button + dialog disappear after success because the
  refetched proposal has status='rejected', which trips the early
  `if (proposal.status !== 'pending') return null` in RejectDialog.

Test coverage: 4 reject-dialog cases — null when status!=pending,
AC-2 happy path with reason, preventDefault-keeps-open + disabled-
prevents-double-submit, spec §11 concurrent-merge 409 (mounts the
detail view with the real QueryProvider so the global toast handler
fires, asserts msw GET handler called twice for the refetch, asserts
UI rerenders to pr_merged and Reject button vanishes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(proposals-ui): mark US-28/29 Implemented + ui-debugging section + source-filter idea (Story 4.1)

Wraps up feat_proposals_ui's documentation workstream:

- docs/02_product/mvp1-user-stories.md: US-28 + US-29 flipped to Implemented
  with the feat_proposals_ui annotation, matching the format used for
  US-13/14/15/16/17/etc.
- docs/03_runbooks/ui-debugging.md: appended a "Proposals routes" section
  covering the hook surface (5 hooks with their purposes), the polling
  cadence ladder on /proposals/[id] (3s / 30s / off), the ?action=open_pr
  auto-trigger semantics, the URL ?status= validation behavior, the
  client-side source-filter pagination caveat (cross-references the
  idea file below), the 5 common error codes the UI surfaces as toasts,
  and the Reject dialog won't-close-during-flight pattern.
- docs/02_product/planned_features/chore_proposals_source_filter_server_side/idea.md:
  captures the deferred backend ?source=study|manual follow-up. Origin
  pointer is feat_proposals_ui's plan §6 risks row + the GPT-5.5 cycle-1
  A5 finding. Acceptable for MVP1 (<50 proposals/page); the fix is
  small (~10 LOC backend + repo + contract test + drop the frontend
  client-side filter).

MVP1 dashboard regenerated by pre-commit hook.

state.md / architecture.md / CLAUDE.md updates land in the impl-execute
finalization commit per Story 4.1's DoD (Task 4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(planned): capture 1 in-flight-noticed issue — proposals-page useMemo dep warning

One tangential noticing surfaced during the feat_proposals_ui
post-implementation sweep: the visibleRows useMemo in /proposals/page.tsx
has a wasted-memoization warning because rows = query.data?.data ?? []
creates a fresh array identity each render. Cosmetic only (the filter
runs over <=200 rows per page, sub-ms cost), but worth tracking so a
future polish pass picks it up.

Captured per CLAUDE.md tangential-discoveries rule.

MVP1 dashboard regenerated by pre-commit hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(proposals-ui): GPT-5.5 final-review patches — visibleRows-aware list poll + missing DoD tests

GPT-5.5 final cross-model review on PR #58 returned 5 findings (0 High,
3 Medium, 2 Low). Adjudication:

Finding #1 (Medium) — pr_open_error alert missing from header area.
REJECTED with cited counter-evidence: the plan's note that the panel
alert "duplicates here for visibility" is speculative, not a requirement.
The PrPanel placement is action-adjacent (the canonical UX); duplicating
the alert in the header produces redundant visual noise without
additional information.

Finding #2 (Medium) — flip-off effect missing.
REJECTED with cited counter-evidence: the derived effectivePollingFlag
correctly masks all observable behavior. fireOpenPR's
clear-existing-before-new pattern means the safety timer is never
orphaned; the 60s timeout always reaches a definite cleanup path. Re-
adding the watching useEffect would re-trigger the React 19
react-hooks/set-state-in-effect rule that the cycle-3 patch
deliberately avoided.

Finding #3 (Low) — list refetchInterval doesn't respect source filter.
ACCEPTED: FR-1 says "any VISIBLE row". The refetchInterval predicate
now applies the same sourceFilter check as visibleRows so a hidden
study-sourced pr_opened+open row doesn't keep the page polling for
invisible state.

Finding #4 (Low) — list-page tests don't assert wire params after URL update.
DEFERRED as chore_proposals_list_wire_param_e2e_test/idea.md. The static
useSearchParams mock doesn't re-render on router.replace, so the
existing tests assert the intent (router.replace called with the right
URL) but not the wire round-trip. Behavior is structurally correct.

Finding #5 (Medium) — 60s safety cap test + Open-PR toast contract test missing.
ACCEPTED: both were explicit Story 3.2 DoD items that didn't land.

- 60s safety cap test: simulates a worker that never writes back; uses
  vi.useFakeTimers to advance 65s and asserts polling stops (cumulative
  ticks bounded at ~21, no further ticks in the subsequent 30s window).
- Open-PR error-toast contract test: instantiates a custom MutationCache
  with a synchronous onError spy, fires 3 sequential clicks against msw
  handlers returning GITHUB_NOT_CONFIGURED, CLUSTER_HAS_NO_CONFIG_REPO,
  QUEUE_UNAVAILABLE. Errors marked retryable=false so the api-client's
  4-attempt 503 backoff doesn't compound test time (retry behavior is
  exercised by api-client.test.ts). Asserts err.errorCode reaches the
  global handler for all 3 codes per GPT-5.5 cycle-1 B6.

MVP1 dashboard regenerated by pre-commit hook.

Test count: 171 (was 169) — 32 files all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(feat-proposals-ui): refresh pipeline_status.md after final review (PR #58 open, awaiting merge)

Updates the status file to reflect what's landed on the feature branch:
- Plan: 3 GPT-5.5 cycles, 18 findings all applied
- Implement: 7 stories landed, CI green, final review 5 findings adjudicated
- Done: pending merge

MVP1 dashboard regenerated by pre-commit hook.

Post-merge re-invocation of /pipeline will run the finalization commit
(folder move, state.md entry, CLAUDE.md table flip, architecture.md
listing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
- Finding #1 (Accept): clarify "§123" → "line 123" in bug_fix.md to remove
  ambiguity between section references (§AC-7) and line-number references.
- Finding #2 (Defer): captured as chore_digest_worker_narrow_except — the
  worker's broad except Exception at digest.py:538-546 is what enabled the
  PR #92 bug to stay silent for 2 days; narrowing the exception surface
  has its own design fork (which exceptions are expected? hard-fail vs.
  soft-fail-with-signal? rollout survey for currently-silent digests?)
  and is separate from this PR's dep fix scope.
- Finding #3 (Reject): bug_fix.md §"Fix design (locked decisions) #4"
  already documents that test_digest_generate.py:65 cannot be tightened
  because the happy-path test seeds 0 Optuna trials, so {} is the correct
  branch there. Gemini didn't read past the heading.

Dashboard regenerated by the mvp1-dashboard-regen pre-commit hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
#92)

* fix(deps): add scikit-learn for Optuna's FanovaImportanceEvaluator (bug_digest_param_importance_seam)

`optuna.importance.get_param_importances()` defaults to FanovaImportanceEvaluator
which `import sklearn` internally. The api/worker image had no scikit-learn,
so every digest's `parameter_importance` was silently `{}` — the worker's
broad `except Exception` at `digest.py:538-546` caught the ImportError and
returned an empty map. The misleading `assert parameter_importance is not None`
at `test_digest_generate.py:65` passed because `{}` is not None, and the
rigorous AC-7 test was `xfail`-marked, so CI was green while the contract
was broken.

The original idea misdiagnosed this as a fixture seam (engine/pool isolation
or sampler/pruner mismatch). Test logs prove `load_if_exists=True` correctly
resolves the worker's separately-constructed Optuna handle to the same RDB
row the test seeded; the seam works. The structlog WARN at digest.py:540
explicitly emitted `error_type=ImportError` — the diagnosis just needed
reading.

Changes:
- pyproject.toml: add `scikit-learn>=1.4` with comment pointing at bug_fix.md.
- uv.lock: pulls scikit-learn 1.8.0 + scipy + joblib + threadpoolctl
  (~30-40MB image growth; CI's `uv sync --frozen` picks it up automatically).
- test_digest_parameter_importance.py: remove the `pytest.mark.xfail` marker
  (test now passes; the previously-xfail'd test is the AC-7 regression guard).
- bug_fix.md: documents triage, root cause, locked design forks, rollout.

The worker's broad `except Exception` stays — it remains useful for genuine
Optuna edge cases (zero-completed-trial studies, schema issues). The fix
removes the ImportError condition entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(planned): adjudicate Gemini PR #92 findings — accept #1, defer #2

- Finding #1 (Accept): clarify "§123" → "line 123" in bug_fix.md to remove
  ambiguity between section references (§AC-7) and line-number references.
- Finding #2 (Defer): captured as chore_digest_worker_narrow_except — the
  worker's broad except Exception at digest.py:538-546 is what enabled the
  PR #92 bug to stay silent for 2 days; narrowing the exception surface
  has its own design fork (which exceptions are expected? hard-fail vs.
  soft-fail-with-signal? rollout survey for currently-silent digests?)
  and is separate from this PR's dep fix scope.
- Finding #3 (Reject): bug_fix.md §"Fix design (locked decisions) #4"
  already documents that test_digest_generate.py:65 cannot be tightened
  because the happy-path test seeds 0 Optuna trials, so {} is the correct
  branch there. Gemini didn't read past the heading.

Dashboard regenerated by the mvp1-dashboard-regen pre-commit hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
…emini #2/3/4)

Accepted findings from Gemini Code Assist on PR #94:

- --diff-filter=AM -> ACMR. High-similarity `git mv foo.txt .env.old`
  is reported as Renamed by git, not Added; AM alone would miss it. ACMR
  (Added, Copied, Modified, Renamed) closes the rename/copy bypass.
- Replace unquoted ${FORBIDDEN} word-split with `sed 's/^/  - /'`. The
  regex's `[^/]+` permits spaces in `.env.<...>` filenames; the previous
  word-splitting approach would mis-split on space. The shellcheck SC2086
  disable is no longer needed and is removed.
- New regression case `.env.foo bar.bak` proves the space-handling fix.
  All 16 cases pass.

Rejected finding #1 (Gemini High): claimed pr.yml workflow update was
missing. Counter-evidence: `gh pr view 94 --json files` shows
.github/workflows/pr.yml with +22/-0 in this PR (the secrets-files-guard
job). Gemini reviewed hunks without seeing the workflow change in the
same PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
Accepted GPT-5.5 final-review finding #1: the implicit `git fetch origin
<base>` does not deterministically create refs/remotes/origin/<base>;
under some actions/checkout modes (especially forked PRs) the diff would
silently fail to enumerate changes and the guard would no-op. Replace
with an explicit refspec and add an `origin/<base>` rev-parse check that
exits 2 with a clear error if resolution fails — refusing to silently
skip the guard.

CI runs prior to this fix passed because actions/checkout's
fetch-depth=50 already populated origin/main in the typical case; the
fix removes the brittleness rather than fixing a current-observable bug.

Deferred GPT-5.5 #2 (workflow paths-ignore inheritance): already
captured as chore_ci_gitignore_paths_ignore_gap. Counter-evidence:
bug_fix.md Tangential Observations §1.

Deferred GPT-5.5 #3 (regex misses .env-old / .env_bak / .env~):
captured as a new chore_env_guard_extend_deny_pattern idea file. Not a
regression of this PR — the gitignore and the guard are currently
consistent (both dotted-only); broadening one requires broadening the
other, and that paired change deserves its own focused PR.

Includes MVP1 dashboard regen for the new chore_env_guard_extend_deny_pattern folder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
…e_corrupted_during_session) (#94)

* fix(ci): filename-pattern guard for .env* files

Adds scripts/ci/check-no-env-files.sh + a self-test driver + a new
secrets-files-guard job in pr.yml. Fails any PR that adds or modifies a
.env-family file other than .env.example, regardless of file content.

Defense-in-depth for bug_env_file_corrupted_during_session: catches the
near-miss where a rotated .env.old slipped past .gitignore (or where
.gitignore is regressed / git add -f is used). Gitleaks in pre-commit is
content-based and would pass an empty .env.old; this guard is filename-
based and catches the structural signal.

Pure bash, no python/pytest dependency. 15 canned regression cases run
as a workflow step before the real-diff check, so a broken regex fails
immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(bug): bug_fix.md for bug_env_file_corrupted_during_session + 2 tangential idea files

bug_fix.md documents the locked design for the secrets-files-guard job
shipped in the previous commit: detection mechanism, regex shape,
failure mode, position in the workflow, and the open question on what
local tool originally renamed .env -> .env.old (deferred — cannot
reproduce 4 days post-incident).

Tangential follow-ups captured per CLAUDE.md "Tangential discoveries":
  - chore_ci_gitignore_paths_ignore_gap — pr.yml paths-ignore skips the
    whole workflow on .gitignore-only PRs, undermining the guard.
  - chore_ci_gitleaks_workflow_step — gitleaks runs only in pre-commit
    locally; CI should also content-scan to complement the filename guard.

Includes MVP1 dashboard regen for the new planned-feature folders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style(ci): add shellcheck SC2086 disable for intentional word-split

The unquoted ${FORBIDDEN} expansion in check-no-env-files.sh is deliberate
— newline-splitting drives printf format-reuse so each forbidden file
prints on its own bullet line. Quoting would collapse the output to a
single line. Annotate the intent so future shellcheck runs don't re-flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): widen guard to catch renames + handle spaces in filenames (Gemini #2/3/4)

Accepted findings from Gemini Code Assist on PR #94:

- --diff-filter=AM -> ACMR. High-similarity `git mv foo.txt .env.old`
  is reported as Renamed by git, not Added; AM alone would miss it. ACMR
  (Added, Copied, Modified, Renamed) closes the rename/copy bypass.
- Replace unquoted ${FORBIDDEN} word-split with `sed 's/^/  - /'`. The
  regex's `[^/]+` permits spaces in `.env.<...>` filenames; the previous
  word-splitting approach would mis-split on space. The shellcheck SC2086
  disable is no longer needed and is removed.
- New regression case `.env.foo bar.bak` proves the space-handling fix.
  All 16 cases pass.

Rejected finding #1 (Gemini High): claimed pr.yml workflow update was
missing. Counter-evidence: `gh pr view 94 --json files` shows
.github/workflows/pr.yml with +22/-0 in this PR (the secrets-files-guard
job). Gemini reviewed hunks without seeing the workflow change in the
same PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): explicit refspec for origin/<base> resolution (GPT-5.5 #1)

Accepted GPT-5.5 final-review finding #1: the implicit `git fetch origin
<base>` does not deterministically create refs/remotes/origin/<base>;
under some actions/checkout modes (especially forked PRs) the diff would
silently fail to enumerate changes and the guard would no-op. Replace
with an explicit refspec and add an `origin/<base>` rev-parse check that
exits 2 with a clear error if resolution fails — refusing to silently
skip the guard.

CI runs prior to this fix passed because actions/checkout's
fetch-depth=50 already populated origin/main in the typical case; the
fix removes the brittleness rather than fixing a current-observable bug.

Deferred GPT-5.5 #2 (workflow paths-ignore inheritance): already
captured as chore_ci_gitignore_paths_ignore_gap. Counter-evidence:
bug_fix.md Tangential Observations §1.

Deferred GPT-5.5 #3 (regex misses .env-old / .env_bak / .env~):
captured as a new chore_env_guard_extend_deny_pattern idea file. Not a
regression of this PR — the gitignore and the guard are currently
consistent (both dotted-only); broadening one requires broadening the
other, and that paired change deserves its own focused PR.

Includes MVP1 dashboard regen for the new chore_env_guard_extend_deny_pattern folder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
…Gemini #1/#2)

Accepted Gemini findings on PR #95:

#1 (Medium): the --ship sub-section listed "Steps 0a → 9" but the gate
described success as "PR open + CI green + reviews adjudicated"
(Step 7). Step 9 is post-merge and runs on a SUBSEQUENT invocation
after the human merges, not as part of the chained call. Truncate the
inline list to 0a–7 and call out Step 9's timing explicitly so the
agent doesn't poll/sleep waiting for a merge it can't perform.

#2 (Medium): "When --ship should NOT be used" listed three cases but
the trailing instruction said "ask for explicit confirmation for any
of these" — contradicts investigation mode's silent-ignore rule
(there's nothing to ship; no confirmation makes sense). Scope the
confirmation prompt to the latter two cases only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
…95)

* chore(skills): add --ship flag to /bug-fix for one-shot bug-to-merge

Today's bug_env_file_corrupted_during_session work surfaced friction in
the two-step flow: /bug-fix terminates at Phase 6 recommending the user
type /impl-execute --ad-hoc as a separate invocation. The two
invocations are one logical task; making the user re-engage adds
overhead with no design value.

Add an opt-in --ship flag that chains Phase 6 directly into
/impl-execute --ad-hoc via the Skill tool (same pattern Phase 1 already
uses to invoke /idea-preflight on stale ideas). Default behavior is
preserved — bugs where you want to eyeball bug_fix.md before pushing,
or release-blocking bugs that warrant a human sanity check, still get
the natural pause point.

Skill changes:
- Frontmatter: argument-hint now lists --ship; description and
  pipeline-role mention it; allowed-tools adds Skill (was missing
  even though Phase 1 already invokes /idea-preflight via the Skill
  tool — the omission was a latent bug, not introduced here).
- Phase 6 split into "Default" and "--ship" sub-sections. Default
  matches the old behavior plus a one-line nudge that --ship exists.
  --ship sub-section enumerates the impl-execute ad-hoc steps that
  run (Steps 0a, 0b.1, 2.5, 3, 4, 5, 6, 7, 9) so the reader knows
  what they're opting into without reading impl-execute SKILL.md.
- Modes section: new --ship entry; Investigation mode explicitly
  notes --ship is silently ignored (nothing to ship).
- "What this skill does NOT do": clarified that Auto-merge is still
  off — the human clicks the merge button even in --ship mode.
- Termination section: new --ship-success terminator string.

No behavioral change in default mode. No new skill — same Skill tool
pattern as the existing /idea-preflight invocation in Phase 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): correct --ship gate scope + investigation-mode wording (Gemini #1/#2)

Accepted Gemini findings on PR #95:

#1 (Medium): the --ship sub-section listed "Steps 0a → 9" but the gate
described success as "PR open + CI green + reviews adjudicated"
(Step 7). Step 9 is post-merge and runs on a SUBSEQUENT invocation
after the human merges, not as part of the chained call. Truncate the
inline list to 0a–7 and call out Step 9's timing explicitly so the
agent doesn't poll/sleep waiting for a merge it can't perform.

#2 (Medium): "When --ship should NOT be used" listed three cases but
the trailing instruction said "ask for explicit confirmation for any
of these" — contradicts investigation mode's silent-ignore rule
(there's nothing to ship; no confirmation makes sense). Scope the
confirmation prompt to the latter two cases only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
Total run drops from ~7m20s to an expected ~5m45s by addressing the
three highest-impact-per-risk targets identified from per-step timing
on PRs #94 / #95:

1. **Drop `needs: [backend, frontend]` from the docker job** (~65s).
   buildx tests whether the Dockerfile builds; it does not consume
   test artifacts. The dependency was defensive, not necessary. With
   it removed, docker runs in parallel with backend/frontend instead
   of being serialized after them — the critical path was previously
   backend (6m06s) → docker (1m05s); now it's just backend.

2. **Cache .mypy_cache + .ruff_cache across runs** (~20s).
   Both tools invalidate per-file internally — the cache directory is
   safe to persist. Cold mypy is ~27s today; warm typically ~5s.
   Keyed on uv.lock + pyproject.toml (dep/tool-config changes); a
   restore-keys fallback to the runner-OS prefix means partial hits
   when only one input changes.

3. **Lower ES + OpenSearch heap from 512m → 256m** (~10-15s).
   The CI integration-test workload uses small synthetic indexes
   without concurrency; 256m is amply sufficient. Halving the heap
   makes container startup + health-check completion faster.

Risk profile: all three changes are low-risk additive YAML edits.
- #1 (docker parallel): zero risk — buildx never depended on test outcomes.
- #2 (cache step): cache-miss path is identical to today (cold run).
- #3 (heap halving): only risk is OOM during a test; if observed,
  trivially revert this single line.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
…#1)

Accepted Gemini Medium finding on PR #99: the `.gitignore` entry
`.env~` only ignored the literal filename, but the CI deny regex
`\.env([._~-][^/]*)?` catches any tilde-suffixed variant (`.env~`,
`.env~bak`, `.env~1`, etc.). Without the trailing wildcard, a vim or
emacs backup variant like `.env~1` would pass `git status` cleanly
locally but be rejected by CI — confusing operator experience.

Fix: `.env~` → `.env~*` (matches `.env~` itself plus any suffix).

New regression case `.env~bak` locks the parity in (23 cases now;
all pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 13, 2026
…oader regex (#99)

* infra(ci): extract secrets-files-guard to own workflow + add gitleaks (chore_ci_gitignore_paths_ignore_gap + chore_ci_gitleaks_workflow_step)

Two CI defense-in-depth chores resolved together because they share a
workflow file:

**chore_ci_gitignore_paths_ignore_gap:** the `secrets-files-guard` job
was inheriting pr.yml's `paths-ignore` filter, so a PR that ONLY
touched `.gitignore` or `docs/**` would skip the entire workflow and
the guard would never run. Failure mode: a malicious or accidental
`.gitignore` regression removing `.env.old` slips past CI; subsequent
PR adding `.env.old` under `docs/` also slips past.

Fix: split secrets-files-guard out to `.github/workflows/secrets-defense.yml`
with no `paths-ignore` so it triggers on every PR + push to main
regardless of which paths changed. pr.yml retains a pointer comment so
future maintainers find the surface.

**chore_ci_gitleaks_workflow_step:** gitleaks was configured in
`.pre-commit-config.yaml` v8.21.2 locally but never invoked from CI.
Pre-commit is opt-in — a contributor without `pre-commit install` had
no content scan at all. Filename guard catches `.env*` patterns;
gitleaks catches `OPENAI_API_KEY=sk-...` content in any path. They're
complementary, not redundant.

Fix: new `gitleaks` job in the same secrets-defense workflow. Uses the
official binary release at v8.21.2 (pinned to match pre-commit), NOT
gitleaks-action@v2 — the action requires a paid GITLEAKS_LICENSE for
private/org repos (SoundMindsAI/relyloop is private per
`gh repo view --json visibility`). Binary install is license-free.

Both jobs run in parallel, ≤30s each. Off the critical path entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): broaden env-guard deny regex to match non-dotted backups (chore_env_guard_extend_deny_pattern)

Extends the .env filename guard from matching only dotted `.env.<x>`
variants to also match `.env-<x>`, `.env_<x>`, `.env~` (vim/emacs
backup suffix). Parallel `.gitignore` update so the local-git defense
and the CI defense agree on the same pattern.

Source: deferred GPT-5.5 Low-severity finding #3 on PR #94. Closed
as a follow-up because the gitignore and the guard were both dotted-
only originally — broadening required updating both in lockstep,
which deserved its own focused PR.

**Bug found while broadening: silent regex range trap.** The original
draft used `[._\-~]` as the separator class, expecting it to mean
"literal `.`, `_`, `-`, `~`". POSIX regex character classes treat
`-` as a range operator between two characters; `[._\-~]` parses as
"literal `.`, `_`, then RANGE from `\` (92) to `~` (126)". That range
matches every letter but does NOT match a literal `-`. So `.env-old`
SILENTLY did not match the supposedly-broader pattern, while
`.environment.yml` did (because `i` is in the range).

Caught by the new regression cases for `.env-old` and `.environment.yml`
running together — without both, neither failure mode is visible.

Fix: put `-` at the end of the bracket expression (`[._~-]`) where it's
unambiguously a literal. Added an explicit code comment explaining the
trap so future maintainers don't re-introduce it.

6 new regression cases (4 deny, 2 allow): `.env-old`, `.env_bak`,
`.env~`, `backend/.env-prod`, `.envoy.conf`, `.env3.yml`. All 22 cases
green; cold local run completes in <0.5s.

`.gitignore` extended with broad globs `.env.*`, `.env-*`, `.env_*`,
`.env~` to mirror the guard. Canonical incident spellings kept as
explicit entries with a comment pointing maintainers at the
documentation reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gitignore): .env~ → .env~* glob for parity with deny regex (Gemini #1)

Accepted Gemini Medium finding on PR #99: the `.gitignore` entry
`.env~` only ignored the literal filename, but the CI deny regex
`\.env([._~-][^/]*)?` catches any tilde-suffixed variant (`.env~`,
`.env~bak`, `.env~1`, etc.). Without the trailing wildcard, a vim or
emacs backup variant like `.env~1` would pass `git status` cleanly
locally but be rejected by CI — confusing operator experience.

Fix: `.env~` → `.env~*` (matches `.env~` itself plus any suffix).

New regression case `.env~bak` locks the parity in (23 cases now;
all pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 14, 2026
…ings

Three fixes bundled:

1. CI flake on 'f keyboard shortcut toggles fullscreen' — the test fired
   keyDown before metadata had loaded. The handler bailed early under
   the prior `if (!open || total === 0) return` guard, so the 'f' key
   silently dropped. CI is slower than local so the race exposed there
   only. Two-part fix:
     - Decoupled fullscreen from `total > 0` in the keydown effect
       (fullscreen is viewer chrome — works during loading too).
       ArrowLeft/Right slide nav still gated on total > 0.
     - Test now waits for `guide-slide-image` (only renders post-load)
       before pressing keys.

2. Gemini #1 (medium) — guidesForPath prefix-match bug:
   `pathname.startsWith('/proposals')` would also match a hypothetical
   `/proposals-archive`. Changed to full path-segment match
   (`pathname === g.prefix || pathname.startsWith(g.prefix + '/')`).
   ACCEPTED.

3. Gemini #2 (medium) — DialogDescription showed "Loading walkthrough
   screenshots…" even on fetch error, which is misleading. Changed
   error state to render empty description; loading and loaded paths
   unchanged. ACCEPTED.

Gemini #3 (medium, aspect-ratio for layout stability) — REJECTED with
cited counter-evidence: the image container already takes all remaining
vertical space via `flex-1 min-h-0`, and the inner `<img>` uses
`object-contain` which preserves the 1440×960 aspect of every captured
screenshot. Forcing `aspect-[3/2]` on the container would constrain its
width based on height and break the responsive flex-grow layout
(particularly in fullscreen + on narrow viewports where the dialog
height drives the layout, not width).

Quality: vitest 249 / 249 green; lint 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 14, 2026
* feat(ui): ship in-app walkthrough guides (GuideViewer + 2 guides)

Adapts the creator-discovery-outreach walkthrough-guide pattern for
RelyLoop. Every guide is a Playwright-captured screenshot deck served
in-app via the new <GuideViewer> modal and surfaced contextually by the
floating <GuideTrigger> button.

What ships:

  ui/src/components/guides/
    guide-types.ts          GUIDE_REGISTRY + GUIDE_MAP route bindings
    guide-viewer.tsx        Radix Dialog slideshow (prev/next + arrow keys
                            + caption rendering, fetches metadata.json
                            from /guides/<id>/metadata.json)
    guide-trigger.tsx       Floating "?" button (bottom-right) that
                            opens the contextual guide for the current
                            route; popover picker when multiple match

  ui/src/app/guide/page.tsx Catalog page listing every registered guide

  ui/playwright.demo.config.ts
                            Slow-mo + 1440x960 viewport + video config
                            for screenshot capture, separate from the
                            regression e2e config which now excludes
                            tests/e2e/guides/**

  ui/tests/e2e/guides/
    01_register_first_cluster.spec.ts
    02_review_a_proposal.spec.ts

  ui/public/guides/
    01_register_first_cluster/
      01-clusters-list.png ... 05-cluster-detail.png  (5 screenshots)
      metadata.json (title + description + per-slide captions)
      script.md (narrative + reference links)
    02_review_a_proposal/
      01-proposals-list.png ... 05-reject-dialog.png  (5 screenshots)
      metadata.json
      script.md

  ui/src/__tests__/components/guides/
    guide-viewer.test.tsx (5 cases — render, nav, keys, error, no-fetch-when-closed)
    guide-trigger.test.tsx (4 cases — visibility + path-prefix matching)
    guide-registry.test.ts (3 cases — metadata.json ↔ registry parity)

Updates:
  .claude/skills/guide-gen/SKILL.md  drop DORMANT, swap web/→ui/, refresh inventory
  docs/08_guides/README.md           document the convention + workflow
  ui/src/app/layout.tsx              mount <GuideTrigger /> globally
  ui/playwright.config.ts            testIgnore guides/** in regression runs

Quality gates: vitest 231 (12 new + 219 existing) green; regression e2e
25 green; lint 0 errors; format clean; Next.js build succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): expand walkthrough guide coverage 2 → 8 guides

Ships 6 new walkthrough guides covering every UI-visible workflow that
can be captured deterministically without an LLM. Pairs with guides 01
and 02 from the previous commit for a full inventory of the operator
journey.

New guides:

  03_create_query_template          /templates  — Jinja2 body + declared
                                                  params + fork-to-v2
  04_create_query_set               /query-sets — bulk-load queries via
                                                  JSON or CSV
  05_import_judgments_and_calibrate /judgments  — import path + Cohen's
                                                  + linear-weighted κ
  06_create_and_monitor_study       /studies    — create + watch trials
                                                  + cancel
  07_browse_proposals               /proposals  — three-axis filters +
                                                  30s pulse-refetch
  08_chat_shell                     /chat       — list + new + secrets
                                                  banner (no streaming)

Plus:
  - GUIDE_REGISTRY: 2 → 8 entries
  - GUIDE_MAP: 2 → 8 entries; /proposals now has two guides (browse +
    review), so the GuideTrigger switches to popover-picker mode there
  - guide-trigger.test.tsx: adds a picker-mode case
  - guide-registry.test.ts: asserts every metadata.json ↔ registry pair,
    every GUIDE_MAP id is resolvable, and prefix matching works
  - docs/08_guides/README.md + .claude/skills/guide-gen/SKILL.md
    inventories list all 8 shipped guides + the 3 workflows blocked on
    LLM mocking / operator credentials (guide-able later)

Workflows explicitly NOT covered by a guide (called out in both
inventories so contributors don't re-litigate the scope):
  - LLM judgment generation (POST /api/v1/judgments/generate) — needs
    LLM mock or live endpoint; import path (guide 05) is the offline
    substitute
  - Chat with agent tool dispatch (C2 / E1-E3) — same blocker
  - Open PR end-to-end (worker side) — needs registered config_repo +
    real PAT; trigger captured in guide 02

Quality gates: vitest 232 (13 guide cases + 219 existing) green;
regression e2e 25 green; lint 0 errors; format clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): ship 2 LLM-driven guides + Guides nav item + re-capture all 10

Closes the previously-marked-as-blocked workflows by running real LLM
calls against the hosted OpenAI endpoint (no mocking, per user
direction). All 10 guides now share the same nav chrome — including a
new "Guides" top-nav item linking to the /guide catalog page.

What ships:

  09_generate_judgments_llm  /query-sets, /judgments
    UI form interaction up to template-dropdown-open; generation
    triggered via direct API call to sidestep a Radix Select issue
    where many seeded templates push options outside the viewport.
    Worker hits OpenAI per (query, doc) pair; ~$0.02 per run with
    gpt-4o-mini.

  10_chat_with_agent  /chat
    Real LLM round-trip end-to-end: send a prompt, watch the
    list_clusters tool dispatch (tool-call + tool-result cards), then
    the assistant's streaming final response. ~$0.01-0.03 per run.

  Guides nav item       New top-nav entry → /guide catalog
  GUIDE_REGISTRY        2 → 10 entries
  GUIDE_MAP             8 → 12 route bindings (chat, query-sets,
                        judgments now have multiple registered guides)

  All 10 guides re-captured against the rebuilt UI image so every
  screenshot shows the consistent nav chrome including "Guides".

Failure modes hit + resolved during capture:

  * Wrong button selector for "Generate new judgment list" (caught by
    matching by data-testid rather than role+name)
  * composer-send stays disabled when input is empty even after the
    SSE 'done' fires — switched the done-signal to composer-input
    (which only disables on streaming=true)
  * Radix Select template picker portal renders options outside the
    1440×960 viewport when 100+ templates are seeded; keyboard nav
    times out — restructured to drive the form via UI for the
    interaction screenshots and trigger generation via API for the
    worker-state screenshots
  * guide-08 description drifted between GUIDE_REGISTRY and metadata.json
    when I updated the cross-reference; parity test caught it before
    push

Quality gates: vitest 232 (all guide cases + 220 existing) green;
regression e2e 25 green; lint 0 errors; format clean; demo capture
10/10 in ~70s wall-clock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): COPY public/ in Dockerfile runtime stage so guide assets ship

Next.js standalone output mode does NOT include `public/` in the
runtime image — Vercel's docs note this explicitly. The Dockerfile's
runtime stage skipped the COPY with a TODO comment saying "ui/ has no
public assets; if any are added later, re-add this line." That comment
was correct at write time (chore_tutorial_polish Story 2.2) but went
stale the moment guide 01 shipped — every guide PNG + metadata.json +
script.md was on the host but NOT in the runtime image.

Symptom: navigating to /guide and clicking any card fired
GET /guides/<id>/metadata.json → 404. The GuideViewer's fetch hook
surfaced the error inline; no infinite loop, just a confusingly long
React render-commit stack on the error transition.

Fix is one line: COPY /app/public ./public from the builder stage so
Next.js can serve static assets from /guides/{id}/* at runtime.
Verified locally — guide 03 + guide 10 PNGs + metadata.json all
return 200 after rebuild.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): make GuideViewer mobile-friendly + fullscreen + low-vision

Mobile audit revealed the original viewer was unusable on small screens:
- max-w-4xl (1024px) dialog sprawls past phone viewports
- 1440×960 source screenshots scaled to ~340px on iPhone → form-field
  text unreadable
- No way to expand the image, no text-size control, no full-PNG escape

Redesigned the GuideViewer around three affordances:

1. Responsive sizing
     default: w-[96vw] max-w-[1100px] h-[92vh] max-h-[900px]
     scales cleanly from 360px phones to ultrawide desktops; sm: breakpoint
     bumps interior padding from 1rem → 1.5rem

2. Fullscreen toggle (Maximize2 icon, top-right of header)
     toggles to w-screen h-screen edge-to-edge — same UI fits the
     screenshot at full resolution. Pref persists in
     localStorage under `relyloop.guide-viewer.fullscreen`. Keyboard
     shortcut: 'f'

3. Text-size toggle (A− / A / A+ button, top-right of header)
     cycles caption + description through sm / base / lg. Default base
     (16px). Pref persists in `relyloop.guide-viewer.text-size`.

Plus accessibility polish:
  * Caption rendered with aria-live="polite" so screen readers announce
    every slide change
  * Sr-only "Slide N of M: " prefix on caption so position is read
  * Counter has aria-label "Slide N of M"
  * Prev/Next/text-size/fullscreen all have explicit aria-labels
  * Keyboard nav unchanged (Arrow ←→, Esc closes, 'f' fullscreen) and
    no longer hijacks keys when an input is focused
  * "View full image in new tab" inline link below image — gives the
    user the browser's native zoom for any detail the viewer can't fit

GuideTrigger button:
  * Bumped from size-12 → size-14 (48 → 56px) — better mobile tap target
  * Position now respects env(safe-area-inset-bottom/right) so the
    button doesn't sit under iOS home indicator / right safe area

7 new vitest cases cover the toggles, persistence, sr-only caption, and
keyboard shortcut. All 10 canonical demo specs re-captured for visual
consistency.

Quality: vitest 238 / 238 (7 new + 231 existing), regression e2e
25 / 25, demo capture 10 / 10, lint 0 errors, format clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(types): update import path for routes in next-env.d.ts

* feat(ui): surface long-form tutorial + workflows-overview docs under /guide

End users can now read the canonical long-form documentation
(tutorial-first-study.md and workflows-overview.md from
`docs/08_guides/`) in-app under the Guides menu — no more roundtrip to
GitHub or a local IDE for first-time setup or capability discovery.

What ships:

  ui/scripts/copy-docs.mjs       prebuild script copies markdown source
                                 from `docs/08_guides/` → `ui/public/docs/`
                                 (wired into package.json `predev` +
                                 `prebuild` hooks so dev + prod stay in
                                 sync). README dropped into the dest dir
                                 explaining the GENERATED convention.

  /guide/docs/[slug]             new route renders the markdown via
                                 react-markdown + remark-gfm. Repository-
                                 relative links (`../01_architecture/...`)
                                 are rewritten at render time to
                                 absolute GitHub blob URLs.

  <MarkdownDoc>                  the component: aria-friendly header
                                 with title + "View on GitHub" + text-
                                 size toggle (A− / A / A+, shared
                                 localStorage key with GuideViewer so a
                                 user who upsizes one surface gets the
                                 other) + column-width toggle (narrow
                                 ~900px reading column vs. wide ~1400px
                                 reference column). Prose typography
                                 scales accordingly.

  /guide catalog                 split into two sections — Long-form
                                 documentation (2 cards) above, Visual
                                 walkthroughs (10 cards) below.

  @tailwindcss/typography        installed + @plugin'd in globals.css.
                                 The `prose` class in <StudyDigestPanel>
                                 also benefits — headings + lists in
                                 digest narratives now render with
                                 correct hierarchy (previously compiled
                                 to nothing under Tailwind 4 CSS-first).

  DOC_REGISTRY                   new export in guide-types.ts; mirrors
                                 the GUIDE_REGISTRY shape for the 2 docs.

8 new vitest cases cover: render, repo-relative link rewriting (handles
`../` + `../../` correctly), absolute URLs unchanged, fetch errors,
text-size toggle + persistence, wide-column toggle, localStorage
hydration, and "View on GitHub" link target.

Quality: vitest 246 / 246 (8 new + 238 existing), regression e2e 25 / 25,
lint 0 errors, typecheck clean, format clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): add favicon.ico + icon.svg to silence /favicon.ico 404

Browsers (notably Chrome) request /favicon.ico on every page load
regardless of any <link rel="icon"> tag in the document head. The
Next.js App Router auto-injects the link tag for src/app/icon.svg, but
the parallel /favicon.ico request still 404'd because no static asset
existed at that path.

What ships:

  ui/src/app/icon.svg   Loop-motif SVG icon (dark rounded square +
                        white ring open at top + dot at the entry
                        point — the Karpathy-loop product metaphor).
                        Next.js auto-serves at /icon.svg and injects
                        <link rel="icon" type="image/svg+xml"
                         href="/icon.svg">.

  ui/public/favicon.ico Multi-size ICO (16, 32, 48, 64) generated from
                        the same motif via Pillow. Served at
                        /favicon.ico by the runtime image (the
                        Dockerfile already COPYs public/ into the
                        standalone output thanks to the prior
                        guide-assets fix).

Verified: both endpoints return 200; rendered icon shows the loop
glyph correctly at 64px preview.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): add Slides / Video toggle in GuideViewer + ship 10 walkthrough.webm files

Playwright's demo config has always recorded slow-motion videos
(video: 'on', 350ms slow-mo) but they were ephemeral test-results
artifacts. This wires those videos into the in-app guide surface so
users can watch a walkthrough play out instead of clicking through
slides — particularly valuable for guide 10 (chat-with-agent) where
the SSE streaming + tool dispatch is fundamentally temporal.

Pipeline:

  pnpm capture-guides
    ├── playwright test -c playwright.demo.config.ts   captures both
    │                                                   screenshots
    │                                                   (already wired)
    │                                                   and videos under
    │                                                   test-results/
    │                                                   demo-artifacts/
    └── node scripts/promote-videos.mjs                copies each
                                                       video.webm to
                                                       ui/public/guides/
                                                       <id>/walkthrough.webm

The match-by-prefix logic in promote-videos.mjs handles Playwright's
filename truncation (some specs get cut mid-word, so naive
`startsWith(guideId)` misses 2/10; matching by the NN_ numeric prefix
recovers all 10).

GuideViewer changes:
  * New Slides | Video pill toggle in the header (shown only when
    metadata.video is set). Slides remains default; Video swaps the
    image area for an autoplay-loop <video> element with browser
    controls. Switching guides resets to Slides mode.
  * Tested via 3 new vitest cases (toggle default, video src + slide
    UI hidden in video mode, toggle absent when video field missing).

Metadata + assets:
  * GuideMetadata type extends with optional `video` field
  * 10 metadata.json files updated with `"video": "walkthrough.webm"`
  * 10 walkthrough.webm files committed under ui/public/guides/<id>/
    (~2.6 MB total; per-video 128-448 KB depending on flow length)

Re-captured screenshots in the same run for visual consistency.

Quality: vitest 252 / 252 (3 new video-mode + 249 existing), regression
e2e 25 / 25, lint 0 errors, format clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): CI keyboard-shortcut test flake + Gemini #1/#2 accepted findings

Three fixes bundled:

1. CI flake on 'f keyboard shortcut toggles fullscreen' — the test fired
   keyDown before metadata had loaded. The handler bailed early under
   the prior `if (!open || total === 0) return` guard, so the 'f' key
   silently dropped. CI is slower than local so the race exposed there
   only. Two-part fix:
     - Decoupled fullscreen from `total > 0` in the keydown effect
       (fullscreen is viewer chrome — works during loading too).
       ArrowLeft/Right slide nav still gated on total > 0.
     - Test now waits for `guide-slide-image` (only renders post-load)
       before pressing keys.

2. Gemini #1 (medium) — guidesForPath prefix-match bug:
   `pathname.startsWith('/proposals')` would also match a hypothetical
   `/proposals-archive`. Changed to full path-segment match
   (`pathname === g.prefix || pathname.startsWith(g.prefix + '/')`).
   ACCEPTED.

3. Gemini #2 (medium) — DialogDescription showed "Loading walkthrough
   screenshots…" even on fetch error, which is misleading. Changed
   error state to render empty description; loading and loaded paths
   unchanged. ACCEPTED.

Gemini #3 (medium, aspect-ratio for layout stability) — REJECTED with
cited counter-evidence: the image container already takes all remaining
vertical space via `flex-1 min-h-0`, and the inner `<img>` uses
`object-contain` which preserves the 1440×960 aspect of every captured
screenshot. Forcing `aspect-[3/2]` on the container would constrain its
width based on height and break the responsive flex-grow layout
(particularly in fullscreen + on narrow viewports where the dialog
height drives the layout, not width).

Quality: vitest 249 / 249 green; lint 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 26, 2026
#265)

* fix(test): align test_seeding helper with seedTemplate's declared_params

backend/app/services/test_seeding.py's seed helpers wrote
`search_space.params['title.boost']` (9 occurrences in 2 functions
spanning search_space, trial params, narrative, parameter_importance,
recommended_config, config_diff, and the followup-chain seed) but the
e2e seedTemplate() helper at ui/tests/e2e/helpers/seed.ts:316 declares
`declared_params: { boost: 'float' }`. Both helpers bypass
validate_against_template, so the inconsistency landed in DB silently.

When an e2e spec then opened CreateStudyModal against the seeded
source study, Step-4's client-side validateSearchSpaceAgainstTemplate
raised `Param 'title.boost' is not declared by template — Declared
params: ['boost']` and refused to advance. Blocked 3 of 6 currently-red
smoke-stack tests:

- study-clone.spec.ts:24 (clone-from-source → submit round-trip)
- followup_run.spec.ts:28 (narrow-followup hard-coded the same
  literal in NARROW_SEARCH_SPACE)
- study-clone-narrow-bounds.spec.ts (deliberately stops short of
  submit per a stale comment citing this bug)

Rename `title.boost` → `boost` per the bug folder's locked Path 1
(the dot-namespaced name is arbitrary; no test depends on the
specific literal). Co-located consumers updated in lockstep:

- backend/tests/integration/test_test_seeding.py declared_params
  rename to keep the seed-endpoint integration test internally
  consistent
- ui/tests/e2e/followup_run.spec.ts NARROW_SEARCH_SPACE + rationale
  string rename (fixes the same Step-4 validator wall the v1 clone
  spec hits)
- ui/tests/e2e/study-clone-narrow-bounds.spec.ts 5 literal references
  + delete the now-stale fixture-inconsistency comment block (replace
  with a one-line pointer to the remaining out-of-scope smoke flakes)
- ui/tests/e2e/studies.spec.ts:173 digest-narrative assertion text
  follows the renamed narrative

Out of scope (separate bug folders stay open): the swap-template
template_id assertion failure in followup_run.spec.ts:111 and the
dashboard demo-state-disclosure / demo-data-banner missing-testids
failures in dashboard.spec.ts + dashboard-reseed.spec.ts. Both need
live smoke-stack reproduction.

See bug_fix.md for full design + locked decisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(planned): capture chore_clone_narrow_bounds_full_roundtrip_e2e

Surfaced during /bug-fix for
bug_clone_e2e_seed_template_params_mismatch. Once that PR's seed-helper
rename lands, the study-clone-narrow-bounds.spec.ts can be extended to
include the submit + GET-the-new-study round-trip assertion that was
deferred per a stale comment block (now replaced with a one-line
pointer in the same fix). P3 — adds an E2E gate that's not currently
present but isn't blocking anything.

Dashboards regenerated by pre-commit hook to reflect the new planned
folder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: GPT-5.5 round 1 adjudication — 3 accepted, 1 rejected

Adjudicates GPT-5.5 final-review findings on PR #265:

- ACCEPTED #2: bug_fix.md's "Tangential observations" bullet was
  internally contradictory; rewrote to plainly state "2 of 3 resolve
  as side effect; followup_run.spec.ts:111 remains".
- ACCEPTED #3: study-clone-narrow-bounds.spec.ts test title still
  claimed "submit → persisted" but stops before submit; renamed to
  "textarea is clamped and restores on uncheck" to match coverage.
- ACCEPTED #4: chore idea.md link `../../../ui/...` (resolves under
  `docs/ui/`); fixed to `../../../../ui/...` to reach repo root.
- REJECTED #1: GPT-5.5 flagged tangential-idea-capture + dashboard
  regen as "scope drift." Counter-evidence: /bug-fix Phase 5 step 7
  and /impl-execute --ad-hoc Step 2.5 require tangential-observations
  sweep as BLOCKING; dashboard regen is automatic via pre-commit hook.
  Both are procedural compliance, not scope drift.

Pre-commit re-regenerated dashboards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(planned): GPT-5.5 round 2 — change chore priority P3 → P2

GPT-5.5 round 2 finding accepted: the chore idea was marked P3 but
the dashboard regen script only recognizes P0/P1/P2/Backlog, so the
P3 marker silently bucketed as P2 (default) — creating an internal
contradiction between the idea's stated priority and the dashboard's
classification.

P2 is the correct semantic match (this adds important coverage that
isn't blocking) and resolves the dashboard/idea contradiction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 26, 2026
Adjudicates GPT-5.5 final-review findings on PR #268:

- ACCEPTED #1 (Medium): integration test used `DELETE FROM clusters`
  which would wipe operator data if the test DB is shared with a
  working stack. Refactored to use transaction rollback so fixture
  inserts never persist past the test run. Snapshots baseline live
  count BEFORE inserts so the test is also isolated from existing
  operator data (e.g. the 4 demo clusters from `make seed-demo`).

- ACCEPTED #2 (Low): negative-only assertion (1 soft-deleted → 0)
  would pass against a broken-but-symmetric SQL like `WHERE
  deleted_at IS NOT NULL` (always returns 0). Added a second test
  pinning the mixed case (1 live + 1 soft-deleted → delta=1) so both
  halves of the contract are guarded: EXCLUDE deleted rows AND COUNT
  live rows.

Both tests now use the same baseline-snapshot + transaction-rollback
pattern. Shared `_insert_cluster_sql()` helper builds INSERT SQL +
bound params for a live or soft-deleted row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 26, 2026
#268)

* fix(install): count only LIVE clusters in --if-empty auto-seed gate

scripts/seed_meaningful_demos.py:824's count_existing_clusters() ran
`SELECT COUNT(*) FROM clusters` with no `WHERE deleted_at IS NULL`
filter. Soft-deleted rows (left behind by E2E test cleanup, operator
manual deletes, or seed-then-wipe cycles) counted toward "exists"
and permanently false-skipped the auto-seed step at install.sh:95 on
every subsequent `make up` — leaving operators on an empty dashboard
with no in-product recovery path.

Empirically reproduced earlier this session: after `make down`/`make
up`, operator's `/` showed zero clusters; DB had 7 cluster rows but
all with non-null `deleted_at` (~2026-05-26 13:32 UTC, from earlier
E2E test cleanup); auto-seed printed "skipping — 7 cluster(s) already
exist"; the public API correctly filtered the dead rows.

Fix:
- Extract SQL to module-level constant `_COUNT_LIVE_CLUSTERS_SQL` so
  it's testable from unit-layer.
- New SQL: `SELECT COUNT(*) FROM clusters WHERE deleted_at IS NULL;`
  Aligns with the public API's view of "exists" (same filter used
  throughout backend/app/db/repo/cluster.py).
- Two-layer regression coverage:
  - unit (no DB needed): assert the constant's SQL includes
    `WHERE deleted_at IS NULL` (whitespace-normalized) — catches
    regression-by-edit if someone deletes the filter
  - integration (Postgres required): seed 1 soft-deleted cluster row
    via raw SQL, run the constant, assert COUNT = 0 — catches
    regression-by-semantic (e.g. swapping `IS NULL` for `= NULL`)

This is part 1 of a bundled fix for the stuck-stack self-rescue
problem. Part 2 (next commit) tightens the dashboard's "Reset to
demo state" disclosure gating in
ui/src/components/dashboard/start-here-checklist.tsx so operators
who land in this state can self-rescue without CLI knowledge.

See bug_fix.md for full design + locked decisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dashboard): show Reset disclosure whenever no live clusters

ui/src/components/dashboard/start-here-checklist.tsx:150 gated the
"Reset to demo state" disclosure on a 3-way AND
(`!hasClusters && !hasQuerySetsWithJudgments && !hasStudies`)
that modeled "truly pristine first-run." But the realistic stuck
state — data orphaned without any live clusters — is exactly when
the operator needs the rescue affordance most, and the strict
predicate hid it. Reproduced earlier this session: operator's `/`
showed Step 1 (Register cluster) as NOT done, Steps 2 + 3 as Done,
and no disclosure anywhere — forcing CLI knowledge of `make
seed-demo FORCE=1` to recover.

Without live clusters every cluster-scoped operation fails, so the
orphan studies + query_sets are unusable. The disclosure should
always be available whenever there's no live cluster to use.

Fix:
- Tighten the predicate to `!hasClusters` only. The other two signals
  are downstream of having a cluster; gating on them was incidental.
- Keep `hasClusters=true` as the "hide" trigger so operators with a
  working stack don't accidentally click Reset (the ResetDemoStateButton
  confirm dialog is the remaining defense).
- Inline-comment the predicate change with a pointer to the bug folder
  so a future reader sees why the trivially-simple `!hasClusters`
  predicate has explanatory weight behind it.

Test coverage:
- Flip 2 of the 3 old AC-8 hide-tests to assert the new render
  behavior at single-orphan states.
- Add a 3rd positive test pinning the exact stuck state (no clusters
  + orphan studies + orphan query_sets) from the 2026-05-26 session.
- Keep AC-7 (all three false → renders) and one AC-8 case
  (hasClusters=true → hides) and the "all three true → returns null"
  unchanged.

Part 2 of a bundled stuck-stack self-rescue fix. Part 1 (commit prior)
fixes `scripts/seed_meaningful_demos.py` so the auto-seed gate counts
only live clusters. Together they restore both the auto-recovery
path (on `make up`) and the in-product rescue path (the disclosure).

See bug_fix.md for full design + locked decisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(integration): GPT-5.5 round 1 — both findings accepted

Adjudicates GPT-5.5 final-review findings on PR #268:

- ACCEPTED #1 (Medium): integration test used `DELETE FROM clusters`
  which would wipe operator data if the test DB is shared with a
  working stack. Refactored to use transaction rollback so fixture
  inserts never persist past the test run. Snapshots baseline live
  count BEFORE inserts so the test is also isolated from existing
  operator data (e.g. the 4 demo clusters from `make seed-demo`).

- ACCEPTED #2 (Low): negative-only assertion (1 soft-deleted → 0)
  would pass against a broken-but-symmetric SQL like `WHERE
  deleted_at IS NOT NULL` (always returns 0). Added a second test
  pinning the mixed case (1 live + 1 soft-deleted → delta=1) so both
  halves of the contract are guarded: EXCLUDE deleted rows AND COUNT
  live rows.

Both tests now use the same baseline-snapshot + transaction-rollback
pattern. Shared `_insert_cluster_sql()` helper builds INSERT SQL +
bound params for a live or soft-deleted row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): clusters table has no `updated_at` column

CI failure on PR #268: backend integration job hit
`asyncpg.exceptions.UndefinedColumnError: column "updated_at" of
relation "clusters" does not exist` on both integration tests in
test_seed_meaningful_demos_if_empty.py.

The clusters table only has `created_at` + `deleted_at` (see
backend/app/db/models/cluster.py:83-88). I assumed updated_at
existed when authoring the INSERT helper but never validated against
the live schema before pushing — should have run the integration
test locally via `make test-worktree` or checked the model file.

Fix: remove updated_at from the INSERT helper's columns + params.

Local validation: 1453 unit tests still green. The integration tests
themselves skip locally (Postgres not reachable from host shell);
CI will exercise them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 26, 2026
… + fix dashboard link drift

Adjudicates GPT-5.5 final-review findings on PR #270:

- ACCEPTED #1 (Medium): the venv-ownership check could false-pass on a
  broken image where /app/.venv is missing (find returns empty → wc -l = 0).
  Added `[ -d /app/.venv ] ||` guard before the find pipeline.

- ACCEPTED #3 (Low): inline comment claimed the new step fires BEFORE the
  smoke job — but GHA dispatches jobs in parallel. Reworded to
  "fast-fail signal" with the actual value-prop.

- ACCEPTED #2 (Low): dashboard regen at MVP1_DASHBOARD.md copied the
  idea.md `../../../00_overview/...` link literally — resolves above
  repo root from docs/00_overview/. Replaced with plain-text slug +
  separate path mention so the dashboard output is clean.

Dashboards regenerated by pre-commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 26, 2026
* infra(ci): add runtime image-invariant smoke to docker buildx job

The `docker buildx (relyloop/api)` job builds the runtime image with
`push: false` and `cache-to: type=gha,mode=max` — verifies the image
BUILDS but never RUNS it. The companion static unit test at
backend/tests/unit/test_dockerfile_runtime_stage.py (3 tests) catches
Dockerfile-structural regressions but cannot catch runtime-state
regressions (e.g., a future uv release writes dist-info files
elsewhere; an upstream Python base image switches default UID).

Add `load: true` to the build step + a new "Verify runtime image
invariants" step that `docker run`s the freshly-built image and
asserts two invariants:

1. `/app/.venv` is fully relyloop-owned (the
   bug_dockerfile_venv_root_owned_after_user_switch failure mode —
   blocks `uv run` from the relyloop user in any one-shot container
   like `make test-worktree`).
2. Default user is relyloop (uid 1000), not root.

Each invariant maps to a captured bug folder for traceability.
Fires BEFORE the heavier `smoke (operator-path tutorial flow)` job
has to bring up the full Compose stack, so image-construction
regressions fail in <30s instead of after 5+ minutes of `make up`.

Idea preflight (Audit & Patch mode) also landed three doc edits on
the originating idea file: P3 → P2 (matches the dashboard regen's
recognized priority tiers per scripts/build_mvp1_dashboard.py:217-245);
Status line updated to reference the merged PR #263 / #264 SHAs; the
stale infra_ci_smoke_makeup reference rewritten to point at the
live smoke job in pr.yml. Dashboard regenerated by pre-commit hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): GPT-5.5 round 1 — guard missing /app/.venv + soften ordering + fix dashboard link drift

Adjudicates GPT-5.5 final-review findings on PR #270:

- ACCEPTED #1 (Medium): the venv-ownership check could false-pass on a
  broken image where /app/.venv is missing (find returns empty → wc -l = 0).
  Added `[ -d /app/.venv ] ||` guard before the find pipeline.

- ACCEPTED #3 (Low): inline comment claimed the new step fires BEFORE the
  smoke job — but GHA dispatches jobs in parallel. Reworded to
  "fast-fail signal" with the actual value-prop.

- ACCEPTED #2 (Low): dashboard regen at MVP1_DASHBOARD.md copied the
  idea.md `../../../00_overview/...` link literally — resolves above
  repo root from docs/00_overview/. Replaced with plain-text slug +
  separate path mention so the dashboard output is clean.

Dashboards regenerated by pre-commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 26, 2026
Adjudicates GPT-5.5 final-review findings on PR #273:

- ACCEPTED #1 (Medium): the POST response assertion used `.ok()`
  which accepts any 2xx status. A future 200/202/204 regression
  would silently pass. Tightened to `.status() === 201` in BOTH
  this spec AND the v1 clone-spec (study-clone.spec.ts:24) to avoid
  divergence — the same loose `.ok()` pattern lived in both files;
  tightening one without the other would have created confusion.

- ACCEPTED #2 (Low): after re-checking the narrow-bounds checkbox
  to put the textarea back to the clamped state for submit, the
  test immediately clicked `step-next` without explicitly waiting
  for the textarea to reflect the re-applied clamp. A fast click
  could have submitted the still-restored source bounds and hidden
  a real bug behind a race. Added an `expect.poll` wait that
  re-parses cs-search-space inputValue and asserts boost.low close
  to 2.0 before advancing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 26, 2026
…nd-trip (#273)

* chore(test): extend study-clone-narrow-bounds spec to full submit round-trip

Picks up the AC-12 coverage gap that bug_clone_e2e_seed_template_params_mismatch
(PR #265) unblocked. Before that fix, the spec stopped at the textarea-clamped
assertion because Step-4 Next was blocked by the seed-helper/template
mismatch. With PR #265 merged, the spec can now exercise the full 6-step
plan from its own docblock.

What changed in this PR:
- Added `API_BASE` constant at top of file (mirrors clone-spec convention).
- After the existing check/assert/uncheck/restore sequence, re-check
  the narrow-bounds checkbox to put the textarea back to the clamped
  [2.0, 3.0] state for submit.
- Advance Step 4 → Step 5 via step-next; assert step-5 visible.
- Click "Create study"; capture the POST response promise; assert 201
  and capture the new study id (pattern lifted from
  ui/tests/e2e/study-clone.spec.ts:24).
- GET /api/v1/studies/{new_id}; assert:
  - parent_study_id === sourceId (FR-9 lineage round-trip)
  - search_space.params.boost.type === 'float'
  - search_space.params.boost.low close to 2.0 (FR-12 — server
    accepted the narrowed rewrite)
  - search_space.params.boost.high close to 3.0 (same)

Test name updated from "textarea is clamped and restores on uncheck"
to "Clone study → narrow bounds → submit → persisted search_space is
clamped" to match the now-complete 6-step coverage. Docblock Step 5
expanded to call out the new check → uncheck → re-check sequence so
the next reader sees the intent (re-check is to prepare for submit,
not redundant with the initial check).

Local validation: exit code 0 against the live local stack; the
test/spec/suite metadata in playwright's JSON reporter shows `"ok":
true` for the spec. CI smoke job will run the full assertion stack
against the GHA-provisioned stack.

Idea folder will move to implemented_features in the finalize PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): GPT-5.5 round 1 — tighten POST assertion + race-safe re-check

Adjudicates GPT-5.5 final-review findings on PR #273:

- ACCEPTED #1 (Medium): the POST response assertion used `.ok()`
  which accepts any 2xx status. A future 200/202/204 regression
  would silently pass. Tightened to `.status() === 201` in BOTH
  this spec AND the v1 clone-spec (study-clone.spec.ts:24) to avoid
  divergence — the same loose `.ok()` pattern lived in both files;
  tightening one without the other would have created confusion.

- ACCEPTED #2 (Low): after re-checking the narrow-bounds checkbox
  to put the textarea back to the clamped state for submit, the
  test immediately clicked `step-next` without explicitly waiting
  for the textarea to reflect the re-applied clamp. A fast click
  could have submitted the still-restored source bounds and hidden
  a real bug behind a race. Added an `expect.poll` wait that
  re-parses cs-search-space inputValue and asserts boost.low close
  to 2.0 before advancing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): Gemini round 1 — URL constructor on new request.get + tangential capture

Adjudicates Gemini Code Assist finding on PR #273:

ACCEPTED (Medium): the new `request.get(`${API_BASE}/api/v1/studies/
${created.id}`)` at line 135 uses string-concat, brittle to a future
PLAYWRIGHT_API_BASE_URL ending with a trailing slash. Switched to
`new URL(..., API_BASE).toString()`.

Sweep DEFERRED for the 4 sibling sites (study-clone.spec.ts:40,91 +
followup_run.spec.ts:92,176) — fixing some-but-not-all creates drift.
Captured as chore_e2e_api_base_url_construction for a focused sweep.

Dashboards regenerated by pre-commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(planned): capture bug_smoke_studies_data_table_search_flake

New smoke flake surfaced on PR #273 CI (commit 5bc44e4) — test
was passing on pre-PR main but failed on PR run. My diff doesn't
touch the failing test or its code paths; most likely genuine
flake. Same pattern as bug_smoke_dashboard_demo_state_locator_missing
and bug_smoke_followup_clone_e2e_flakes captures. Next main CI run
post-merge will provide a second data point.

Dashboards regenerated by pre-commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 27, 2026
#1)

The Protocol's list_documents docstring still claimed _id sorting after
the implementation flipped to _doc per spec D-26 fallback (ES 9 disables
_id fielddata by default). This created a small contract drift between
the abstract method and the ElasticAdapter implementation.

Updates the docstring to describe the actual contract: adapters paginate
via search_after over a stable internal sort key, with ElasticAdapter
specifically using _doc, and document the spec D-26 rationale inline.
Other engines remain free to pick any stable key as long as
hits[i].sort is returned for search_after round-trips.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 27, 2026
…P1) (#285)

* docs(planned): feat_index_document_browser idea + spec + plan

- idea: 9 locked decisions (D-1..D-9) + IA route hierarchy after /idea-preflight
- feature_spec: 12 FRs, 20 ACs, 28 decisions; 3 GPT-5.5 cycles (33 findings, all accepted)
- implementation_plan: 13 stories across 3 epics; 3 GPT-5.5 cycles (27 findings, 26 accepted + 1 rejected with cited counter-evidence)
- pipeline_status: spec + plan stages approved, ready for implementation
- mvp1_dashboard regen captures the new feature in the Idea table

Substrate: read-only document browser surfaced from a live demo of
tune-acme-products-rich-boosts. Adds 2 SearchAdapter Protocol methods
(get_document + list_documents), 2 new backend endpoints, 1 endpoint
extension (studies ?target= filter), 3 new frontend pages, 2 modified
frontend surfaces. No migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(adapters): SearchAdapter Protocol additions — Document + DocumentPage + 2 methods (Story 1.1)

- Add Document(doc_id, source) — return shape of get_document
- Add AdapterDocumentHit(doc_id, source, sort) — per-hit list page entry; the
  per-hit sort lets the router encode cursor from the in-body hit under
  the limit+1 overfetch pattern (no last_sort field per cycle-2 F1)
- Add DocumentPage(hits, total) — return shape of list_documents
- Add get_document / list_documents Protocol method signatures
- Update _StubAdapter + protocol async-method assertion (5 names → 7)
- Add 10 new unit tests for the 3 models (28 → 29 → covers Field(min_length=1)
  on doc_id, _source=None case, missing-sort rejection, no-last_sort guard)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(adapters): ElasticAdapter.get_document + list_documents (Stories 1.2 + 1.3)

Story 1.2 — get_document:
- GET /<target>/_doc/<doc_id> via _request(translate_errors=False)
- found:false → None; index_not_found_exception → TargetNotFoundError
- 401/403 → TargetsForbiddenError (matches list_targets pattern; spec FR-1)
- urllib.parse.quote on target + doc_id (spec D-25 + AC-16)
- 10 unit tests covering all 6 paths + URL encoding

Story 1.3 — list_documents:
- POST /<target>/_search with match_all + sort _id:asc + track_total_hits:true (spec D-24)
- search_after threads into body; fields → _source.includes
- Returns DocumentPage(hits=[AdapterDocumentHit(doc_id, source, sort)], total)
- Per-hit sort uses h["sort"] (fail-loud KeyError if engine omits; cycle-2 F10)
- 14 unit tests covering happy path, body shape, error mappings, sort fail-loud

Both methods route through the existing _request retry/translate plumbing
so they inherit the spec §13 single-retry budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: mark feat_index_document_browser Epic 1 complete (paused checkpoint)

After Stories 1.1 + 1.2 + 1.3 committed, pause for operator-chosen
checkpoint before Epic 1 phase gate (GPT-5.5 review + full suite) and
the rest of the 10 remaining stories. Branch feat/index-document-browser
holds the work; resume with /impl-execute ... --all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): documents-browse helper modules (Story 2.1)

Lands the pure-function helpers that Story 2.2 (list endpoint) and Story
2.3 (detail endpoint) compose:

- _errors.py: shared _err() builder for the spec §7.5 envelope; resolves
  the F6 circular-import risk so helpers don't import from clusters.py.
  clusters.py switches to re-export.
- _documents_cursor.py: base64-urlsafe JSON encode/decode of the ES
  hits[i].sort array. Decode validates the parsed value is a list
  (cycle-2 F9) so a syntactically-valid encoded {} can't slip through
  to ES.
- _documents_fields.py: parse_fields_csv() per FR-3 (trim, dedup,
  dotted-paths, wildcard-reject).
- _strict_query_params.py: FastAPI dependency factory that rejects
  unknown query params with 422 VALIDATION_ERROR (per D-21 / FR-12).
- services/documents.py: truncate_source_for_list() with per-field +
  whole-doc UTF-8 byte caps per D-27, plus exported sentinel constants.
- services/_target_filter.py: check_target_visible() glob matcher for
  cluster.target_filter (anti-enumeration via 404, per D-13).

39 unit tests across the 5 helpers; mypy --strict clean; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): documents list + detail endpoints (Stories 2.2 + 2.3)

Lands two new engine-passthrough endpoints under
/api/v1/clusters/{cluster_id}/targets/{target}:

- GET .../documents — paginated (cursor + limit+1 overfetch) _id +
  truncated _source preview; X-Total-Count header; strict query-param
  allowlist (cursor/limit/fields) + 422 on unknown; ?fields= CSV with
  wildcard rejection; target_filter glob enforced as 404 (anti-
  enumeration per D-13).
- GET .../documents/{doc_id:path} — single document detail; FastAPI
  :path converter round-trips slashes (D-17 / AC-16); 404
  DOCUMENT_NOT_FOUND distinct from TARGET_NOT_FOUND.

Adapter sort key flipped from _id → _doc per spec D-26 fallback: ES 9
disallows _id fielddata by default (indices.id_field_data.enabled is
off), surfacing as HTTP 400 on first live-ES probe. _doc is shard-
internal but always available; search_after works against it
identically. Cursor continuity tradeoff acknowledged in adapter
docstring.

Validation moved before DB I/O so malformed cursor / wildcard fields
always surface as 422 regardless of cluster_id validity.

DocumentSummary + DocumentListResponse added to schemas.py; detail
endpoint returns the adapter Document model directly (single source
of truth — no router-side schema drift per cycle-2 F3).

Tests:
- 11 contract tests (5 router-source + 7 endpoint via real app; ES not
  required for any). Asserts the 6 spec §7.5 codes + X-Total-Count
  header + strict-dep wiring + truncation sentinel.
- 21 integration tests (16 stub-adapter for full error-code coverage +
  5 live-ES happy path covering AC-15 exact-multiple termination,
  fields filter, doc-id-with-slash round-trip, full corpus paginate).
  Both layers skip cleanly when prerequisites absent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): ?target= filter on GET /api/v1/studies (Story 2.4 / FR-5)

list_studies() + count_studies() repo functions gain a `target: str | None`
kwarg (default None — source-compatible per D-22). The /api/v1/studies
endpoint adds a corresponding ?target=<min_length=1, max_length=256>
Query param that composes with every existing filter (status, cluster_id,
since, q, sort) via AND.

The frontend's "view studies targeting this index" link from the index
summary page (FR-7) consumes this filter; the same param drives the
LinkedEntitiesRow's filter-chip experience (FR-10, lands in Story 3.5).

Tests:
- 6 integration tests covering: target alone, target+cluster_id AND
  composition, target+status AND composition, default unchanged
  behavior, target+malformed-cursor → 422 (cycle-3 F2 — target filter
  doesn't bypass cursor validation), empty target → 422.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): Indices card on cluster detail (Story 3.1 / FR-6)

New ClusterDetailIndicesCard reuses the existing useClusterTargets hook
to render an indices table on the cluster-detail page, between the
ClusterActionBar and the StudiesByClusterTable card.

Each row links to /clusters/[id]/indices/[name] (lands in Story 3.2).
Indices sort by name ascending via localeCompare regardless of engine
return order. Doc count formatted with toLocaleString(); null fallback
renders an em-dash.

Error states wire the standard backend envelope codes to inline copy:
- TARGETS_FORBIDDEN → ACL hint citing the monitor privilege +
  cluster-registration runbook.
- CLUSTER_UNREACHABLE → Retry button calling query.refetch().

5 new glossary keys added (cluster.indices_card, cluster.target_doc_count,
target.schema, target.schema_analyzer, document.truncation_sentinel) —
all per-key length validators clean (<=140 short / <=800 long).

5 vitest tests cover sort, doc_count formatting, empty state, 403
TARGETS_FORBIDDEN inline hint, and the CLUSTER_UNREACHABLE retry button
render.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): index summary page (Story 3.2 / FR-7 / AC-2)

New route /clusters/[id]/indices/[name]/page.tsx composes the existing
/targets + /schema endpoints (no new API call) into a summary page with:
- Page header: <index name> · <formatted doc_count> · <engine type chip>
- Two nav cards: "Browse documents" + "View studies targeting this
  index" (links to /studies?cluster_id=...&target=... per D-3)
- Sortable schema table extracted as IndexSummarySchemaTable for reuse.

Error states cover every spec surface:
- 404 TARGET_NOT_FOUND on /schema → AC-17 empty state with breadcrumb.
- 403 on /targets + 200 on /schema → D-28 partial-permission state
  ("document count unknown").
- 403 on BOTH → cycle-2 F8 full-denial state.

4 vitest tests cover happy path, 404 not-found, partial-permission, and
full denial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): documents list page (Story 3.3 / FR-8)

New /clusters/[id]/indices/[name]/documents page renders the paginated
documents browse:
- _id + truncated _source preview (first 3 fields rendered as key=value).
- Page header reads X-Total-Count from the response header (cycle-2 F8
  resolution) and renders <toLocaleString> documents.
- Prev / Next cursor controls with cursorStack so Prev steps back one
  page rather than re-loading the first page. Page-size selector
  (25 / 50 / 100, default 25, max 100). Cursor stack resets on
  page-size change.
- Row click navigates to .../documents/[doc_id] with the doc_id
  URL-encoded (lands in Story 3.4).
- Truncation sentinel rendered verbatim with InfoTooltip linking to the
  document.truncation_sentinel glossary entry; matches the backend
  literal exported via ui/src/lib/documents-constants.ts.
- Error states: 403 TARGETS_FORBIDDEN inline ACL hint, 404
  TARGET_NOT_FOUND inline, 503 CLUSTER_UNREACHABLE Retry button.

New lib/api/documents.ts hook (useTargetDocuments) parses X-Total-Count
from the response headers; inline DocumentSummary / DocumentListResponse /
Document interfaces until the OpenAPI codegen regenerates ui/src/lib/
types.ts.

6 vitest tests cover happy paginated render, truncation sentinel,
empty state, 403, 404, and CLUSTER_UNREACHABLE retry button.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): document detail page (Story 3.4 / FR-9)

New catch-all route /clusters/[id]/indices/[name]/documents/[...doc_id]
per D-17 — accepts doc_ids containing literal slashes (AC-16).

Renders the full _source as pretty-printed JSON (JSON.stringify(..., 2))
inside a max-height scroll container. Copy-JSON button uses
navigator.clipboard.writeText with a 1.5s "Copied" indicator. Hidden
when source is null.

Breadcrumb: Cluster > <index name> > Documents > <doc_id>.

States:
- AC-5 happy: pretty JSON + Copy button.
- AC-9: 404 DOCUMENT_NOT_FOUND empty state with breadcrumb.
- AC-18: source=null → "_source: false" amber hint, Copy button hidden.
- AC-16: doc_id with literal slashes round-trips (URL-encoded twice on
  the wire so Next.js decodes once into the literal value).
- 403 TARGETS_FORBIDDEN inline ACL hint.
- 503 CLUSTER_UNREACHABLE + TARGET_NOT_FOUND inline messages with Retry.

5 vitest tests cover happy path, DOCUMENT_NOT_FOUND, source=null,
slash-in-doc_id round-trip, and TARGETS_FORBIDDEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): LinkedEntitiesRow Index entry + studies ?target= chip (Story 3.5)

LinkedEntitiesRow grows from 4 to 5 entries (FR-10 / AC-11) — appends
"Index" pointing at /clusters/[id]/indices/[name] with cluster_id +
target URL-encoded.

useStudies hook signature extended with target?: string | undefined;
the query key includes target so cache keys differ across filter
values (no false-positive rehydration).

Studies list page now consumes ?target= and ?cluster_id= from
searchParams. When ?target= is present, renders an active filter chip
"Target: <name>" with an × button. Clicking × calls router.replace
with a URLSearchParams copy that drops target= while preserving every
other param (status, cluster_id, sort, cursor) — F9 resolution.

3 vitest tests on LinkedEntitiesRow cover the 5 entries, the
linked-index href shape, and target-with-special-chars URL encoding.
3 vitest tests on the studies page cover chip render when ?target= is
present, × click drops target while preserving cluster_id, and chip
absent when ?target= is missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): index-document-browser top-down + filter chip spec (Story 3.6)

New Playwright spec exercises the full feat_index_document_browser flow
against the real backend (no page.route() mocking per CLAUDE.md):

1. Top-down: bulk-load 30 docs into a unique ES test index via the live
   ES cluster, register a cluster pointing at it, navigate
   /clusters/[id] → Indices card → click index → summary page → Browse
   documents → click first row → document detail; assert each page's
   testids + the JSON preview.
2. Filter chip: navigate /studies?target=hypothetical-index, assert
   the active filter chip renders + clicking × drops the param from
   the URL.

Resolves OQ-1 by seeding the corpus inline rather than depending on
scripts/seed_meaningful_demos.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(planned): mark feat_index_document_browser implementation complete

All 13 stories shipped across Epics 1–3:
- Adapter Protocol + ElasticAdapter (Stories 1.1–1.3)
- Documents helper modules + list/detail endpoints + studies ?target=
  filter (Stories 2.1–2.4)
- 4 frontend routes + LinkedEntitiesRow + filter chip + E2E
  (Stories 3.1–3.6)

Updates implementation_plan.md execution tracker (all stories [x]) and
pipeline_status.md to "Complete — PR pending" with a precise inventory
of tests added.

Captures one tangential discovery as an idea file:
- bug_starlette_request_poisons_fastapi_depends_tests — a test-infra
  footgun where mixing direct Request(scope=...) instantiation and
  FastAPI Depends(...) in the same test file breaks the latter
  silently. Workaround documented; reproducer + upstream issue
  deferred.

Dashboards regenerated by the pre-commit hook to pick up the new
bug_starlette_request_poisons_fastapi_depends_tests folder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(contract): register the 2 new documents endpoints in EXPECTED_ENDPOINTS

CI's test_openapi_surface.py::test_openapi_has_no_orphan_endpoints flagged
the documents list + detail endpoints as orphan — the routes ship without
an entry in EXPECTED_ENDPOINTS, which would let future drift go uncaught.

Adds the two (method, path) tuples bracketing the existing
/run_query entry, mirroring its convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(adapter): guard ES 404 .json() + compact cursor encoding (Gemini PR #285)

Adjudicated Gemini Code Assist's 5 medium-priority findings on PR #285:

Accepted (3):
- get_document 404 path — wrap resp.json() in try/except so an intermediate
  proxy returning a non-JSON HTML 404 page surfaces as None/return rather
  than bubbling JSONDecodeError as a 500.
- list_documents 404 path — same guard; falls through to
  ClusterUnreachableError on malformed body, same as the non-index-not-found
  branch.
- encode_documents_cursor — use separators=(",", ":") so the base64 blob
  in query strings stays compact.

Rejected (2) with cited counter-evidence:
- get_document 200-path JSON guard: existing adapter convention (list_targets
  at elastic.py:413, get_schema at :282, search_batch at :638/701) all
  call resp.json() unguarded on the success path. Adding a guard here
  would create asymmetric handling across the adapter; the assumption is
  that a 200 from ES is well-formed JSON.
- list_documents 200-path JSON guard: same rationale.

The 3 accepted fixes ship in this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(adapters): align Protocol docstring with _doc sort (GPT-5.5 cycle-1 #1)

The Protocol's list_documents docstring still claimed _id sorting after
the implementation flipped to _doc per spec D-26 fallback (ES 9 disables
_id fielddata by default). This created a small contract drift between
the abstract method and the ElasticAdapter implementation.

Updates the docstring to describe the actual contract: adapters paginate
via search_after over a stable internal sort key, with ElasticAdapter
specifically using _doc, and document the spec D-26 rationale inline.
Other engines remain free to pick any stable key as long as
hits[i].sort is returned for search_after round-trips.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(planned): capture infra_smoke_job_chronic_flake

The pr.yml workflow has been failing/cancelled on main for ~20
consecutive pushes since PR #188 (2026-05-23) added the seed-demo +
Playwright steps to the smoke job. Every other job (backend, frontend,
docker buildx) passes; only `smoke (operator-path tutorial flow)` is
red.

Surfaced during PR #285 (feat_index_document_browser) CI watch when
the operator asked whether the seed step had been failing since
introduction. The seed step itself succeeds — it's just slow enough
to push the job into the Playwright window where the flakes live, and
concurrency policy cancels in-flight runs.

Idea file describes the 4-axis cost stack (seed-demo unconditional
re-run, Playwright flakes, no concurrency: block, push:main trigger
surface) and proposes a 3-phase fix path (cheap signal recovery →
Playwright bisect → workflow split).

P1 priority — the pr workflow's red badge on main no longer signals
real regressions; a real backend break would currently look identical
to the chronic state.

Dashboards regenerated by the pre-commit hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 28, 2026
PR #291 first CI run surfaced an integration-test FK collision under
parallel pytest execution:

  IntegrityError: update or delete on table "clusters" violates foreign
  key constraint "query_sets_cluster_id_fkey" on table "query_sets"

Triggered at teardown of TestClusterRepoBasics.test_create_then_fetch_round_trip
when a parallel xdist worker held an FK reference to a cluster being
deleted in another worker's teardown. The integration test layer isn't
isolated enough at the DB level for parallel execution against a shared
Postgres.

Reverting `-n auto` on both backend jobs:
- backend-unit-fast: was 33s baseline, went to 43s with -n auto (xdist
  worker-startup overhead exceeded parallelism savings on the 2-core
  runner for an already-fast test suite). Revert is a net win.
- backend (lint + typecheck + tests + coverage): was 6m 1s baseline,
  parallel execution introduced FK collisions. Revert restores
  correctness.

Keeping pytest-xdist in dev deps for local opt-in (the unit-only path
is parallel-safe locally where the runner has more cores; the
worker-startup overhead is dominated by 1500+ unit tests on 8+ cores).

Follow-up captured in chore_ci_perf_buildx_artifact_image_cache_xdist
§"What is NOT changed in this PR": split the backend job into a
parallel-safe "unit + contract" lane (where -n auto helps) + a serial
"integration" lane (where collisions live) to recover #3's intended
savings without the correctness risk.

CI-perf #1 (buildx artifact handoff) + #2 (base-image cache) are
unaffected and remain the actual smoke-pace wins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
SoundMindsAI added a commit that referenced this pull request May 28, 2026
…mage cache + pytest-xdist) (#291)

* chore(ci): buildx artifact handoff + base-image cache + pytest-xdist

Optimizes the two longest CI jobs to bring smoke under its 15min ceiling
(was timing out on PR #290 after the docker bumps invalidated the
implicit Docker layer cache).

Three changes bundled:

  1. Buildx artifact handoff (~5min off smoke). The `docker` job already
     builds the API image; add a parallel `docker-ui` job for the UI
     image. Both upload tars via actions/upload-artifact. Smoke
     `needs: [docker, docker-ui]`, downloads + docker-loads both before
     `make up`. RELYLOOP_SKIP_BUILD=1 makes install.sh skip its
     `docker compose build` step (new escape hatch in install.sh step 6);
     RELYLOOP_GIT_SHA picks up the loaded images by tag.

  2. Base-image cache (~1-2min off smoke on hit). actions/cache keyed on
     hashFiles('docker-compose.yml'). On miss: docker pull + save the 4
     service-container images. On hit: docker load each tar — ~5s vs
     60-90s for pull.

  3. pytest-xdist parallel execution (~3min off backend full + ~15s off
     backend-unit-fast). pytest-xdist>=3.6 added to dev deps; `-n auto
     --dist worksteal` passed to backend pytest commands.

Expected impact: smoke 13-15min → ~7-9min; backend full 8m 36s → ~4-5min.
Total ~7-10min saved per PR run.

See chore_ci_perf_buildx_artifact_image_cache_xdist/idea.md for the
risk analysis (xdist DB collisions, artifact overhead, cache key
staleness) and the not-in-scope follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(ci): revert -n auto on backend jobs (xdist DB collisions)

PR #291 first CI run surfaced an integration-test FK collision under
parallel pytest execution:

  IntegrityError: update or delete on table "clusters" violates foreign
  key constraint "query_sets_cluster_id_fkey" on table "query_sets"

Triggered at teardown of TestClusterRepoBasics.test_create_then_fetch_round_trip
when a parallel xdist worker held an FK reference to a cluster being
deleted in another worker's teardown. The integration test layer isn't
isolated enough at the DB level for parallel execution against a shared
Postgres.

Reverting `-n auto` on both backend jobs:
- backend-unit-fast: was 33s baseline, went to 43s with -n auto (xdist
  worker-startup overhead exceeded parallelism savings on the 2-core
  runner for an already-fast test suite). Revert is a net win.
- backend (lint + typecheck + tests + coverage): was 6m 1s baseline,
  parallel execution introduced FK collisions. Revert restores
  correctness.

Keeping pytest-xdist in dev deps for local opt-in (the unit-only path
is parallel-safe locally where the runner has more cores; the
worker-startup overhead is dominated by 1500+ unit tests on 8+ cores).

Follow-up captured in chore_ci_perf_buildx_artifact_image_cache_xdist
§"What is NOT changed in this PR": split the backend job into a
parallel-safe "unit + contract" lane (where -n auto helps) + a serial
"integration" lane (where collisions live) to recover #3's intended
savings without the correctness risk.

CI-perf #1 (buildx artifact handoff) + #2 (base-image cache) are
unaffected and remain the actual smoke-pace wins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(ci): seed_es shard race + verify-script tolerates indented build line

Two follow-up fixes for PR #291 surfaced by the 2nd CI run:

1. seed_es.py: ES 9.x single-node shard race
   ---------------------------------------
   The faster stack-up (~3min vs ~10min baseline) stopped masking a
   latent race: seed_es creates the products index then immediately
   bulk-indexes. ES 9.x default `number_of_replicas: 1` tries to
   allocate a replica that can never bind on a one-node cluster,
   leaving the primary itself in an INITIALIZING → STARTED race that
   surfaces as `unavailable_shards_exception: [products][0] primary
   shard is not active Timeout: [1m]` on the bulk-index call.

   Fix:
   - Set `number_of_replicas: 0` in the index create call (single-node
     hygiene; the previous default-1 left an unallocatable replica)
   - Add an explicit `_cluster/health/{index}?wait_for_active_shards=1`
     poll between create and bulk so the failure mode (if it ever
     resurfaces) gives a clean error instead of the 1-min ES bulk-side
     timeout

2. scripts/ci/verify_install_builds_all_services.sh: regex too strict
   ------------------------------------------------------------------
   PR #291 added a RELYLOOP_SKIP_BUILD=1 escape hatch in install.sh
   that wraps `docker compose build` in an `if [[ ... ]]; then` block.
   The line is now indented 2 spaces. The verify script's anchor
   `^docker compose build` no longer matched, so the check failed
   "no 'docker compose build' line found" even though the line was
   present (just indented).

   Fix:
   - Allow leading whitespace in the grep + the sed strip:
     `^[[:space:]]*docker compose build( .*)?$`
   - Indentation is irrelevant to the drift this gate exists to catch;
     what matters is that the args list covers every buildable service.

Both surfaced by PR #291 CI but trace back to existing behavior
(seed_es race was masked by previous slow stack-up; verify script
became incompatible with the conditional wrapper). Local verify of
each:
- bash scripts/ci/verify_install_builds_all_services.sh → "OK (no-args
  = builds all)"
- seed_es.py change is straight schema-additive (settings block) +
  one extra GET call; no behavior change on multi-node clusters

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(ci): skip install.sh auto-seed in smoke job (was eating 5min/run)

PR #291's 3rd CI run smoke job hit timeout-minutes: 15 again at 15m 27s.
Step-level breakdown showed "Bring up the stack" took 8m 5s vs the 2m 56s
seen on the 2nd run. The discrepancy: install.sh step 8 calls
`python3 scripts/seed_meaningful_demos.py --if-empty` unconditionally as
part of `make up`. That's the same demo-seed work I removed from pr.yml
in chore_drop_demo_seed_from_ci — but install.sh's auto-seed still ran
inside `make up`.

The smoke job no longer needs the demo data (the 2 demo-dependent E2E
specs `dashboard.spec.ts` + `dashboard-reseed.spec.ts` are skipped in CI
via playwright.config.ts's CI-conditional testIgnore from PR #290).

Fix: mirror the RELYLOOP_SKIP_BUILD escape-hatch pattern from PR #291.
Add RELYLOOP_SKIP_AUTO_SEED=1 honored by install.sh step 8, and set it
on the smoke job's `Bring up the stack` step. Saves ~5min per run.

Expected new smoke total: ~7-9 min (well under the 15min ceiling). The
previous 2nd-run's 6m 10s was the floor when the auto-seed happened to
fast-path (probably because something was already partially seeded);
this should be consistent now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* chore(ci): skip 4 more demo-data-dependent E2E specs in CI

PR #291's 4th CI run surfaced 6 additional E2E specs (across 4 spec
files) that depend on demo data populated by
scripts/seed_meaningful_demos.py — auto-followup chain, index-document
browser, Step-4 builder, Step-1 target picker. With
RELYLOOP_SKIP_AUTO_SEED=1 keeping install.sh's ~5min auto-seed out of
CI, these specs assert on data that doesn't exist and hit 14-iteration
retry timeouts.

Extends the CI-conditional testIgnore in ui/playwright.config.ts:

  Before:
    - dashboard.spec.ts
    - dashboard-reseed.spec.ts

  Added:
    - auto-followup.spec.ts
    - index-document-browser.spec.ts
    - studies-create-builder.spec.ts
    - studies-create-target-dropdown.spec.ts

Locally these specs continue to run after `make up` (the operator path
does the auto-seed). Only CI skips them.

Same trade-off as the original 2 from chore_drop_demo_seed_from_ci:
lose CI coverage of these 6 specific flows, accept that vitest
component coverage + the smoke test's LLM round-trip still validate
the underlying behavior. Net: smoke completes in ~5 min instead of
timing out at 15 min.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(seed-es): drop redundant cluster-health wait that hit httpx timeout

PR #291 5th CI run failed with httpx.ReadTimeout on the seed-es step
after exactly 30 seconds. Root cause: the cluster-health call I added in
the 3rd-run seed_es fix used `timeout=30s` server-side, which collided
with the httpx client's `timeout=30.0` set at module top. When ES is
cold (first run after image-cache miss) the health endpoint takes the
full 30s, exceeding the client timeout by a hair and raising
ReadTimeout.

The real fix from the 3rd-run commit was `number_of_replicas: 0` in the
index create call — that eliminates the unallocatable-replica race on
single-node ES. The extra cluster-health GET was belt-and-suspenders
that added more variance than safety. Dropping it.

If shard allocation ever does stall on single-node ES with
number_of_replicas=0, the bulk-index call's own 30s ES timeout will
surface a clear error path. No need to pre-check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(compose): ES + OS healthchecks gate on write-readiness, not just alive

PR #291's 5th + 6th smoke runs failed with httpx.ReadTimeout on the
seed-es step's index-create call. Pattern:
- Bring up the stack succeeds (1m 22s) — install.sh's
  `docker compose up -d --wait` returns when all healthchecks pass
- Seed clusters succeeds (4s) — Postgres + cluster API are responsive
- Seed sample ES index FAILS with httpx.ReadTimeout — ES says "alive"
  but isn't ready for write ops yet

Root cause: the ES + OS healthchecks tested mere responsiveness
(`curl -fs http://localhost:9200/_cluster/health`) rather than
write-readiness. ES returns 200 from /_cluster/health as soon as the
HTTP layer is up, even while shard allocation is still in flight. The
slow stack-up (10min baseline) masked this by ambient delay; PR #291's
fast stack-up (1m 22s) doesn't grant ES enough warmup before client
writes.

Fix: extend both healthchecks to use `wait_for_status=yellow&timeout=20s`.
This blocks the healthcheck until ES has at least yellow status (single-
node ES never goes green because the default 1-replica setting can't
allocate without a 2nd node — yellow is the right "ready for writes"
floor). healthcheck `timeout: 25s` gives the ES `timeout=20s` enough
headroom + 5s for the curl overhead.

Now `docker compose up --wait` doesn't return until ES is genuinely
ready for index ops, fixing the seed-es flake at its source rather
than papering over it with extra waits in seed_es.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(seed-es): bump httpx timeout 30s → 90s for cold-runner ES write path

PR #291's 6th + 7th smoke runs both failed at the seed-es step with
httpx.ReadTimeout on the `PUT /products` index-create call. Pattern:
the fast stack-up (compose `Bring up the stack`: 10min → 21s on this
run) eliminated the ~5min of ambient ES warmup time that the previous
slow path was implicitly granting. ES 9.4.1 on a cold GHA runner is
"healthy" per `_cluster/health?wait_for_status=yellow` (passes early on
single-node ES — no shards to wait on) but its write path takes >30s
to respond to the first index-create PUT.

Bump the httpx client timeout from 30s → 90s. This gives ES enough
headroom on a cold runner without making real failure modes invisible.

If smoke goes green on the next run, the locked-in CI-perf wins
(smoke 15min+ timeout → ~5min, buildx artifact handoff, base-image
cache) ship clean. If the 90s timeout still doesn't help, ES has a
genuine startup issue that needs separate investigation (potential
candidates: reverting OS 3.6.0 / ES 9.4.1 to prior majors, or sleep
in the workflow between `Bring up the stack` and `Seed sample ES
index`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(compose): roll back ES/OS healthcheck tightening (broke compose --wait)

The healthcheck tightening I added 2 commits back
(wait_for_status=yellow&timeout=20s) made `docker compose up -d --wait`
fail. PR #291's 8th smoke run died at "Bring up the stack" after
1m 33s when the new healthcheck couldn't satisfy within compose's
retry budget.

Root cause: my interpretation of "ES is slow to be write-ready" was
wrong direction. The 6th/7th runs' httpx.ReadTimeout on the index-
create PUT wasn't a healthcheck issue — ES was responsive at the
healthcheck level. The actual issue was the httpx client's 30s
timeout being tight against ES 9.4.1's cold-runner write-path latency.

The seed_es httpx timeout bump (30s → 90s, previous commit) addresses
the actual problem. Reverting the healthcheck change keeps compose
`up --wait` working unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* docs: capture seed-es shard race as bug idea for follow-up

PR #291's CI-perf optimizations exposed a latent ES 9.4.1 single-node
race: the bulk-index call after creating the products index sometimes
returns unavailable_shards_exception. Previously masked by the slow
~10min stack-up; now surfaces because stack-up is ~30s-2min.

Captures 4 candidate fixes ranked by effectiveness + risk. Closing
PR #291 with this follow-up per CLAUDE.md "implement-over-defer" —
the seed-es retry is cross-cutting enough to warrant a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

---------

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 29, 2026
…now, docstrings)

GPT-5.5 final review of PR #299 — 3 accepted, 1 deferred:

- #1 (Medium, accepted): wrap redis.aclose() in the finally block in its
  own try/except + WARN log. A raise from aclose() would otherwise
  replace the re-raised original exception (or fail an
  otherwise-successful job).
- #3 (Low, accepted): normalize a naive `now` arg in
  reseed_status_is_stale() to UTC — an aware-minus-naive subtraction
  would raise TypeError. Production never passes `now`; this guards
  callers/tests. + regression test test_naive_now_argument_treated_as_utc.
- #4 (Low, accepted): fix stale `BaseException` wording in the worker +
  test docstrings (code already uses `Exception`).
- #2 (Medium, deferred non-regression): stale-recovery check-then-set is
  non-atomic. Counter-evidence: the deterministic Arq job_id +
  advisory lock already prevent duplicate runs. Captured as
  chore_demo_reseed_stale_recovery_atomic_cas/idea.md.

Includes dashboard regen triggered by the new chore_ idea folder.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
SoundMindsAI added a commit that referenced this pull request May 29, 2026
…home-button silent failure (#299)

* docs(idea-preflight): refresh bug_demo_reseed_button_silent_enqueue_failure idea

Add Depends on + Coordinate with lines; expand Problem section with
explicit gap-region citations (lines 76-88 outside outer try; lines
91-133 inside try but no except); add structlog-buffering hypothesis;
lock the re-raise-after-status-write choice in fix design with
rationale (Arq ops visibility + worker-log traceback); split the
diagnostic print() from the exception barrier into its own capability;
refactor regression test to unit-level (no chore_demo_seeding_integration
dependency — uses the existing ctx-pool fallback at demo_reseed.py:82-88).

Includes dashboard regen triggered by the idea.md edit (no folder
adds/moves — just frontmatter refresh; dashboard hash unchanged in
practice but the regen hook fired anyway).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(demo-reseed): add top-level exception barrier + stale-status auto-recovery

The home page "Reset to demo state" button enqueued the run_demo_reseed
Arq job, but two gap regions in the worker let exceptions escape without
writing status="failed" to Redis:

  - Lines 76-88 (settings load, factory init, Redis acquisition) sat
    OUTSIDE the outer try block.
  - Lines 91-133 (get_engine, engine.connect, advisory lock, factory(),
    httpx.AsyncClient(...)) sat INSIDE the outer try but the block had
    no except, only a finally to close Redis.

When either gap region raised, Arq marked the job JobExecutionFailed
but Redis stayed stuck at the POST handler's initial "running" payload
indefinitely, leaving the operator's UI at "Scenario 0 of 5 (0%)" and
blocking subsequent POSTs with 409 SEED_IN_PROGRESS until a manual
Redis cleanup. The inner except (DemoSeedingError, httpx.HTTPError,
Exception) at line 150 only catches errors inside reseed_demo_state,
not the init regions.

Fix per bug_demo_reseed_button_silent_enqueue_failure §"Proposed
capabilities":

1. Wrap the entire run_demo_reseed body in `except BaseException` that
   writes status="failed" with the exception class + first 200 chars
   of the message, then re-raises. Re-raising preserves Arq's
   JobExecutionFailed record AND emits a worker-log traceback the
   operator can read. The inner reseed_demo_state handler keeps its
   return (no re-raise) because retrying the destructive wipe is the
   wrong behavior.

2. Acquire Redis FIRST so the barrier can write status even when
   settings/factory/engine init explodes. Preserves Gemini PR #286
   finding #7 (reuse Arq's managed pool from ctx) and finding #8
   (only close Redis when we created it ourselves).

3. Add reseed_status_is_stale() helper in
   backend/app/services/demo_seeding.py — defense-in-depth for the
   case where the worker process itself dies (OOM, container restart)
   before any exception handler runs. The POST handler uses it to
   convert a stuck-running status (started_at older than
   DEMO_RESEED_JOB_TIMEOUT_S = 1200s) into "treat as failed and
   proceed" instead of 409.

4. Hoist DEMO_RESEED_JOB_TIMEOUT_S from workers/demo_reseed.py to
   services/demo_seeding.py so the route handler can read it without
   importing from the workers package. Worker re-exports for back-compat.

Regression tests:

  - backend/tests/unit/workers/test_demo_reseed_exception_barrier.py
    (4 tests): get_engine + get_session_factory raising both flip
    Redis to "failed" and re-raise; ctx-managed Redis stays open;
    self-created Redis is closed in the finally block.
  - backend/tests/unit/services/test_reseed_status_is_stale.py
    (10 tests): timeout boundary (== timeout → not stale, > timeout →
    stale), idle/complete/failed never stale, missing/malformed
    started_at conservative-not-stale, naive timestamps treated as UTC.

Verified on main: 3 of 4 exception-barrier tests fail (the 4th —
"does-not-close-arq-redis" — trivially passes because the bare
try/finally never reached the close path either).

No DB migration, no env var, no operator action. Existing happy path
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(demo-reseed): narrow exception barrier from BaseException to Exception

Per Gemini PR #299 review (medium): catching BaseException intercepts
asyncio.CancelledError (Arq's job-timeout cancellation mechanism, a
BaseException subclass since 3.8) plus SystemExit/KeyboardInterrupt
(worker shutdown). Awaiting status_set from inside a handler that caught
one of those would re-raise CancelledError — masking the original — or
delay/hang shutdown with network I/O.

The documented bug (init-region exceptions: settings load, factory init,
get_engine, engine.connect, httpx.AsyncClient construction) is fully
covered by Exception — all those failures inherit from it. No behavior
change for the regression tests (they raise RuntimeError/ValueError).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(demo-reseed): apply GPT-5.5 review findings (aclose guard, naive-now, docstrings)

GPT-5.5 final review of PR #299 — 3 accepted, 1 deferred:

- #1 (Medium, accepted): wrap redis.aclose() in the finally block in its
  own try/except + WARN log. A raise from aclose() would otherwise
  replace the re-raised original exception (or fail an
  otherwise-successful job).
- #3 (Low, accepted): normalize a naive `now` arg in
  reseed_status_is_stale() to UTC — an aware-minus-naive subtraction
  would raise TypeError. Production never passes `now`; this guards
  callers/tests. + regression test test_naive_now_argument_treated_as_utc.
- #4 (Low, accepted): fix stale `BaseException` wording in the worker +
  test docstrings (code already uses `Exception`).
- #2 (Medium, deferred non-regression): stale-recovery check-then-set is
  non-atomic. Counter-evidence: the deterministic Arq job_id +
  advisory lock already prevent duplicate runs. Captured as
  chore_demo_reseed_stale_recovery_atomic_cas/idea.md.

Includes dashboard regen triggered by the new chore_ idea folder.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

---------

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 29, 2026
…LSearchParams)

- auto-followup.spec.ts getName: add `if (!resp.ok) throw` to match the
  robust pattern already in studies-create-builder + studies-create-validation
  getName helpers (Gemini #1).
- dashboard-reseed.spec.ts: align API_BASE to
  `PLAYWRIGHT_API_BASE_URL ?? 'http://127.0.0.1:8000'` — it was the lone
  spec using NEXT_PUBLIC_API_BASE_URL ?? 'http://localhost:8000', the
  exact API-base inconsistency this chore targets (Gemini #2).
- studies-create-target-dropdown.spec.ts: build the studies?q=&limit=
  query via URLSearchParams instead of interpolating studyName into the
  query string — safe even if a future study name carries special chars
  (Gemini #3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
SoundMindsAI added a commit that referenced this pull request May 29, 2026
…es) (#301)

* test(e2e): sweep ${API_BASE} string-concat to URL constructor (28 sites)

Per chore_e2e_api_base_url_construction. Gemini flagged the
${API_BASE}/<path> concat pattern on PR #273 and PR #273 fixed the one
line it touched, deferring the sibling sweep to a focused PR.

Idea-preflight before pickup found the pattern had grown from the
originally-censused 5 sites/3 files (2026-05-26) to 28 sites across 10
e2e spec files, as new specs landed with the same convention. The
idea's rationale (partial sweep creates inconsistency) only strengthened
with the larger N, so this does the full sweep:

  `${API_BASE}/api/v1/...`        → new URL(`/api/v1/...`, API_BASE).toString()
  `${API_BASE}${path}`            → new URL(path, API_BASE).toString()
  `${API_BASE}${path}?limit=200`  → new URL(`${path}?limit=200`, API_BASE).toString()

The URL constructor collapses a double slash regardless of whether
API_BASE carries a trailing slash — the durable fix for the latent
PLAYWRIGHT_API_BASE_URL=...8000/ footgun (CI never sets a trailing
slash, so no signal had fired).

All variable-path call sites verified to pass /-prefixed paths, so
new URL(path, API_BASE) is behaviorally identical. No assertion changes
— same endpoints, same behavior. pnpm lint + pnpm typecheck clean.
Includes dashboard regen triggered by the idea.md scope-correction edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* test(e2e): address Gemini PR #301 findings (resp.ok, API_BASE env, URLSearchParams)

- auto-followup.spec.ts getName: add `if (!resp.ok) throw` to match the
  robust pattern already in studies-create-builder + studies-create-validation
  getName helpers (Gemini #1).
- dashboard-reseed.spec.ts: align API_BASE to
  `PLAYWRIGHT_API_BASE_URL ?? 'http://127.0.0.1:8000'` — it was the lone
  spec using NEXT_PUBLIC_API_BASE_URL ?? 'http://localhost:8000', the
  exact API-base inconsistency this chore targets (Gemini #2).
- studies-create-target-dropdown.spec.ts: build the studies?q=&limit=
  query via URLSearchParams instead of interpolating studyName into the
  query string — safe even if a future study name carries special chars
  (Gemini #3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

---------

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request May 29, 2026
…+ dashboards

idea-preflight across the MVP2 bucket + adjudicated 6 Gemini comments on PR #312.

Stale-link root cause: ideas moved into planned_features/02_mvp2/<folder>/
during the MVP-bucket reorg without updating relative links; several siblings
since shipped to implemented_features/.

Link fixes (8 files): shipped siblings -> implemented_features/ homes;
comparison.md + feature_templates/README.md depth; truncation_mvp2 -> _truncation;
00_unsure bug_seed link depth.

infra_adapter_solr deep audit: removed false HTTPX_POOL_LIMITS claim; removed
nonexistent templates/elasticsearch/ reference; added open questions +
feat_ubi_onramp coordination.

Gemini adjudication (four-quadrant): #1-3 REJECTED as false positives (Gemini
miscounted directory depth; flagged links resolve, its suggestions resolve
above repo root). #4 ACCEPTED (link CLAUDE.md for Rule #5/#2). #6 ACCEPTED
(drop markdown bold from overnight-autopilot first sentence). #5 auto-resolved
by upstream _mvp2 fix + regen.

Lockstep dashboard regen. Docs only. Zero code/migration.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant