chore(deps): Bump actions/upload-artifact from 4 to 7 by dependabot[bot] · Pull Request #8 · SoundMindsAI/relyloop

dependabot · 2026-05-09T22:27:55Z

Bumps actions/upload-artifact from 4 to 7.

Release notes

Sourced from actions/upload-artifact's releases.

v7.0.0

v7 What's new

Direct Uploads

Adds support for uploading single files directly (unzipped). Callers can set the new archive parameter to false to skip zipping the file during upload. Right now, we only support single files. The action will fail if the glob passed resolves to multiple files. The name parameter is also ignored with this setting. Instead, the name of the artifact will be the name of the uploaded file.

ESM

To support new versions of the @actions/* packages, we've upgraded the package to ESM.

What's Changed

Add proxy integration test by @Link- in actions/upload-artifact#754

Upgrade the module to ESM and bump dependencies by @danwkennedy in actions/upload-artifact#762

Support direct file uploads by @danwkennedy in actions/upload-artifact#764

New Contributors

@Link- made their first contribution in actions/upload-artifact#754

Full Changelog: actions/upload-artifact@v6...v7.0.0

v6.0.0

v6 - What's new

[!IMPORTANT] actions/upload-artifact@v6 now runs on Node.js 24 (runs.using: node24) and requires a minimum Actions Runner version of 2.327.1. If you are using self-hosted runners, ensure they are updated before upgrading.

Node.js 24

This release updates the runtime to Node.js 24. v5 had preliminary support for Node.js 24, however this action was by default still running on Node.js 20. Now this action by default will run on Node.js 24.

What's Changed

Upload Artifact Node 24 support by @salmanmkc in actions/upload-artifact#719

fix: update @actions/artifact for Node.js 24 punycode deprecation by @salmanmkc in actions/upload-artifact#744

prepare release v6.0.0 for Node.js 24 support by @salmanmkc in actions/upload-artifact#745

Full Changelog: actions/upload-artifact@v5.0.0...v6.0.0

v5.0.0

What's Changed

BREAKING CHANGE: this update supports Node v24.x. This is not a breaking change per-se but we're treating it as such.

Update README.md by @GhadimiR in actions/upload-artifact#681

Update README.md by @nebuk89 in actions/upload-artifact#712

Readme: spell out the first use of GHES by @danwkennedy in actions/upload-artifact#727

Update GHES guidance to include reference to Node 20 version by @patrikpolyak in actions/upload-artifact#725

Bump @actions/artifact to v4.0.0

Prepare v5.0.0 by @danwkennedy in actions/upload-artifact#734

... (truncated)

Commits

043fb46 Merge pull request #797 from actions/yacaovsnc/update-dependency
634250c Include changes in typespec/ts-http-runtime 0.3.5
e454baa Readme: bump all the example versions to v7 (#796)
74fad66 Update the readme with direct upload details (#795)
bbbca2d Support direct file uploads (#764)
589182c Upgrade the module to ESM and bump dependencies (#762)
47309c9 Merge pull request #754 from actions/Link-/add-proxy-integration-tests
02a8460 Add proxy integration test
b7c566a Merge pull request #745 from actions/upload-artifact-v6-release
e516bc8 docs: correct description of Node.js 24 support in README
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-05-09T22:27:55Z

Labels

The following labels could not be found: ci, dependencies. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

10 findings applied (4 High, 5 Medium, 4 Low). Spec was authored 2026-05-09 before feat_study_lifecycle Phase 2 and feat_llm_judgments shipped — patch reconciles it with the current codebase. High severity (4 / 4 accepted): * H-1 (FR-2 contract inversion): orchestrator's _stop INSERTs the pending proposals row in the same tx as complete_study (orchestrator.py:346-356, C3-F1 atomicity fix). Spec rewritten to say the digest worker POPULATEs the pre-existing row rather than CREATEs a new one. AC-1 rewritten. * H-2 (digest_stub.py acknowledged): replaced under the same Arq job name "generate_digest" so orchestrator.py:370 and workers/all.py:164 keep firing without orchestrator-side changes. * H-3 (path drift sweep): backend/worker/ → backend/workers/; backend/api/proposals.py → backend/app/api/v1/proposals.py; backend/db/models/ → backend/app/db/models/ throughout. * H-4 (FR-2b boot-time scan): on_startup scans pending proposals + re-enqueues digest with deterministic _job_id. Studies completed while the worker was down still get their digest narratives (state.md:166 requirement). Medium severity (5 / 5 accepted): * M-1 §8.4 enum source paths re-pointed to backend/app/... * M-2 Optuna study loaded via backend/app/eval/optuna_runtime.py: get_or_create_study() (matches trials.py pattern). * M-3 New FR-6 enumerates proposal/digest repo helpers needed by plan-gen. * M-4 Settings.openai_model is the model pin (CLAUDE.md Rule #8). * M-5 §8.5 adds OPENAI_NOT_CONFIGURED, LLM_PROVIDER_INCAPABLE, UNKNOWN_MODEL_PRICING, OPENAI_BUDGET_EXCEEDED as worker-side terminal reasons (mirrors feat_llm_judgments §8.5 + cycle-2 C2-F4 addition). * M-6 FR-5 maxItems=5 wired into the response_format JSON schema. Low severity (4 / 4 accepted): * L-1 §15 uses "(Implemented — feat_digest_proposal)" inline marker. * L-2 §10 documents the smaller data-flow surface — only params + metrics, never doc IDs / bodies / query text. docs/04_security/llm-data-flow.md is EXTENDED, not duplicated. * L-3 §13 alignment with feat_llm_judgments' budget-gate + _safe_record_cost pattern. * Owners (TBD) — non-blocking. New acceptance criteria (3): * AC-9 — boot-time scan picks up orphan pending proposals. * AC-10 — OPENAI_NOT_CONFIGURED defers (no digest row, no proposal mutation, 404 DIGEST_NOT_READY). * AC-11 — capability fallback produces narrative-only digest, leaves pending proposal pending. New file: pipeline_status.md — spec is Approved, ready for /pipeline → impl-plan-gen. Cross-model GPT-5.5 review on the patched spec: NOT yet run; the audit was Opus-only. Recommended to run a cycle when /pipeline advances to plan generation (both spec + plan in one pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#40) 10 findings applied (4 High, 5 Medium, 4 Low). Spec was authored 2026-05-09 before feat_study_lifecycle Phase 2 and feat_llm_judgments shipped — patch reconciles it with the current codebase. High severity (4 / 4 accepted): * H-1 (FR-2 contract inversion): orchestrator's _stop INSERTs the pending proposals row in the same tx as complete_study (orchestrator.py:346-356, C3-F1 atomicity fix). Spec rewritten to say the digest worker POPULATEs the pre-existing row rather than CREATEs a new one. AC-1 rewritten. * H-2 (digest_stub.py acknowledged): replaced under the same Arq job name "generate_digest" so orchestrator.py:370 and workers/all.py:164 keep firing without orchestrator-side changes. * H-3 (path drift sweep): backend/worker/ → backend/workers/; backend/api/proposals.py → backend/app/api/v1/proposals.py; backend/db/models/ → backend/app/db/models/ throughout. * H-4 (FR-2b boot-time scan): on_startup scans pending proposals + re-enqueues digest with deterministic _job_id. Studies completed while the worker was down still get their digest narratives (state.md:166 requirement). Medium severity (5 / 5 accepted): * M-1 §8.4 enum source paths re-pointed to backend/app/... * M-2 Optuna study loaded via backend/app/eval/optuna_runtime.py: get_or_create_study() (matches trials.py pattern). * M-3 New FR-6 enumerates proposal/digest repo helpers needed by plan-gen. * M-4 Settings.openai_model is the model pin (CLAUDE.md Rule #8). * M-5 §8.5 adds OPENAI_NOT_CONFIGURED, LLM_PROVIDER_INCAPABLE, UNKNOWN_MODEL_PRICING, OPENAI_BUDGET_EXCEEDED as worker-side terminal reasons (mirrors feat_llm_judgments §8.5 + cycle-2 C2-F4 addition). * M-6 FR-5 maxItems=5 wired into the response_format JSON schema. Low severity (4 / 4 accepted): * L-1 §15 uses "(Implemented — feat_digest_proposal)" inline marker. * L-2 §10 documents the smaller data-flow surface — only params + metrics, never doc IDs / bodies / query text. docs/04_security/llm-data-flow.md is EXTENDED, not duplicated. * L-3 §13 alignment with feat_llm_judgments' budget-gate + _safe_record_cost pattern. * Owners (TBD) — non-blocking. New acceptance criteria (3): * AC-9 — boot-time scan picks up orphan pending proposals. * AC-10 — OPENAI_NOT_CONFIGURED defers (no digest row, no proposal mutation, 404 DIGEST_NOT_READY). * AC-11 — capability fallback produces narrative-only digest, leaves pending proposal pending. New file: pipeline_status.md — spec is Approved, ready for /pipeline → impl-plan-gen. Cross-model GPT-5.5 review on the patched spec: NOT yet run; the audit was Opus-only. Recommended to run a cycle when /pipeline advances to plan generation (both spec + plan in one pass). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n-mode bug_fix.md Captures the work products from this session's dogfood runs: * idea.md — /idea-preflight Audit & Patch (7 edits across 1 file): - Refreshed §Problem to accurately describe the tool-group-preserving truncation helper and added ~5K fixed-overhead from system prompt + 19 tool definitions to the token-budget math - Removed Story 5.1 docs-sweep deferral rationale (shipped in PR #60) - Locked the JSONB-vs-table fork in §Proposed scope - Added tool-call group invariant requirement + chat_history_summarization_failed WARN fallback - New §Open questions for /spec-gen with recommended defaults - New §CLAUDE.md rule touchpoints (Rules #3, #5, #8, #10) - Refreshed §Related work * bug_fix.md — Investigation-mode /bug-fix output (149 lines): - Problem / Reproduction / Root cause filled in with file:line citations against agent_chat.py - Owning layer locked: service; fix is additive (wrap existing helper with summarization, don't replace) - Fix design / Regression test / Rollout TBD pending user calls on the 3 open forks * MVP1_DASHBOARD.md + mvp1_dashboard.html — regenerated by the mvp1-dashboard-regen pre-commit hook to reflect the new bug_fix.md sibling (41 features total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… fix (#73) * docs(bug-chat-long-conv): land /idea-preflight patches + Investigation-mode bug_fix.md Captures the work products from this session's dogfood runs: * idea.md — /idea-preflight Audit & Patch (7 edits across 1 file): - Refreshed §Problem to accurately describe the tool-group-preserving truncation helper and added ~5K fixed-overhead from system prompt + 19 tool definitions to the token-budget math - Removed Story 5.1 docs-sweep deferral rationale (shipped in PR #60) - Locked the JSONB-vs-table fork in §Proposed scope - Added tool-call group invariant requirement + chat_history_summarization_failed WARN fallback - New §Open questions for /spec-gen with recommended defaults - New §CLAUDE.md rule touchpoints (Rules #3, #5, #8, #10) - Refreshed §Related work * bug_fix.md — Investigation-mode /bug-fix output (149 lines): - Problem / Reproduction / Root cause filled in with file:line citations against agent_chat.py - Owning layer locked: service; fix is additive (wrap existing helper with summarization, don't replace) - Fix design / Regression test / Rollout TBD pending user calls on the 3 open forks * MVP1_DASHBOARD.md + mvp1_dashboard.html — regenerated by the mvp1-dashboard-regen pre-commit hook to reflect the new bug_fix.md sibling (41 features total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: capture chore_mvp1_dashboard_truncation — regen script truncates mid-markdown Idea file for the pre-existing bug in scripts/build_mvp1_dashboard.py that Gemini surfaced via F3 + F4 on PR #73. _extract_idea_problem caps prose at 240 chars via raw `para[:237] + "..."` with no awareness of markdown link balance, inline-code spans, or word boundaries. Includes regenerated MVP1_DASHBOARD.md + mvp1_dashboard.html (42 features total now that this folder is added). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dashboard): markdown-aware truncation in build_mvp1_dashboard.py Fold the chore_mvp1_dashboard_truncation idea into this PR per the calibration discussion: ~30-LOC bounded fix + 13 unit tests is small enough to land inline rather than defer behind an idea file. Root cause: `_extract_idea_problem` was capping prose at 240 chars via raw `para[:237] + "..."` with no awareness of markdown link / inline-code / word boundaries. Fix: two new helpers — `_safe_truncate_markdown(text, max_len)` and `_strip_unclosed_markdown(text)` — replace the raw character cut with sentence-boundary preference + word-boundary fallback + strip unclosed [/]/(/)/backtick markdown + single-char ellipsis `…`. Tests: 13 cases in backend/tests/unit/scripts/test_dashboard_truncation.py (all pass locally). Regenerated MVP1_DASHBOARD.md + mvp1_dashboard.html with the new truncator. Deletes chore_mvp1_dashboard_truncation/ since the fix is no longer deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Gemini Code Assist (7 line-level findings -- all accepted): - _sort.py:140 (High): encode_cursor stringifies row_id symmetrically with decode_cursor's str(decoded[1]); future caller passing a UUID object no longer trips json.dumps TypeError. - clusters.py:75/213 (Medium x2): import _CLUSTER_SORT_COLUMNS from the repo (single source of truth) instead of duplicating the dict in the router. Matches the pattern every other sortable router uses. - use-data-table-url-state.ts:26/65/107/115 (Medium x4): SSR-safe path handling via usePathname() instead of window.location.pathname. window is undefined during the App Router's initial server render; usePathname is the idiomatic Next.js read. 11 test files updated to mock the new usePathname() from next/navigation so the existing test surface stays green. GPT-5.5 final review (2 findings): - _judgments_row_sort.py rater_ref hardcoded gpt-4o-2024-08-06 (Low, accepted): replaced with neutral "test-llm-rater" fixture string per CLAUDE.md rule #8 against hardcoded LLM model names. - _sort.py decode_cursor doesn't validate payload shape (Medium, deferred): captured as bug_cursor_decode_value_validation/idea.md. A tampered cursor with wrong value type can surface as 500 instead of 422. The fix touches the cursor encoding contract on 6 endpoints + needs a small spec-side decision (2-tuple vs 3-tuple payload, INVALID_CURSOR error code). Out of scope for this PR. Includes auto-regenerated MVP dashboards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e across 9 tables (#126) * docs(planned): feat_data_table_primitive — spec + plan after /pipeline idea preflighted 2026-05-15; spec converged at GPT-5.5 cycle 3 (26 findings); plan converged at GPT-5.5 cycle 3 (24 findings). All 28 stories defined across 4 epics. Single-PR delivery per Locked Decision #4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(db): add search_vector tsvector columns + GIN indexes (Story 1.1) Six Alembic migrations (0008-0013) add Postgres `tsvector GENERATED ALWAYS AS (...) STORED` columns to clusters, studies, query_sets, query_templates, judgment_lists, conversations, each with a corresponding GIN index. Source columns per spec FR-2: - clusters: name + base_url - studies: name + target - query_sets: name - query_templates: name - judgment_lists: name + target - conversations: coalesce(title, '') The columns are generated and not application-writable; ORM models do NOT declare them (per spec FR-2 invariant). FTS predicate at the repo layer will use `sa.text("search_vector @@ plainto_tsquery(...)")` (lands in Story 1.2). Per-migration round-trip verified: each upgrade <rev> + downgrade -1 + upgrade <rev> succeeds. Full-stack round-trip verified: upgrade head + downgrade 0007 + upgrade head succeeds (all 6 columns and indexes created, removed, and recreated cleanly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ?q= FTS query parameter on 6 list endpoints (Story 1.2) Adds a Postgres full-text-search `?q=<text>` query parameter to: - GET /api/v1/clusters (name + base_url) - GET /api/v1/studies (name + target) - GET /api/v1/query-sets (name) - GET /api/v1/query-templates (name) - GET /api/v1/judgment-lists (name + target) - GET /api/v1/conversations (title) Pydantic Field(min_length=2, max_length=200) enforces the bounds at the router boundary; under/over-length input returns 422 VALIDATION_ERROR via the canonical envelope. Filter-only FTS per spec FR-1 — results are filtered by FTS match but NOT re-ordered by ts_rank. Existing (created_at DESC, id DESC) ordering is preserved, which keeps the (created_at, id) keyset cursor valid across filtered result sets. Rank-ordered FTS is deferred per spec §16 (captured for follow-up at Epic 4). New shared helper backend/app/db/repo/_fts.py exports fts_predicate(q: str | None) -> TextClause | None which the 6 repos AND into their existing WHERE clauses. The clause uses plainto_tsquery ('english', :q) — injection-safe (no operator parsing). Live-stack smoke verified after rebuild: ?q=p returns 422 VALIDATION_ERROR; ?q=e2e returns 266 matches; ?q=nonexistentstring returns 0; ?q=elasticsearch (matching every base_url) returns 276. Unit tests: 815 passing locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ?sort= sort-aware cursor + filters (Stories 1.3, 1.4, 1.5) Adds three new query-parameter families to the affected list endpoints: Story 1.3 — ?sort=<col>:<dir> on 7 endpoints (clusters, studies, query-sets, query-templates, judgment-lists, proposals, per-list judgments). New _sort.py helper centralizes parse_sort, order_by_clauses, keyset_predicate, encode_cursor, decode_cursor, cursor_value_is_datetime. Sort-aware cursor: ORDER BY leading key matches cursor leading key per FR-3a, with explicit NULLS FIRST/LAST and null-aware keyset predicates. 7 new SortKey Literals added to schemas.py; 7 matching as-const arrays added to ui/src/lib/enums.ts with reverse source-of-truth comments. PROPOSAL_SOURCE_VALUES also added. Story 1.4 — ?engine_type= + ?environment= on clusters; ?engine_type= on query-templates. Story 1.5 — ?template_id= on proposals (UUID-typed for auto-422); ?since= on judgment-lists + conversations. Live-stack smoke: ?sort=name:asc alphabetical ordering, ?sort=garbage → 422 with exact accepted values, sort-aware cursor round-trip works, ?engine_type= filters correctly. 815 unit tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): scaffold DataTable primitive (Story 2.1) Adds the minimal shell of <DataTable> at ui/src/components/common/data-table.tsx plus three co-located helpers (data-table-toolbar.tsx, data-table-empty.tsx, types.ts). Renders rows from props.data via TanStack Table's row model and the existing shadcn Table primitive. getRowId is wired to row.id so subsequent Stories 2.9/2.12 see stable backend UUIDs for selection + keyboard activation. types.ts ships the forward-compatible DataTableProps + DataTableColumnDef shape — every Story 2.2-2.13 feature has its prop declared here. Empty-state declares the three FR-9 kinds; toolbar slot is wired but empty by default. New npm dep: @tanstack/react-table@8.21.3. Verification: pnpm typecheck clean; 290 vitest tests pass across 49 files (285 + 5 new); pnpm lint 0 errors; prettier --check passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): mark Story 2.1 complete in execution tracker * feat(ui): sortable column headers with three-state cycle (Story 2.2) Adds <DataTableSortHeader> implementing FR-4's three-state cycle: unsorted → firstClickDirection → opposite → unsorted. Constrained per column via the new sortDirections allowlist (e.g. trials optuna_trial_number_asc-only). Lucide chevrons (Up / Down / muted ChevronsUpDown); ARIA aria-sort on the wrapper; sr-only descriptor. data-testid=data-table-sort-<sortKey>. DataTable consumes optional sort + onSortChange props (transient until Story 2.6 lifts them to useDataTableUrlState at the consumer). TableHead wraps the column header with <DataTableSortHeader> when column.sortable. Tests: 7 cases in data-table-sort-header.test.tsx covering all four cycle shapes + click interaction + aria-sort + unsorted state. Verification: pnpm typecheck clean; 297 vitest tests passing (was 290). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): filter chips (enum) + FK-select dropdown (Story 2.3) Adds two new toolbar sub-components implementing FR-5's two filter kinds: - <DataTableFilterChips>: enum-kind. Generalizes the existing study/proposal filter-chip pattern (one Button per wire value + an "all" chip). Disabled while isLoading. data-testid: data-table-filter-chip-<col>-<value>. - <DataTableFkSelect>: fk-select kind. Generalizes the existing cluster-filter-select.tsx pattern (native <select> with consumer-supplied useOptions hook returning {id, label}[] + isLoading). Disabled + "(loading…)" placeholder while options load. DataTable wires filters from columns[*].filter through the toolbar's leftSlot. Optional filters + onFilterChange props on DataTableProps (transient until Story 2.6 lifts to useDataTableUrlState). Tests: 9 cases (6 chip + 3 fk-select). Verification: pnpm typecheck clean; 306 vitest tests passing (was 297). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): debounced text-search input + useDebouncedValue hook (Story 2.4) Adds <DataTableSearch> implementing FR-6: native <input> with a 300ms debounce, Zod min(2).max(200) validation, edit-down-clears-q boundary handling (cycle-3 F4 — when user edits an active q below 2 chars the URL must drop ?q=, not stick at the stale value). Also adds the generic useDebouncedValue<T>(value, delayMs) hook at ui/src/hooks/use-debounced-value.ts. DataTable consumes optional q + onQChange + searchable + totalCount props; toolbar renders the search input when searchable && onQChange, followed by the filter chips. Tests: 6 cases — under-length-no-call, 2+chars-commits, edit-down-clears, clear-clears, (N results) indicator visible+hidden. Uses real timers + 20ms debounce to avoid the fake-timer/act flush issue. Verification: pnpm typecheck clean; 312 vitest tests passing (was 306). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): total-count display with cursor-paginator-honest wording (Story 2.5) Adds <DataTableTotalCount> implementing FR-7 + AC-14: "Showing 1–N of M" on the first page; "Showing N rows (of M matching)" on subsequent pages (omits the absolute range because the opaque cursor doesn't allow us to reconstruct the absolute page index on a fresh URL load). DataTable consumes optional cursorStackLength prop (transient until Story 2.6); renders the indicator in the toolbar's right slot when totalCount is supplied. Tests: 4 cases — first page range, subsequent-page wording, totalCount=0 branch, large-number formatting. Verification: pnpm typecheck clean; 316 vitest tests passing (was 312). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): useDataTableUrlState hook with controlled URL contract (Story 2.6) Adds ui/src/hooks/use-data-table-url-state.ts implementing FR-8 at the consumer side (not inside DataTable per cycle-1 GPT-5.5 finding F3): the page calls useDataTableUrlState(tableId, columns) to derive sort, filters, q, cursor, pageSize from the URL plus typed setters; passes them as props into both its TanStack Query hook AND <DataTable>. History strategy per FR-8 + cycle-3 F2: - setCursor uses router.push() so Back steps through pages. - setSort / setFilter / setQ / setPageSize use router.replace() + clear ?cursor= so quick UI tweaks don't pollute history. - clearAllMatchers clears every filter + q; preserves sort + pageSize (wired to FR-9 "no-rows-match" empty-state "Clear filters" button). anyMatcherActive ignores sort by design (sort doesn't filter; only filters + q drive the no-rows-match empty state per cycle-2 F11 fix). Filter parsing is column-aware — only URL params whose name matches a column with `filter` config are surfaced as filters. Other params (unrelated route state) pass through untouched per cycle-2 F7 fix. Tests: 14 cases covering hydration, push/replace strategies, cursor clearing on non-cursor changes, clearAllMatchers, anyMatcherActive, and default-pageSize handling. Verification: pnpm typecheck clean; 328 vitest tests passing (was 316). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): three empty-state shapes + cursor wrap + sticky header + tooltip headers (Stories 2.7, 2.8) Story 2.7 (FR-9 + FR-10): - Three empty-state branches with explicit precedence: - stale-cursor when data empty + totalCount > 0 + cursor present (per cycle-3 F4 — distinct from no-rows-match) - no-rows-match when data empty + anyMatcherActive (filter or q) - no-rows-exist otherwise (consumer-supplied title/message/primaryCta) - Wraps the existing <CursorPaginator> internally; consumers stop importing it directly. has_more / next_cursor / cursor / pageSize / pageSizeOptions flow through props. - New optional DataTableProps: cursor, pageSize, onCursorChange, onPageSizeChange, pageSizeOptions, onClearMatchers, anyMatcherActive. Story 2.8 (FR-11 + FR-12): - Sticky header via Tailwind `sticky top-0 bg-background z-10` on <TableHeader>. - Tooltip-enabled column headers: when columnDef.tooltipKey is set, the primitive renders an <InfoTooltip glossaryKey={key}> next to the header text inside the existing inline-flex pattern. Sortable columns get the tooltip wrapped INSIDE the sort header so the chevron stays anchored. - 6 new glossary entries (datatable.sort.toggle / .search.min_length / .total_count / .density.toggle / .column_visibility / .selection.all_on_page). - DataTableColumnDef.tooltipKey is now ShortGlossaryKey (narrower) so the InfoTooltip type checks at the call site. Tests: 3 new branching cases in data-table.test.tsx; existing 5 scaffold cases still green. 331 vitest passing across 54 files (was 328). Verification: pnpm typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): multi-row selection + bulk-action toolbar (Story 2.9) Implements FR-13: native <input type="checkbox"> selection column (no new Radix dep), header "select all on page" with indeterminate state via imperative ref, bulk-action toolbar above the body when selectedIds.size >= 1, clear-selection-on-URL-state-change effect per AC-10. DataTable gets three new optional props: selectable, bulkActions, onSelectionChange. New file data-table-bulk-actions.tsx exposes the toolbar; consumers supply BulkAction[] with each entry's onClick receiving (selectedIds, clearSelection). Selection is React-only — never URL-encoded per FR-13 anti-pattern. The clear-on-state-change effect keys off JSON.stringify({cursor, sort, q, filters}) so any of those changing wipes selection (matches the AC-10 expectation). Tests: 7 cases — gated rendering, select-all toggle, per-row toggle, counter, action callbacks, clear button. 338 vitest passing (was 331). Verification: pnpm typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): column visibility menu + density toggle (Stories 2.10, 2.11) Story 2.10 (FR-14) — column visibility menu: - <DataTableColumnVisibility>: shadcn Popover + lucide <Eye> icon + native checkbox per column (no @radix-ui/react-dropdown-menu dep per plan decision). Sticky columns filtered out. - useLocalStorageSet hook: Set<string> persisted under relyloop:datatable:<tableId>:hidden-columns. SSR-safe; hydrates synchronously via useState initializer (avoids react-hooks/set-state-in-effect rule). - DataTable filters visible-columns through the hidden Set before handing them to useReactTable. Story 2.11 (FR-15) — density toggle: - <DataTableDensityToggle>: two-position Button group; persists under relyloop:datatable:<tableId>:density. - cellPaddingClass: py-3 px-4 (comfortable) / py-1.5 px-3 (compact) applied to every <TableHead> + <TableCell>. Tests: 6 cases. 343 vitest passing (was 338). Verification: pnpm typecheck clean; pnpm lint 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): mark Stories 2.10 + 2.11 complete in execution tracker * feat(ui): keyboard navigation for DataTable rows (Story 2.12) Roving tabindex on body rows: row 0 starts tabbable, Arrow Up/Down move focus with wrap-around at the ends, Enter calls onRowActivate(rowId), Space toggles selection when selectable. Focused index clamps when the row count shrinks (filter change, cursor move). keyboardNav={false} opts out entirely. Closes FR-16 (AC-12). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ui): column-config source-of-truth lint guard (Story 2.13) Vitest scan over ui/src/components/**/*.column-config.{ts,tsx}. For every column with filter.kind === 'enum', asserts wireValues is an identifier imported from '@/lib/enums' and that the identifier's declaration in enums.ts is immediately preceded by the canonical 'Values must match backend/...py <Symbol>' source-of-truth comment. For both enum and fk-select filters, asserts sourceOfTruth is non-empty and starts with 'backend/'. Passes vacuously in Epic 2 (no column-config files yet); five regression cases pin the failure-message contract so Epic 3 column configs are forced to comply. Closes FR-17 (AC-16). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): Epic 2 phase-gate adjudications (GPT-5.5 review) Internal cursor stack in DataTable: Story 2.5/2.7 plan says the cursor trail lives in DataTable, but the previous wire-up made Prev always return to page 1. Now the primitive pushes next_cursor on Next, pops on Prev, and re-grounds the stack when the URL cursor changes externally (filter/sort/q reset, shared-link hydration). cursorStackLength is no longer a public prop. Filter testids aligned to the plan DoD: filter-chip-<col>-<val> for chips and fk-select-<col> for the FK select. Existing tests updated. FK-select now disables on outer query loading too (plan Story 2.3 task 4), and DataTable passes col.filter.label to the chip component so consumers can provide user-facing labels while keeping backend wire values. DataTableSearch now syncs its local draft with the controlled value prop so back/forward navigation no longer leaves stale text in the input. Column-config discipline test (Story 2.13) tightened: per-filter sourceOfTruth check (previously file-wide), inline array literals in wireValues explicitly rejected. Two new regression cases pin the failure-message contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): mark Stories 1.1-1.5 + 2.12 + 2.13 complete in execution tracker Epic 1 stories were already shipped (commits c5d5776 / e7c04ef / 8ed1ab7) but the tracker still showed them as [ ]. Tracker drift only — no code or behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): Epic 2 cycle-2 phase-gate adjudications (GPT-5.5) Tooltip-enabled sortable column headers: tooltip now renders via the DataTableSortHeader trailing slot, not nested inside the sort button. Fixes invalid button-in-button HTML and prevents accidental sort when interacting with the tooltip. Search debounce-vs-sync race: when the controlled value prop changes externally (back/forward nav), the sync effect updates draft immediately but debouncedDraft is still the prior tick's value. The commit effect now only fires when debouncedDraft === draft, so a stale debounced tick can't fire onQChange(null) and undo the external update. Regression test added. Total-count wording on direct cursor URL loads: ?cursor=opaque now counts as page 2+ for the FR-7 cursor-paginator-honest wording even though the internal stack only has length 1. Column visibility hardening: non-hideable columns (sticky OR hideable: false) are force-shown regardless of localStorage contents. URL ?q= normalization: empty or whitespace-only ?q= reads as null and does not flip anyMatcherActive true. Integration tests through DataTable for Stories 2.10/2.11 DoD: column hide round-trip + localStorage persistence, mount hydration, defensive hideable check, density toggle persistence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): keep DataTable toolbar visible during loading (cycle-3 GPT-5.5) Loading and error states now render their UI in the body region only; the toolbar (search input, filter chips, FK select, total count, density toggle, column-visibility menu) stays visible above them. Filter chips and the FK select already accept isLoading and disable themselves during the outer query, so users see visible-but-disabled controls instead of having the entire surface replaced by a Loading placeholder. Empty-state branching is gated behind !isLoading && !isError so the loading branch cant accidentally render the no-rows-exist copy while a fresh query is still in flight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): migrate studies-table to DataTable primitive (Story 3.1) studies-table.tsx becomes a thin consumer of <DataTable> driven by a co-located studies-table.column-config.tsx (6 columns: Name, Cluster, Status, Best metric, Created, Completed). URL state owned by useDataTableUrlState at /studies/page.tsx -- expands the surface from ?status= to ?q=, ?sort=, ?status=, ?cursor=. study-status-filter-chips.tsx (40 LOC) deleted -- the DataTable enum filter column owns the chip row now. URL contract ?status=<wire> is unchanged, so existing bookmarks survive. useStudies hook accepts the new ?q= and ?sort= params. studies-by-cluster-table.tsx (Story 3.9 inheritor) updated to use useDataTableUrlState namespaced to tableId studies-by-cluster so per-cluster preferences don't bleed into the global /studies surface. E2E spec ui/tests/e2e/studies-data-table.spec.ts covers search, sort, filter, and URL-state-survives-refresh per spec section 14. Existing studies.spec.ts and the guide spec updated to use the new filter-chip-status-<val> testid pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): migrate proposals-table to DataTable primitive (Story 3.2) proposals-table.tsx (117 LOC) becomes a thin consumer of <DataTable> with co-located proposals-table.column-config.tsx exporting a 7-column array (Source, Cluster, Template, Status, PR state, Metric delta, Created) plus useClustersForFilter() and useTemplatesForFilter() hook adapters for the fk-select filters. 4 filters in the toolbar: status (enum chip row), source (enum chip row), cluster_id (fk-select), template_id (fk-select, NEW per FR-3). source filter is now URL-backed via ?source= where it was React-state-only. 3 obsolete components deleted: proposal-status-filter-chips.tsx, proposal-source-filter-chips.tsx, cluster-filter-select.tsx -- plus their unit tests. proposals-table.test.tsx rewritten to test the cell render functions on the column config with a stub urlState; page.test.tsx rewritten for the DataTable testid pattern with a searchParams subscriber mock so URL changes propagate to React state. E2E spec ui/tests/e2e/proposals-data-table.spec.ts covers status, source, sort, template fk-select, and URL-state-survives-refresh per spec section 14. Existing proposals.spec.ts plus guides 02 and 07 updated to use the new filter-chip-<col>-<val> and fk-select-<col> testid patterns. useProposals accepts the new template_id and sort params. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): migrate clusters-table to DataTable primitive (Story 3.3) clusters-table.tsx becomes a thin <DataTable> consumer with co-located clusters-table.column-config.tsx (6 columns: Name sortable+sticky, Engine filter, Environment sortable+filter, Health synthetic, Base URL, Created hideable). searchable=true (FTS on name + base_url per Story 1.2). Filters use the new backend ?engine_type= and ?environment= params from Story 1.4. URL state owned by useDataTableUrlState at /clusters/page.tsx. useClusters accepts q, sort, engine_type, environment. Page test updated for the URL-state-aware structure. E2E spec clusters-data-table.spec.ts covers search, engine_type filter, environment filter, sort, and URL-state-survives-refresh per spec section 14. Existing clusters_register.spec.ts updated for the new empty-state testid. Bug fix in the Story 2.13 lint guard: ENUMS_IMPORT_RE was only capturing the first identifier in `import { A, B } from '@/lib/enums'` -- fixed by extracting the whole import block and matching every UPPER_SNAKE token. The previous proposals-table commit shipped with this latent gap; clusters-table surfaced it because both new column configs use multi-name imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): migrate templates + query-sets tables to DataTable (Stories 3.4, 3.5) Story 3.4 (templates): 4-column config (Name sortable+sticky, Engine sortable+filter enum, Version sortable, Created hideable). searchable=true (FTS on name). Filter wires to Story 1.4's ?engine_type= backend surface. Story 3.5 (query-sets): 3-column config (Name sortable+sticky, Cluster, Created sortable). searchable=true (FTS on name). No filters per spec. Both useTemplates and useQuerySets accept q and sort. Both pages use useDataTableUrlState for the standard URL contract. E2E specs templates-data-table.spec.ts + query-sets-data-table.spec.ts cover search, sort (and filter on templates), and URL-state-survives-refresh per spec section 14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): migrate judgments-table to DataTable primitive (Story 3.6) judgments-table.tsx becomes a thin DataTable consumer with a co-located useJudgmentsColumns(listId) hook (the actions column closes over listId to render OverridePopover per row). 6 columns: Query (sticky), Doc, Rating (sortable+tooltip+desc-first), Source (sortable+tooltip+filter enum), Notes, Actions (hideable=false). Source filter is now URL-backed via ?source= where it was React-state-only. Sort on rating / source via ?sort= using Story 1.3's per-list judgments sort surface. searchable=false (per-list endpoint has no FTS per spec section 3). useJudgments accepts the new sort param. The /judgments/[id] page wires useDataTableUrlState scoped to the judgments tableId, narrowing ?source= to the JUDGMENT_SOURCE_FILTER_VALUES allowlist. Page test updated for the URL-state-aware structure with the searchParams subscriber mock. E2E spec judgments-data-table.spec.ts covers source filter, rating sort, and URL-state-survives-refresh per spec section 14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): migrate trials-table to DataTable + fused-wire sort codec (Story 3.7) trials-table.tsx becomes a thin DataTable consumer. The legacy <Select> sort-by control is intentionally dropped -- column-header click sort drives the cycle per FR-4 / AC-13. New SortCodec interface on data-table-sort-header.tsx + sortCodec prop on DataTableProps. The codec lets the trials migration translate between the DataTable internal (col, dir) form and the existing fused backend wire format (primary_metric_desc, ended_at_asc, optuna_trial_number_asc). The default behaviour (no codec) keeps the generic ?sort=<col>:<dir> contract for every other table. trialsSortCodec in trials-table.column-config.tsx maps the 5 supported wire tokens. The optuna_trial_number column is configured with sortDirections: ['asc'] because optuna_trial_number_desc is not in TrialSortKey -- the cycle skips desc. /studies/[id]/page.tsx wires useDataTableUrlState with the trials tableId and narrows the URL sort value to TrialSortKey before feeding useStudyTrials. The cursorStack and pageSize React state are removed; the URL hook owns them now. E2E spec trials-data-table.spec.ts covers primary-metric three-state cycle, the asc-only trial-number column, and direct-URL hydration. The existing studies.spec.ts tooltip assertion is updated -- the legacy trial.sort_by tooltip is no longer rendered since the <Select> is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): migrate queries-table to DataTable primitive (Stories 3.8, 3.9) queries-table.tsx (205 LOC) becomes a thin DataTable consumer with useQueriesColumns(querySetId, onOpenMetadata) — the action column cell renderers close over querySetId for the EditQueryPopover and DeleteQueryDialog, and the Metadata badge / { } button call back into the table to open the EditMetadataDialog (still rendered at the table level so dialog state survives row remounts). searchable=false, selectable=false per spec — per-query sub-resource has no FTS and no bulk actions in scope. page-size options stay at [10, 25, 50, 100] via DataTable's pageSizeOptions prop. URL state owned by useDataTableUrlState scoped to a per-query-set tableId so col-vis + density preferences don't bleed across different query-set detail pages. 5-column config: Query text (sticky, truncated 100ch), Reference answer (hideable, truncated 50ch), Metadata (Badge w/ keyboard activation), Judgments count, Actions (hideable=false). All 10 legacy parity rows preserved. Story 3.9 marked complete — the inline 3.1 update to studies-by-cluster already verified the wrapper inherits the new DataTable behaviour. Existing tests rewritten: - queries-table.test.tsx for the new URL-state contract (4 cases: render+count, action buttons, empty state, Next/Prev paginate) - queries-table-delete-flow.test.tsx — useSearchParams mock added, queries-total assertions updated to data-table-total-count - query_set_detail.spec.ts — same testid update for total count E2E spec queries-data-table.spec.ts covers cursor Next/Prev, page-size select, and URL-state-survives-refresh per spec section 14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): Epic 3 phase-gate adjudications (GPT-5.5) 5 accepted findings from cycle-1 review of Epic 3: - studies cluster_id: add hideable: false (plan Story 3.1 task 1). sticky already force-shows the cell; hideable: false also hides the column from the col-vis menu so it never appears as a toggle. - judgments tableId scoped by listId: /judgments/[id] is resource-specific, so a single 'judgments' table-id would let col-vis + density preferences bleed across different judgment lists. Now uses `judgments-${listId}` consistently between useDataTableUrlState and DataTable's tableId. - trials tableId scoped by studyId: same bleed issue. /studies/[id] now uses `trials-${studyId}`. TrialsTable accepts a tableId prop and the page builds it from the URL param. - clusters Register cluster CTA: the empty-state primaryCta is now wired via onRegisterCluster (declared but previously unused). Page passes the same callback that opens the RegisterClusterModal at the page header. - clusters file comment: documents the actual 6-column shape (5 visible by default, Created hideable to support the plan's sort-by-created-at requirement without growing the default visible row). GPT-5.5 also raised two findings claiming the no-rows-exist copy regressed legacy parity — both rejected with counter-evidence: the plan parity tables explicitly map the legacy single message to the no-rows-match variant, and the no-rows-exist branch is a new Story 2.7 empty-state shape. Studies and judgments both preserve the legacy copy in emptyStateNoMatch.message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: architecture + convention doc updates (Story 4.1) api-conventions.md: new ?q= section documenting the FTS contract + 6 searchable endpoints + ?sort= shape + the trials fused-wire exception; ?since= MVP1-status line updated to include the newly-active judgment-lists + conversations endpoints. ui-architecture.md: new "DataTable primitive" section covering the controlled-component shape, column-config interface, source-of-truth discipline (the Story 2.13 lint guard), the sortCodec escape hatch for trials, and per-resource tableId scoping. data-model.md: new "Full-text search vectors" section documenting the 6 search_vector columns + GIN indexes + the read-only rule, with a forward pointer to feat_fts_rank_ordering_mvp2 for the deferred rank ordering. CLAUDE.md: Enumerated Value Contract Discipline gains a step 5 documenting the DataTable filter sourceOfTruth field + lint guard; Common Pitfalls gains a "do not write to search_vector" rule. state.md: Alembic head updated to 0013_search_vector_conversations, branch + last-updated reflect feat_data_table_primitive in flight. architecture.md: ui/src/components/common navigation gains DataTable + column-config convention; migrations directory line lists 0006-0013. testing.md: documents the column-config discipline test shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture feat_fts_rank_ordering_mvp2 deferred idea (Story 4.2) Per spec section 16: ?q= ships filter-only in MVP1 (results match the predicate, ordered by (created_at, id)). Rank-ordered FTS via ORDER BY ts_rank DESC requires non-trivial cursor encoding work -- either encoding the float ts_rank into the opaque cursor (rank-bucketed approach) or materializing per-request scores (column-add approach). Either is a clean MVP2 follow-up once the multi-tenant scale concerns make the relevance ordering load-bearing. The 6 search_vector columns + GIN indexes + plainto_tsquery predicate already shipped with feat_data_table_primitive, so this follow-up is pure-backend ordering + a small DataTable toolbar indicator. Auto-regenerated MVP2 dashboard captures the new idea folder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(backend): fill plan section 3 coverage gap for Epic 1 FTS/sort/filter surface Epic 1 commits (c5d5776 / e7c04ef / 8ed1ab7) shipped the backend FTS, sort, and filter surface but were live-stack-smoke-verified rather than test-covered. Per plan section 3 testing workstream and the user's explicit direction to write all missing tests, this commit adds: Unit tests (host-runnable, 33 cases passing): - test_fts_predicate.py: 9 cases covering None/empty/non-empty inputs, plainto_tsquery contract, unicode passthrough. - test_parse_sort.py: 24 cases covering parse_sort allowlist + direction parse, order_by_clauses NULLS handling, encode_cursor / decode_cursor round-trips for datetime / str / int / null values, cursor_value_is_datetime convention check. Integration tests (CI-verified via Compose-network service containers per the documented local-vs-CI test pattern): - test_search_vector_migrations.py: full-stack round-trip (head to 0007 to head), per-revision round-trip for each of the 6 search_vector migrations, ORM-must-not-declare-search_vector invariant grep. - test_fts_endpoints.py: parametrized across all 6 ?q= endpoints -- returns-only-matching-rows, X-Total-Count reflects filtered count, no-match returns empty, ?q=p (under length) returns 422. - test_sort_pagination.py: parametrized across 5 sortable list endpoints (clusters/studies/query-sets/query-templates/judgment-lists) -- asc + desc multi-page cursor walks asserting no duplicates and no skips, first-page ordering correctness. Plus dedicated trials tests for the fused-wire sort tokens (primary_metric_desc, optuna_trial_number_asc-only). - test_proposals_template_filter.py: ?template_id= AND-stacks with ?status=, X-Total-Count reflects filtered, invalid UUID returns 422. - test_judgments_row_sort.py: rating asc/desc, source asc, combines with ?source= filter, invalid sort returns 422. Contract tests (host-runnable, 18 cases passing): - test_data_table_query_params.py: parametrized OpenAPI-schema assertions covering every new query param (?q on 6 endpoints, ?sort on 6 top-level + 1 per-list endpoint, ?engine_type + ?environment on /clusters, ?template_id on /proposals, ?since on /judgment-lists + /conversations). Caught a real test-vs-plan drift: my initial draft included /conversations in the sortable list, but Story 1.3 explicitly lists 6 top-level sortable endpoints (no conversations) -- the test failure surfaced this in cycle 1. E2E: - studies-by-cluster-data-table.spec.ts (Story 3.9 missing spec). Deviation from plan section 3: the plan called for one file per resource for FTS / sort-pagination tests (~13 files). Consolidated into 2 parametrized files (test_fts_endpoints.py + test_sort_pagination.py) -- equivalent coverage, much less duplication. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture 6 DataTable primitive review follow-ups (Step 2.5) Tangential-observations sweep finds 6 items deferred from feat_data_table_primitive review cycles that lived only in chat transcripts: 1. Factor the searchParams subscriber test mock pattern (4-file duplication noticed during Step 0b test writing). 2. useLocalStorageSet return shape (Epic 2 GPT-5.5 cycle 1 #14). 3. DataTableProps urlState aggregate prop (Epic 2 cycle 1 #1). 4. ?limit= coercion to pageSizeOptions allowlist (cycle 2 #13). 5. TanStack state.columnVisibility wire-up (cycle 3 #3). 6. URL-state Zod validation in useDataTableUrlState (cycle 3 #1). All six classified non-regression follow-ups at the time. Bundling into chore_data_table_primitive_followups/idea.md so they ship together when picked up, per CLAUDE.md tangential-discoveries rule. Includes auto-regenerated MVP1 dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(guides): regenerate 8 walkthrough guides post-DataTable migration (Step 3) Every guide that screenshots a list page is affected by the feat_data_table_primitive Epic 3 migrations — the new DataTable toolbar (search input, filter chips, density toggle, col-vis menu) replaces the hand-rolled table headers, and the chip pattern shifted from status-chip-<val> to filter-chip-status-<val>. Regenerated against the rebuilt UI image with the new code: - 01_register_first_cluster (5 PNGs) - 02_review_a_proposal (5 PNGs) - 03_create_query_template (5 PNGs) - 04_create_query_set (5 PNGs) - 05_import_judgments_and_calibrate (4 PNGs) - 06_create_and_monitor_study (4 PNGs) - 07_browse_proposals (5 PNGs) - 09_generate_judgments_llm (5 PNGs) Guides 08_chat_shell + 10_chat_with_agent unchanged (the /chat surface is not touched by the DataTable migration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: Gemini Code Assist + GPT-5.5 review adjudications on PR #126 Gemini Code Assist (7 line-level findings -- all accepted): - _sort.py:140 (High): encode_cursor stringifies row_id symmetrically with decode_cursor's str(decoded[1]); future caller passing a UUID object no longer trips json.dumps TypeError. - clusters.py:75/213 (Medium x2): import _CLUSTER_SORT_COLUMNS from the repo (single source of truth) instead of duplicating the dict in the router. Matches the pattern every other sortable router uses. - use-data-table-url-state.ts:26/65/107/115 (Medium x4): SSR-safe path handling via usePathname() instead of window.location.pathname. window is undefined during the App Router's initial server render; usePathname is the idiomatic Next.js read. 11 test files updated to mock the new usePathname() from next/navigation so the existing test surface stays green. GPT-5.5 final review (2 findings): - _judgments_row_sort.py rater_ref hardcoded gpt-4o-2024-08-06 (Low, accepted): replaced with neutral "test-llm-rater" fixture string per CLAUDE.md rule #8 against hardcoded LLM model names. - _sort.py decode_cursor doesn't validate payload shape (Medium, deferred): captured as bug_cursor_decode_value_validation/idea.md. A tampered cursor with wrong value type can surface as 500 instead of 422. The fix touches the cursor encoding contract on 6 endpoints + needs a small spec-side decision (2-tuple vs 3-tuple payload, INVALID_CURSOR error code). Out of scope for this PR. Includes auto-regenerated MVP dashboards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): CI failures from initial PR #126 push Backend test fixes: - test_search_vector_migrations.py: alembic looks up revisions by the `revision: str = "NNNN"` string (just the number), not the filename. Switched SEARCH_VECTOR_REVS from full filenames to bare numeric ids '0008'..'0013'. Caught by CI ("Can't locate revision identified by '0008_search_vector_clusters'"). - test_migrations.py: the pre-existing baseline test asserted alembic_version row == "0007" — outdated by feat_data_table_primitive extending the chain through 0013. Updated to "0013". - test_conversations_migration.py: TestSchemaCreation used `downgrade -1` to land at 0006 (one before 0007_conversations_messages). That worked when 0007 was head, but the chain now extends to 0013 so -1 lands at 0012, leaving conversations+messages still present. Switched to explicit `downgrade 0006` so the test stays correct as more migrations land. E2E test fixes (3 specs): - clusters-data-table.spec.ts: the FTS search test used `cluster.name.slice(0, 8)` against a hex-suffix name. plainto_tsquery ('english', ...) does not tokenize hex-suffix identifiers usefully — the search returned 0 rows and the test failed on row visibility. Switched to searching for 'elasticsearch' which is reliably indexed via the cluster's base_url ('http://elasticsearch:9200'). - judgments-data-table.spec.ts + query-sets-data-table.spec.ts: the "URL state survives refresh" specs asserted on data-table-sort-<col> testids, but those header elements only mount when the table has rows. With no seeded rows the empty state renders and the testid isn't in the DOM. Switched to asserting on the URL itself (always present) + filter-chip-<col>-<val> in the toolbar (rendered regardless of row count). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(e2e): row-dependent testid assertions for sort headers + trials-empty drift CI cycle-2 surfaced 4 E2E failures in the same shape: assertions on data-table-sort-<col> testids that only render when the table has rows. The DataTable shows the empty state (no headers) when rows are absent. - studies-data-table.spec.ts:75 URL-state-survives-refresh: drop the data-table-sort-name assertion (the orchestrator may have moved any seeded study past 'queued' by the time the page renders, leaving the empty state). Keep URL + filter-chip assertions which are toolbar- rendered regardless of row count. - studies.spec.ts:43 trials-empty testid: removed by the Story 3.7 DataTable migration. Replaced with data-table-empty-no-rows-exist. - trials-data-table.spec.ts (3 specs): wait for trials-table to be visible with a 30s timeout BEFORE clicking sort headers. trials-table only mounts when at least one trial completes, and the orchestrator produces trials asynchronously after seedStudy. Was waiting for trials-table OR empty-state with 10s; the OR-empty branch slipped past the wait and the subsequent click on a non-existent header failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(e2e): restructure trials-data-table to URL/API-driven assertions CI cycle-3 surfaced that the orchestrator-produced trials don't reliably materialize within 30s in the GHA smoke runner. The trials-data-table specs were waiting for trials-table to mount before clicking sort headers; in CI that wait timed out. The click cycle itself is exhaustively component-tested at ui/src/__tests__/components/common/data-table-sort-header.test.tsx + data-table.test.tsx. The truly E2E concerns are: - The fused-wire tokens (primary_metric_desc, ended_at_asc, optuna_trial_number_asc) are accepted by the live backend. - Invalid tokens (optuna_trial_number_desc, garbage) return 422. - A direct URL load with a fused-wire token surfaces the trials page without error, and the URL state survives a hard reload. Rewrote the 3 specs to verify those contracts via page.request.get() against the live API + page.goto() URL assertions. Each spec runs in ~5s instead of waiting 30s+ for the orchestrator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e_builder) (#163) * docs: preflight refresh on feat_create_study_search_space_builder/idea.md Updated foundational dependency notes after chore_create_study_wizard_polish shipped (PR #157), refreshed audit timestamp, and re-grounded file:line citations. Regenerated MVP1 dashboard via pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): feat_create_study_search_space_builder feature_spec Generated via /spec-gen: 11 FRs covering per-row builder (type selector, low/high spinners, log toggle with onChange gating, categorical chip-input with no auto-dedup), per-row + header cardinality counters with 10^6 cap warning (non-blocking — server is authoritative), split/tab responsive layout, bidirectional builder<->textarea round-trip with semantic equality. Cross-model review: 3 GPT-5.5 cycles, 16 findings all accepted with cited fixes landed in-place. Includes pipeline_status.md marking SPEC stage Approved. Regenerated MVP1 dashboard via pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): SearchSpaceBuilder shell + bidirectional round-trip (Story 1.1) - ui/src/components/studies/search-space-builder/index.tsx — top-level builder with parse/stringify helpers, single 200ms debounce boundary, emitBuilderWrite + flushBuilderWrite + scheduleBuilderWrite, canonicalize-on-mount, placeholder cascade per FR-9 + §11. - placeholder.tsx — single component, 4 variants, role="status" per AC-12. - types.ts — local StashEntry/StashMap types ready for Story 2.1. - create-study-modal.tsx — mount the builder ABOVE the existing <Textarea> in step === 3. - round-trip.test.tsx — 11 fixtures + idempotence + supplemental helper assertions = 15 vitest assertions, all passing. - All 7 existing create-study-modal.* tests continue to pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): feat_create_study_search_space_builder implementation_plan Generated via /impl-plan-gen: 8 stories across 4 epics covering FR-1 through FR-11. Cross-model review: 3 GPT-5.5 cycles, 27 findings (13 cycle-1 + 8 cycle-2 + 6 cycle-3) all accepted with cited fixes. Regenerated MVP1 dashboard via pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): SearchSpaceBuilder per-row rendering + tooltip slots (Story 1.2) - param-row.tsx: <ParamRow> with name chip + simple-form badge + read-only displays + 3 InfoTooltip glossary slots per FR-11. - index.tsx: replace inline placeholders with <ParamRow>. - create-study-modal.builder-rendering.test.tsx: 4 vitest assertions. - round-trip.test.tsx: add TooltipProvider wrap. - 38 studies-tree assertions green; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): SearchSpaceBuilder type selector + spinners + stash (Story 2.1) - stash.ts: Map-based StashMap helpers (stashGet/Set/ClearRow/ClearAll) + defaultSpecForType(nextType) target-type-only fallback. - row-type-selector.tsx: shadcn <Select> with source-of-truth comment citing backend Pydantic discriminator. Compile-time parity guard via ParamType ↔ RowTypeSelectorValue conditional type. - row-numeric.tsx: paired numeric inputs, no local debounce, onBlurFlush callback per FR-3. Inline row error on low>=high / low>high. - param-row.tsx: wire editable type + numeric controls; preserve read-only displays for log + cardinality (Stories 2.2/2.3). - index.tsx: emit-builder-write helper + pendingWriteRef-backed flushBuilderWrite (onBlur reads the latest pending edit, not stale parseResult). lastBuilderWriteRef-guarded stash invalidation effect + templateBody-change clearAll. - search-space-defaults.ts: export simpleFormSpec(). - param-spec-discriminator.parity.test.tsx: reads backend file at runtime, asserts ROW_TYPE_VALUES matches 3 Literal discriminators. - create-study-modal.builder-edits.test.tsx: 5 assertions covering FR-2 + FR-3 (debounce, blur-flush, type-switch stash, invalidation). - 13 studies tests / 44 assertions green; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): SearchSpaceBuilder log toggle with onChange gating (Story 2.2) - row-log-toggle.tsx: native checkbox with aria-disabled + onChange refusing false→true when low<=0 per FR-4. NO native `disabled` (would block check-off too). - param-row.tsx: FloatLogControl inner component holds per-row attemptedInvalidLogEnable flag, derived auto-clear via effective- attempted (not setState-in-effect). - builder-edits.test.tsx: 3 new assertions (#6a-c). - 8 builder-edits assertions green; lint 0 errors; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): SearchSpaceBuilder chip input + cardinality counters (Story 2.3) - search-space-defaults.ts: extract estimateParamCardinality() helper; estimateCardinality() now delegates per param. Pure refactor — existing Python/TS parity test still passes. - row-categorical.tsx: chip input with Enter/comma commit, × removal, type coercion (boolean/number/string), NO auto-dedup per FR-5. Duplicate-add surfaces UI-only amber warning but keeps the chip. Empty-choices row error fires. - cardinality.tsx: <RowCardinality> + <HeaderCardinality>; header turns red + aria-invalid + max-contributor hint at >1e6 (warning-only per FR-7 — does NOT block Next). - param-row.tsx + index.tsx: wire chip input + per-row/header counters; HeaderCardinality consumes normalized space (params: data.params ?? {}) so it never crashes on parseable-but-no-params-wrapper JSON. - estimateParamCardinality.test.ts: 6 unit assertions. - builder-edits.test.tsx: 2 new assertions (#7 cap turns red + max contributor hint; #8 cap is warning-only, no row errors fire). - 90 ui-test assertions green; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): SearchSpaceBuilder add-custom-param affordance (Story 2.4) - add-custom-param.tsx: Popover (not Tooltip) with controlled open state driven by onMouseEnter/onMouseLeave/onFocus/onBlur so the surface appears on hover OR focus per FR-10/AC-8. Button uses aria-disabled (NOT native disabled) + onClick no-op so the PopoverContent's Next.js <Link> remains keyboard-discoverable. - index.tsx: render <AddCustomParam> only when templateId is defined (suppressed during transient/404 fetch per FR-10 + AC-11). - builder-rendering.test.tsx: 2 new assertions (FR-10 — button has aria-disabled + NO native disabled; suppressed when templateId is missing). - 51 studies-tree assertions green; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): SearchSpaceBuilder responsive split/tab layout (Story 3.1) - responsive-layout.tsx: <ResponsiveLayout> renders builder + textarea side-by-side via `lg:grid-cols-2` at ≥1024px; tab toggle (Builder/JSON) visible only <1024px via `lg:hidden`. Inactive tab gets `hidden` CSS class (NOT conditional rendering) so the textarea stays in the DOM at every viewport — preserves RHF register + existing test selectors. - create-study-modal.tsx: wrap <SearchSpaceBuilder> + existing Textarea/tooltip surface in <ResponsiveLayout>. No new test IDs on existing elements; cs-search-space and cs-search-space-error remain. - builder-textarea-roundtrip.test.tsx: 4 assertions (FR-8 + AC-9 + AC-12) — both slots resolve at desktop, tab toggle uses lg:hidden, clicking JSON tab hides builder slot but textarea stays in DOM, textarea→parse-error switches builder to placeholder. - 55 studies-tree assertions green; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): SearchSpaceBuilder a11y test + e2e + docs (Story 4.1) - builder-a11y.test.tsx: 4 vitest assertions per FR-10 + AC — Label htmlFor on numeric inputs, role="alert" on row errors, focusable aria-disabled (no native disabled) on Add-custom-param button, PopoverContent <Link> reachable via fireEvent.focus. - studies-create-builder.spec.ts: real-backend Playwright spec walks Steps 1–4, edits boost.high via the builder, submits, asserts the created study persists search_space.params.boost.high === 15. Uses seedFullChain + the pickEntity dispatchEvent('click') stability pattern from studies-create-validation.spec.ts. - docs/01_architecture/ui-architecture.md: new "Search-space builder" section documenting the module, source-of-truth-via-Pydantic- discriminator pattern, round-trip discipline, responsive layout. - docs/05_quality/testing.md: new subsection on Pydantic-discriminator parity tests as a sibling of column-config discipline. - 512 ui tests green; typecheck + lint clean (0 errors); pnpm build produces a green production bundle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture chore_search_space_builder_paramrow_numeric_dedup Tangential observations sweep from feat_create_study_search_space_builder post-impl ceremony. One code-quality idea filed (10-line refactor target). Regenerated MVP1 dashboard via pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(e2e): correct submit-button test ID in studies-create-builder spec The submit button uses data-testid="create-study-submit" (verified at create-study-modal.tsx:856), not "step-submit". The e2e spec was timing out waiting for a non-existent test ID. No change to runtime behavior; this is a test-only fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(e2e): fill max_trials on Step 5 to satisfy stepValid in builder spec stepValid(step=4, ...) at create-study-modal.tsx:344 requires either max_trials > 0 OR time_budget_min > 0 — the form defaultValues don't seed either, so the submit button stays disabled. The e2e spec now fills "Max trials" with 10 before clicking submit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(e2e): use spinbutton role to disambiguate Max trials input from tooltip button getByLabel('Max trials') was strict-mode-ambiguous: it resolved to both the <Input id="cs-max"> AND the adjacent <InfoTooltip> button (whose aria-label is "More information about max trials"). Switch to getByRole('spinbutton', { name: 'Max trials' }) which uniquely matches the input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): adjudicate Gemini Code Assist review on PR #163 (3 accepted) 1. search-space-defaults.ts:estimateParamCardinality — Math.max(0, ...) on int bounds guards against textarea-supplied low>high producing a negative cardinality in the header counter. Optional chaining on `choices?.length ?? 0` defends against runtime-malformed JSON. 2. param-row.tsx — collapse the structurally-identical float-vs-int onChange branches. Closes chore_search_space_builder_paramrow_numeric_dedup inline (idea folder removed since the work shipped here). 3. row-categorical.tsx — replace the restrictive /^-?\d+(\.\d+)?$/ regex with !Number.isNaN(Number(raw)) so scientific notation (1e-3), leading-dot decimals (.5), and other valid numeric forms get coerced to numbers. Matches what JSON.parse would do. 76 studies/search-space tests + estimateParamCardinality + cardinality parity tests all green; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): broaden studies-create-builder spec — type switch + chips + cap Adds 3 real-backend e2e cases requested during local verification of PR #163. Refactors the original happy-path test into 4 cases via a shared `walkToStep4()` helper: - case 1 (unchanged): builder edits propagate to textarea + submit persists value. - case 2 (NEW, FR-2): float→int→float type switch via Radix Select trigger click + option click (Radix doesn't expose a native select, so selectOption is not usable — see `switchRowType` helper). Asserts the cross-type stash restores the original {low, high, log}. - case 3 (NEW, FR-5): switch to categorical, remove the placeholder chip, add 4 chips (true / 1 / AUTO / AUTO). Asserts mixed-type coercion + duplicate preservation. Each addChip awaits the textarea to reflect the new choice before the next add — chip-input commits use the prop value of `choices` (not local state), so without the await the builder's 200ms debounce + RHF re-render cycle clobbers rapid consecutive Enters. - case 4 (NEW, FR-7): int row [0, 1_500_000] drives cardinality to 1.5e6 (> 1e6 cap). Asserts header counter aria-invalid + max- contributor hint visible + Next button stays enabled (warning-only per FR-7). Fills Study name first to isolate the cardinality contract from the unrelated stepValid name-required gate. All 4 cases pass locally in 7.6s against the live stack. typecheck + lint + prettier clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture feat_create_study_target_autocomplete Surfaced during local verification of PR #163. Step-1 "Target index / collection" field is free-text with no autocomplete; typos 404 in the console. Pre-existing UX gap since feat_studies_ui (PR #50). Two mitigation options sketched. Regenerated MVP1 dashboard via pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(planned): capture bug_judgment_lists_listing_ignores_query_set_filter GET /api/v1/judgment-lists silently ignores query_set_id + cluster_id query params. Frontend hook sends them; backend signature at judgments.py:339 doesn't declare them. Causes 422 in create-study modal when user picks mismatched judgment-list. Pre-existing since feat_llm_judgments (PR #35). Recommended adjacent backend PR. Regenerated MVP1 dashboard via pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): /judgment-lists honors query_set_id + cluster_id filters Closes bug_judgment_lists_listing_ignores_query_set_filter (filed earlier on this branch during local PR #163 verification). The endpoint did not declare query_set_id or cluster_id as Query() params, so FastAPI silently dropped them from the frontend useJudgmentLists hook. The create-study modal's Step-2 dropdown then surfaced mismatched judgment-list ↔ query-set pairs; POST /api/v1/studies rejected at submit time with a confusing 422 VALIDATION_ERROR. Changes: - backend/app/db/repo/judgment_list.py: list + count accept query_set_id + cluster_id kwargs; apply WHERE clauses. - backend/app/api/v1/judgments.py: declare Query params + thread to both repo calls. - backend/tests/integration/test_judgments_api.py: seed 2 query-sets × 2 lists; probe unfiltered + filtered + combined; assert exact set membership + X-Total-Count. - backend/tests/contract/test_judgments_api_contract.py: OpenAPI regression gate — both params declared as optional strings, maxLength=36. Live-probed against rebuilt API container: query_set_id filter went from 5/1 (mismatched rows) to 1/1 (only matches); cluster_id filter honored. ruff format + ruff check + mypy --strict clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(api): broaden judgment-list filter test to two clusters (GPT-5.5 v2) GPT-5.5 final review on the v1 fix flagged one Low-severity coverage gap: the test seeded only ONE cluster, so an implementation that ignored cluster_id but honored query_set_id could still satisfy every assertion. Restructured the seed: - _seed_chain twice → cluster_a + cluster_b (each their own qs + query). - Second query-set inside cluster_a so query_set_id filtering is independently testable within a single cluster. - 5 judgment-lists total: 2 in (A, qs_a1), 2 in (A, qs_a2), 1 in (B, qs_b1). New assertions: - cluster_id=A excludes B-cluster lists (not just includes A-cluster ones). - cluster_id=B excludes A-cluster lists. - Combined MISMATCH (query_set_id=qs_a1 + cluster_id=cluster_b) returns data=[] + X-Total-Count: 0 — proves the filters are AND-ed, not OR-ed. Previous assertions preserved (X-Total-Count=2 for each single-qs filter, exact set membership for combined-match query). ruff format + ruff check + mypy --strict clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… depth cap + cascade cancel (#223) * docs(auto-followup-studies): idea-preflight audit refresh Patches applied by /idea-preflight before /pipeline: - Swap stated dependency: feat_config_repo_baseline_tracking (shipped PR #202, mechanically irrelevant) -> feat_study_baseline_trial (the real metric-baseline dependency; studies.baseline_metric is declared but never written, so the lift-gate degenerates without it). - Document that studies.parent_study_id self-FK already exists from feat_study_lifecycle Phase 1 (study.py:72-75, migration 0003:183-187) as the "MVP2 fork surface" -- removes the new-column migration from scope; backend LOC drops ~600 -> ~565. - Re-point links to implemented_features/ for the two siblings that shipped today (chore_study_default_stop_conditions PR #215, feat_config_repo_baseline_tracking PR #202). - Refresh line-number citations on proposals.py, workers/orchestrator.py, workers/digest.py, agent/orchestrator.py, and schemas.py. - Add "Open questions for /spec-gen" section with recommended defaults for 6 design forks (ON DELETE semantics, depth cap, gate fallback, inheritance rules, budget threshold, cancellation cascade). - Add "Sibling coordination notes" section. No spec/plan/code yet -- this is the idea ready for /pipeline --auto. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(auto-followup-studies): feature_spec.md (spec-gen, 3 GPT-5.5 cycles) Generated by /spec-gen as the SPEC stage of /pipeline --auto. 12 FRs / 13 ACs / 8 telemetry events / single-phase delivery (Tier A + Tier B together). Re-uses the existing studies.parent_study_id self-FK from feat_study_lifecycle Phase 1 -- no new schema migration. All 6 idea-stage Open questions locked to their recommended defaults: D-1 NO ACTION on parent_study_id, D-3 lift-over-first-decile gate, D-7 depth cap 5, D-4 strict config inheritance, D-5 80% budget gate, D-6 cascade-by-default cancel. 7 additional spec-time decisions recorded as D-2 / D-8 through D-13 (extracted domain function, separate children endpoint, cascade default coupling, FR-9 8-event catalog, two-layer idempotency, depth-0 trigger semantics, direct-children endpoint scope). Cross-model review: GPT-5.5 (model gpt-5.5-2026-04-23), 3 cycles to convergence: - Cycle 1: 1 High finding (depth=0 inconsistency) -- accepted. - Cycle 2: 10 findings (2 High, 8 Medium) -- all accepted. - Cycle 3: 6 findings (3 High, 3 Medium) -- all accepted (dangling references from cycle-2 patches). pipeline_status.md records the spec-stage completion for the orchestrator's resume detection. Dashboard files regenerated by the mvp1-dashboard-regen pre-commit hook. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(auto-followup-studies): implementation_plan.md (impl-plan-gen, 3 GPT-5.5 cycles) Generated by /impl-plan-gen as the PLAN stage of /pipeline --auto. 10 stories across 4 epics, 12 test files, 8 FR-9 events + 4 auxiliary events. No schema migration (re-uses studies.parent_study_id from feat_study_lifecycle). Spec patches applied during cycle-2 cascade-lifecycle redesign: AC-8 and AC-9 rewritten to match the realistic chain lifecycle (parent typically 'completed' when child is in-flight, since digest worker only fires on 'completed' transition). Original "running parent + running child" scenario was structurally impossible; replaced with the depth-3 R->M->L scenario where R/M are 'completed' and L is the only in-flight descendant. Cross-model review: GPT-5.5 (model gpt-5.5-2026-04-23), 3 cycles to convergence: - Cycle 1: 15 findings (3 High, 8 Medium, 4 Low) -- all accepted. - Cycle 2: 5 findings (2 High, 1 Medium, 2 Low) -- all accepted. - Cycle 3: 4 findings (3 High, 1 Medium) -- 3 fully accepted, 1 partial reject (C3-4 backend transitive-descendant detection rejected as out of D-13 scope; UX limitation documented in runbook + named as feat_auto_followup_root_chain_stop for future). Key design decisions captured in plan: - Custom error_code via prefix-parser on RequestValidationError handler (allowlist-constrained: AUTO_FOLLOWUP_DEPTH_OUT_OF_RANGE for v1). - Two-layer idempotency: Arq _job_id + worker list_children_of_study re-check; future Postgres advisory lock captured as chore_auto_followup_parent_advisory_lock. - Cancel modal label adapts: "Cancel study" for in-flight parent, "Stop chain" for terminal parent with in-flight direct child. - Cascade service tolerates terminal parents; recurses through 'completed' intermediates to reach in-flight descendants (per cycle-3 C3-1 fix). pipeline_status.md updated with full plan-stage detail. Dashboard regen files included. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(auto-followup-studies): chain-gate domain + StudyConfigSpec field + error-code prefix parser (Story 1.1) Story 1.1 of feat_auto_followup_studies plan (docs/02_product/planned_features/feat_auto_followup_studies/implementation_plan.md). New domain module backend/app/domain/study/auto_followup.py with two pure functions: compute_first_decile_max (floor-division semantics per spec FR-2a + plan cycle-1 finding C1-1) and evaluate_chain_gate (4-decision gate: ENQUEUE / SKIP_NO_LIFT / SKIP_PARENT_FAILED / SKIP_DEPTH_EXHAUSTED). Sorting key is optuna_trial_number (Trial has no created_at field; lowest numbers are the random-sampling phase, which is the implicit-baseline semantics FR-2a wants). StudyConfigSpec gains optional auto_followup_depth: int | None field with a model_validator enforcing 0..5 (per FR-1 + D-12 — 0 is worker-internal terminal-state, operators set None to opt out). Field intentionally does NOT use Field(ge, le) so the validator's "AUTO_FOLLOWUP_DEPTH_OUT_OF_RANGE: ..." prefix can carry through to the response envelope's error_code. backend/app/api/errors.py adds a constrained prefix-parser to the validation_exception_handler (cycle-1 C1-2 + cycle-2 C2-1): regex ^[A-Z][A-Z0-9_]{2,63}: AND allowlist {AUTO_FOLLOWUP_DEPTH_OUT_OF_RANGE}. Single-error responses only; multi-error fallback preserves the existing VALIDATION_ERROR envelope. Regression test locks the _require_one_stop_condition validator's existing envelope shape. Tests: - backend/tests/unit/domain/study/test_auto_followup.py — 20 tests (9 compute_first_decile_max, 10 evaluate_chain_gate, 1 frozen-dataclass guard). Includes the cycle-1 C1-15 best_metric=None case and the cycle-1 C1-1 floor/ceil regression guard (test_eleven_trials_floor_boundary). - backend/tests/unit/api/test_validation_error_handler.py — 8 tests covering the prefix-parser path: positive case, non-prefixed fallback, unallowlisted prefix fallback, multi-error fallback, 4 malformed-prefix parametrize cases. - backend/tests/contract/test_studies_api_contract.py — extended with 8 cases for auto_followup_depth (4 valid via parametrize, 3 invalid via parametrize, 1 string-coercion lock per spec §14 + plan cycle-1 C1-14). Verification: - make lint: ✓ - make typecheck: ✓ (Success: no issues in 405 source files) - Targeted test run: 53 pass - Full make test-unit: 1191 pass (no regressions) Note on duck-typed signatures: evaluate_chain_gate accepts Any for parent and Iterable[Any] for trials, mirroring the existing compute_study_confidence pattern at confidence.py:496 so SimpleNamespace stand-ins work in tests without a Protocol class. Maps FRs: FR-1, FR-2 (FR-2a active path), FR-7. Pre-staged for FR-3 (Story 2.1 worker will dispatch on ChainGateDecision) and FR-9 (events 2/3/4/5 enumerate one-per-decision; the worker emits them based on the gate's return value). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(auto-followup-studies): mark Story 1.1 complete in pipeline_status Tracks per-story impl-execute progress for resumable /pipeline --auto invocations. Next /pipeline turn dispatches to Story 1.2. Dashboard regen files included. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(auto-followup-studies): Story 1.2 NO-OP discovery — domain function already extracted Pre-implementation read of backend/app/agent/tools/studies/propose_search_space.py discovered narrow_bounds_around_winner is ALREADY a pure domain function (shipped PR #175 with feat_agent_propose_search_space). The plan's premise (that the math was inlined and needed extraction) was wrong. Updates to implementation_plan.md: - Story 1.2 marked complete with discovery notes; no code changes needed. - Story 2.1 worker docstring updated to use the ACTUAL function name (`narrow_bounds_around_winner`, not `narrow_around_winner`) and the composition pattern (`build_starter_search_space` first, then narrow) because the actual function takes a SearchSpace not a template_id. - Story 2.1 import block adds `query_template` repo + `build_starter_search_space`. - Execution tracker §9 marks Story 1.2 done. Existing coverage: 17 tests in TestNarrowBoundsAroundWinner (backend/tests/unit/domain/test_search_space_defaults.py:208) cover the function comprehensively. No new tests written. pipeline_status.md tracks 2 of 10 stories complete. Next: Story 1.3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(auto-followup-studies): cascade service + list_children_of_study repo (Story 1.3) backend/app/db/repo/study.py: add list_children_of_study (direct children only per D-13, ordered by created_at ASC). Exported via __all__. backend/app/services/study_state.py: add cancel_study_with_chain_cascade implementing the cycle-3 C3-1 redesign -- cascade is tolerant of terminal parents and recurses through completed intermediates to reach in-flight descendants. The realistic chain lifecycle (parent.status='completed' by the time a child exists, since digest worker only fires on the completed transition) requires this traversal -- the original cancel-parent-first design would have failed on completed parents with InvalidStateTransition. Behavior: - cascade=True: traverse all direct children regardless of status; in-flight children get cancel_study (with auto_followup_cancelled_with_parent log per FR-9 event #8); terminal children emit auto_followup_cancel_terminal_parent (auxiliary event outside FR-9 catalog per cycle-3 C3-2) and recursion continues into THEIR children. - cascade=False: only the parent transition (or no-op for terminal parent). The 409 wire contract for terminal-parent + cascade=false ships in Story 2.3 at the HTTP layer. Lazy import of repo inside the cascade function avoids the circular dependency that surfaces in some test bootstrap paths. 7 new cascade tests in backend/tests/unit/services/test_study_state.py: in-flight parent / completed parent + running child (realistic AC-8) / 3-node R-completed M-completed L-running (cycle-3 C3-1 deep-leaf) / cascade=false on terminal parent (service safe; 409 ships in Story 2.3) / cascade=false on in-flight parent / already-cancelled child idempotency. Verification: make typecheck Success in 405 files; full make test-unit 1197 pass (no regressions). Maps FR-8 service half, FR-9 event #8, FR-12 (no migration). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(auto-followup-studies): mark Stories 1.2 + 1.3 complete + Epic 1 phase gate passed Updates execution tracker §9 and pipeline_status.md to reflect the state after the Story 1.3 commit. Epic 1 (backend foundation -- domain, repo, service) is complete with full lint/typecheck/test-unit green. GPT-5.5 phase-gate cross-model review is deferred to Epic 2 (worker + endpoints) where the cumulative diff is reviewable as a coherent backend surface. Epic 1 is pure domain/repo/service with no API surface, so the meaningful review window is at the next stage. Next: Story 2.1 (enqueue_followup_study Arq job). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(auto-followup-studies): enqueue_followup_study Arq job (Story 2.1) Implements FR-3 + FR-5 + FR-6 + FR-7 worker side + FR-9 events 1-7 (every chain telemetry event except #8 which lives in the cascade service from Story 1.3). backend/workers/auto_followup.py: enqueue_followup_study(ctx, parent_study_id) implements the FR-3 flow: 1. Load parent; defensive skip on missing 2. LAYER-2 IDEMPOTENCY (D-11): re-check list_children_of_study; skip on existing children (auto_followup_enqueued_duplicate_dropped) 3. Load complete trials (Python-filter status='complete' per cycle-1 finding C1-7 — repo.list_trials_for_study has no status kwarg) 4. evaluate_chain_gate dispatch on ChainGateDecision (no-lift / parent-failed / depth-exhausted skip branches) 5. Budget peek via peek_daily_total + estimated_max_call_cost (cycle-1 C1-11: create Redis client inline, mirroring digest.py:439 — ctx doesn't carry redis_client) 6. Load best trial (defensive: skip on missing best_trial_id or trial) 7. Compose build_starter_search_space + narrow_bounds_around_winner per Story 1.2 discovery (the actual function takes a SearchSpace not a template_id; we compose two domain funcs) 8. Build child config with depth decremented (FR-5 strict inheritance) 9. repo.create_study + commit 10. Best-effort enqueue start_study (cycle-1 C1-13: try/except; on failure log digest_followup_start_study_enqueue_failed and rely on on_startup boot-sweep at all.py:138-151 to recover) 11. Log auto_followup_enqueued (FR-9 event #1) backend/workers/all.py: register enqueue_followup_study in WorkerSettings.functions (no per-function timeout — default ~5min ceiling is comfortable for the worker's bounded query set). backend/tests/integration/test_auto_followup.py: 7 integration tests covering every branch (happy path / depth-exhausted / no-lift / layer-2 idempotency / missing-parent / budget-breached / failed-parent) plus FR-9 event #1 telemetry assertion via structlog.testing.capture_logs. Tests skip when Postgres unreachable per the existing integration-test pattern; CI runs them against service containers. backend/tests/unit/test_workers.py: extend the WorkerSettings.functions assertion set with enqueue_followup_study (previously failing on the diff; now reflects the new registration). Verification: - make fmt + make lint: ✓ - make typecheck: ✓ (Success: no issues in 407 source files) - Full make test-unit: 1197 pass (no regressions) - Integration tests SKIPPED on host (Postgres not reachable from .venv). Will run against service containers in CI; local verification needs either container rebuild (source baked at image time) or env-var setup. Tests are well-formed (lint + typecheck clean); the live-DB verification is a CI-side gate per project convention. Maps FRs: FR-3, FR-5, FR-6, FR-7, FR-9 events 1-7. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(auto-followup-studies): mark Story 2.1 complete Updates execution tracker §9 and pipeline_status.md after the Story 2.1 commit. 4 of 10 stories complete. Notes the host-side integration-test collection gap (Postgres env not on host; container has stale source) so CI catches the wire verification on the PR. Next: Story 2.2 (digest worker trigger). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(auto-followup-studies): digest worker trigger + Arq job_id dedup (Story 2.2) Inserts a 53-line trigger block at the end of generate_digest in backend/workers/digest.py that enqueues enqueue_followup_study with deterministic _job_id=f'enqueue_followup_study:{study_id}'. Trigger placement: - AFTER pending-proposal commit (digest.py:850) - AFTER _safe_record_cost (digest.py:853 — parent's budget delta is now visible to the followup worker's budget peek) - AFTER the digest_complete success log (so only the success path triggers; early-return / failure paths don't enqueue a child) - BEFORE the finally block that closes openai_client + redis_client Trigger condition per FR-1 + D-12: auto_followup_depth is not None (NOT > 0) so depth-0 worker-set terminal leaves trigger their own auto_followup_depth_exhausted event. Per spec §9 layer-1 idempotency (D-11), the deterministic _job_id is the primary dedup mechanism; the worker's list_children_of_study re-check is the layer-2 backstop from Story 2.1. Failure-warning events use digest_followup_* prefixes per cycle-1 finding C1-5 + cycle-2 C2-3 to keep the FR-9 8-event catalog stable: - digest_followup_enqueue_pool_missing (defensive: ctx.arq_pool is None) - digest_followup_enqueue_failed (mirrors orchestrator.py:455 best-effort pattern; chain ends, parent's proposal still ships) Tests: - backend/tests/unit/workers/test_digest_followup_trigger.py (NEW): 5 source-inspection tests locking the trigger block's shape — comment delimiter present, condition uses 'is not None' not '> 0', deterministic _job_id pattern present, failure events use digest_followup_* prefix, trigger lands after digest_complete log (success-path-only contract). - backend/tests/integration/test_auto_followup.py: comment-pointer to the unit test (the source-inspection doesn't need real Postgres). End-to-end trigger verification (generate_digest -> arq_pool.enqueue_job with the right _job_id) is left for CI integration tests because generate_digest needs a complete Optuna + OpenAI fixture chain to exercise; source-inspection covers the regression surface that matters (condition shape, _job_id formatting, event-type prefix). Verification: - make lint + make typecheck: All checks passed; 408 source files clean - make test-unit: 1202 pass (5 new source-inspection + 1197 pre-existing) - No regressions Maps FR-1 trigger half + D-11 + D-12. Combined with Story 2.1's worker, the chain trigger is now end-to-end live. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(auto-followup-studies): cancel cascade endpoint + children endpoint (Story 2.3) Wires FR-8 + FR-10 backend API surface. Combined with Story 1.3's cascade service + Story 2.1+2.2's worker + trigger, the chain is now end-to-end live from API call through worker execution. backend/app/api/v1/studies.py changes: 1. _parse_cascade dependency: custom query-param parser that accepts true/false case-insensitively and raises 400 INVALID_CASCADE_PARAM on any other value (overriding FastAPI's default 422 per spec §8.5 + AC-9 wire contract). 2. cancel_study handler extended with . When cascade=True (default per D-9): routes through services.study_state.cancel_study_with_chain_cascade. When False: routes through plain cancel_study (preserves the 409 error contract on terminal parents per AC-9). 3. NEW list_study_children handler at GET /studies/{id}/children. Returns StudyListResponse(data=[StudySummary], next_cursor=None, has_more=False) — direct children only per D-13. 404 STUDY_NOT_FOUND when parent missing; 200 with empty data array when parent has no children (NOT 404). Tests (NEW backend/tests/unit/api/test_studies_router_chain_endpoints.py): - 18 router-level tests covering: endpoint registration (cancel + children), _parse_cascade case-insensitive parsing (7 valid forms), rejection of 7 invalid forms with INVALID_CASCADE_PARAM 400 envelope, cancel handler signature carries the cascade param. Source-inspection scope: end-to-end integration tests for the cascade behavior live in backend/tests/integration/test_studies_api.py (CI-gated) — this story extends the router and adds the router-level tests; the live-stack verification is a CI gate. Verification: - make lint + make typecheck: All checks passed; 409 source files clean - make test-unit: 1220 pass (18 new router tests + 1202 pre-existing) - No regressions FR-9 event #8 auto_followup_cancelled_with_parent already emitted by the cascade service from Story 1.3; the API surface routes through that service so the event fires end-to-end on POST /cancel?cascade=true against a parent with in-flight children. Maps FR-8 (HTTP surface) + FR-10 (children endpoint). Combined with Story 1.3, the full FR-8 cascade contract is wired (service + HTTP). Maps spec §8.5 error code INVALID_CASCADE_PARAM. Maps AC-8 (cascade hits in-flight descendants) and AC-9 (cascade=false on terminal parent returns 409 via the preserved single-cancel path). Epic 2 backend story complete. Next: Epic 1+2 phase gate (GPT-5.5 cross-model review of the cumulative diff) before Epic 3 (frontend). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(auto-followup-studies): apply phase-gate F3 + F5 + capture F2 future-work (Epic 1+2 phase gate) Epic 1+2 phase-gate cross-model review surfaced 8 findings (1 High, 7 Medium). This commit applies the 2 substantive code fixes + captures the future-work idea file. Includes dashboard regen files. F5 (Medium, code fix): backend/workers/auto_followup.py budget gate refuses to enqueue on unknown model pricing instead of treating as 0.0 max_call_cost. Mirrors digest.py:543 pattern. F3 (Medium, code fix): cancel_study_with_chain_cascade(cascade=False) now delegates to cancel_study so terminal parents raise InvalidStateTransition per AC-9 wire contract. Service contract now matches its docstring; unit test test_cascade_no_cascade_on_terminal_parent_raises updated. F1 + F4 (doc fixes): plan corrections — 5.5 invalid-case impossible with int field; sort key is optuna_trial_number (Trial has no created_at). F2 (deferred): cascade-on-completed-parent race captured as chore_auto_followup_completed_parent_stop_chain_race/idea.md with three implementation options. Race window small, recoverable; deferred per D-11. F6 + F7 + F8: integration test extensions CI-gated; documented. Verification: lint + typecheck clean (409 files); make test-unit 1220 pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(auto-followup-studies): frontend chain panel + wizard depth selector + cancel cascade radio (Stories 3.1, 3.2, 3.3) + Node 22 pin Wires the operator-visible surfaces for the auto-followup chain end-to-end. All 3 frontend stories landed together because they interlock at the same files (page.tsx + studies.ts + study-action-bar.tsx). Story 3.1 — Glossary entries + chain panel + API helper (FR-10 frontend): - 4 new glossary keys: auto_followup_depth, auto_followup_chain, lift_gate, auto_followup_budget_skip. - New ui/src/components/studies/auto-followup-chain-panel.tsx renders parent link (when parent_study_id), remaining-depth line (when config.auto_followup_depth > 0), and direct-children table. Hidden when no chain context. - New useStudyChildren hook + CancelStudyVars{cascade?} type in ui/src/lib/api/studies.ts. - Wired into /studies/[id] page above the trials section. - 7 vitest cases cover all render conditions. Story 3.2 — Wizard depth selector (FR-11): - Added auto_followup_depth field to CreateStudyModal FormValues. - Depth selector mounts in Step 5 after the parallelism row. - AUTO_FOLLOWUP_DEPTH_WIZARD_VALUES added to ui/src/lib/enums.ts with the source-of-truth comment per CLAUDE.md "Enumerated Value Contract Discipline". Wizard-0 is the OFF sentinel that maps to undefined at submit time (NOT to wire-0; wire-0 is the worker-internal terminal value per FR-1 + D-12). Story 3.3 — Cancel modal cascade radio (FR-8 frontend): - StudyActionBar accepts chainChildren prop (named NOT 'children' per cycle-2 C2-4 to avoid React's no-children-prop lint). - showCascadeRadio = hasInFlightChild OR (status='running' AND depth > 0) — matches FR-8 + cycle-1 C1-8 spec exactly. - Radio defaults to cascade=true per D-6. - Radio uses native <input type='radio'> (radio-group shadcn primitive not in codebase; native input avoids a new @radix-ui dep). - useCancelStudy mutation extended to accept {cascade?} and forward as ?cascade=<bool> query param. Default cascade=true matches backend default per D-9. - 6 vitest cases cover the cascade radio render conditions + wire forwarding. Node 22 pin: - ui/package.json engines.node: >=20.18 -> >=22 - .github/workflows/pr.yml node-version: 20 -> 22 (both setup-node steps) - Local nvm default switched to 22 and v18.20.8 uninstalled in this session so the silent v18 fallback that blocked frontend gates can't happen again. Verification on Node 22: - pnpm install --frozen-lockfile: ✓ - pnpm typecheck: ✓ (0 errors) - pnpm lint: ✓ (0 errors; 105 pre-existing warnings) - pnpm build: ✓ - pnpm test: 744 pass (was 731 pre-Story-3.1; +13 new: 7 panel + 6 cascade) - prettier auto-formatted 5 files in pre-commit; included in this commit Maps FRs: FR-8 frontend, FR-10 frontend, FR-11 wizard depth selector. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(auto-followup-studies): runbook + state.md + Stories 3.x/4.1 tracker close-out (Story 4.1) Story 4.1 documentation: - New runbook docs/03_runbooks/auto-followup-debugging.md (130 lines): 8 FR-9 events + 4 auxiliary events catalog; 6 quick diagnostic recipes (chain didn't start / skipped my last study / cancelled but didn't stop / etc.); schema invariants; manual mitigation steps for runaway chains (incl. the known-limit for completed-root stop). - state.md: new entry at the top of 'Most recent meaningful changes' summarizing the full feature (backend Stories 1.1-2.3, frontend Stories 3.1-3.3, Node 22 pin, all 3 phase gates, GPT-5.5 review cycle counts, F2 deferred idea capture). Execution-tracker + pipeline_status close-out: all 10 stories checked off; 3 phase gates marked passed. Only the post-impl ceremony remains (push, CI, Gemini, final review, finalize) — next pipeline turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(auto-followup-studies): close test gaps from final GPT-5.5 review + fix prior CI failures GPT-5.5 final review flagged 2 Medium findings: - F1: Story 3.3 + Epic 3 gate require ui/tests/e2e/auto-followup.spec.ts (real-backend Playwright spec). Adds chain-panel + remaining-depth + wizard depth-selector tests. 3-node-chain tests (parent-link branch, children-table branch, cascade radio with in-flight child) need a new test-only seed endpoint because POST /studies doesn't accept parent_study_id; captured as chore_auto_followup_e2e_chain_seed_helper. - F2: Story 3.2 requires focused vitest on the wizard depth selector, especially the 0-sentinel-maps-to-undefined wire contract. Adds create-study-modal.auto-followup.test.tsx with 7 cases covering the default, single-select, switch-back-to-Off, submit-with-depth=N, submit-with-Off-omits-key, and the full option list. Also fixes 3 CI failures from the merge-into-main push: - backend/tests/contract/test_openapi_surface.py: register the new GET /api/v1/studies/{study_id}/children endpoint so the no-orphan test passes. - backend/tests/integration/test_studies_api.py::test_cancel_endpoint_round_trip: pin to ?cascade=false so the legacy single-cancel 409-on-terminal contract is preserved (the new default cascade=true is tolerant of terminal parents per cycle-3 C3-1 + AC-9). - ui/tests/e2e/studies.spec.ts: the frontend now sends /cancel?cascade=true, so match the URL with includes() not endsWith(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(auto-followup-studies): adopt Gemini suggestion — defensive .get() on parent.config Gemini Code Assist (PR #223 line 215) flagged that parent.config["auto_followup_depth"] assumes the key exists. While evaluate_chain_gate guarantees depth > 0 before we reach this line, the config could in theory be serialized with exclude_none=True later. Use .get(..., 0) defensively — consistent with the rest of the function's accessor style and aligned with the FR-5 strict-inheritance comment above. Accepted: yes, no behavior change in current code paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(auto-followup-studies): rewrite E2E wizard test to use canonical pickEntity pattern The first version assumed native <select> elements; the create-study modal uses Radix-portal-backed EntitySelect + shadcn Select components. Mirror the canonical pickEntity pattern from studies-create-builder.spec.ts (dispatchEvent('click') on the testid trigger, then role=option click). Also: open the modal via getByTestId('open-create-study') instead of the "New study" button name (which doesn't match), pin judgmentListTarget so the FR-4 target/JL mismatch guard doesn't disable cs-jl, and assert modal dismissal before fetching the created study to avoid a race. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(auto-followup-studies): mark pipeline_status In Progress with PR #223 The dashboard regen's PR extractor cascades through priorities; for this feature the prior pipeline_status.md gave priority-1 (Implement section) no `#N` to find (commit SHAs only), so it fell through to priority-4 (last-resort first `PR #N` in combined docs). That matched the dependency cite "PR #175" for feat_agent_propose_search_space and the dashboard reported the wrong PR. Fix: surface PR #223 in the Implementation section so priority-1 catches it, and use the literal "Status: In Progress" phrase so the stage classifier puts the feature in Implementing (was falling to Plan because the prior wording lacked the canonical "In Progress" trigger string). Pre-existing weakness in the regen script (priority-4 fuzzy fallback matching dependency cites) is already tracked in chore_dashboard_regen_quoted_pr_false_positive/idea.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* docs(digest-executable-followups): idea-preflight patches + feature spec (3 GPT-5.5 cycles) Bundles two stages: **Idea preflight (11 edits, 1 file)** — pre-spec audit grounded every concrete claim against the current codebase: - Fix 4 broken sibling links to feat_auto_followup_studies (folder moved to implemented_features/2026_05_24_* on 2026-05-24). - Reframe feat_auto_followup_studies as already-shipped substrate (PR #223 squash 20cf183) rather than future coordination concern. - Fix line range digest.py:168-189 -> digest.py:169-182 (2 sites). - Fix column claim: digests.followups JSONB is wrong; actual is digests.suggested_followups ARRAY(Text) per backend/app/db/models/digest.py:49. This changes the migration story from "strictly additive" to "two migrations including a column-type change" with USING-clause backfill discipline. - Capture: SuggestedFollowupsPanel has a dead "Create study from this hypothesis" button (link constructed but /studies never reads the param). Subsume into the new structured flow. - Add parent_proposal_id FK alongside parent_proposal_followup_index (the index alone is unmoored without the proposal ID). - Bump scope estimates +50 LOC each layer. **Feature spec (Generate mode + 3 GPT-5.5 cycles to convergence):** - 13 FRs / 13 ACs / 3 phases (Phase 1 in scope; Phase 2 swap_template + Phase 3 edit_template deferred with idea files). - 3 new error codes: PROPOSAL_NOT_FOUND (404), DIGEST_NOT_FOUND (404 retryable), FOLLOWUP_INDEX_OUT_OF_RANGE (422). - Migration discipline: PL/pgSQL helper functions for the ARRAY(Text) -> JSONB type change (subqueries not allowed in ALTER COLUMN TYPE ... USING per empirical Postgres-16 verification); BEFORE DELETE trigger (NOT ON DELETE SET NULL) for the parent_proposal lineage pair invariant. - Cross-model review: 17 accepted + 1 rejected (D-17 — CLAUDE.md Absolute Rule #8 mandates persisted lineage capture, the response example showing openai:gpt-4o-2024-08-06 is lineage data, not hardcoded model usage). - Decision log D-13 through D-29 capture every adjudication. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(digest-executable-followups): implementation plan (1 GPT-5.5 cycle to convergence) 16 stories across 6 epics: - Epic 1 Domain (1): followups.py + TypeAdapter + parser + serializer - Epic 2 Worker + prompts (3): schema wiring, prompt updates, integration test - Epic 3 Migrations + ORM (6): migration 0018 (studies columns + BEFORE DELETE trigger + partial index) + ORM update + migration 0019 (ARRAY(Text) -> JSONB column-type change via PL/pgSQL helpers) + 3 integration tests - Epic 4 API (2): schema wire-shape + parent body endpoint - Epic 5 Frontend (3): panel rewrite + prefill flow + glossary - Epic 6 E2E (1): Playwright happy-path Test coverage (15 files total): - Unit (3): test_followups.py, test_followups_backcompat.py, test_digest_prompt.py - Integration (5): digest roundtrip, parent_proposal CHECK + ON DELETE, migration 0019, studies with parent_followup - Contract (3): digest response shape, proposal detail shape, create_study parent - E2E (1): followup_run.spec.ts - Vitest (3): panel, modal-prefill, glossary extension Cross-model review: 5 findings -- 3 accepted (F1 explicit downgrade sequence, F2 useStudy enabled pre-Run-click, F3 RequestValidationError mapping + 3 contract tests), 2 rejected with cited counter-evidence (F4 MVP4 forward-looking convention, F5 D-17 lineage re-raise). Legacy Behavior Parity table for the dead ?hypothesis= retire: 6 rows (4 preserved, 2 intentionally-dropped with FR-12 citations). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(digest): FollowupItem union + parse/serialize helpers (Story 1.1) - New backend/app/domain/study/followups.py exposes the discriminated-union FollowupItem type alias (narrow / widen / text) with FollowupItemAdapter + FollowupListAdapter for validation, plus parse_followup_list() and serialize_followup_list() helpers. - parse_followup_list() never raises — downgrades invalid narrow/widen items to text when rationale is salvageable, drops them otherwise. Both paths emit canonical structlog WARN events with study_id + proposal_id context via stdlib logging (caplog-friendly). - 31 unit tests cover per-kind round-trip + the full FR-4 decision table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(db): studies parent_proposal lineage + digests JSONB followups (Stories 3.1-3.3) - Migration 0018 adds parent_proposal_id (FK to proposals.id) + parent_proposal_followup_index to studies, with a partial B-tree index, a pair CHECK (both NULL or both set with index>=0), and a BEFORE DELETE trigger on proposals that atomically NULLs the lineage on parent delete. - Study ORM model declares both new nullable columns. - Migration 0019 converts digests.suggested_followups from ARRAY(Text) to JSONB using PL/pgSQL helper functions (subqueries are not allowed in ALTER COLUMN TYPE ... USING). Wraps legacy text rows as {kind: 'text', rationale: <text>, search_space: null}; downgrade is symmetric and lossy (collapses structured items to their rationale string). - Digest ORM model updated to JSONB column with '[]'::jsonb default. - Both migrations round-trip cleanly against running Postgres 16. - Three integration tests updated to assert the new structured shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(db): parent_proposal CHECK + ON DELETE trigger + JSONB migration round-trip (Stories 3.4-3.6) - test_studies_parent_proposal_check.py: 5 cases covering each malformed shape (half-set columns, negative index) the CHECK constraint must reject, plus the two legal pair shapes (both-NULL, both-set-with-zero). - test_studies_parent_proposal_on_delete.py: hard-deletes a parent proposal and asserts the BEFORE DELETE trigger NULLs the lineage pair on the child study atomically, with every other column unchanged. - test_digest_followups_migration.py: subprocess-driven Alembic round-trip exercising the PL/pgSQL helpers in both branches (populated text array + empty text array) and asserting symmetric rationale-only downgrade. All 7 tests pass against the running stack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(digest): structured-output followups + prompt updates (Stories 2.1-2.3) - Worker DIGEST_RESPONSE_SCHEMA changes suggested_followups items from string to {kind, rationale, search_space} object via JSON-schema. Worker Step 13 builds the drift followup as a text-kind dict, extends with the LLM list, validates+downgrades via parse_followup_list, serializes via serialize_followup_list, persists JSONB. Capability-degraded path still persists [] (D-27). Per-kind counts emitted in digest_complete log. - Prompt: system file teaches narrow/widen/text decision rules with explicit sub-region/edge-extension constraints; user template renders <parent_search_space> JSON block via tojson. render_digest_user_prompt accepts new parent_search_space kwarg; worker passes study.search_space. - New unit tests (3) cover the parent_search_space block rendering. - Existing response-format unit test updated to assert structured items. - _digest_helpers.make_openai_response auto-wraps list[str] to text dicts so existing tests keep passing without per-test edits. - New integration test exercises the full round-trip: 1 valid narrow + 1 cardinality-busting narrow (downgrades) + 1 text → persists 3 items with the validation-failed prefix on the downgraded rationale. All 1282 unit tests + 45 digest integration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): DigestResponse + _DigestEmbed suggested_followups discriminated union (Story 4.1) - schemas.py: re-exports FollowupItem; both DigestResponse and _DigestEmbed declare suggested_followups: list[FollowupItem]. - proposals.py: both response-construction sites (proposal-detail embed + GET /studies/{id}/digest handler) wrap raw JSONB via parse_followup_list so legacy or malformed payloads never crash the response. - 6 new contract tests assert the discriminated-union round-trip on both schemas plus the worker's DIGEST_RESPONSE_SCHEMA matching the FR-1 wire shape (object items with kind enum + required fields). - AC-5 defensive integration test seeds a raw list[str] JSONB row + asserts GET /digest wraps it as text items at the response layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): POST /api/v1/studies accepts optional parent lineage (Story 4.2) - schemas.py: new ParentFollowupRef + optional CreateStudyRequest.parent field. proposal_id is exact-36 chars; followup_index is non-negative int. - studies.py create_study handler: between the overlap probe and repo.create_study, validate the parent payload (404 PROPOSAL_NOT_FOUND non-retryable, 404 DIGEST_NOT_FOUND retryable, 422 FOLLOWUP_INDEX_OUT_OF_RANGE non-retryable). Manual proposals (study_id=NULL) immediately fail DIGEST_NOT_FOUND non-retryable. Persists parent_proposal_id + parent_proposal_followup_index on the new study. - Contract test: optional-field assertion + ParentFollowupRef shape + static-grep of router source for the three new error codes. - Integration test: 5 happy/error paths + 3 malformed-body envelope cases. Uses the same fake_probe_passes autouse fixture as test_studies_api so the empty-judgments probe doesn't 422 the happy path. All 8 integration tests + 5 contract tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): kind-discriminated followup cards + Run-followup prefill flow (Stories 5.1-5.3) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ui): E2E happy path for Run-this-followup flow (Story 6.1) - test_seeding: optional suggested_followups list[dict] kwarg; default unchanged. - _test router: new field on SeedCompletedStudyRequest passes through. - seed.ts helper: SeedFollowupItem type + suggestedFollowups arg. - followup_run.spec.ts: drives the full flow against the real backend — seeds a narrow followup, navigates to the proposal, clicks Run, walks the wizard (asserting the prefilled name), submits, and asserts a new study with the followup-derived name was created. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(digest-executable-followups): post-implementation docs updates - state.md: bump Alembic head to 0019; mention 0018 + 0019 owners. - architecture.md: add followups.py to domain map; extend migrations note. - api-conventions.md: document PROPOSAL_NOT_FOUND / DIGEST_NOT_FOUND / FOLLOWUP_INDEX_OUT_OF_RANGE on POST /api/v1/studies. - data-model.md: studies.parent_proposal_* columns + digests.suggested_followups type change to JSONB with the FollowupItem comment. - implementation_plan.md: mark all 16 stories complete in §9 tracker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(digest): use search_space_json string in OpenAI structured-output schema OpenAI strict-mode JSON schema rejects open-ended object subschemas (SearchSpace.params has arbitrary user-defined param names). The CI smoke test failed with: 'Invalid schema for response_format digest_narrative: additionalProperties is required to be supplied and to be false'. Solution: ship search_space as a JSON-encoded string (search_space_json) in the structured-output schema. The worker decodes the string before passing to parse_followup_list. Bad JSON or invalid SearchSpace content falls through to the defensive-parser downgrade path. - workers/digest.py: schema items declare search_space_json: string; worker translates LLM payloads to parse_followup_list shape. - prompts: system prompt teaches the search_space_json string form with a concrete narrow example. - tests: response-format unit + contract assertions updated; test_digest_fetch.py asserts the new JSONB dict shape; _digest_helpers.make_openai_response normalizes all three input shapes (legacy list[str], object-shape dict, wire-shape dict) to the search_space_json wire format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(digest): add FOLLOWUP_KIND_VALUES tuple constant for verify-enum CI gate The verify_enum_source_of_truth CI helper resolves the cited backend symbol via importlib + Literal/frozenset/tuple introspection. The FollowupItem PEP-695 'type' alias is none of those (it's an Annotated discriminated union), so the helper failed with 'helper failed to resolve backend.app.domain.study.followups.FollowupItem'. Fix: add a module-level FOLLOWUP_KIND_VALUES tuple constant mirroring the per-class Literal['narrow'|'widen'|'text'] discriminators; update the source-of-truth comment in ui/src/lib/enums.ts to cite the new constant. verify_enum_source_of_truth.sh now exits clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(digest): apply Gemini Code Assist findings on PR #225 Two Medium findings accepted: F1 (backend/app/domain/study/followups.py:192): switch _truncate from pure head-truncate to head-and-tail truncate so Pydantic ValidationError strings (which put the most specific field path at the end) retain both the leading context AND the trailing field-path-and-message. F2 (ui/src/app/proposals/[id]/page.tsx:162): defensively truncate the parent study name to 200 chars in the prefill name assembly so the combined 'parent — followup #NN (kind)' stays under the backend's CreateStudyRequest.name 256-char bound. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(digest-executable-followups): document search_space_json plan drift GPT-5.5 final review F3 accepted: the worker's structured-output schema ships search_space as search_space_json (JSON-encoded string) rather than the planned {object|null} variant because OpenAI strict-mode JSON schema rejects open-ended object subschemas. Added a 'Post-execution plan drift' subsection to §9 of the implementation plan documenting the workaround for future traceability. Operator-visible behavior is unchanged (the API response + persisted JSONB still use the object shape); only the worker ↔ LLM wire format differs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…home-button silent failure (#299) * docs(idea-preflight): refresh bug_demo_reseed_button_silent_enqueue_failure idea Add Depends on + Coordinate with lines; expand Problem section with explicit gap-region citations (lines 76-88 outside outer try; lines 91-133 inside try but no except); add structlog-buffering hypothesis; lock the re-raise-after-status-write choice in fix design with rationale (Arq ops visibility + worker-log traceback); split the diagnostic print() from the exception barrier into its own capability; refactor regression test to unit-level (no chore_demo_seeding_integration dependency — uses the existing ctx-pool fallback at demo_reseed.py:82-88). Includes dashboard regen triggered by the idea.md edit (no folder adds/moves — just frontmatter refresh; dashboard hash unchanged in practice but the regen hook fired anyway). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(demo-reseed): add top-level exception barrier + stale-status auto-recovery The home page "Reset to demo state" button enqueued the run_demo_reseed Arq job, but two gap regions in the worker let exceptions escape without writing status="failed" to Redis: - Lines 76-88 (settings load, factory init, Redis acquisition) sat OUTSIDE the outer try block. - Lines 91-133 (get_engine, engine.connect, advisory lock, factory(), httpx.AsyncClient(...)) sat INSIDE the outer try but the block had no except, only a finally to close Redis. When either gap region raised, Arq marked the job JobExecutionFailed but Redis stayed stuck at the POST handler's initial "running" payload indefinitely, leaving the operator's UI at "Scenario 0 of 5 (0%)" and blocking subsequent POSTs with 409 SEED_IN_PROGRESS until a manual Redis cleanup. The inner except (DemoSeedingError, httpx.HTTPError, Exception) at line 150 only catches errors inside reseed_demo_state, not the init regions. Fix per bug_demo_reseed_button_silent_enqueue_failure §"Proposed capabilities": 1. Wrap the entire run_demo_reseed body in `except BaseException` that writes status="failed" with the exception class + first 200 chars of the message, then re-raises. Re-raising preserves Arq's JobExecutionFailed record AND emits a worker-log traceback the operator can read. The inner reseed_demo_state handler keeps its return (no re-raise) because retrying the destructive wipe is the wrong behavior. 2. Acquire Redis FIRST so the barrier can write status even when settings/factory/engine init explodes. Preserves Gemini PR #286 finding #7 (reuse Arq's managed pool from ctx) and finding #8 (only close Redis when we created it ourselves). 3. Add reseed_status_is_stale() helper in backend/app/services/demo_seeding.py — defense-in-depth for the case where the worker process itself dies (OOM, container restart) before any exception handler runs. The POST handler uses it to convert a stuck-running status (started_at older than DEMO_RESEED_JOB_TIMEOUT_S = 1200s) into "treat as failed and proceed" instead of 409. 4. Hoist DEMO_RESEED_JOB_TIMEOUT_S from workers/demo_reseed.py to services/demo_seeding.py so the route handler can read it without importing from the workers package. Worker re-exports for back-compat. Regression tests: - backend/tests/unit/workers/test_demo_reseed_exception_barrier.py (4 tests): get_engine + get_session_factory raising both flip Redis to "failed" and re-raise; ctx-managed Redis stays open; self-created Redis is closed in the finally block. - backend/tests/unit/services/test_reseed_status_is_stale.py (10 tests): timeout boundary (== timeout → not stale, > timeout → stale), idle/complete/failed never stale, missing/malformed started_at conservative-not-stale, naive timestamps treated as UTC. Verified on main: 3 of 4 exception-barrier tests fail (the 4th — "does-not-close-arq-redis" — trivially passes because the bare try/finally never reached the close path either). No DB migration, no env var, no operator action. Existing happy path unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(demo-reseed): narrow exception barrier from BaseException to Exception Per Gemini PR #299 review (medium): catching BaseException intercepts asyncio.CancelledError (Arq's job-timeout cancellation mechanism, a BaseException subclass since 3.8) plus SystemExit/KeyboardInterrupt (worker shutdown). Awaiting status_set from inside a handler that caught one of those would re-raise CancelledError — masking the original — or delay/hang shutdown with network I/O. The documented bug (init-region exceptions: settings load, factory init, get_engine, engine.connect, httpx.AsyncClient construction) is fully covered by Exception — all those failures inherit from it. No behavior change for the regression tests (they raise RuntimeError/ValueError). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(demo-reseed): apply GPT-5.5 review findings (aclose guard, naive-now, docstrings) GPT-5.5 final review of PR #299 — 3 accepted, 1 deferred: - #1 (Medium, accepted): wrap redis.aclose() in the finally block in its own try/except + WARN log. A raise from aclose() would otherwise replace the re-raised original exception (or fail an otherwise-successful job). - #3 (Low, accepted): normalize a naive `now` arg in reseed_status_is_stale() to UTC — an aware-minus-naive subtraction would raise TypeError. Production never passes `now`; this guards callers/tests. + regression test test_naive_now_argument_treated_as_utc. - #4 (Low, accepted): fix stale `BaseException` wording in the worker + test docstrings (code already uses `Exception`). - #2 (Medium, deferred non-regression): stale-recovery check-then-set is non-atomic. Counter-evidence: the deterministic Arq job_id + advisory lock already prevent duplicate runs. Captured as chore_demo_reseed_stale_recovery_atomic_cas/idea.md. Includes dashboard regen triggered by the new chore_ idea folder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…317) * docs(mvp2): feat_ubi_judgments — idea refresh + spec + plan (planning bundle) Planning + spec + plan stage of /pipeline --auto for the engine-neutral UBI judgments feature. Bundles three sets of related doc state: (1) Operator-prep state from prior session (pre-existing in working tree): - feat_ubi_onramp folder merged back into feat_ubi_judgments (folder deletion + idea.md update explaining the merge) - Sibling MVP2 idea-file updates (infra_adapter_solr, feat_query_normalization_tuning) - bug_relyloop_spec_ubi_section_drift idea added (UBI section staleness) - MVP2 + Unsure dashboard regen - mvp2-overview.md update reflecting the merge (2) Feature spec (feature_spec.md, 11 FRs, 15 ACs, 1 additive migration): - Cross-model converged at 3-cycle cap (10 GPT-5.5 findings accepted) - Locks D-1..D-10 covering all idea-stage open questions + cycle-3 fixes - Decision D-1: _SourceBreakdown evolves to {llm, human, click} in place - Decision D-2: UI picker field is `method` (4 values), API request field is `converter` (3 values) — keeps llm-routing in the picker without polluting the UBI endpoint enum - Decision D-3: ?source= filter widens to accept click (3) Implementation plan (implementation_plan.md, 14 stories across 5 epics): - Cross-model converged at 3-cycle cap (3 GPT-5.5 findings accepted) - Cycle 2 fix: generation_params JSONB column persists generation_kind: 'ubi' discriminator for worker resume + value-delta card discrimination - Cycle 3 fix: dropped snapshot UbiRungBadge variant (spec FR-7 requires query_set_id + target which cluster pages don't have) - Pipeline status: ready for /impl-execute No code changes in this commit — implementation begins in subsequent per-story commits per the plan's execution tracker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): migration 0021 — judgment_lists.generation_params JSONB (Story 1.1) feat_ubi_judgments Story 1.1 / FR-4 + FR-5 backing — adds one additive, nullable JSONB column to the existing judgment_lists table so the boot-time resume sweep can reconstruct UBI worker calls without depending on the Arq job payload. Changes: - migrations/versions/0021_judgment_lists_generation_params.py upgrade adds judgment_lists.generation_params JSONB NULL via idempotent DO $$ ... IF NOT EXISTS $$ guard. downgrade drops it with the matching IF EXISTS guard. No CHECK constraint — the JSONB shape is enforced at the dispatcher layer by the CreateJudgmentListFromUbiRequest Pydantic schema; duplicating it in SQL would complicate future converter additions in v1.5+. - backend/app/db/models/judgment_list.py Declares the new column on the JudgmentList ORM. Docstring updated with the MVP2 additive context + the discriminator pattern (UBI lists set generation_kind: 'ubi' inside the JSONB; LLM lists leave NULL — current_template_id + rubric already carry LLM resume state). - backend/tests/integration/test_judgment_lists_generation_params_migration.py 5 integration tests asserting: column shape (jsonb + nullable), downgrade drops only generation_params (sibling columns survive), round-trip preserves other columns, idempotent re-upgrade after alembic_version rewind, and existing LLM lists keep generation_params NULL across both directions (no backfill). Verification: - make fmt / make lint / make typecheck — backend mypy strict + frontend tsc both green; lint warnings on the diff are pre-existing on unrelated files - alembic upgrade head + downgrade -1 + upgrade head — clean round-trip on the running Postgres service container - Column shape confirmed via psql introspection (data_type=jsonb, is_nullable=YES) - make migrate — idempotent re-run succeeds - make test-worktree CMD="pytest backend/tests/integration/test_judgment_lists_generation_params_migration.py" — all 5 tests pass against the running Postgres container Alembic head advances 0020 → 0021. Pre-existing LLM judgment_list rows survive cleanly because the column is nullable and never read on the LLM path. state.md bump to 0021 happens at finalization, not this commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): domain/ubi/ pure-domain library (Story 1.2) feat_ubi_judgments Story 1.2 / FR-2 + FR-11 — the pure-domain UBI substrate that powers click-derived judgment lists: feature aggregation, the async SignalsConverter Protocol + three concrete implementations, and the optional position-bias prior loader. New files: - backend/app/domain/ubi/__init__.py Public exports. Documents the async-Protocol exception to the parent domain package's "synchronous and deterministic" rule (cycle-3 fix D-10e): the I/O lives in the worker-supplied callback, not in converter code paths. - backend/app/domain/ubi/features.py UbiEvent frozen dataclass + FeatureVec Pydantic model + pure aggregate_features() with Wang-Bendersky position-bias correction. Edge-cases locked: zero impressions → corrected_ctr=0.0 (no raise), no dwell events → dwell_mean_seconds=None (distinct from 0.0), unknown event types silently ignored, corrected_ctr clipped at 1.0, sparse-prior ranks fall back to weight 1.0. - backend/app/domain/ubi/converter.py Async SignalsConverter Protocol + CtrThresholdConverter (defaults {1: 0.05, 2: 0.15, 3: 0.30}) + DwellTimeThresholdConverter (defaults {1: 10s, 2: 30s, 3: 90s}) + HybridUbiLlmConverter (splits at llm_fill_threshold=20; awaits injected llm_rate callback for tail). CRITICAL: zero openai imports, zero AsyncOpenAI construction — the hybrid converter takes the LLM-fill callback as a constructor argument; the worker (Story 3.3) builds the callback by wrapping rate_query_batch + the daily-budget gate. Enforces CLAUDE.md Absolute Rules #3 / #8 / #10. ConverterConfig threshold override validation (non-monotonic, missing keys, non-numeric → ValueError). - backend/app/domain/ubi/position_bias_prior.py load_position_bias_prior() reads the optional UBI_POSITION_BIAS_PRIOR_FILE JSON. Missing/empty/malformed → returns {} (uninformed default) + WARN log; NEVER raises. Worker can fall back to uninformed cleanly on operator misconfiguration. Modified: - backend/app/core/settings.py Adds ubi_position_bias_prior_file: Path | None field + @cached_property ubi_position_bias_prior accessor (lazy import to avoid circular boot order). Per FR-11. Tests (58 unit tests, all pass): - test_features.py (18 tests): basic counts, position-bias correction with informed/uninformed/sparse priors, all edge cases above, FeatureVec validation - test_converter.py (24 tests): CTR + dwell threshold boundary values, ConverterConfig override validation (5 failure modes), hybrid partition correctness, all-tail / head-only flows, callback NOT called when head-only, llm_fill_threshold override validation, HybridUbiLlmConverter.build_inner factory - test_converter_no_openai_import.py (3 tests): ast-based guard asserting backend/app/domain/ubi/converter.py never imports openai / httpx and never constructs AsyncOpenAI. This is the test that catches a regression turning the converter into a direct LLM caller (Absolute Rule #3 escape hatch). Resolves the converter path via inspect.getfile(converter) for robust container/host portability. - test_position_bias_prior.py (13 tests): trivial-fallback paths (None / missing / empty / whitespace) stay silent; malformed branches (invalid JSON, non-object, missing positions, wrong shape, non-numeric values, rank<1, negative weight) WARN-log and fall back; valid prior round-trips correctly. Plan tracker: mark Story 1.2 [x] in implementation_plan.md. Verification: - make fmt / make lint / make typecheck — backend mypy strict + frontend tsc both green; ruff D205/D102 fixes applied - make test-worktree CMD="pytest backend/tests/unit/domain/ubi/" — 58/58 pass against the OpenSearch-less worktree container - Anti-pattern guard verified: ast scan of converter.py confirms no openai/httpx import; no AsyncOpenAI construction call No new I/O, no DB writes, no LLM calls (the LLM-fill path activates only when the hybrid converter is instantiated with a callback — done by the worker in Story 3.3, not here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): UbiReader service — engine-neutral two-index scan (Story 2.1) Adds the read-only UBI scan that powers Story 3.3's worker. The reader issues two `SearchAdapter.search_batch` calls (one against `ubi_queries`, one against `ubi_events`), performs the `query_id` join client-side, and hands the joined events to `aggregate_features` (Story 1.2) for the Wang-Bendersky position-bias-corrected FeatureVec map. Key shape choices: * No new method on the SearchAdapter Protocol per Absolute Rule #4 + Story 2.1 DoD — the reader composes `get_schema` + `search_batch`. * `_probe_enabled` wraps `get_schema('ubi_queries')` so the readiness service (Story 2.2) and the dispatcher preflight U-C share one probe shape; raises `UbiNotEnabledError` on `TargetNotFoundError`. * Empty post-filter is the race-condition fallback (returns `{}`) — the sync `UBI_INSUFFICIENT_DATA` case is Story 2.2's `_count` preflight U-D2, NOT the reader. * Field extraction handles both the OpenSearch UBI plugin nested shape (`event_attributes.object.object_id`, `event_attributes.position`, `event_attributes.dwell_time_seconds`) AND the o19s ES UBI fork's flatter top-level shape, with DEBUG drop logging for events missing required fields. * Sibling `read_user_query_map(...)` surfaces the `{ubi_query_id: user_query}` map for the same window so Story 3.3's `mapping_strategy` join doesn't have to re-scan `ubi_queries`. Tests (16 cases, all in unit layer): * `test_ubi_reader.py` (14 cases) — stub-adapter coverage of probe paths, empty windows, happy path with both nested + flat event shapes, field-extraction robustness, target/window/query-filter propagation into the Query DSL, position-bias prior reaching aggregate_features, and a Protocol-shape lock asserting no new SearchAdapter method snuck in. * `test_ubi_reader_no_writes.py` (2 cases) — defense-in-depth against cluster-write leaks: boots a real ElasticAdapter against an httpx MockTransport, runs read_features end-to-end, asserts every recorded request is read-shaped (no PUT/DELETE/PATCH methods, no `_bulk`/`_update`/`_doc`/`_create` path segments). Mirrors the `test_elastic_get_document.py` MockTransport idiom. Test-layer placement note: the plan §3.2 specified `backend/tests/integration/services/test_ubi_reader{,_no_writes}.py` but the codebase has no `tests/integration/services/` subfolder convention (sibling no-DB service tests live under `backend/tests/unit/services/` — e.g. `test_dispatch_run_query.py`, `test_agent_judgments_dispatch.py`). Placed both files under `backend/tests/unit/services/` to match convention; the reader has no DB/Redis/engine dependency, so the unit layer is the correct classification. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): readiness service + start_ubi_judgment_generation dispatcher (Story 2.2) Spec FR-7 readiness + FR-4 dispatcher. Refactors agent_judgments_dispatch to extract 5 shared helpers (`_resolve_fk`, `_check_consistency`, `_check_llm_preflight`, `_check_oversized_query_set`, `_insert_generating_list_and_enqueue`) so the LLM + UBI dispatchers compose out of one set of preflight building blocks (no copy-pasted body — Spec FR-4 anti-drift rationale). New surface: * `backend/app/services/ubi_readiness.py` — `classify_rung(...)` with 60 s Redis cache per `(cluster_id, query_set_id, target)`. Probes `get_schema('ubi_queries')` for rung_0 detection then issues one bounded `search_batch` against `ubi_events` (size=cap, _source=False) to distinguish rung_1/2/3 by event count. The `covered_pairs_pct` / `head_covered` fields stay None in MVP2 — Story 2.1 DoD locked "no new SearchAdapter method", and exact pair-coverage needs an `_count` endpoint we don't have. Documented at the module docstring + the dataclass docstring; future `infra_adapter_count_method` can re-introduce exact counts when operator feedback asks for it. * `count_ubi_events_in_window(...)` — public wrapper used by the dispatcher U-D2 preflight (FR-4) to issue the sync `UBI_INSUFFICIENT_DATA` gate. Dispatcher refactor (parity-preserving): * All 12 existing `start_judgment_generation` tests pass with no modification (DoD: behavioral parity proven). * `start_ubi_judgment_generation(...)` runs U-A..U-H per spec FR-4: FK resolve (template required for hybrid, forbidden for pure) → consistency → UBI probe (412 UBI_NOT_ENABLED) → window validity + 90-day cap (422 UBI_WINDOW_TOO_LARGE) → sync count gate (422 UBI_INSUFFICIENT_DATA with hybrid-vs-window hint per converter mode) → hybrid-only LLM preflight (A+B+B.1+C) → oversize → INSERT + best-effort enqueue. * `_build_ubi_generation_params(req)` injects `generation_kind: 'ubi'` server-side at INSERT time per cycle-2 plan-review fix; the round-trip assertion in the happy-path test confirms the discriminator persists. Tests (23 new, 12 pre-existing all still green): * `test_ubi_readiness.py` (9 cases) — rung_0/1/2/3 classification, cache hit short-circuit, cache decode failure fall-through, count wrapper return-min-of-actual-and-cap shape, filter-target propagation, dataclass round-trip. * `test_agent_judgments_dispatch_ubi.py` (14 cases) — every preflight branch (cluster/query_set/template missing, mismatch, UBI_NOT_ENABLED, window invalid + too large, insufficient data with both message variants, hybrid LLM preflight fires, pure skips LLM preflight, oversized query set, happy-path pure + hybrid both inject `generation_kind: 'ubi'` + enqueue the UBI worker). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): _SourceBreakdown three-term + UBI wire Literals (Story 2.3) Spec FR-9 + FR-10. Evolves the cycle-2 F6 "click folds into human" forward- compat fiction now that UBI lists ship click rows; adds the four new UBI wire Literals the frontend (Story 4.1) + endpoints (Story 3.1/3.2) consume. Schema changes (`backend/app/api/v1/schemas.py`): * `_SourceBreakdown` now `{llm, human, click}` with invariant `llm + human + click == judgment_count`. Cycle-2 F6 docstring superseded. * `JudgmentSourceFilterWire` widened from `{llm, human}` to `{llm, human, click}`; `?source=click` now returns matching rows instead of 422 VALIDATION_ERROR. * `JudgmentSourceWire` already named all three (Story 1.2 was forward- compat); docstring refreshed to reflect live status. * 4 new wire Literals (FR-9): `UbiConverterKind` (3 values), `JudgmentGenerationMethodWire` (4 values), `UbiReadinessRungWire` (4 values), `UbiMappingStrategyWire` (3 values). Each carries the source-of-truth comment per the Enumerated Value Contract Discipline. Repo change (`backend/app/db/repo/judgment.py`): * `source_breakdown_for_list(...)` returns the three-term shape directly; removed the `click → human` folding. Docstring superseded. Endpoint change (`backend/app/api/v1/judgments.py`): * `_detail(...)` populates `click=breakdown.get("click", 0)`. Test impact (per FR-9 / FR-10): * New unit test `test_source_breakdown_evolution.py` (9 cases) locks the three-term shape + the `Literal` value sets. * `test_judgments_api.py::test_list_judgments_rejects_click_filter` renamed to `_accepts_click_filter` — inverts the assertion. The cycle-2 F6 422 contract was the bug FR-10 fixed. * `test_judgment_repo.py` breakdown assertion updated to include `"click": 0` and docstring refreshed. * `test_judgments_api.py` `/import` smoke test updated for the new shape. No new endpoints, no migration. All 1,696 unit tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): GET /api/v1/clusters/{id}/ubi-readiness endpoint (Story 3.1) Spec FR-7. Surfaces the rung-classifier result from Story 2.2 over HTTP so the frontend (Story 4.1's useUbiReadiness hook) can drive the generate-judgments dialog's method-picker default + the on-ramp nudge. Endpoint contract: * `GET /api/v1/clusters/{cluster_id}/ubi-readiness?query_set_id=&target=` * 200: UbiReadinessResponse{rung, covered_pairs_pct, head_covered, checked_at} * 404 CLUSTER_NOT_FOUND, 404 QUERY_SET_NOT_FOUND * 422 VALIDATION_ERROR (missing query params — FastAPI built-in handler) * 503 CLUSTER_UNREACHABLE Required-query-params contract (spec FR-7 + cycle-3 D-10c): the endpoint MUST 422 without `query_set_id` + `target` — the classifier cannot compute a per-target rung without an application filter. Both params are typed `Annotated[str, Query(..., min_length=...)]` so the 422 fires at FastAPI's validator layer. Implementation: * Reuses `cluster_svc.acquire_adapter()` async context for adapter lifecycle (matches `get_cluster_schema` pattern). * Resolves `query_set_id` → `repo.list_queries_for_set(...)` and passes the id list to `classify_rung(...)` so the event count scopes to "this query set's traffic" rather than "any traffic on the target." * Reuses `get_redis_client` FastAPI dependency for the 60 s readiness cache. Tests (5 contract cases): * All 4 rung values accepted on the response model. * Unknown rung rejected. * Required fields locked at 4 (rung, covered_pairs_pct, head_covered, checked_at). * `UbiReadinessRungWire` value set locked. The end-to-end behavior (Redis cache hit, adapter probe, rung classification) is already covered by `backend/tests/unit/services/test_ubi_readiness.py` (9 cases). The contract layer just locks the wire shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): POST /api/v1/judgments/generate-from-ubi endpoint (Story 3.2) Spec FR-3. Thin router handler delegating to `start_ubi_judgment_generation` (Story 2.2). Returns 202 with `GenerateJudgmentsResponse{judgment_list_id, status: "generating"}` on success; mirrors the LLM endpoint's lifecycle pattern (per-request Redis client; best-effort Arq enqueue via the dispatcher). Request model `CreateJudgmentListFromUbiRequest` lives at `backend/app/api/v1/schemas.py` (added in Story 3.1's schemas commit). The `@model_validator(mode="after")` enforces the hybrid conditional: - `converter == 'hybrid_ubi_llm'` → REQUIRES `current_template_id` + `rubric` - non-hybrid converters → REJECTS both (no silent partial-config state) The 13 error envelopes documented in spec §8.5 (3 UBI-specific + 10 reused codes) are emitted by the dispatcher's preflight chain; the contract layer asserts only the wire shape + validator gates. Tests (12 contract cases): * Pure-converter minimal payload accepted * Pure-converter + `current_template_id` / `rubric` rejected (both branches) * Hybrid without template/rubric rejected (with "REQUIRED when" message) * Hybrid with template + rubric accepted * Invalid converter value rejected (e.g. `"llm"` — endpoint doesn't accept it; the LLM path is the existing `/judgments/generate` endpoint) * Invalid mapping_strategy rejected * `min_impressions_threshold` / `llm_fill_threshold` must be positive * `until` optional / defaults to None * Required-fields inventory locked (14 fields) * `GenerateJudgmentsResponse` reuse contract Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): generate_judgments_from_ubi Arq worker (Story 3.3) Spec FR-5. Single-list UBI judge pipeline mirroring the LLM worker's lifecycle pattern: load row → adapter + reader → features → converter → mapping-strategy join → bulk-insert judgments → calibration JSONB → terminal flip. New file `backend/workers/judgments_ubi.py` (~580 LOC): * `generate_judgments_from_ubi(ctx, judgment_list_id)` Arq entry point with the full FR-5 lifecycle (8 steps documented in module docstring). * `_make_llm_rate_callback(...)` — worker-local factory wiring the hybrid converter's `LlmRateCallback` to `rate_query_batch` + `peek_daily_total` + `record_cost`. Per-query bundling so the LLM call shape mirrors the LLM-judgment worker. * `_apply_mapping_strategy(...)` — pure helper resolving `ubi_query_id → queries.id` via `user_query` text match with three strategies (`reject` / `first_match` / `most_recent`). Per-query ambiguous mappings under `reject` are SKIPPED (NOT terminal — cycle-3 finding `ambiguous-mapping-behavior-contradictory`), counted as `calibration.ambiguous_query_skip_count`. * `_build_converter(...)` — converter factory with the `query_text_lookup` closure for the hybrid path. * `_write_calibration_and_complete(...)` — writes the spec-FR-5 UBI calibration JSONB (`{coverage_pct, head_pairs, tail_pairs, position_bias_prior_id, llm_fill_calls?, ambiguous_query_skip_count, sparse_query_skip_count}`) before terminal flip. Modified `backend/workers/all.py`: * Imports the new worker; registers it under `WorkerSettings.functions` with the same 15-min `_JUDGMENTS_JOB_TIMEOUT_S` as the LLM worker. * Extends the boot-time resume sweep (lines ~148-184) to discriminate UBI rows from LLM rows by `generation_params IS NOT NULL` (the FR-5 step 4 discriminator). One scan over `list_generating_judgment_list_ids` + per-row `get_judgment_list` to read the JSONB column; routes each row to the matching enqueue job name. Hybrid LLM-fill implementation note: * The plan + spec describe hybrid as "use the template to retrieve docs per query for LLM-fill." The worker takes a slightly different path: for below-threshold pairs the callback fetches doc bodies via `adapter.get_document(target, doc_id)` (the doc_id set is already known from UBI) rather than re-running the search. Produces ratings on the same (query, doc) pairs; only the doc-body source differs. Future `chore_ubi_hybrid_template_render` can re-introduce the template render path if operator feedback asks. Tests (11 unit cases + 1 update to the existing worker-registration test): * `test_judgments_ubi_helpers.py` (11 cases): - `_apply_mapping_strategy`: one-to-one resolves, unmatched UBI silently dropped (not counted as ambiguous), `reject` skips + counts, `first_match` picks lowest id, `most_recent` picks highest `created_at`, unknown strategy treated as reject, empty inputs - Worker exports + boot registration (source-scan to avoid Settings construction trip) - AST scan asserts AsyncOpenAI is never constructed outside the callback factory (Absolute Rule #3 enforcement) * `test_workers.py`: extended the WorkerSettings registry inventory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): generate_judgments_from_ubi agent tool + orchestrator prompt (Story 3.4) Spec FR-6. Mirrors generate_judgments_llm for the UBI path; both tools delegate to the same shared dispatcher (Story 2.2's start_ubi_judgment_generation), so the preflight + INSERT + enqueue chain is identical between the chat-agent call and the POST /api/v1/judgments/generate-from-ubi endpoint. Triad pattern (TOOLS / TOOL_REGISTRY / TOOL_ARG_MODELS): * New file backend/app/agent/tools/judgments/generate_judgments_from_ubi.py * GenerateJudgmentsFromUbiArgs Pydantic model with the @model_validator hybrid conditional (mirrors CreateJudgmentListFromUbiRequest) — so the agent-tool dispatcher rejects bad shapes before hitting the service, yielding cleaner errors in the chat stream. * MUTATING tool — orchestrator's confirmation guard fires before dispatch (UBI lists are equivalent to LLM lists in operator commitment + data side-effects). * Module-load drift assertion already in TOOLS / TOOL_REGISTRY / TOOL_ARG_MODELS catches missing registration. Orchestrator system prompt updates (prompts/orchestrator.system.md): * Tool count 20 → 21 * Query sets & judgments category 5 → 6 tools * Mutating-set roster 7 → 8 (adds generate_judgments_from_ubi) * New "Choosing between LLM and UBI judgment generation" subsection: - Prefer UBI when cluster has ubi_queries + operator wants real behavioral signal - Fall back to LLM on rung_0 / tutorial / sparse-data window - Hybrid converter requires both template + rubric Tests (7 new + 4 inventory updates): * test_generate_judgments_from_ubi_tool.py (7 cases) — definition shape, triad registration, args conditional (pure / hybrid / rejected combos), orchestrator prompt references both tools + the chooser section. * test_tool_registry.py — EXPECTED_TOOL_COUNT 20 → 21 + add generate_judgments_from_ubi to CANONICAL_MVP1_TOOL_NAMES. * test_propose_search_space.py — test_tool_count_advanced_to_20 → _to_21 update. * test_orchestrator_system_prompt_inventory.py — "You have 20 tools" → "You have 21 tools" assertion. All 1715 unit tests pass; mypy --strict clean across 501 files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): wire enums + useUbiReadiness + <UbiRungBadge> (Story 4.1) Spec FR-7 + FR-8 + FR-9 mirror. Frontend substrate for the Story 4.2 generate-judgments dialog: typed enum arrays for the four new UBI wire Literals, the TanStack Query hook hitting the readiness endpoint, and the rung-badge primitive that surfaces inside the dialog. ui/src/lib/enums.ts (+5 arrays, +5 types): * UBI_CONVERTER_VALUES (3 values) — mirrors UbiConverterKind. * JUDGMENT_GENERATION_METHOD_VALUES (4 values, llm + the three UBI converters) — mirrors JudgmentGenerationMethodWire; the picker superset. * UBI_READINESS_RUNG_VALUES (4 values) — mirrors UbiReadinessRungWire. * UBI_MAPPING_STRATEGY_VALUES (3 values) — mirrors UbiMappingStrategyWire. * JUDGMENT_SOURCE_FILTER_VALUES widened from {llm, human} to {llm, human, click} per FR-10. * All new arrays carry the canonical `// Values must match backend/app/api/v1/schemas.py <Symbol>` comment on the line immediately preceding the export const. ui/src/lib/glossary.ts (+5 entries): * judgment.converter (short), judgment.converter.llm, judgment.converter.ubi, judgment.converter.hybrid, cluster.ubi_readiness (long). ui/src/lib/api/ubi.ts (new): * useUbiReadiness(clusterId, querySetId, target) — 60s staleTime, graceful 404/503 degradation to rung_0. * useGenerateJudgmentsFromUbi() — POST /api/v1/judgments/generate-from-ubi. * Hand-rolled inline types until next `pnpm types:gen` regen. ui/src/components/clusters/ubi-rung-badge.tsx (new): * Text-only badge, single variant; per-rung labels + HelpPopover. * Per cycle-3 plan-review fix readiness-snapshot-badge-contract- drift: consumed ONLY inside the generate-judgments dialog (cluster list/detail pages don't have query_set_id+target). Tests: 6 new ubi-rung-badge cases + JUDGMENT_SOURCE_FILTER_VALUES inventory bumped for FR-10. Full UI suite green (921); typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): dialog method picker + on-ramp nudge + sparse-data card (Story 4.2) Spec FR-8 Capabilities A + B + C. Extends the existing <GenerateJudgmentsDialog> with the 4-option method picker, conditional UBI window controls, the LLM-fill threshold input, the engine-aware on-ramp nudge when rung_0, and the sparse-data recommendation card when rung_1 + a pure-UBI converter is selected. New components: * ui/src/components/clusters/ubi-onramp-nudge.tsx — dismissible nudge with engine-specific copy (ES → o19s fork; OS → OpenSearch UBI plugin). Dismissal persisted in localStorage keyed by cluster_id (per D-7). * ui/src/components/query-sets/ubi-sparse-data-card.tsx — single- action recovery card with "Switch to Hybrid UBI + LLM" affordance. Dialog refactor (generate-judgments-dialog.tsx): * Added 4 form fields: method, since, until, llm_fill_threshold. * Method <Select> uses JUDGMENT_GENERATION_METHOD_VALUES.map(...) per the form-select-discipline lint guard. * Conditional rendering: UBI window when method ≠ llm; LLM-fill threshold only when method == hybrid_ubi_llm; template + rubric when method ∈ {llm, hybrid_ubi_llm}. * Picker default seeded from useUbiReadiness rung (rung_0 → llm, rung_1/2 → hybrid_ubi_llm, rung_3 → ctr_threshold); only seeds when operator hasn't manually picked. * Submit routing: llm → useGenerateJudgments; UBI three → useGenerateJudgmentsFromUbi. * Nudge dismissal: SSR-safe localStorage round-trip; per-cluster key. * HelpPopover next to Method label uses judgment.converter glossary entry (extended to dual short/long for the popover). Tests: 3 new vitest cases. Full UI suite (126 files / 924 cases) green; typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): value-delta + ambiguous-skip recovery cards (Story 4.3) Spec FR-8 Capability D. Surfaces the "payoff" of UBI generation on the judgment-list detail page: the value-delta card shows how much real traffic the UBI ratings covered (optionally with a link to the prior LLM list on the same query_set for side-by-side comparison), and the ambiguous-skip recovery card offers a one-shot "Re-run with most_recent tiebreaker" affordance when the worker skipped queries under the default `reject` mapping_strategy. Backend tweaks (Story 2.3 follow-up per plan task §"Add to Story 2.3"): * JudgmentListDetail.generation_params exposed on the wire so the detail page can discriminate UBI/hybrid lists + reconstruct the original request body for the recovery card's re-run. Frontend (new): * ui/src/components/judgments/value-delta-card.tsx — coverage-only and delta-with-prior-link variants. * ui/src/components/judgments/ambiguous-skip-recovery-card.tsx — one-shot "Re-run with most_recent" affordance; disabled state when re-run is pending. Detail page integration (ui/src/app/judgments/[id]/page.tsx): * Renders both cards conditionally based on calibration + generation_params. * Re-run reconstructs the original body + overrides mapping_strategy. * Widens the URL source filter to include 'click' (FR-10 follow-on). Frontend type augmentation: JudgmentListDetail extended with generation_params? + useJudgments source widened to include 'click'. Tests: 7 new vitest cases. Full UI suite green (931); typecheck clean. Full backend unit suite green (1715); mypy --strict clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(ubi): operator runbook + 3 FAQ entries + data-model patches (Story 5.1) Spec FR-7 + FR-8 operator-facing docs. Ships the highest-value subset of Story 5.1's 10-doc scope: * docs/03_runbooks/ubi-judgment-generation.md (new) — per-engine installation, converter chooser table, position-bias-prior calibration, debugging matrix for UBI_NOT_ENABLED / UBI_INSUFFICIENT_DATA / ambiguous_query_skip_count. * ui/src/lib/faq.ts (+3 entries) — do-i-need-ubi, trust-ubi-over-llm, cluster-no-ubi. Operator-judgment-shaped Q&A keyed off the rungs the readiness endpoint surfaces. * docs/01_architecture/data-model.md — judgment_lists.generation_params column documented (UBI worker resume payload, JSONB shape, MVP2 additive); UBI calibration JSONB shape annotated alongside the LLM shape; judgments.source CHECK note explaining click is live in MVP2 (cycle-2 F6 click-folds-into-human contract superseded by FR-10). Remaining 7 Story 5.1 doc artifacts (tutorial Step 7, umbrella spec patches, api-conventions + llm-orchestration + llm-data-flow + testing one-liners) deferred to chore_ubi_docs_followup. Story 5.2 (E2E + seed_ubi.ts) deferred to chore_ubi_e2e_suite. Both idea files committed so /pipeline status surfaces them as the next-action set after this PR merges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs: dashboard regen + state for feat_ubi_judgments Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(ubi): adjudicate Gemini PR #317 review — 6 findings accepted All 6 Gemini Code Assist findings on PR #317 were real and accepted: - **#1 (High) — dispatcher tz crash**: naive `since`/`until` (Pydantic parses an offset-less ISO-8601 string into a naive datetime) crashed the window check when compared with the aware `datetime.now(UTC)`. Normalize naive inputs to UTC-aware up front via dataclasses.replace. +2 regression tests (naive since+until, naive since + until=None). - **#2 + #3 (High) — worker query_id misattribution**: the hybrid LLM-fill callback groups pairs by query_text; two distinct internal query_ids sharing the same text were both attributed to one representative qid, dropping the others' ratings. Map prompt ordinals back to the full (query_id, doc_id) tuple. - **#4 (Medium) — numeric doc_id drop**: the reader's strict isinstance(str) check on event_attributes.object_id silently dropped operator-emitted numeric ids (e.g. integer SKUs). Coerce to str(). - **#5 (Medium) — sparse_query_skip_count always 0**: the calibration field was never populated. Compute it as scoped queries that received no rating — captures hybrid LLM-fill per-query drops. - **#6 (Medium) — frontend ISO parse fragility**: isoToUtcMs concatenated ':00.000Z' assuming YYYY-MM-DDTHH:MM, breaking when the browser returns seconds. Parse via the Date constructor with a 'Z' suffix + NaN fallback. Backend: ruff + mypy --strict clean (501 files); 16 dispatcher tests pass (14 + 2 new). Frontend: tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * test(ubi): fold in deferred integration tests + remaining Story 5.1 docs Per operator direction (PR #317 review): the docs + integration-test sub-scope deferred to chore_ubi_docs_followup + chore_ubi_integration_tests is folded into this PR. Only the E2E suite (chore_ubi_e2e_suite) stays deferred — needs an OpenSearch UBI-plugin Compose change + won't run while SKIP_HEAVY_CI is on. Integration tests (6 files, 21 cases): migration round-trip, worker happy/fail paths, both endpoints, detail breakdown, agent tool. All collect cleanly; skip locally without Postgres; gated behind heavy CI. Docs (remaining 7 of Story 5.1's 10): tutorial Step 11, relyloop-spec section 706/724 patches, api-conventions + llm-orchestration + llm-data-flow + testing one-liners. Removed the 2 now-resolved idea files; chore_ubi_e2e_suite is the sole deferral. Dashboards regenerated. mypy --strict clean (507 files); ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(ubi): adjudicate GPT-5.5 PR #317 final review — 4 accepted, 1 documented, 1 deferred GPT-5.5 cross-model final review surfaced 6 contract-level findings distinct from Gemini's. ACCEPTED + FIXED (4): - #1 (High) confirmation guard: generate_judgments_from_ubi was MUTATING in docstring + prompt but missing from MUTATING_TOOL_NAMES (guard wouldn't block unconfirmed dispatch). Added (set now 8 tools). - #2 (High) CLUSTER_UNREACHABLE: dispatcher U-C probe + U-D2 count could bubble an unstructured 500; now caught → 503 (spec §8.5). - #3 (Medium) readiness query_id filter passed internal queries.id as a ubi_events.query_id filter (UBI uses the plugin's UUID) → silently zeroed the count → always rung_1. Dropped the filter (target-level signal) + added query_set.cluster_id consistency check. - #5 (Medium) llm_fill_threshold not merged into ConverterConfig.extra → converter partitioned at default 20 while source-attribution used the request value. Now merged (operator override wins). DOCUMENTED (1): - #4 (Medium) U-D2 counts target-level not query-set-scoped — deliberate MVP approximation (scoping needs the user_query join, too expensive for a <2s preflight; worker race-fallback covers the empty scoped case). DEFERRED (1): - #6 (Medium) hybrid uses get_document not template-render — functionally correct; captured as chore_ubi_hybrid_template_render. Also captured bug_baseline_phase_test_isolation (pre-existing flake found during the targeted run). Dashboards regenerated. All 1718 unit tests pass; mypy --strict clean (507); ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(planned): capture feat_demo_ubi_study_comparison idea Operator asked whether the home-page demo reseed includes UBI data + whether you can run a UBI study vs an LLM study on the same queries/data and compare. Today: no — the reseed writes zero UBI (RelyLoop never writes UBI by design). Captured the feature: a demo/seed-only synthetic UBI generator + reseed wiring that seeds both an LLM and a UBI judgment list on the same query set, enabling the head-to-head comparison. Dashboards regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(ubi): E2E suite + fix UbiReader result-window overflow Implements the deferred E2E suite AND fixes a real backend bug only a real-engine run could catch: _scan_ubi_events requested size=50000 > engine default index.max_result_window (10000) -> all shards failed -> adapter swallows -> empty features -> spurious UBI_INSUFFICIENT_DATA on dense clusters. Fix: cap DEFAULT_MAX_EVENTS at 10000 + clamp both scans; regression guard added; search_after pagination deferred to chore_ubi_reader_search_after_pagination. E2E: seed_ubi.ts + 4 specs (rung-0/3, hybrid, click filter) all green vs live ES + worker. 15 reader tests pass; ruff+mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(ubi): correct hybrid LLM-fill design note (GPT-5.5 #6 is working-as-designed) Per-pair get_document scoring is the correct implementation of FR-2's per-pair llm_rate callback (not a deviation). Corrected the worker docstring + refined chore_ubi_hybrid_template_render to P3: the only open item is dropping the vestigial current_template_id requirement (a product/contract decision), deferred. No code behavior change. Dashboards regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

SoundMindsAI merged commit 34c7a43 into main May 10, 2026
3 checks passed

dependabot Bot deleted the dependabot/github_actions/actions/upload-artifact-7 branch May 10, 2026 18:29

SoundMindsAI mentioned this pull request May 11, 2026

docs(spec): feat_digest_proposal review-and-patch — Status: Approved #40

Merged

3 tasks

gemini-code-assist Bot mentioned this pull request May 13, 2026

infra(skills): add /bug-fix skill for medium-sized bug fixes #71

Merged

4 tasks

SoundMindsAI mentioned this pull request May 13, 2026

docs+fix: bug_chat_long_conv dogfood artifacts + dashboard truncation fix #73

Merged

4 tasks

SoundMindsAI mentioned this pull request May 16, 2026

feat(ui): shared DataTable primitive — FTS + sort + filter + URL state across 9 tables #126

Merged

7 tasks

SoundMindsAI mentioned this pull request May 21, 2026

feat(studies): reject mismatched judgment-list cluster + target at POST /studies #184

Merged

8 tasks

SoundMindsAI mentioned this pull request May 29, 2026

fix(demo-reseed): exception barrier + stale-status auto-recovery for home-button silent failure #299

Merged

5 tasks

SoundMindsAI mentioned this pull request May 30, 2026

infra(solr): Apache Solr adapter — MVP2 three-engine reach (infra_adapter_solr A1–A13) #336

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): Bump actions/upload-artifact from 4 to 7#8

chore(deps): Bump actions/upload-artifact from 4 to 7#8
SoundMindsAI merged 1 commit into
mainfrom
dependabot/github_actions/actions/upload-artifact-7

dependabot Bot commented on behalf of github May 9, 2026

Uh oh!

dependabot Bot commented on behalf of github May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github May 9, 2026

v7.0.0

v7 What's new

Direct Uploads

ESM

What's Changed

New Contributors

v6.0.0

v6 - What's new

Node.js 24

What's Changed

v5.0.0

What's Changed

Uh oh!

dependabot Bot commented on behalf of github May 9, 2026

Labels

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant