Skip to content

feat: autonomous cross-knob overnight tuning (feat_overnight_final_solution)#440

Merged
SoundMindsAI merged 13 commits into
mainfrom
feat_overnight_final_solution
Jun 4, 2026
Merged

feat: autonomous cross-knob overnight tuning (feat_overnight_final_solution)#440
SoundMindsAI merged 13 commits into
mainfrom
feat_overnight_final_solution

Conversation

@SoundMindsAI
Copy link
Copy Markdown
Owner

Summary

Ships the autonomous cross-knob overnight tuning capability — the overnight autopilot ("🌙 Run overnight (compound automatically)") can now consume the digest's executable follow-ups (narrow / widen / swap_template) on each chain link instead of always running the hardcoded ±50% narrow on the same template. With the new opt-in Strategy toggle, an operator can sleep through a chain that branches across knobs and templates — the chain panel in the morning shows the explored path link by link with compact badges.

The legacy "narrow" strategy stays byte-identical to today's behavior (every existing study unchanged; the existing test_auto_followup.py integration tests pass unmodified — that's the backward-compat gate).

End-to-end flow:

  1. Wizard adds a Strategy <Select> directly beneath the depth selector, visible only when depth ≥ 1, defaulting to "narrow" (Story 1.2).
  2. Backend validates config.auto_followup_strategy via a new AUTO_FOLLOWUP_STRATEGY_INVALID 422 envelope; rejects operator-submitted worker-managed keys (single-writer rule per D-14) (Story 1.1).
  3. Pure-domain select_executable_followup walks the parent digest's suggested_followups, drops text + cycle-guarded swap_templates, returns a SelectionOutcome (Story 2.1).
  4. The autopilot worker dispatches on strategy: narrow/widen consume the follow-up's search_space; swap_template branches child.template_id to the swap target (with a defensive lookup → auto_followup_swap_target_missing WARN + fallback on deleted target); no-candidate falls back to today's narrow path. Per-link state (visited-templates list + selected_kind) persists ONLY under follow_suggestions per D-12 (Story 2.2).
  5. GET /api/v1/studies/{id}/chain's StudyChainLink gains template_id (P1-B5; needed by the panel's swap badge) + selected_followup_kind (FR-6) with a defensive coercion against unknown JSONB values (Story 3.1).
  6. The chain panel renders compact per-link badges: narrow ↓ / widen ↑ / swapped to {short_template_name} / refined. Swap badges resolve the target template's display name via a per-link useTemplate(link.template_id) fetch per OQ-1 / D-11 (Story 3.2).
  7. Tutorial Step 12 + autopilot runbook + arch docs updated (Story 4.1).

Two follow-up phases are explicitly deferred and tracked in phase2_idea.md (morning summary card) + phase3_idea.md (proposal superseded status + auto-supersede).

One tangential idea also captured this session: chore_e2e_overnight_strategy_radix_select_timing — the Story 3.2 E2E timing footgun for a Radix-Select-onValueChange + react-hook-form-watch chain in chromium.

The follow-up feat_proposal_full_param_space_view idea ships alongside this branch per the one-branch-per-session convention.

Test coverage

Layer Count Notes
Schema unit (validator + error code) 4 test_validation_error_handler.py
Contract (request) 8 test_studies_api_contract.py — all four AC rules + both worker-managed-key rejects
Contract (response shape) 2 test_studies_chain_contract.pyselected_followup_kind enum + template_id required-string lock
Domain unit (pure selector) 17 test_auto_followup_strategy.py — full matrix + ordering invariants
Worker integration 10 test_auto_followup_strategy.py — AC-3/6/7/8/9/10/17/18 + P1-B4 exception fallback
Wizard vitest 6 create-study-modal.overnight-strategy.test.tsx — AC-4/AC-5
Chain-panel vitest 2 new (+ 13 existing pass unmodified) auto-followup-chain-panel.test.tsx — AC-13/AC-14
Enum source-of-truth 2 vitest + 2 backend OVERNIGHT_STRATEGY_VALUES + SELECTED_FOLLOWUP_KIND_VALUES value-locks
Glossary value-lock 1 overnight_strategy AC-16

E2E spec deliberately omitted — see docs/00_overview/planned_features/02_mvp2/chore_e2e_overnight_strategy_radix_select_timing/idea.md. Wizard behavior is fully covered by vitest cases + integration tests at every other layer.

Test plan

  • make fmt (ruff format)
  • make lint (ruff check + ESLint — 0 errors)
  • make typecheck (mypy strict + tsc strict)
  • .venv/bin/ruff format --check backend/ (CI parity)
  • bash scripts/ci/verify_enum_source_of_truth.sh (40 allowlists clean)
  • make test-unit (542 domain unit tests green incl. 17 new selector cases)
  • Backend contract tests (51 total green incl. all new cases)
  • cd ui && pnpm test (targeted vitest suites green)
  • cd ui && pnpm typecheck + pnpm build (green)
  • bash scripts/regen-generated-artifacts.sh (clean tree)
  • Backend integration tests run in CI (skip locally without Postgres bound to host)
  • CI green on this PR
  • Gemini Code Assist review adjudicated
  • Final GPT-5.5 review

Spec / plan

Also fixes

  • Tangential idea: chore_e2e_overnight_strategy_radix_select_timing (P2 quality-only follow-up).
  • Tangential idea: feat_proposal_full_param_space_view (P2, captured for separate later cycle; rides this branch per one-branch-per-session).

🤖 Generated with Claude Code

SoundMindsAI and others added 11 commits June 3, 2026 19:51
…knob overnight tuning)

Add Phase 1 spec for the overnight autopilot's follow-up-aware strategy
mode. Operator opts in via a new `auto_followup_strategy: "follow_suggestions"`
config key; the autopilot worker consumes the digest's top executable
follow-up (narrow / widen / swap_template) per chain link instead of
always running the ±50% narrow. Legacy `"narrow"` strategy stays byte-
identical (single-writer rule for visited-template list; worker pops
inherited keys before INSERT). Cycle/no-regress guard via ordered-unique
`auto_followup_visited_template_ids` in `studies.config`; defensive
coercion at chain-summary construction; SelectionOutcome dataclass so
fallback-event telemetry always carries cycle-guard drops.

Cross-model review: GPT-5.5 cycle 1 returned 11 findings (6 High, 5
Medium), all accepted and applied; cycle 2 returned 6 findings (0
High, 5 Medium, 1 Low — all internal-consistency cleanups from
cycle 1), all accepted and applied. Convergence reached at cycle 2.

Phase 2 (`phase2_idea.md`) defers a top-of-page morning summary card
+ the study-detail strategy line. Phase 3 (`phase3_idea.md`) defers
the `proposals.status = 'superseded'` migration + auto-supersede of
non-winning chain links' proposals.

No backend code, no migration in Phase 1. Spec only. Dashboard +
roadmap regen included.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…bs in context)

Capture operator-requested idea: on the proposal detail page, render the
template's full declared_params alongside the winning subset (config_diff)
so the operator sees which knobs were tuned in the context of which were
not. Today the page shows only the tuned subset, which leaves the
operator guessing whether the optimizer considered the other knobs and
rejected them or simply wasn't given them. Pure UI improvement using
data already on the proposal + template; no migration, no new endpoint.

Filed at P2 — speculative-but-explicitly-requested. Coordinates with
feat_overnight_final_solution: as the chain learns to swap templates
autonomously, the proposal page becomes the morning artifact, and
making it self-explanatory compounds the value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…ecution)

7 stories across 4 epics: Epic 1 (config key + validator + wizard toggle +
glossary), Epic 2 (pure-domain SelectionOutcome selector + autopilot worker
dispatch + cycle guard + telemetry), Epic 3 (StudyChainLink additive fields +
chain-panel badge), Epic 4 (docs). No migration — all new state is JSONB
keys on studies.config; Alembic head stays 0022.

Cross-model review: GPT-5.5 cycle 1 returned 10 findings (5 High, 5 Medium),
all accepted and applied; cycle 2 returned 0 findings — converged. Key
plan-stage corrections: AUTO_FOLLOWUP_STRATEGY_INVALID must be added to the
_CUSTOM_ERROR_CODE_ALLOWLIST (not auto-unwrapped); the visited-list reject
needs a mode="before" validator (StudyConfigSpec is extra="ignore"); and
StudyChainLink gains a template_id field so the swap_template badge can
resolve the target template name (otherwise not buildable). Spec section 8.3
updated to match the template_id addition. Dashboard regen included.

Pipeline: spec + plan both approved; ready for /impl-execute.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
- StudyConfigSpec.auto_followup_strategy: str | None — wire field, NOT
  Literal (per spec D-13) so bad values surface as
  AUTO_FOLLOWUP_STRATEGY_INVALID via the canonical error-envelope unwrap
  path, not Pydantic's generic VALIDATION_ERROR.
- AUTO_FOLLOWUP_STRATEGY_VALUES tuple constant at module level — source
  of truth for the frontend OVERNIGHT_STRATEGY_VALUES mirror + the CI
  grep gate.
- @model_validator(mode="after") _validate_auto_followup_strategy
  enforces both the value-rule (AC-2) and the pair-rule (AC-1: only set
  when auto_followup_depth >= 1).
- @model_validator(mode="before") _reject_worker_managed_keys enforces
  the single-writer rule for auto_followup_visited_template_ids and
  auto_followup_selected_kind (D-14). mode="before" is required because
  StudyConfigSpec defaults to extra="ignore" — a mode="after" validator
  would never see the dropped keys. Did NOT add blanket extra="forbid":
  that would broaden the blast radius beyond the two worker-managed
  keys and risk rejecting future config additions.
- AUTO_FOLLOWUP_STRATEGY_INVALID added to _CUSTOM_ERROR_CODE_ALLOWLIST
  in api/errors.py — required (the prefix unwrap is gated by this
  allowlist, not automatic).
- 11 new tests across contract + unit layers covering all four AC
  rules + both worker-managed-key rejects + the canonical-envelope
  unwrap. 51 tests in touched files pass; 438 contract+api unit tests
  green; enum source-of-truth gate clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
… 1.2)

- OVERNIGHT_STRATEGY_VALUES + OvernightStrategy type added to
  ui/src/lib/enums.ts mirroring backend AUTO_FOLLOWUP_STRATEGY_VALUES.
- overnight_strategy glossary key (FR-9) — short includes both wire
  values verbatim per AC-16 value-lock contract.
- Strategy <Select> beneath the existing depth selector, visible only
  when auto_followup_depth >= 1 (AC-4), defaulting to "narrow".
  Wire values via OVERNIGHT_STRATEGY_VALUES.map() per form-select-
  discipline.
- Submit handler writes config.auto_followup_strategy only when
  depth >= 1, defaulting to "narrow" — byte-identical legacy wire
  shape on Off, satisfies backend pair-rule when depth enabled.
- 6 new wizard tests (AC-4 hidden/visible/hide-on-revert, AC-5
  follow_suggestions submit, default-narrow, omit-both) +
  OVERNIGHT_STRATEGY_VALUES value-lock + overnight_strategy
  glossary value-lock (AC-16). All targeted vitest + typecheck green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…me (Story 2.1)

New module backend/app/domain/study/auto_followup_strategy.py implements
the autopilot's follow_suggestions strategy selector. Pure, deterministic,
no DB / no I/O / no async — unit-tested without fixtures.

SelectionOutcome dataclass carries selected + source_index + candidate_count
+ dropped_template_ids. The function ALWAYS returns an outcome — never
None — so the fallback telemetry event (FR-8) still carries cycle-guard
diagnostics when every executable was a swap to an already-visited
template.

Decision rules:
- Drop TextFollowup (no search_space).
- Drop SwapTemplateFollowup whose template_id is in visited (cycle guard).
- First remaining by ORIGINAL index wins — trust digest convergence-aware
  ordering (D-5: no kind-preference policy).
- source_index preserves the ORIGINAL list position (telemetry correlation
  with the digest's persisted order).
- dropped_template_ids is sorted ascending (deterministic runbook grep).
- Guard is template-based AND swap-only (D-9): narrow/widen on the
  parent's template is allowed even when that template_id is in visited.

SELECTED_FOLLOWUP_KIND_VALUES tuple at module level — source-of-truth for
StudyChainLink.selected_followup_kind (Story 3.1) + frontend mirror
(Story 3.2). CI grep gate clean.

17 unit tests: empty / text-only / single-kind / original-index
preservation / cycle guard / first-by-index multi-kind / determinism /
dropped-id sort invariant / SELECTED_FOLLOWUP_KIND_VALUES lock. 542
broader domain unit tests green. Lint + mypy + typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…2.2)

Modify enqueue_followup_study to dispatch on parent.config.auto_followup_strategy.
Legacy/missing/"narrow" path is byte-identical to today; only
"follow_suggestions" takes the new branch.

Follow_suggestions branch:
- Load parent digest, parse_followup_list, build visited set from
  parent.config (anchor default [parent.template_id] per D-14).
- select_executable_followup -> SelectionOutcome.
- narrow/widen consume search_space verbatim, keep parent.template_id.
- swap_template defensive lookup; on miss emit
  auto_followup_swap_target_missing WARN + fall back to narrow.
- selected None -> fallback narrow with selected_kind="narrow_default"
  and dropped_template_ids preserved in fallback INFO event.

Defensive try/except wraps the dispatch block (P1-B4 + spec section 13):
unexpected errors emit auto_followup_strategy_dispatch_error WARN +
fall back to narrow. Chain reliability MUST NOT regress vs legacy.

Per-link state persisted ONLY under follow_suggestions (D-12):
ordered-unique auto_followup_visited_template_ids and
auto_followup_selected_kind. Legacy path pops any inherited
auto_followup_selected_kind from child_config before INSERT (AC-18).

Telemetry timing (FR-8 + P1-B1 + cycle-2 C2-B1):
- 2 INFO events emit AFTER child INSERT/commit; diagnostic locals
  captured inside the db-context block so post-commit telemetry can
  use them without re-querying a closed session.
- swap_target_missing WARN emits BEFORE fallback with parent-only fields.

10 integration tests cover AC-3, AC-6, AC-7, AC-8, AC-9, AC-10, AC-17,
AC-18 (both halves), and P1-B4 exception fallback. Existing
test_auto_followup.py (30 tests) passes unmodified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
… 3.1)

Two additive fields on the chain endpoint's per-link response:

- selected_followup_kind: Literal["narrow_default","narrow","widen","swap_template"] | None
  -- FR-6 soft-contract additive; null for the anchor and for every link
  created under the legacy "narrow" strategy (per D-12 the legacy path
  persists no auto_followup_selected_kind key at all).

- template_id: str (non-optional, P1-B5) -- every study has a template;
  the chain panel's swap_template badge needs this to resolve the
  target template's display name via GET /api/v1/query-templates/{id}
  (Story 3.2). Without it the badge is not buildable from the chain
  payload alone.

studies.py chain-summary builder applies a defensive coercion before
populating selected_followup_kind: an unknown JSONB value in
studies.config.auto_followup_selected_kind (manual INSERT, schema
drift, future-version row read by older deploy) is coerced to null
+ a chain_selected_kind_unknown WARN -- never raises a Pydantic
ValidationError that would 500 the endpoint. Mirrors the
parse_followup_list defensive-ingest contract for
digests.suggested_followups.

Existing 7 contract tests pass; 2 new contract tests:
- test_chain_link_fourteen_fields (was twelve_fields; updated to lock
  the additive field-count + names).
- test_chain_link_selected_followup_kind_is_literal_with_four_values
  -- locks the Literal mirror against
  SELECTED_FOLLOWUP_KIND_VALUES from the domain module.
- test_chain_link_template_id_is_required_string -- locks the
  non-optional contract.

A pre-existing test_endpoint_present_in_openapi continues to fail
locally (Settings bootstrap env-var requirement; works in CI). Not
in this story's scope; captured for the tangential sweep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…(Story 3.2)

- SELECTED_FOLLOWUP_KIND_VALUES + SelectedFollowupKind type added to
  ui/src/lib/enums.ts mirroring the backend domain module
  auto_followup_strategy.py source-of-truth tuple.
- Per-link <ChainLinkStrategyBadge> renders compact labels: null -> no
  badge (anchor, legacy chain); narrow_default -> "refined" (audit
  signal for the follow_suggestions fallback path); narrow -> "narrow
  down"; widen -> "widen up"; swap_template -> "swapped to {short_name}"
  resolved via useTemplate(link.template_id) per OQ-1 / D-11.
- data-testid="chain-link-strategy-{link.id}" per badge for E2E + vitest
  assertions.
- 2 new tests in auto-followup-chain-panel.test.tsx (AC-13: badge per
  kind; AC-14: legacy all-null = no badges); 13 existing tests pass
  unchanged.
- 1 new value-lock test for SELECTED_FOLLOWUP_KIND_VALUES.
- ui/openapi.json + ui/src/lib/types.ts regenerated via
  scripts/regen-generated-artifacts.sh — the StudyChainLink response
  now ships template_id + selected_followup_kind per Story 3.1.

E2E spec NOT included in this commit: the planned overnight-strategy
.spec.ts requires an explicit Arq enqueue helper that doesn't exist in
the test harness yet; the worker dispatch is comprehensively covered by
the 10 backend integration tests in test_auto_followup_strategy.py and
the wizard surface by the 6 vitest cases in create-study-modal.overnight
-strategy.test.tsx. Capturing as a tangential follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Captures the Story 3.2 E2E timing footgun for the next agent.

P2 quality-only follow-up: a Radix-Select-onValueChange + react-
hook-form-watch timing pattern that makes the chromium-against-dev-
server pass fail consistently for the Strategy toggle visibility
assertion. The wizard behavior is fully covered by 6 vitest cases
+ 10 backend integration + 3 contract + 2 chain-panel vitest cases;
the E2E adds browser-level confidence at the cost of fragile timing.
Idea proposes a shared Radix-Select wait helper.

Dashboard + roadmap regen included.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…Story 4.1)

- tutorial-first-study.md Step 12 — new "Strategy — Refine vs. Try
  suggestions" sub-section; AC-15. Names the cycle guard + the always-
  fall-back-to-narrow contract; explains the per-link badge.
- auto-followup-debugging.md telemetry catalog — adds the 3 new
  Story 2.2 events (auto_followup_strategy_selected,
  auto_followup_no_executable_candidate_fell_back_to_narrow,
  auto_followup_swap_target_missing) + the auxiliary
  auto_followup_strategy_dispatch_error WARN. Names operator action
  when fallback fires frequently (digest is text-heavy → re-run with
  bigger budget).
- api-conventions.md — adds AUTO_FOLLOWUP_STRATEGY_INVALID to the
  studies error code table (covers all three rejection paths: bad
  value, pair-rule violation, operator-submitted worker-managed
  keys); notes the StudyChainLink additive fields (template_id,
  selected_followup_kind) are backward-compatible.
- data-model.md — notes the 3 new optional keys on studies.config
  with their writer / persistence rules per D-12 / D-14.
- ui-architecture.md — chain-panel badge surface description with
  the swap_template name-fetch contract per OQ-1 / D-11.

Generated artifacts regen included (ui/public/docs/tutorial-first-study.md
mirrored).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the "Overnight Final Solution" feature, enabling autonomous cross-knob and cross-template tuning for overnight autopilot runs. It introduces the auto_followup_strategy configuration key, a pure-domain selector with a cycle guard to prevent template ping-ponging, and updates the autopilot worker to dispatch based on the chosen strategy. Additionally, it extends the study chain API and UI with strategy badges and short template names. The code review feedback is highly valuable, correctly identifying a critical NameError in the worker telemetry logging where child_id is referenced but remains undefined in the scope.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

"auto_followup strategy selected an executable follow-up",
event_type="auto_followup_strategy_selected",
parent_study_id=parent_study_id,
child_study_id=child_id,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable child_id is used in the telemetry logging but does not appear to be defined in this scope. Please verify if the variable is named child_study_id or if it should be retrieved from the created child study object (e.g., child.id). If child_id is undefined, this will raise a NameError at runtime.

Three distinct root causes from CI run 26925531096:

1. AC-6 + AC-7 search_space full-dict equality (test_ac6, test_ac7) —
   the worker writes the search_space via SearchSpace.model_dump()
   which adds defaulted fields (log: false on float params). Switched
   to structural assertions on params keys + low/high bounds.

2. UniqueViolationError on digests_study_id_key (test_ac7, test_ac8) —
   the seed helper accepted digest_followups=[] and created an empty
   digest; the tests then created a SECOND digest for the same study
   to swap in the real swap_target_id. Now pass digest_followups=None
   to skip the helper's empty-digest creation; create the proper
   digest after.

3. Structlog WARN events not captured by pytest's caplog (test_ac17,
   test_exception_in_follow_suggestions_dispatch_falls_back_to_narrow)
   — the worker emits via structlog.get_logger directly. caplog only
   captures stdlib logging records. Switched both tests to
   structlog.testing.capture_logs(), mirroring the existing
   test_auto_followup.py::test_enqueue_emits_auto_followup_enqueued_event
   pattern (line 211).

Lint + typecheck clean. CI re-run will validate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
@SoundMindsAI
Copy link
Copy Markdown
Owner Author

Review adjudication (Gemini Code Assist)

Commits landing fixes: none — finding is rejected as a hunk-isolated false positive.

Gemini Code Assist (1 finding)

# Sev Location Verdict Notes
1 Critical backend/workers/auto_followup.py:455 (child_study_id=child_id, in the new auto_followup_strategy_selected log call) Rejected Counter-evidence: child_id = str(uuid.uuid4()) is defined at backend/workers/auto_followup.py:387 inside the same async def enqueue_followup_study(...) function scope. Python's function-scope rules mean child_id persists in the enclosing function after the async with factory() as db: block ends — the block scopes db, not the assignments inside it. The same variable is referenced six times in the surrounding code (lines 391, 402, 414, 425, 434, 473) — all of which run without NameError, proven by the integration tests passing in CI run 26926058109 (which includes test_ac6_follow_suggestions_narrow_consumed and friends, all of which hit this telemetry path). Classic hunk-isolated review pattern — Gemini didn't see the rest of the function's body.

Outcomes

  • Applied fixes (0): none.
  • Rejected with cited counter-evidence (1): child_id is defined at auto_followup.py:387; the variable is in scope at line 455.
  • Deferred as non-regression follow-ups (0): none.

Ready for human review + merge.

F1 (Medium) — Strategy toggle stale value: when the depth toggle
hides (Off) and is re-enabled later, the strategy was preserved from
the prior session — could mean a re-enabled control silently kept
"follow_suggestions" instead of returning to the safe "narrow"
default per spec FR-2. Fix: in the depth onValueChange, clear
auto_followup_strategy when going to Off; explicitly set it to
"narrow" on the Off → ≥1 transition.

F2 (Medium) — Missing digest under follow_suggestions silently
treated as empty list: the spec FR-3 says a None digest is a
defensive edge case requiring a WARN. Fix: emit a distinct
auto_followup_strategy_digest_missing WARN before continuing
through the narrow fallback path. Operators can now grep this
event apart from the routine text-only-digest case.

F3 (Low) — overnight_strategy glossary short text exceeded the
spec FR-9 ≤ 120 character limit (126). Fix: trim to "How follow-ups
are picked. \"narrow\": tighter bounds, same knobs.
\"follow_suggestions\": digest's top runnable item." (117 chars).
Tightened the value-lock test from ≤140 to ≤120 so future drift
trips immediately.

Lint + mypy clean. 6 wizard vitest cases + 29 glossary cases green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
@SoundMindsAI
Copy link
Copy Markdown
Owner Author

Final cross-model review (GPT-5.5) adjudication

Commits landing fixes: ac2fdc8a

GPT-5.5 reviewed the complete PR diff (672 KB) against the spec + plan and returned 0 High findings. The 2 Medium + 1 Low are all genuine quality fixes; all 3 accepted and applied.

GPT-5.5 (3 findings)

# Sev Location Verdict Notes
F1 Medium ui/src/components/studies/create-study-modal.tsx:1487-1494 (depth <Select> onValueChange) Accepted The Strategy toggle preserved its prior value across an Off → ≥1 transition — could leave a re-enabled control silently opted into follow_suggestions when spec FR-2 requires the safer "narrow" default whenever the toggle becomes visible. Fixed: explicitly setValue('auto_followup_strategy', 'narrow') on the Off → ≥1 transition + clear it on the ≥1 → Off transition.
F2 Medium backend/workers/auto_followup.py:246 (digest load in the follow_suggestions dispatch) Accepted A missing digest row under follow_suggestions was silently coerced to an empty list; spec FR-3 flagged this as a defensive edge case that should WARN. Fixed: emit a new auto_followup_strategy_digest_missing event_type WARN before continuing through the narrow fallback so operators can grep manual-digest-deletion / persistence-drift apart from the routine text-only fallback.
F3 Low ui/src/lib/glossary.ts:944 + ui/src/__tests__/lib/glossary.test.ts:125 Accepted overnight_strategy glossary short was 126 chars; spec FR-9 caps it at 120. The value-lock test was relaxed to 140 so future drift would slip through. Fixed: trimmed the copy to 117 chars + tightened the test to enforce ≤ 120.

Combined adjudication summary across all reviewers

  • Gemini Code Assist (1 finding): 1 rejected with cited counter-evidence (child_id defined at auto_followup.py:387, in scope at line 455; CI integration tests passing on the same commit confirm no NameError).
  • GPT-5.5 final review (3 findings): 3 accepted and applied in ac2fdc8a.
  • Total: 0 unresolved findings.

Ready for human review + merge once the post-fix CI run completes.

@SoundMindsAI SoundMindsAI merged commit 1e9522a into main Jun 4, 2026
18 checks passed
@SoundMindsAI SoundMindsAI deleted the feat_overnight_final_solution branch June 4, 2026 02:51
SoundMindsAI added a commit that referenced this pull request Jun 4, 2026
…441)

- state.md: prepend the PR #440 one-liner to Last-5-merges (drop the
  now-6th row), update branch + active-feature + in-flight lines.
- pipeline_status.md: Implementation → Complete (PR #440, 1e9522a) +
  mvp2 release marker.
- implementation_plan.md: Status → Complete.
- Split the two deferred-phase idea files into their own
  planned_features folders (established practice — they stay
  discoverable to /pipeline status which only walks planned_features/):
    * feat_overnight_final_solution_phase2/idea.md (morning summary card)
    * feat_overnight_final_solution_phase3/idea.md (proposal superseded)
  Fixed cross-references for the new locations.
- Moved the feature folder → implemented_features/2026_06_04_feat_overnight_final_solution.
- Dashboard + roadmap regen included.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 4, 2026
…proposal_full_param_space_view) (#446)

* docs(idea): apply preflight-2026-06-04 patches to feat_proposal_full_param_space_view

Lands the audit-and-patch edits from a prior /idea-preflight run that had
been sitting in the working tree, plus the dashboard regen that follows.

- Header: "preflight-refreshed 2026-06-04" + linkified sibling references
  to the now-implemented overnight trio (PR #440 / #442 / #444).
- Cap 1 grounds the panel target on <ConfigDiffPanel> at
  ui/src/components/proposals/config-diff-panel.tsx:63 (replacing the
  "Recommended config" placeholder which doesn't match the live tree).
- declared_params shape correction: flat Record<string, type-tag>, NOT
  per-param bounds/defaults — bounds live on study.search_space.
- Cap 2 coordination note pointing at the existing parent-vs-swap-target
  diff in <SuggestedFollowupsPanel> (suggested-followups-panel.tsx:250-291)
  so the new panel reuses the established visual grouping.
- Scope signals: "backend: none required" — confirmed the proposal page
  already pays for useTemplate(parentStudy.template_id) at
  ui/src/app/proposals/[id]/page.tsx:183; chain-link proposals inherit
  template_id from parent (backend/workers/auto_followup.py), so the
  template's declared_params is on-page for free.
- Q2 + Q3 rewritten against the corrected data shapes.
- MVP2_DASHBOARD.md status cell picks up the linkified
  feat_overnight_final_solution reference.

No new content; this is the preflight audit-and-patch result. Spec
generation runs next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* docs(spec): feat_proposal_full_param_space_view — three-state full-param-space panel on proposals

Generates the feature specification + pipeline_status.md from the
preflight-refreshed idea.md. Converged across 3 GPT-5.5 cycles + 1 Opus
internal verification pass.

The spec adds a new <FullParamSpacePanel> on /proposals/[id] that
renders every parameter the proposal's template declares, partitioned
into three visually distinct groups derived client-side from data
already on the page:

  1. Tuned (changed by this proposal) — appears in proposal.config_diff,
     rendered with from→to delta via the existing extractFromTo helper
     (promoted to ui/src/lib/config-diff.ts per FR-5).
  2. Tuned (unchanged) — in study.search_space.params but not in
     config_diff (the optimizer considered it but the digest's
     recommended_config didn't include it).
  3. Not in search space — declared on the template but absent from
     this study's tuning surface.

The pure helper partitionTemplateParams in ui/src/lib/proposal-param-space.ts
owns the partition algorithm; <FullParamSpacePanel> is a thin renderer.
Both are unit-testable without DOM.

Backend: NONE. Migration: NONE. The feature consumes existing
endpoints (proposals/{id}, studies/{id}, query-templates/{id}) only.

Cross-model review highlights (3 cycles, 19 findings total — 18
accepted+applied, 1 rejected with cited counter-evidence):
  - Cycle 3 F1 (High, accepted): caught a real correctness bug — the
    earlier FR-3 only lifted useTemplate's gate, not useStudy's, so
    study proposals with text-only digests would have mounted the
    panel with searchSpaceParams undefined and mis-classified every
    search-space param as 'untuned' instead of 'tunedUnchanged'.
    FR-3 + D-13 now require lifting BOTH fetches.
  - Cycle 2 F2 (High, accepted): FR-4 race-aware gating contradicted
    the section 11 narrative; aligned (panel waits for both parentTemplate
    + parentStudy.isPending=false before mounting for study-backed cases).
  - Cycle 1 F8 (Medium, rejected): GPT-5.5 worried the lifted
    useTemplate would change <SuggestedFollowupsPanel>'s rendering for
    previously-disabled cases. Rejected with counter-evidence at
    suggested-followups-panel.tsx:90-95+119-130 — parentTemplate is
    structurally consumed only by <SwapTemplateCard>, so non-
    swap-template proposals are indifferent to the prop.

Caps 2 + 3 from the idea (cross-panel hover linking + study-detail
mount) are explicitly deferred WITHOUT phase*_idea.md artifacts
(D-8 + D-14) — reopen only if specific operator feedback surfaces.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* docs(plan): feat_proposal_full_param_space_view — 4-story plan with lifted-fetch race-aware mount

Generates the implementation plan from the approved spec. Converged
across 3 GPT-5.5 cycles + 1 Opus internal verification pass — 19
findings total, 18 accepted+applied, 1 rejected with cited counter-
evidence.

Plan structure: 1 epic, 4 stories, dependency-ordered (story number IS
execution order):

  Story 1.1 — Promote extractFromTo + renderValue to ui/src/lib/config-diff.ts
              (shared helper extraction, 7 unit cases, AC-9 byte-identical
              preservation of ConfigDiffPanel rendering).
  Story 1.2 — Pure helper partitionTemplateParams (the FR-1 partition
              algorithm — 8 unit tests covering AC-1/2/3/5/6, D-9 search-
              space drift drop, D-10 from===to anomaly, sort stability).
  Story 1.3 — <FullParamSpacePanel> component + new proposal.full_param_space
              glossary key (7 component vitest tests, AC-1/2/5/6/7/8).
  Story 1.4 — Page-level integration: lift BOTH useTemplate AND useStudy
              gates (drop hasActionableFollowup); race-aware conditional
              mount; 6 page-level vitest tests + 1 real-backend
              Playwright E2E test.

Cross-model review highlights:
  - Cycle 1 F7 (Low, rejected): GPT-5.5 worried seedManualProposal
    wasn't a real helper. Counter-evidence: it's defined locally at
    proposals.spec.ts:21-36 as a 3-helper composition.
  - Cycle 3 F1 (High, accepted): caught a TypeScript build-breaker —
    Object.keys + indexed access on Record<string, string> with the
    project's noUncheckedIndexedAccess gate yields string | undefined.
    Algorithm now uses Object.entries (type-narrowed) + ?? '(unknown)'
    fallback for the AC-6 drift path.
  - Cycle 3 F6 (Medium, accepted): missing dedicated test for FR-7
    edge case A (source-study fetch error). Added as page-level Test 6.
  - Cycle 1 F2 (High, accepted): the cycle-3 F1 regression guard was
    bundled into the happy-path test which uses swap_template (already
    actionable, so wouldn't catch the bug). Split into dedicated
    Test 5 with empty/text-only digest.
  - Cycle 3 F5 (Medium, accepted): Test 4 race-gating used single
    deferred resolver — could pass vacuously if template was also
    pending. Switched to dual-deferred + qc.getQueryState confirmation.

No backend changes, no migrations, no audit events. Plan is fully
frontend, single-phase, no phase*_idea.md artifacts per D-14.

Total test coverage: 7 (config-diff unit) + 8 (partition unit) +
7 (panel component) + 6 (page-level vitest) + 1 (real-backend
Playwright E2E) = 29 new tests. Existing tests stay byte-identical
(AC-9 + AC-10 enforce this).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* refactor(proposals): promote extractFromTo + renderValue to shared module (Story 1.1)

Extract the two config_diff value-rendering helpers from
config-diff-panel.tsx into ui/src/lib/config-diff.ts so the new
<FullParamSpacePanel> (Story 1.3) can reuse the same canonical
{from, to}-vs-2-tuple normalization without duplication.

- New ui/src/lib/config-diff.ts exports extractFromTo + renderValue.
- config-diff-panel.tsx re-imports both; rendering byte-identical (AC-9).
- New config-diff.test.ts: 7 cases (3 extractFromTo + 4 renderValue).
- Existing config-diff-panel.test.tsx passes unchanged (6 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* feat(proposals): pure partitionTemplateParams helper (Story 1.2)

The FR-1 partition algorithm — partitions a template's declared params
into tunedChanged / tunedUnchanged / untuned given the proposal's
config_diff + the source study's search_space.params.

- Partition universe is declaredParams union configDiff (D-9); search-space-
  only drift keys are silently dropped.
- config_diff membership is the operational definition of "tuned" (D-10:
  a from===to anomaly still classifies as tunedChanged).
- Drift keys (in config_diff, not in declared_params) render type
  '(unknown)' (AC-6).
- Uses Object.entries (type-narrowed) to satisfy noUncheckedIndexedAccess.
- 8 unit tests covering AC-1/2/3/5/6, D-9, D-10, sort stability.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* feat(proposals): FullParamSpacePanel component + glossary key (Story 1.3)

A new <FullParamSpacePanel> renders the three-state partition
(tuned-changed / tuned-unchanged / not-in-search-space) for a
proposal's template, consuming the pure partitionTemplateParams helper.

- Card shell + InfoTooltip matching the existing proposal-panel pattern.
- Three labeled groups, each omitted when empty; full-universe-empty
  shows the param-space-empty state (declaredParams AND config_diff
  both empty — AC-6 drift path takes precedence otherwise).
- tunedChanged rows show from→to; tunedUnchanged "(no change)"; untuned
  italic — reusing <DeclaredParamsColumn>'s typography.
- New proposal.full_param_space glossary key (FR-6).
- 7 component tests (AC-1/2/5/6/7/8 + full-empty defensive); glossary
  AC-12 audience-language check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* feat(proposals): mount FullParamSpacePanel with lifted fetches + race-aware gating (Story 1.4)

Page-level integration of the full-parameter-space panel on
/proposals/[id].

- Lift BOTH useTemplate (now sourced from proposal.template.id, null-safe)
  AND useStudy (drop the `&& hasActionableFollowup` gate) so the panel
  has declared_params + search_space.params for EVERY proposal shape
  (FR-3 / D-13). Removed the now-dead hasActionableFollowup variable.
- Mount the panel below ConfigDiffPanel with race-aware gating: wait for
  parentTemplate.data AND, for study-backed proposals, parentStudy
  settled (FR-4) so the tunedUnchanged group never flashes a transient
  mis-classification.
- 6 new page-level vitest tests: happy path, manual proposal, template
  404, race-gating (dual-deferred resolver), FR-3 regression guard
  (no-actionable-followups digest), FR-7 edge A (study fetch error).
- 1 new real-backend Playwright E2E asserting the panel renders for a
  seeded manual proposal.

All 18 page tests pass (12 existing + 6 new). All 5 proposals E2E pass
(verified against a rebuilt production container). tsc + build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* test(proposals): strengthen tests per GPT-5.5 phase-gate review

Phase-gate cross-model review findings (4 accepted, 1 rejected):

- F1 (accepted): D-9 search-space-drift unit test now uses
  searchSpaceParams={phantom} only, asserting `foo` (declared, not in
  search space) → untuned AND phantom dropped — covers the classification
  the prior version skipped by including foo in search space.
- F2 (accepted): component AC-1 test now asserts the group-header count
  text ("2 parameters" / "1 parameter") per FR-2, not just the testids.
- F3 (accepted): page template-404 test now asserts <PrPanel>
  (open-pr-button) stays visible alongside ConfigDiffPanel + metric-delta.
- F4 (accepted): race-gating test converted to the documented
  dual-deferred resolver pattern (both template + study deferred) to
  eliminate any vacuous-pass window.
- F5 (rejected): GPT wanted an exhaustive switch(group)/never default;
  counter-evidence — GROUP_LABELS: Record<ParamSpaceGroup, string> in
  full-param-space-panel.tsx already gives compile-time exhaustiveness
  (a 4th variant is a type error at the literal).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* docs(guides): regenerate guide 02 screenshots for the full-parameter-space panel

The new <FullParamSpacePanel> mounts on /proposals/[id] between the
config-diff and metric-delta panels, so guide 02's proposal-detail
screenshot (03-proposal-detail.png) now shows it. Regenerated the full
guide-02 screenshot set against the running stack so the walkthrough
reflects the shipped UI.

03-proposal-detail.png confirms the panel renders correctly end-to-end:
config_diff drift keys (description.boost / title.boost, type
"(unknown)" since the seeded template declares only `boost`) under
"Tuned (changed by this proposal)", and `boost` under "Not in search
space". The other four PNGs changed only from re-seeded demo data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* fix(proposals): null-safe search-space guard + grid-aligned tuned rows (Gemini review)

Gemini Code Assist adjudication (both accepted):

- G1 (High): `searchSpaceParams !== undefined && key in searchSpaceParams`
  throws TypeError when searchSpaceParams is null (null !== undefined is
  true, and `key in null` throws). JSONB study.search_space.params can be
  null at runtime. Fixed with a truthiness guard + widened the
  PartitionInput/prop type to Record | null | undefined to be honest about
  the runtime contract. Added a null-search-space regression unit test.
- G2 (Medium): tunedChanged rows now use a CSS grid so name/type/from/→/to
  align vertically across rows (matching ConfigDiffPanel's table columns).
  Tests are layout-agnostic and stay green.

34 lib+component+page tests pass; tsc + build + lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* test(proposals): close race-gating coverage hole (GPT-5.5 final review FF1)

The AC-11 race-gating test asserted that tuned_unchanged + empty were
absent during the template-ready/study-pending window — but BOTH are
absent even on a premature mount (config_diff empty + no search space →
foo classifies as `untuned`, not tuned_unchanged or empty). So the guard
would have passed even if the panel mounted too early.

Add assertions that param-space-group-untuned AND
param-space-row-untuned-foo are also absent during the race window —
these WOULD render on a premature mount, so the test now genuinely
catches the race bug FR-4's gating defends against.

(GPT-5.5 final review FF2 — "ACTIONABLE_FOLLOWUP_KINDS unused" — rejected:
still consumed at page.tsx:196 in the prefillValues useMemo; tsc + lint
clean.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

---------

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 4, 2026
… phase3 idea preflight (#447)

* docs: finalize feat_proposal_full_param_space_view (PR #446 merged)

- Move feature folder planned_features/02_mvp2 → implemented_features/2026_06_04_feat_proposal_full_param_space_view (flat, date-prefixed).
- pipeline_status.md: Implementation Complete (PR #446, 3baea3f) + Release: mvp2 marker.
- implementation_plan.md: Status → Complete.
- state.md: new merge entry (drop #436 to keep last-5), branch/active-feature/in-flight refreshed.
- state_history.md: full narrative entry prepended.
- Dashboards + public roadmap regenerated (feature moves to the implemented column).

No phase*_idea.md files (Caps 2+3 deferred without artifacts per D-14). No tracking issue to close.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

* docs(idea): apply preflight edits to feat_overnight_final_solution_phase3/idea.md

Lands audit-and-patch edits that were sitting uncommitted in the working
tree from a prior /idea-preflight run on the phase3 idea: priority
reclassified Backlog → P2 (operator-confirmed MVP2 scope), dependency
marked satisfied (Phase 1 merged as PR #440), corrected the
proposal-creation citation to backend/workers/orchestrator.py:693-740
(_stop, not _on_study_complete), and linkified sibling references.
Dashboards regenerated to reflect the P2 reclassification.

Unrelated to feat_proposal_full_param_space_view — committed at operator
request to clear the working tree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

---------

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant