From 9469ebd0431cdd99e26ca424655c98255311e41d Mon Sep 17 00:00:00 2001 From: SoundMindsAI Date: Wed, 3 Jun 2026 22:56:56 -0400 Subject: [PATCH] docs(state): finalize feat_overnight_final_solution (PR #440 merged) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - state.md: prepend the PR #440 one-liner to Last-5-merges (drop the now-6th row), update branch + active-feature + in-flight lines. - pipeline_status.md: Implementation → Complete (PR #440, 1e9522a0) + mvp2 release marker. - implementation_plan.md: Status → Complete. - Split the two deferred-phase idea files into their own planned_features folders (established practice — they stay discoverable to /pipeline status which only walks planned_features/): * feat_overnight_final_solution_phase2/idea.md (morning summary card) * feat_overnight_final_solution_phase3/idea.md (proposal superseded) Fixed cross-references for the new locations. - Moved the feature folder → implemented_features/2026_06_04_feat_overnight_final_solution. - Dashboard + roadmap regen included. Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: SoundMindsAI --- docs/00_overview/DASHBOARD.md | 2 +- docs/00_overview/MVP2_DASHBOARD.md | 88 ++++++++++--------- docs/00_overview/dashboard.html | 2 +- .../feature_spec.md | 16 ++-- .../idea.md | 0 .../implementation_plan.md | 4 +- .../pipeline_status.md | 34 +++++++ docs/00_overview/mvp2_dashboard.html | 84 ++++++++++++------ .../pipeline_status.md | 25 ------ .../idea.md} | 6 +- .../idea.md} | 6 +- state.md | 8 +- website/docs/roadmap.md | 4 +- 13 files changed, 159 insertions(+), 120 deletions(-) rename docs/00_overview/{planned_features/02_mvp2/feat_overnight_final_solution => implemented_features/2026_06_04_feat_overnight_final_solution}/feature_spec.md (97%) rename docs/00_overview/{planned_features/02_mvp2/feat_overnight_final_solution => implemented_features/2026_06_04_feat_overnight_final_solution}/idea.md (100%) rename docs/00_overview/{planned_features/02_mvp2/feat_overnight_final_solution => implemented_features/2026_06_04_feat_overnight_final_solution}/implementation_plan.md (99%) create mode 100644 docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/pipeline_status.md delete mode 100644 docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/pipeline_status.md rename docs/00_overview/planned_features/02_mvp2/{feat_overnight_final_solution/phase2_idea.md => feat_overnight_final_solution_phase2/idea.md} (89%) rename docs/00_overview/planned_features/02_mvp2/{feat_overnight_final_solution/phase3_idea.md => feat_overnight_final_solution_phase3/idea.md} (86%) diff --git a/docs/00_overview/DASHBOARD.md b/docs/00_overview/DASHBOARD.md index 7f7199d9..61b41633 100644 --- a/docs/00_overview/DASHBOARD.md +++ b/docs/00_overview/DASHBOARD.md @@ -7,7 +7,7 @@ _Top-level index across MVP1 → GA v1+ as of **2026-06-04**. Click a release na | Release | Theme | Progress | Status | |---|---|---|---| | [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 94 / 94 scoped done | **Complete** | -| [MVP2 / v0.2](MVP2_DASHBOARD.md) | Three-Engine + Real Signals | 14 / 25 scoped done · 26 remaining | **In progress** | +| [MVP2 / v0.2](MVP2_DASHBOARD.md) | Three-Engine + Real Signals | 15 / 25 scoped done · 25 remaining | **In progress** | | MVP3 / v0.3 | Observable | — | **Not yet scoped** | | GA v1 / v1.0 | Production-ready | — | **Not yet scoped** | diff --git a/docs/00_overview/MVP2_DASHBOARD.md b/docs/00_overview/MVP2_DASHBOARD.md index 8af39f0b..21684328 100644 --- a/docs/00_overview/MVP2_DASHBOARD.md +++ b/docs/00_overview/MVP2_DASHBOARD.md @@ -20,27 +20,28 @@ Plan approved; run /impl-execute to ship | Metric | Value | |---|---| -| Filed under MVP2 | **46** folders total (done + specced not-done + idea backlog + bugs) | -| Specced features done | **14 / 25** (56%) — of features *past the idea stage* (those with a spec); the idea backlog below is NOT in this denominator, so 100% ≠ release complete | -| Pending work | **30** items (every not-done feat/infra/chore/bug across all priorities) | +| Filed under MVP2 | **48** folders total (done + specced not-done + idea backlog + bugs) | +| Specced features done | **15 / 25** (60%) — of features *past the idea stage* (those with a spec); the idea backlog below is NOT in this denominator, so 100% ≠ release complete | +| Pending work | **31** items (every not-done feat/infra/chore/bug across all priorities) | | → P0 — do next | **0** unblocking / paying daily cost | -| → P1 | **1** high-value, ready when P0 clears | -| → P2 (default) | 25 important to file, not blocking | -| → Backlog | 4 captured for record, not planned | +| → P1 | **0** high-value, ready when P0 clears | +| → P2 (default) | 26 important to file, not blocking | +| → Backlog | 5 captured for record, not planned | | Open bugs | 9 | -| Legacy "Path to MVP2" | 26 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | -| Backlog ideas | 4 idea-only feat/infra (not yet scoped into MVP2) | +| Legacy "Path to MVP2" | 25 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | +| Backlog ideas | 6 idea-only feat/infra (not yet scoped into MVP2) | | In flight | 0 feature(s) actively shipping | ## Pipeline -### Done (16) +### Done (17) | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---| | [feat_contextual_help_mvp2](implemented_features/2026_05_15_feat_contextual_help_mvp2/idea.md) | Feature | Phase 1 covered the create-study modal + study-detail surface — the steepest onboarding cliff. Two clusters of surfaces remain that a relevance engineer encounters after running their first study: | — | [PR #124](https://github.com/SoundMindsAI/relyloop/pull/124) merged 2026-05-15 | | [feat_demo_ubi_study_comparison](implemented_features/2026_05_30_feat_demo_ubi_study_comparison/feature_spec.md) | Feature | After this feature, the home-button reseed (and the | — | [PR #320](https://github.com/SoundMindsAI/relyloop/pull/320) merged 2026-05-30 | | [feat_overnight_autopilot](implemented_features/2026_05_31_feat_overnight_autopilot/feature_spec.md) | Feature | an operator can (a) discover the overnight path while creating a study because the wizard control is reframed as a labeled "🌙 Run overnight (compound automatically)" toggle with explicit copy about th | — | [PR #343](https://github.com/SoundMindsAI/relyloop/pull/343) merged 2026-05-31 | +| [feat_overnight_final_solution](implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md) | Feature | The wizard exposes a strategy choice alongside the existing depth: keep today's predictable `narrow` loop OR opt into `follow_suggestions`, which lets each chain link consume the parent digest's top * | — | [PR #440](https://github.com/SoundMindsAI/relyloop/pull/440) merged 2026-06-04 | | [feat_studies_convergence_visibility](implemented_features/2026_06_02_feat_studies_convergence_visibility/feature_spec.md) | Feature | The studies list shows a completed-trial count and a convergence badge (`Converged` / `Still improving` / `Too few trials`) per study, reusing the shipped classifier. | — | [PR #421](https://github.com/SoundMindsAI/relyloop/pull/421) merged 2026-06-02 | | [feat_study_convergence_indicator](implemented_features/2026_06_01_feat_study_convergence_indicator/feature_spec.md) | Feature | Every completed study carries a plain-language **convergence verdict** — `converged` / `still_improving` / `too_few_trials` — backed by a best-metric-so-far curve. | — | [PR #352](https://github.com/SoundMindsAI/relyloop/pull/352) merged 2026-06-01 | | [feat_study_sub_warmup_guard](implemented_features/2026_05_29_feat_study_sub_warmup_guard/feature_spec.md) | Feature | A non-blocking inline warning appears under the `max_trials` input whenever the derived preset is `custom` AND `max_trials < STUDIES_TPE_WARMUP_FLOOR (= 50)`, naming Focused/Standard as one-click reme | — | [PR #316](https://github.com/SoundMindsAI/relyloop/pull/316) merged 2026-05-29 | @@ -59,49 +60,50 @@ Plan approved; run /impl-execute to ship _None._ -### Plan (13) +### Plan (12) | # | Priority | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---|---|---| -| 1 | P1 | [feat_overnight_final_solution](planned_features/02_mvp2/feat_overnight_final_solution/feature_spec.md) | Feature | The wizard exposes a strategy choice alongside the existing depth: keep today's predictable `narrow` loop OR opt into `follow_suggestions`, which lets each chain link consume the parent digest's top * | — | deferred: Phase 2, Phase 3 | -| 2 | P2 | [feat_apply_path_normalizer_declaration](planned_features/02_mvp2/feat_apply_path_normalizer_declaration/feature_spec.md) | Feature | The winning normalizer ships as a **structured, language-agnostic manifest** in the config-repo PR — not just prose. | — | — | -| 3 | P2 | [feat_overnight_studies_summary_card](planned_features/02_mvp2/feat_overnight_studies_summary_card/feature_spec.md) | Feature | A "ran while you were away" card surfaces at the top of `/studies` when at least one overnight chain has completed since the operator's last visit. | — | [PR #343](https://github.com/SoundMindsAI/relyloop/pull/343) | -| 4 | P2 | [feat_query_normalization_tuning](planned_features/02_mvp2/feat_query_normalization_tuning/feature_spec.md) | Feature | A template that opts in by declaring `query_normalizer` as a Categorical param gets the Optuna loop deciding empirically — on the operator's judgment set — whether lowercasing, trimming, or contractio | — | — | -| 5 | P2 | [feat_query_normalizer_typed_pipeline](planned_features/02_mvp2/feat_query_normalizer_typed_pipeline/feature_spec.md) | Feature | A new typed search-space member `NormalizerPipelineParam` lets a template declare an **ordered list of normalization steps**; the Optuna loop samples over the powerset of declared steps and proposes t | — | — | -| 6 | P2 | [feat_ubi_llm_study_comparison](planned_features/02_mvp2/feat_ubi_llm_study_comparison/feature_spec.md) | Feature | A single dedicated route `/studies/compare?a={id}&b={id}` renders the two studies side-by-side with a per-panel diff column: a sentence-level digest-narrative diff, a best-trial parameter table with s | — | [PR #320](https://github.com/SoundMindsAI/relyloop/pull/320) | -| 7 | P2 | [chore_arq_pool_aclose_deprecation](planned_features/02_mvp2/chore_arq_pool_aclose_deprecation/feature_spec.md) | Chore | Both call sites use `await arq_pool.aclose()`; no `DeprecationWarning` on shutdown; a regression guard asserts the async-correct form on both paths so a future edit cannot silently reintroduce `close( | — | — | -| 8 | P2 | [chore_cluster_detail_rung_badge](planned_features/02_mvp2/chore_cluster_detail_rung_badge/feature_spec.md) | Chore | The cluster-detail page surfaces a `` for the cluster, scoped by a user-selected (or auto-seeded) query set + target. | — | [PR #320](https://github.com/SoundMindsAI/relyloop/pull/320) | -| 9 | P2 | [chore_demo_seeding_integration_tests_rewrite](planned_features/02_mvp2/chore_demo_seeding_integration_tests_rewrite/feature_spec.md) | Chore | The 9 skipped cases are rewritten to the async "POST + poll-until-terminal" shape, the timeout case is re-homed to the worker layer, a new `AC-Async` case asserts the `running → complete` polling tran | — | [PR #286](https://github.com/SoundMindsAI/relyloop/pull/286) | -| 10 | P2 | [chore_studies_post_arq_spy_fixture](planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/feature_spec.md) | Chore | A reusable `arq_pool_spy` integration fixture that records every `enqueue_job(name, *args)` call, letting studies-POST tests positively assert `spy.calls == []` on rejection and `spy.calls == [("start | — | — | -| 11 | P2 | [chore_ubi_reader_search_after_pagination](planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/feature_spec.md) | Chore | A new engine-neutral `SearchAdapter.scan_all` cursor-scan lets `UbiReader` iterate the **entire** matching event/query stream for a window (subject to a caller ceiling), folding each page into the agg | — | [PR #413](https://github.com/SoundMindsAI/relyloop/pull/413) | -| 12 | P2 | [bug_baseline_phase_test_isolation](planned_features/02_mvp2/bug_baseline_phase_test_isolation/feature_spec.md) | Bug | The three `TestComputeBaselineWaitS` cases pass standalone — `.venv/bin/python -m pytest backend/tests/unit/workers/test_orchestrator_baseline_phase.py -p no:randomly` is all-green with no reliance on | — | — | -| 13 | P2 | [bug_judgment_header_omits_click_bucket](planned_features/02_mvp2/bug_judgment_header_omits_click_bucket/feature_spec.md) | Bug | The header renders all three buckets (`llm`, `human`, `click`) so the displayed terms sum to the displayed total count, making the doc-comment claim ("the UI's source-breakdown card now renders all th | — | — | +| 1 | P2 | [feat_apply_path_normalizer_declaration](planned_features/02_mvp2/feat_apply_path_normalizer_declaration/feature_spec.md) | Feature | The winning normalizer ships as a **structured, language-agnostic manifest** in the config-repo PR — not just prose. | — | — | +| 2 | P2 | [feat_overnight_studies_summary_card](planned_features/02_mvp2/feat_overnight_studies_summary_card/feature_spec.md) | Feature | A "ran while you were away" card surfaces at the top of `/studies` when at least one overnight chain has completed since the operator's last visit. | — | [PR #343](https://github.com/SoundMindsAI/relyloop/pull/343) | +| 3 | P2 | [feat_query_normalization_tuning](planned_features/02_mvp2/feat_query_normalization_tuning/feature_spec.md) | Feature | A template that opts in by declaring `query_normalizer` as a Categorical param gets the Optuna loop deciding empirically — on the operator's judgment set — whether lowercasing, trimming, or contractio | — | — | +| 4 | P2 | [feat_query_normalizer_typed_pipeline](planned_features/02_mvp2/feat_query_normalizer_typed_pipeline/feature_spec.md) | Feature | A new typed search-space member `NormalizerPipelineParam` lets a template declare an **ordered list of normalization steps**; the Optuna loop samples over the powerset of declared steps and proposes t | — | — | +| 5 | P2 | [feat_ubi_llm_study_comparison](planned_features/02_mvp2/feat_ubi_llm_study_comparison/feature_spec.md) | Feature | A single dedicated route `/studies/compare?a={id}&b={id}` renders the two studies side-by-side with a per-panel diff column: a sentence-level digest-narrative diff, a best-trial parameter table with s | — | [PR #320](https://github.com/SoundMindsAI/relyloop/pull/320) | +| 6 | P2 | [chore_arq_pool_aclose_deprecation](planned_features/02_mvp2/chore_arq_pool_aclose_deprecation/feature_spec.md) | Chore | Both call sites use `await arq_pool.aclose()`; no `DeprecationWarning` on shutdown; a regression guard asserts the async-correct form on both paths so a future edit cannot silently reintroduce `close( | — | — | +| 7 | P2 | [chore_cluster_detail_rung_badge](planned_features/02_mvp2/chore_cluster_detail_rung_badge/feature_spec.md) | Chore | The cluster-detail page surfaces a `` for the cluster, scoped by a user-selected (or auto-seeded) query set + target. | — | [PR #320](https://github.com/SoundMindsAI/relyloop/pull/320) | +| 8 | P2 | [chore_demo_seeding_integration_tests_rewrite](planned_features/02_mvp2/chore_demo_seeding_integration_tests_rewrite/feature_spec.md) | Chore | The 9 skipped cases are rewritten to the async "POST + poll-until-terminal" shape, the timeout case is re-homed to the worker layer, a new `AC-Async` case asserts the `running → complete` polling tran | — | [PR #286](https://github.com/SoundMindsAI/relyloop/pull/286) | +| 9 | P2 | [chore_studies_post_arq_spy_fixture](planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/feature_spec.md) | Chore | A reusable `arq_pool_spy` integration fixture that records every `enqueue_job(name, *args)` call, letting studies-POST tests positively assert `spy.calls == []` on rejection and `spy.calls == [("start | — | — | +| 10 | P2 | [chore_ubi_reader_search_after_pagination](planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/feature_spec.md) | Chore | A new engine-neutral `SearchAdapter.scan_all` cursor-scan lets `UbiReader` iterate the **entire** matching event/query stream for a window (subject to a caller ceiling), folding each page into the agg | — | [PR #413](https://github.com/SoundMindsAI/relyloop/pull/413) | +| 11 | P2 | [bug_baseline_phase_test_isolation](planned_features/02_mvp2/bug_baseline_phase_test_isolation/feature_spec.md) | Bug | The three `TestComputeBaselineWaitS` cases pass standalone — `.venv/bin/python -m pytest backend/tests/unit/workers/test_orchestrator_baseline_phase.py -p no:randomly` is all-green with no reliance on | — | — | +| 12 | P2 | [bug_judgment_header_omits_click_bucket](planned_features/02_mvp2/bug_judgment_header_omits_click_bucket/feature_spec.md) | Bug | The header renders all three buckets (`llm`, `human`, `click`) so the displayed terms sum to the displayed total count, making the doc-comment claim ("the UI's source-breakdown card now renders all th | — | — | ### Spec (0) _None._ -### Idea (17) +### Idea (19) | # | Priority | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---|---|---| -| 1 | P2 | [feat_proposal_full_param_space_view](planned_features/02_mvp2/feat_proposal_full_param_space_view/idea.md) | Feature | The proposal detail page surfaces `config_diff` — the subset of parameters the study **tuned** — and the winning values for them. Today's example proposal carries `{boost: {from: 1.0, to: 2.5}}` and r | — | Idea — user request during the same session as `feat_overnight_final_solution` | -| 2 | P2 | [infra_smoke_fork_pr_secret_skip](planned_features/02_mvp2/infra_smoke_fork_pr_secret_skip/idea.md) | Infra | `.github/workflows/pr.yml` triggers on `pull_request:` ([pr.yml:43](../.github/workflows/pr.yml)) — **not** `pull_request_target`. GitHub deliberately withholds repository secrets from workflows trigg | — | Idea — tangential discovery while merging PR #387 (`chore_arq_pool_aclose_deprecation`) | -| 3 | P2 | [chore_demo_reseed_partial_completion_fast_test](planned_features/02_mvp2/chore_demo_reseed_partial_completion_fast_test/idea.md) | Chore | `infra_solr_ci_readiness` made the demo reseed engine-tolerant: when an engine is unreachable, its scenario is skipped, the reseed completes with `status="complete"` and a non-empty `scenarios_skipped | — | Idea — tangential discovery during `infra_solr_ci_readiness` Story 1.2 implementation | -| 4 | P2 | [chore_e2e_overnight_strategy_radix_select_timing](planned_features/02_mvp2/chore_e2e_overnight_strategy_radix_select_timing/idea.md) | Chore | The Story 3.2 E2E spec walks the create-study wizard to Step 5, clicks the depth `` becomes visible. In chromium against `pnpm dev`, t | — | Idea — tangential follow-up captured during `feat_overnight_final_solution` Story 3.2 implementation | -| 5 | P2 | [chore_pr_yml_parallelize_backend_job](planned_features/02_mvp2/chore_pr_yml_parallelize_backend_job/idea.md) | Chore | `.github/workflows/pr.yml` has a job named `backend (lint + typecheck + tests + coverage)` that runs four sequential things in one job: ruff/lint, mypy, the full pytest matrix (unit + integration + co | — | Idea — captured during PR #426 CI watch | -| 6 | P2 | [chore_solr_post_pipeline_followups](planned_features/02_mvp2/chore_solr_post_pipeline_followups/idea.md) | Chore | The 13-story `infra_adapter_solr` execution surfaced several follow-on items that fit neither the original spec nor any sister feature folder. None block the MVP2 Solr release — they're operator-exper | — | Idea — tangential observations from `infra_adapter_solr` end-to-end | -| 7 | P2 | [chore_ubi_hybrid_template_render](planned_features/02_mvp2/chore_ubi_hybrid_template_render/idea.md) | Chore | Idea — contract decision deferred (NOT a worker bug) | — | Idea — contract decision deferred (NOT a worker bug) | -| 8 | P2 | [bug_e2e_teardown_chain_node_delete_500](planned_features/02_mvp2/bug_e2e_teardown_chain_node_delete_500/idea.md) | Bug | The E2E global-teardown deletes seeded rows in a fixed order (per `chore_e2e_test_rows_isolation` Story 1.2 cleanup registration). For auto-followup **chains**, the seeded nodes are `queued` studies c | — | Idea — tangential discovery during `feat_overnight_autopilot` (Story 4.2 E2E, PR forthcoming) | -| 9 | P2 | [bug_relyloop_spec_ubi_section_drift](planned_features/02_mvp2/bug_relyloop_spec_ubi_section_drift/idea.md) | Bug | [`docs/00_overview/relyloop-spec.md`](relyloop-spec.md) §"Click-derived judgments — OpenSearch UBI as the engine-neutral primary path" (line ~706) carries two staleness bugs from the 2026-05-27 releas | — | Idea — captured during `feat_ubi_judgments` preflight (2026-05-29) | -| 10 | P2 | [bug_reseed_failure_blocks_retry_arq_singleton_dedup](planned_features/02_mvp2/bug_reseed_failure_blocks_retry_arq_singleton_dedup/idea.md) | Bug | `run_demo_reseed` is enqueued with a fixed Arq job id `demo_reseed:singleton` (the singleton concurrency guard). When a run reaches a terminal state, Arq stores its **result** under `arq:result:demo_r | — | Idea — tangential discovery while verifying `fix(demo): add Solr (8983) to the reseed engine host-URL mapping` (branch `feat_demo_reseed_solr_and_steplog`) | -| 11 | P2 | [bug_seed_meaningful_demos_silent_bulk_errors](planned_features/02_mvp2/bug_seed_meaningful_demos_silent_bulk_errors/idea.md) | Bug | [`scripts/seed_meaningful_demos.py:917-935`](../../scripts/seed_meaningful_demos.py#L917-L935) bulk-indexes 1000 Amazon ESCI products into a dedicated index per demo scenario: | — | Idea — captured during `bug_smoke_seed_es_unavailable_shards_race` Phase 2.5 tangential sweep | -| 12 | P2 | [bug_studies_detail_vitest_intermittent_timeout](planned_features/02_mvp2/bug_studies_detail_vitest_intermittent_timeout/idea.md) | Bug | Under the full `pnpm test` run (`vitest run`, default worker pool), the Study-detail-page render test sometimes blocks past the 5 s `testTimeout` default — but the test itself is data-driven from mock | — | Idea — captured during `chore_template_library_expansion` post-impl tangential sweep | -| 13 | P2 | [bug_webhook_concurrent_merge_race_timing_sensitive](planned_features/02_mvp2/bug_webhook_concurrent_merge_race_timing_sensitive/idea.md) | Bug | Idea — surfaced during `bug_demo_clusters_unreachable_in_healthz` PR #236 CI. | — | Idea — surfaced during `bug_demo_clusters_unreachable_in_healthz` PR #236 CI. | -| 14 | Backlog | [feat_fts_rank_ordering](planned_features/02_mvp2/feat_fts_rank_ordering/idea.md) | Feature | `feat_data_table_primitive` shipped filter-only FTS — `?q=foo` matches rows where `search_vector @@ plainto_tsquery('english', 'foo')` is true but orders results by `created_at DESC, id DESC` (the def | — | Idea — deferred from `feat_data_table_primitive` (MVP1) per spec §16. | -| 15 | Backlog | [infra_arq_subprocess_test](planned_features/02_mvp2/infra_arq_subprocess_test/idea.md) | Infra | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; | — | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; a subprocess test would add a narrow Arq-version-regression guard. | -| 16 | Backlog | [chore_auto_followup_parent_advisory_lock](planned_features/02_mvp2/chore_auto_followup_parent_advisory_lock/idea.md) | Chore | The shipped `feat_auto_followup_studies` worker uses a two-layer idempotency scheme: | — | Idea — captured as a standalone file to resolve broken cross-references in `feat_auto_followup_studies` D-11 + plan F2 + `bug_auto_followup_completed_parent_stop_chain_race/idea.md`. The slug was coined 2026-05-24 in D-11 but only existed as descriptive prose across other documents until now. | -| 17 | Backlog | [bug_chat_long_conversation_truncation](planned_features/02_mvp2/bug_chat_long_conversation_truncation/idea.md) | Bug | [`backend/app/services/agent_chat.send_user_message`](../../backend/app/services/agent_chat.py) defensively caps the OpenAI history at the most recent `HISTORY_MAX_MESSAGES = 100` messages… | — | Held for MVP2 (decided 2026-05-13). Folder renamed with `_mvp2` suffix to make the deferral visible at-a-glance in `ls docs/00_overview/planned_features/`. Resume work when MVP2 starts — no technical dependency on MVP2 infra (audit_log is N/A; Langfuse is convenience only); the deferral is scope discipline + zero current impact (latent bug, no operator has hit the 100-message cap). | +| 1 | P2 | [feat_overnight_final_solution_phase2](planned_features/02_mvp2/feat_overnight_final_solution_phase2/idea.md) | Feature | After Phase 1 ships, an operator who picks `follow_suggestions` overnight wakes up to: | — | Idea — deferred Phase 2 from `feat_overnight_final_solution` Phase 1 spec | +| 2 | P2 | [feat_proposal_full_param_space_view](planned_features/02_mvp2/feat_proposal_full_param_space_view/idea.md) | Feature | The proposal detail page surfaces `config_diff` — the subset of parameters the study **tuned** — and the winning values for them. Today's example proposal carries `{boost: {from: 1.0, to: 2.5}}` and r | — | Idea — user request during the same session as `feat_overnight_final_solution` | +| 3 | P2 | [infra_smoke_fork_pr_secret_skip](planned_features/02_mvp2/infra_smoke_fork_pr_secret_skip/idea.md) | Infra | `.github/workflows/pr.yml` triggers on `pull_request:` ([pr.yml:43](../.github/workflows/pr.yml)) — **not** `pull_request_target`. GitHub deliberately withholds repository secrets from workflows trigg | — | Idea — tangential discovery while merging PR #387 (`chore_arq_pool_aclose_deprecation`) | +| 4 | P2 | [chore_demo_reseed_partial_completion_fast_test](planned_features/02_mvp2/chore_demo_reseed_partial_completion_fast_test/idea.md) | Chore | `infra_solr_ci_readiness` made the demo reseed engine-tolerant: when an engine is unreachable, its scenario is skipped, the reseed completes with `status="complete"` and a non-empty `scenarios_skipped | — | Idea — tangential discovery during `infra_solr_ci_readiness` Story 1.2 implementation | +| 5 | P2 | [chore_e2e_overnight_strategy_radix_select_timing](planned_features/02_mvp2/chore_e2e_overnight_strategy_radix_select_timing/idea.md) | Chore | The Story 3.2 E2E spec walks the create-study wizard to Step 5, clicks the depth `` becomes visible. In chromium against `pnpm dev`, t | — | Idea — tangential follow-up captured during `feat_overnight_final_solution` Story 3.2 implementation | +| 6 | P2 | [chore_pr_yml_parallelize_backend_job](planned_features/02_mvp2/chore_pr_yml_parallelize_backend_job/idea.md) | Chore | `.github/workflows/pr.yml` has a job named `backend (lint + typecheck + tests + coverage)` that runs four sequential things in one job: ruff/lint, mypy, the full pytest matrix (unit + integration + co | — | Idea — captured during PR #426 CI watch | +| 7 | P2 | [chore_solr_post_pipeline_followups](planned_features/02_mvp2/chore_solr_post_pipeline_followups/idea.md) | Chore | The 13-story `infra_adapter_solr` execution surfaced several follow-on items that fit neither the original spec nor any sister feature folder. None block the MVP2 Solr release — they're operator-exper | — | Idea — tangential observations from `infra_adapter_solr` end-to-end | +| 8 | P2 | [chore_ubi_hybrid_template_render](planned_features/02_mvp2/chore_ubi_hybrid_template_render/idea.md) | Chore | Idea — contract decision deferred (NOT a worker bug) | — | Idea — contract decision deferred (NOT a worker bug) | +| 9 | P2 | [bug_e2e_teardown_chain_node_delete_500](planned_features/02_mvp2/bug_e2e_teardown_chain_node_delete_500/idea.md) | Bug | The E2E global-teardown deletes seeded rows in a fixed order (per `chore_e2e_test_rows_isolation` Story 1.2 cleanup registration). For auto-followup **chains**, the seeded nodes are `queued` studies c | — | Idea — tangential discovery during `feat_overnight_autopilot` (Story 4.2 E2E, PR forthcoming) | +| 10 | P2 | [bug_relyloop_spec_ubi_section_drift](planned_features/02_mvp2/bug_relyloop_spec_ubi_section_drift/idea.md) | Bug | [`docs/00_overview/relyloop-spec.md`](relyloop-spec.md) §"Click-derived judgments — OpenSearch UBI as the engine-neutral primary path" (line ~706) carries two staleness bugs from the 2026-05-27 releas | — | Idea — captured during `feat_ubi_judgments` preflight (2026-05-29) | +| 11 | P2 | [bug_reseed_failure_blocks_retry_arq_singleton_dedup](planned_features/02_mvp2/bug_reseed_failure_blocks_retry_arq_singleton_dedup/idea.md) | Bug | `run_demo_reseed` is enqueued with a fixed Arq job id `demo_reseed:singleton` (the singleton concurrency guard). When a run reaches a terminal state, Arq stores its **result** under `arq:result:demo_r | — | Idea — tangential discovery while verifying `fix(demo): add Solr (8983) to the reseed engine host-URL mapping` (branch `feat_demo_reseed_solr_and_steplog`) | +| 12 | P2 | [bug_seed_meaningful_demos_silent_bulk_errors](planned_features/02_mvp2/bug_seed_meaningful_demos_silent_bulk_errors/idea.md) | Bug | [`scripts/seed_meaningful_demos.py:917-935`](../../scripts/seed_meaningful_demos.py#L917-L935) bulk-indexes 1000 Amazon ESCI products into a dedicated index per demo scenario: | — | Idea — captured during `bug_smoke_seed_es_unavailable_shards_race` Phase 2.5 tangential sweep | +| 13 | P2 | [bug_studies_detail_vitest_intermittent_timeout](planned_features/02_mvp2/bug_studies_detail_vitest_intermittent_timeout/idea.md) | Bug | Under the full `pnpm test` run (`vitest run`, default worker pool), the Study-detail-page render test sometimes blocks past the 5 s `testTimeout` default — but the test itself is data-driven from mock | — | Idea — captured during `chore_template_library_expansion` post-impl tangential sweep | +| 14 | P2 | [bug_webhook_concurrent_merge_race_timing_sensitive](planned_features/02_mvp2/bug_webhook_concurrent_merge_race_timing_sensitive/idea.md) | Bug | Idea — surfaced during `bug_demo_clusters_unreachable_in_healthz` PR #236 CI. | — | Idea — surfaced during `bug_demo_clusters_unreachable_in_healthz` PR #236 CI. | +| 15 | Backlog | [feat_fts_rank_ordering](planned_features/02_mvp2/feat_fts_rank_ordering/idea.md) | Feature | `feat_data_table_primitive` shipped filter-only FTS — `?q=foo` matches rows where `search_vector @@ plainto_tsquery('english', 'foo')` is true but orders results by `created_at DESC, id DESC` (the def | — | Idea — deferred from `feat_data_table_primitive` (MVP1) per spec §16. | +| 16 | Backlog | [feat_overnight_final_solution_phase3](planned_features/02_mvp2/feat_overnight_final_solution_phase3/idea.md) | Feature | When `follow_suggestions` runs a 4-link chain, today's proposal-creation logic ([`backend/workers/orchestrator.py`](../backend/workers/orchestrator.py) `_on_study_complete`) creates **one `pending` pr | — | Idea — deferred Phase 3 from `feat_overnight_final_solution` Phase 1 spec | +| 17 | Backlog | [infra_arq_subprocess_test](planned_features/02_mvp2/infra_arq_subprocess_test/idea.md) | Infra | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; | — | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; a subprocess test would add a narrow Arq-version-regression guard. | +| 18 | Backlog | [chore_auto_followup_parent_advisory_lock](planned_features/02_mvp2/chore_auto_followup_parent_advisory_lock/idea.md) | Chore | The shipped `feat_auto_followup_studies` worker uses a two-layer idempotency scheme: | — | Idea — captured as a standalone file to resolve broken cross-references in `feat_auto_followup_studies` D-11 + plan F2 + `bug_auto_followup_completed_parent_stop_chain_race/idea.md`. The slug was coined 2026-05-24 in D-11 but only existed as descriptive prose across other documents until now. | +| 19 | Backlog | [bug_chat_long_conversation_truncation](planned_features/02_mvp2/bug_chat_long_conversation_truncation/idea.md) | Bug | [`backend/app/services/agent_chat.send_user_message`](../../backend/app/services/agent_chat.py) defensively caps the OpenAI history at the most recent `HISTORY_MAX_MESSAGES = 100` messages… | — | Held for MVP2 (decided 2026-05-13). Folder renamed with `_mvp2` suffix to make the deferral visible at-a-glance in `ls docs/00_overview/planned_features/`. Resume work when MVP2 starts — no technical dependency on MVP2 infra (audit_log is N/A; Langfuse is convenience only); the deferral is scope discipline + zero current impact (latent bug, no operator has hit the 100-message cap). | ## Dependency graph @@ -126,8 +128,6 @@ graph LR class chore_ubi_reader_search_after_pagination plan; feat_apply_path_normalizer_declaration["apply path normalizer declaration"] class feat_apply_path_normalizer_declaration plan; - feat_overnight_final_solution["overnight final solution"] - class feat_overnight_final_solution plan; feat_overnight_studies_summary_card["overnight studies summary card"] class feat_overnight_studies_summary_card plan; feat_query_normalization_tuning["query normalization tuning"] @@ -164,6 +164,8 @@ graph LR class infra_solr_smoke_stability done; infra_generated_artifact_freshness_gate["generated artifact freshness gate"] class infra_generated_artifact_freshness_gate done; + feat_overnight_final_solution["overnight final solution"] + class feat_overnight_final_solution done; feat_ubi_judgments --> infra_adapter_solr ``` diff --git a/docs/00_overview/dashboard.html b/docs/00_overview/dashboard.html index e4099445..983e022a 100644 --- a/docs/00_overview/dashboard.html +++ b/docs/00_overview/dashboard.html @@ -392,7 +392,7 @@

Releases

Three-Engine + Real Signals
-
14 / 25 scoped done · 26 remaining
+
15 / 25 scoped done · 25 remaining
In progress
diff --git a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/feature_spec.md b/docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md similarity index 97% rename from docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/feature_spec.md rename to docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md index 97a27aab..42407899 100644 --- a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/feature_spec.md +++ b/docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md @@ -34,7 +34,7 @@ | Digest worker | [`backend/workers/digest.py`](../../../../backend/workers/digest.py) | Line 1289 reads `auto_followup_depth = study.config.get("auto_followup_depth")` and, if not None, enqueues `enqueue_followup_study` via Arq with deterministic `_job_id=f"enqueue_followup_study:{study_id}"`. The digest also persists `suggested_followups` as JSONB on the `digests` row before this dispatch. | | Digest model | [`backend/app/db/models/digest.py`](../../../../backend/app/db/models/digest.py) | `Digest.suggested_followups: Mapped[list[dict[str, Any]]]` — NOT NULL, JSONB, server_default `'[]'::jsonb`. 1:1 with `studies` via UNIQUE FK on `study_id`. Consumers read via `parse_followup_list()` per spec D-defensive-ingest. | | Study model | [`backend/app/db/models/study.py`](../../../../backend/app/db/models/study.py) | `studies.config: JSONB` carries `auto_followup_depth`. Self-FK `parent_study_id`. `parent_proposal_id` + `parent_proposal_followup_index` ([lines 86-97](../../../../backend/app/db/models/study.py#L86-L97)) are the lineage columns the manual "Run this followup" path uses — DB CHECK `studies_parent_proposal_pair_check` requires both-set-or-both-NULL. | -| Proposal model | [`backend/app/db/models/proposal.py`](../../../../backend/app/db/models/proposal.py) | `Proposal.status` CHECK constraint — `status IN ('pending', 'pr_opened', 'pr_merged', 'rejected')` ([line 42](../../../../backend/app/db/models/proposal.py#L42)). **No `superseded` value today** — adding one requires a migration (deferred to Phase 3 `phase3_idea.md`). | +| Proposal model | [`backend/app/db/models/proposal.py`](../../../../backend/app/db/models/proposal.py) | `Proposal.status` CHECK constraint — `status IN ('pending', 'pr_opened', 'pr_merged', 'rejected')` ([line 42](../../../../backend/app/db/models/proposal.py#L42)). **No `superseded` value today** — adding one requires a migration (deferred to Phase 3 `feat_overnight_final_solution_phase3/idea.md`). | | Chain endpoint | [`backend/app/api/v1/studies.py:856-867`](../../../../backend/app/api/v1/studies.py#L856-L867) + Pydantic `StudyChainLink` at [`schemas.py:867-885`](../../../../backend/app/api/v1/schemas.py#L867-L885) | `GET /api/v1/studies/{id}/chain` returns `links: list[StudyChainLink]` with the rolled-up `best_link_id` + `cumulative_lift` + `stop_reason` + `proposal_id_for_best_link`. The `StudyChainLink` shape is explicitly extensible (the convergence-indicator spec FR-7 added `convergence_verdict` as a soft-contract additive field — see [`convergence.py:77-89`](../../../../backend/app/domain/study/convergence.py#L77-L89)). | | Chain panel | [`ui/src/components/studies/auto-followup-chain-panel.tsx`](../../../../../ui/src/components/studies/auto-followup-chain-panel.tsx) | Calls `useStudyChain(studyId)` and renders the ordered link list + cumulative-lift + stop-reason + best-config CTA per feat_overnight_autopilot FR-4. Reusing this panel — no replacement. | | Wizard depth selector | [`ui/src/components/studies/create-study-modal.tsx:1460-1468`](../../../../../ui/src/components/studies/create-study-modal.tsx#L1460-L1468) | The `🌙 Run overnight (compound automatically)` label + `cs-auto-followup` testid + `InfoTooltip glossaryKey="overnight_autopilot"` + `Select` writing `auto_followup_depth: 0..5` into `config`. This feature ADDS a strategy toggle immediately below it. | @@ -87,8 +87,8 @@ No URL changes. The chain panel and the wizard mount at their existing positions ### Out of scope - Any change to `evaluate_chain_gate`, the budget peek, the depth decrement, the cancel cascade, or the layer-1/layer-2 idempotency contract. The strategy dispatch happens AFTER all of these. -- A `superseded` value on `proposals.status` (Phase 3 → `phase3_idea.md`). MVP2 leans on the existing `/chain` endpoint's `best_link_id` + `proposal_id_for_best_link` to give the operator a single morning artifact; marking non-winning links' proposals `superseded` is a separate UX decision + migration that's not required for the core "explore + roll up" capability. -- A standalone morning summary card on the `/studies` list (Phase 2 → `phase2_idea.md`, coordinates with the existing `feat_overnight_studies_summary_card` sibling idea). +- A `superseded` value on `proposals.status` (Phase 3 → `feat_overnight_final_solution_phase3/idea.md`). MVP2 leans on the existing `/chain` endpoint's `best_link_id` + `proposal_id_for_best_link` to give the operator a single morning artifact; marking non-winning links' proposals `superseded` is a separate UX decision + migration that's not required for the core "explore + roll up" capability. +- A standalone morning summary card on the `/studies` list (Phase 2 → `feat_overnight_final_solution_phase2/idea.md`, coordinates with the existing `feat_overnight_studies_summary_card` sibling idea). - A new follow-up kind, a change to the digest LLM prompt, or a change to the digest's structured-output schema. - Multi-child fan-out per parent. The shipped engine's linear-chain invariant (D-7 of `feat_overnight_autopilot`) holds — strategy selection picks ONE follow-up per link. - Operator-pickable mid-chain strategy switching. Strategy is set at study create and inherited verbatim by descendants. @@ -106,8 +106,8 @@ No URL changes. The chain panel and the wizard mount at their existing positions ### Phase boundaries - **Phase 1 (this spec, MVP2):** FR-1 through FR-9 — the strategy wire contract, the wizard toggle, the worker dispatch, the cycle guard, the chain endpoint additive field, the panel badge, telemetry, tutorial, glossary key. Ships the autonomous cross-knob/cross-template exploration capability behind an opt-in toggle. -- **Phase 2 (deferred to [`phase2_idea.md`](phase2_idea.md)):** Dedicated morning summary card surfacing the rolled-up winner + the explored path + total lift, separate from the chain panel. Coordinates with [`feat_overnight_studies_summary_card`](../feat_overnight_studies_summary_card/idea.md). Rationale for deferral: the existing `/chain` endpoint already exposes the data needed; a polished morning card is a UX add-on that should follow rather than block the capability. -- **Phase 3 (deferred to [`phase3_idea.md`](phase3_idea.md)):** Proposal `superseded` status value + state-transition logic that marks non-winning chain links' proposals `superseded` so the morning artifact is unambiguously *one* answer. Rationale for deferral: requires a migration that reopens shipped schema (CHECK constraint on `proposals.status`) and a UX decision on whether superseded proposals appear in the `/proposals` index at all. Phase 1 delivers cross-knob exploration; Phase 3 polishes the rollup. Build it when an incident or design partner asks for the cleaner index. +- **Phase 2 (deferred to [`feat_overnight_final_solution_phase2/idea.md`](../../planned_features/02_mvp2/feat_overnight_final_solution_phase2/idea.md)):** Dedicated morning summary card surfacing the rolled-up winner + the explored path + total lift, separate from the chain panel. Coordinates with [`feat_overnight_studies_summary_card`](../feat_overnight_studies_summary_card/idea.md). Rationale for deferral: the existing `/chain` endpoint already exposes the data needed; a polished morning card is a UX add-on that should follow rather than block the capability. +- **Phase 3 (deferred to [`feat_overnight_final_solution_phase3/idea.md`](../../planned_features/02_mvp2/feat_overnight_final_solution_phase3/idea.md)):** Proposal `superseded` status value + state-transition logic that marks non-winning chain links' proposals `superseded` so the morning artifact is unambiguously *one* answer. Rationale for deferral: requires a migration that reopens shipped schema (CHECK constraint on `proposals.status`) and a UX decision on whether superseded proposals appear in the `/proposals` index at all. Phase 1 delivers cross-knob exploration; Phase 3 polishes the rollup. Build it when an incident or design partner asks for the cleaner index. --- @@ -678,7 +678,7 @@ If the chain produces an unexpected swap_template result the operator wants to a - [ ] Coverage gate ≥ 80% holds. - [ ] Rollout gates from §16 satisfied (no schema change, no migration, no flag). - [ ] `docs/01_architecture/api-conventions.md` + `data-model.md` + `ui-architecture.md` + `tutorial-first-study.md` updated. -- [ ] Phase 2 + Phase 3 deferred-work tracking files (`phase2_idea.md`, `phase3_idea.md`) exist alongside this spec. +- [ ] Phase 2 + Phase 3 deferred-work tracking files exist as their own planned_features folders (`feat_overnight_final_solution_phase2/`, `feat_overnight_final_solution_phase3/`). - [ ] No open questions remain in §19. ## 19) Open questions and decision log @@ -686,7 +686,7 @@ If the chain produces an unexpected swap_template result the operator wants to a ### Open questions - **OQ-1 (resolved at GPT-5.5 cycle 1, finding C1-B1)** — How does the chain-panel badge resolve the "short template name" for a `swap_template` link's display? **Resolved as D-11**: per-link `GET /api/v1/query-templates/{id}` fetch from the frontend (FR-7 updated). Rationale: at most 0–5 extra small fetches per chain, already TanStack-Query-cached client-side, keeps `/chain`'s response shape stable. -- **OQ-2 (resolved at GPT-5.5 cycle 2, finding C2-B3)** — Should the strategy toggle ALSO show as a read-only line on the study detail page? **Resolved as D-15**: deferred to Phase 2 (`phase2_idea.md`). The chain-panel badges per link (FR-7) already surface the strategy a chain link followed; an extra detail-page line would be a redundant secondary surface. If operator feedback during MVP2 says the chain panel is too far down the page to spot quickly, Phase 2 picks it up as part of the morning summary card scope. +- **OQ-2 (resolved at GPT-5.5 cycle 2, finding C2-B3)** — Should the strategy toggle ALSO show as a read-only line on the study detail page? **Resolved as D-15**: deferred to Phase 2 (`feat_overnight_final_solution_phase2/idea.md`). The chain-panel badges per link (FR-7) already surface the strategy a chain link followed; an extra detail-page line would be a redundant secondary surface. If operator feedback during MVP2 says the chain panel is too far down the page to spot quickly, Phase 2 picks it up as part of the morning summary card scope. _No open questions remain — §18's "no open questions" gate is satisfied._ @@ -709,4 +709,4 @@ _No open questions remain — §18's "no open questions" gate is satisfied._ - Rationale: a single, unambiguous contract everywhere (FR-3, FR-5, FR-6, AC-3, AC-6, AC-9, AC-12, AC-18 all reconcile). The earlier draft had FR-3 and FR-5/AC-12 contradicting each other on the legacy-path persistence — D-12 resolves in favor of the clean-legacy contract. - **D-13 (2026-06-03, GPT-5.5 cycle 1 finding C1-A3 accept)** — `auto_followup_strategy` field type is `str | None` (NOT `Literal[...]`). The pair-and-value check happens in the `_validate_auto_followup_strategy` model_validator with the message-prefix path so the canonical `AUTO_FOLLOWUP_STRATEGY_INVALID` error code reaches the response envelope. Mirrors the existing `_validate_auto_followup_depth` pattern — a Pydantic `Literal[...]` at field-level would surface generic `VALIDATION_ERROR` for unknown values, violating §8.6's error-code contract. - **D-14 (2026-06-03, GPT-5.5 cycle 1 finding C1-A4 accept)** — The wizard does NOT write `auto_followup_visited_template_ids`. The worker is the sole writer. The anchor's missing key is treated as `[anchor.template_id]` by the worker. The create-study contract test asserts a wizard-submitted `auto_followup_visited_template_ids` is 422-rejected. Rationale: single-writer rule eliminates the "two writers must agree on the seed value" coordination surface. -- **D-15 (2026-06-03, GPT-5.5 cycle 2 finding C2-B3 accept)** — Strategy read-only line on the study detail page (OQ-2) is deferred to Phase 2 (`phase2_idea.md`). The FR-7 per-link chain-panel badges are sufficient for MVP2; an extra detail-page line is redundant and would crowd the existing detail-page layout. Phase 2 picks it up if operator feedback says the chain panel is too far down to spot quickly during morning review. +- **D-15 (2026-06-03, GPT-5.5 cycle 2 finding C2-B3 accept)** — Strategy read-only line on the study detail page (OQ-2) is deferred to Phase 2 (`feat_overnight_final_solution_phase2/idea.md`). The FR-7 per-link chain-panel badges are sufficient for MVP2; an extra detail-page line is redundant and would crowd the existing detail-page layout. Phase 2 picks it up if operator feedback says the chain panel is too far down to spot quickly during morning review. diff --git a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/idea.md b/docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/idea.md similarity index 100% rename from docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/idea.md rename to docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/idea.md diff --git a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/implementation_plan.md b/docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/implementation_plan.md similarity index 99% rename from docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/implementation_plan.md rename to docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/implementation_plan.md index e90c06a2..84c6a2f5 100644 --- a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/implementation_plan.md +++ b/docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/implementation_plan.md @@ -1,7 +1,7 @@ # Implementation Plan — Overnight → final solution (autonomous cross-knob tuning) **Date:** 2026-06-03 -**Status:** Ready for Execution +**Status:** Complete (PR #440, squash-merged `1e9522a0` 2026-06-04) **Primary spec:** [`feature_spec.md`](feature_spec.md) **Policy source(s):** [`CLAUDE.md`](../../../../CLAUDE.md) (Absolute Rules), [`docs/01_architecture/api-conventions.md`](../../../../01_architecture/api-conventions.md) @@ -30,7 +30,7 @@ | FR-9 (glossary key) | Epic 1 / Story 1.2 | `overnight_strategy` glossary key ships with the wizard toggle | | FR-9 (tutorial + runbook) | Epic 4 / Story 4.1 | Tutorial Step 12 sub-section + autopilot runbook event section | -All spec FRs covered. No deferred FRs in Phase 1 (Phase 2 + Phase 3 tracked in `phase2_idea.md` + `phase3_idea.md`). +All spec FRs covered. No deferred FRs in Phase 1 (Phase 2 + Phase 3 tracked in `feat_overnight_final_solution_phase2/idea.md` + `feat_overnight_final_solution_phase3/idea.md`). ## 2) Delivery structure diff --git a/docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/pipeline_status.md b/docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/pipeline_status.md new file mode 100644 index 00000000..a4383b42 --- /dev/null +++ b/docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/pipeline_status.md @@ -0,0 +1,34 @@ +# Pipeline Status — feat_overnight_final_solution + +**Release:** mvp2 + +## Idea +- Status: Complete +- File: idea.md + +## Spec +- Status: Approved +- Date: 2026-06-03 +- File: feature_spec.md +- Cross-model review: GPT-5.5 passed (2 cycles to convergence; 0 High-severity findings at cycle 2) +- Cycle 1: 11 findings (6 High, 5 Medium, 0 Low) — all 11 accepted and applied +- Cycle 2: 6 findings (0 High, 5 Medium, 1 Low) — all 6 accepted and applied (internal-consistency cleanups from cycle 1 edits) +- Phases: 3 total (Phase 1 covered by this spec; Phase 2 + Phase 3 deferred with `feat_overnight_final_solution_phase2/idea.md` + `feat_overnight_final_solution_phase3/idea.md`) + +## Plan +- Status: Approved +- Date: 2026-06-03 +- File: implementation_plan.md +- Cross-model review: GPT-5.5 passed (2 cycles; cycle 1: 10 findings (5 High, 5 Medium) all accepted+applied; cycle 2: 0 findings — converged) +- Stories: 7 across 4 epics (Epic 1 schema+wizard, Epic 2 worker dispatch, Epic 3 chain surface, Epic 4 docs) +- Phases covered: Phase 1 (Phase 2 + 3 split out to their own planned_features folders `feat_overnight_final_solution_phase2/` + `feat_overnight_final_solution_phase3/` at finalization) + +## Implementation +- Status: Complete +- Date: 2026-06-04 +- PR: #440 (squash-merged `1e9522a0`) +- CI: green (all 17 `pr.yml` checks) +- Stories: 7/7 complete across 4 epics +- Cross-model review: Gemini 1 finding (rejected — hunk-isolated `child_id` false positive); GPT-5.5 final review 3 findings (0 High; 2 Medium + 1 Low all accepted + applied in `ac2fdc8a`) +- Tests: 17 domain unit + 10 worker integration + 11 contract + 4 schema unit + 6 wizard vitest + 2 chain-panel vitest + 4 enum source-of-truth + 1 glossary value-lock +- Deferred: Phase 2 (`feat_overnight_final_solution_phase2/idea.md` — morning summary card) + Phase 3 (`feat_overnight_final_solution_phase3/idea.md` — proposal `superseded` status) remain tracked; tangential `chore_e2e_overnight_strategy_radix_select_timing` + adjacent `feat_proposal_full_param_space_view` ideas filed diff --git a/docs/00_overview/mvp2_dashboard.html b/docs/00_overview/mvp2_dashboard.html index cf72272c..7fc8b22a 100644 --- a/docs/00_overview/mvp2_dashboard.html +++ b/docs/00_overview/mvp2_dashboard.html @@ -397,13 +397,13 @@

MVP2 Progress

Specced features done
-
14 / 25
-
56% specced · 46 filed under MVP2
-
+
15 / 25
+
60% specced · 48 filed under MVP2
+
Pending work
-
30
+
31
every not-done feat/infra/chore/bug across all priorities
@@ -420,29 +420,29 @@

MVP2 Progress

P1
-
1
+
0
high-value, ready when P0 clears
P2 (default)
-
25
+
26
important to file, not blocking
Backlog
-
4
+
5
captured for record, not planned
Legacy "Path to MVP2"
-
26
+
25
scoped not-done + bugs + chore-ideas only (excludes feat/infra ideas)
Backlog ideas: - 4 idea-only feat/infra folders (not yet scoped into MVP2) + 6 idea-only feat/infra folders (not yet scoped into MVP2) In flight: @@ -463,7 +463,20 @@

Pipeline

-

Idea 17

+

Idea 19

+ +
+ +
+ Feature + P2 + +
+
After Phase 1 ships, an operator who picks `follow_suggestions` overnight wakes up to:
+ + +
+
@@ -647,6 +660,19 @@

Idea 17

+
+ +
+ Feature + Backlog + +
+
When `follow_suggestions` runs a 4-link chain, today's proposal-creation logic ([`backend/workers/orchestrator.py`](../backend/workers/orchestrator.py) `_on_study_complete`) creates **one `pending` pr
+ + +
+ +
@@ -693,20 +719,7 @@

Spec 0

-

Plan 13

- -
- -
- Feature - P1 - -
-
The wizard exposes a strategy choice alongside the existing depth: keep today's predictable `narrow` loop OR opt into `follow_suggestions`, which lets each chain link consume the parent digest's top *
-
deferred: Phase 2, Phase 3
- -
- +

Plan 12

@@ -871,7 +884,7 @@

Implementing 0

-

Done 16

+

Done 17

@@ -912,6 +925,19 @@

Done 16

+
+ +
+ Feature + + PR #440 merged 2026-06-04 +
+
The wizard exposes a strategy choice alongside the existing depth: keep today's predictable `narrow` loop OR opt into `follow_suggestions`, which lets each chain link consume the parent digest's top *
+ + +
+ +
@@ -1105,8 +1131,6 @@

Dependency graph (feat_ + infra_)

class chore_ubi_reader_search_after_pagination plan; feat_apply_path_normalizer_declaration["apply path normalizer declaration"] class feat_apply_path_normalizer_declaration plan; - feat_overnight_final_solution["overnight final solution"] - class feat_overnight_final_solution plan; feat_overnight_studies_summary_card["overnight studies summary card"] class feat_overnight_studies_summary_card plan; feat_query_normalization_tuning["query normalization tuning"] @@ -1143,6 +1167,8 @@

Dependency graph (feat_ + infra_)

class infra_solr_smoke_stability done; infra_generated_artifact_freshness_gate["generated artifact freshness gate"] class infra_generated_artifact_freshness_gate done; + feat_overnight_final_solution["overnight final solution"] + class feat_overnight_final_solution done; feat_ubi_judgments --> infra_adapter_solr
diff --git a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/pipeline_status.md b/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/pipeline_status.md deleted file mode 100644 index ee66075c..00000000 --- a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/pipeline_status.md +++ /dev/null @@ -1,25 +0,0 @@ -# Pipeline Status — feat_overnight_final_solution - -## Idea -- Status: Complete -- File: idea.md - -## Spec -- Status: Approved -- Date: 2026-06-03 -- File: feature_spec.md -- Cross-model review: GPT-5.5 passed (2 cycles to convergence; 0 High-severity findings at cycle 2) -- Cycle 1: 11 findings (6 High, 5 Medium, 0 Low) — all 11 accepted and applied -- Cycle 2: 6 findings (0 High, 5 Medium, 1 Low) — all 6 accepted and applied (internal-consistency cleanups from cycle 1 edits) -- Phases: 3 total (Phase 1 covered by this spec; Phase 2 + Phase 3 deferred with `phase2_idea.md` + `phase3_idea.md`) - -## Plan -- Status: Approved -- Date: 2026-06-03 -- File: implementation_plan.md -- Cross-model review: GPT-5.5 passed (2 cycles; cycle 1: 10 findings (5 High, 5 Medium) all accepted+applied; cycle 2: 0 findings — converged) -- Stories: 7 across 4 epics (Epic 1 schema+wizard, Epic 2 worker dispatch, Epic 3 chain surface, Epic 4 docs) -- Phases covered: Phase 1 (Phase 2 + 3 deferred via phase2_idea.md + phase3_idea.md) - -## Implementation -- Status: Not started diff --git a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/phase2_idea.md b/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution_phase2/idea.md similarity index 89% rename from docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/phase2_idea.md rename to docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution_phase2/idea.md index 2014d9b8..23cc5f2c 100644 --- a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/phase2_idea.md +++ b/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution_phase2/idea.md @@ -3,8 +3,8 @@ **Date:** 2026-06-03 **Status:** Idea — deferred Phase 2 from `feat_overnight_final_solution` Phase 1 spec **Priority:** P2 -**Origin:** Carried out of `feat_overnight_final_solution/feature_spec.md` §3 "Phase boundaries" + §19 D-5/D-15. Phase 1 delivered cross-knob/cross-template autonomous exploration with the rollup data already available via the existing `/chain` endpoint; Phase 2 polishes the morning-review surface. -**Depends on:** `feat_overnight_final_solution` Phase 1 (this folder's `feature_spec.md`) must be merged first. +**Origin:** Carried out of `docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md` §3 "Phase boundaries" + §19 D-5/D-15. Phase 1 delivered cross-knob/cross-template autonomous exploration with the rollup data already available via the existing `/chain` endpoint; Phase 2 polishes the morning-review surface. +**Depends on:** `feat_overnight_final_solution` Phase 1 (now at `implemented_features/2026_06_04_feat_overnight_final_solution/`) must be merged first. > **Priority guidance:** P2 — UX polish. Not blocking the capability the Phase 1 spec delivers; lifts the morning-review experience from "open the study detail page → scroll to the chain panel" to "one card at the top says here's the answer." @@ -59,7 +59,7 @@ Phase 1's job was the **capability** — let the autopilot explore across knobs ## Relationship to other work -- **Builds on** [`feat_overnight_final_solution`](feature_spec.md) Phase 1 — depends on its `selected_followup_kind` field and the strategy persistence. +- **Builds on** [`feat_overnight_final_solution`](../../implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md) Phase 1 — depends on its `selected_followup_kind` field and the strategy persistence. - **Coordinates with** [`feat_overnight_studies_summary_card`](../feat_overnight_studies_summary_card/idea.md) — index-page "ran while away" surface; resolve overlap at Phase 2 spec time. - **Composes with** [`feat_study_convergence_indicator`](../../implemented_features/2026_05_31_feat_study_convergence_indicator/feature_spec.md) — the morning card may want to surface the winning link's convergence verdict too. diff --git a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/phase3_idea.md b/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution_phase3/idea.md similarity index 86% rename from docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/phase3_idea.md rename to docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution_phase3/idea.md index 1a67bdbb..8992fdcd 100644 --- a/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/phase3_idea.md +++ b/docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution_phase3/idea.md @@ -3,7 +3,7 @@ **Date:** 2026-06-03 **Status:** Idea — deferred Phase 3 from `feat_overnight_final_solution` Phase 1 spec **Priority:** Backlog -**Origin:** Carried out of `feat_overnight_final_solution/feature_spec.md` §3 "Phase boundaries" + §19 D-8. Phase 1 ships the cross-knob exploration capability and leans on `best_link_id` + `proposal_id_for_best_link` from the existing `/chain` endpoint to surface the morning artifact as a single proposal. Phase 3 polishes the `/proposals` index by marking non-winning chain links' proposals `superseded` so the morning view is unambiguously "one answer." +**Origin:** Carried out of `docs/00_overview/implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md` §3 "Phase boundaries" + §19 D-8. Phase 1 ships the cross-knob exploration capability and leans on `best_link_id` + `proposal_id_for_best_link` from the existing `/chain` endpoint to surface the morning artifact as a single proposal. Phase 3 polishes the `/proposals` index by marking non-winning chain links' proposals `superseded` so the morning view is unambiguously "one answer." **Depends on:** `feat_overnight_final_solution` Phase 1 must be merged first. Independent of Phase 2 (the morning summary card). > **Priority guidance:** Backlog — defer-until-incident. The Phase 1 capability does not require this. File once an operator (or design partner) reports `/proposals` clutter as friction during morning review. @@ -61,8 +61,8 @@ Critically: Phase 3 requires a migration that **reopens shipped schema** (the `p ## Relationship to other work -- **Depends on** [`feat_overnight_final_solution`](feature_spec.md) Phase 1 — uses its chain-termination signal. -- **Adjacent to** [`feat_overnight_final_solution`](feature_spec.md) Phase 2 — the morning card (Phase 2) may want to know which intermediate proposals are superseded for cleaner rendering. +- **Depends on** [`feat_overnight_final_solution`](../../implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md) Phase 1 — uses its chain-termination signal. +- **Adjacent to** [`feat_overnight_final_solution`](../../implemented_features/2026_06_04_feat_overnight_final_solution/feature_spec.md) Phase 2 — the morning card (Phase 2) may want to know which intermediate proposals are superseded for cleaner rendering. - **Independent of** `feat_overnight_studies_summary_card` — different surface. ## Open questions diff --git a/state.md b/state.md index 62e651e7..35c1949f 100644 --- a/state.md +++ b/state.md @@ -16,8 +16,8 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer ## Current branch / execution context -- **Branch:** `main` (PR #438 `feat_studies_list_trial_convergence_columns` just merged `03976c5e`; PRs #436 + #433 merged earlier the same day). All `pr.yml` checks green (smoke skipped — opt-in/off). -- **Active feature:** _None in flight._ `feat_studies_list_trial_convergence_columns` shipped 2026-06-03 (PR #438). Next: pull from the MVP2 Idea/Plan backlog (run `/pipeline status`). +- **Branch:** `main` (PR #440 `feat_overnight_final_solution` just merged `1e9522a0`, 2026-06-04). All `pr.yml` checks green (smoke skipped — opt-in/off). +- **Active feature:** _None in flight._ `feat_overnight_final_solution` shipped 2026-06-04 (PR #440). Next: pull from the MVP2 Idea/Plan backlog (run `/pipeline status`). - **Alembic head:** `0022_solr_engine_auth_check` (added by `infra_adapter_solr` Story A6 — extends `clusters.engine_type` + `clusters.auth_kind` CHECK constraints for Solr). - **Python:** 3.13. **Frontend stack:** Next 16 (App Router + Turbopack), React 19, Tailwind 4 (CSS-first), Vitest 4, ESLint 9 (flat), TypeScript 6, Playwright (chromium, single worker) for E2E. - **Coverage gates:** backend 80% (`fail_under` in pyproject), UI vitest + tsc + ESLint + Next build, plus a full-stack smoke E2E job. Live pass counts: see the latest `pr.yml` run (the historical per-feature counts moved to `state_history.md`). @@ -26,16 +26,16 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer Detail + reasoning for each is in [`state_history.md`](state_history.md). +- **2026-06-04** — `feat_overnight_final_solution` (PR #440, squash-merged `1e9522a0`). **Autonomous cross-knob overnight tuning.** Teaches the overnight autopilot to consume the parent digest's *executable* follow-ups (narrow / widen / **swap_template**) on each chain link instead of always running the hardcoded ±50% narrow on the same template — so a chain can branch knobs AND templates while you sleep. Opt-in via a new wizard **Strategy** toggle (`"narrow"` default = byte-identical legacy behavior; `"follow_suggestions"` = the new mode); the existing `test_auto_followup.py` passes unmodified (that's the backward-compat gate). 7 stories / 4 epics: (1.1) `StudyConfigSpec.auto_followup_strategy: str | None` + `_validate_auto_followup_strategy` via the `AUTO_FOLLOWUP_STRATEGY_INVALID` 422 envelope (str-not-Literal per D-13 so the canonical error-code unwrap works) + a `mode="before"` validator rejecting operator-submitted worker-managed keys (single-writer rule, D-14); (1.2) wizard Strategy `