Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -421,7 +421,7 @@ If you slip and a stub leaks into a committed file, capture it as a `bug_<slug>`
|---|---|---|
| 1 | [`infra_foundation`](docs/00_overview/implemented_features/2026_05_09_infra_foundation/) | **Complete (PR #4, merged 2026-05-09)** |
| 2 | [`infra_adapter_elastic`](docs/00_overview/implemented_features/2026_05_10_infra_adapter_elastic/) | **Complete (PR #16, merged 2026-05-10)** |
| 3 | [`infra_optuna_eval`](docs/02_product/planned_features/infra_optuna_eval/) | Spec approved, plan pending |
| 3 | [`infra_optuna_eval`](docs/00_overview/implemented_features/2026_05_10_infra_optuna_eval/) | **Complete (PR #23, merged 2026-05-10)** |
| 4 | [`feat_study_lifecycle`](docs/02_product/planned_features/feat_study_lifecycle/) | **Phase 1 (Schema) Complete (PR #18, merged 2026-05-10); Phase 2 (Orchestrator + API) deferred** ([`phase2_idea.md`](docs/02_product/planned_features/feat_study_lifecycle/phase2_idea.md)) |
| 5 | [`feat_llm_judgments`](docs/02_product/planned_features/feat_llm_judgments/) | Spec approved, plan pending |
| 6 | [`feat_digest_proposal`](docs/02_product/planned_features/feat_digest_proposal/) | Spec approved, plan pending |
Expand Down
17 changes: 8 additions & 9 deletions docs/00_overview/MVP1_DASHBOARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,33 +6,32 @@ _Reflects feature-folder state as of **2026-05-10** (latest mtime of any planned

| Metric | Value |
|---|---|
| Features done | **2 / 12** (17%) |
| Path to MVP1 | **15** items remaining (features + bugs + chores) |
| Features done | **3 / 12** (25%) |
| Path to MVP1 | **14** items remaining (features + bugs + chores) |
| Open bugs | 2 |
| Open chores | 3 (idea-stage debt) |
| Backlog ideas | 2 idea-only feat/infra (not yet scoped into MVP1) |
| In flight | 1 feature(s) actively shipping |

## Pipeline

### Done (2)
### Done (3)

| Feature | Type | One-liner | Depends on | Status |
|---|---|---|---|---|
| [infra_adapter_elastic](implemented_features/2026_05_10_infra_adapter_elastic/feature_spec.md) | Infra | A single `ElasticAdapter` implements the `SearchAdapter` Protocol and serves both Elasticsearch (8.11+ / 9.x) and OpenSearch (2.x / 3.x), distinguished by a `engine_type` column. | — | [PR #16](https://github.com/SoundMindsAI/relyloop/pull/16) merged 2026-05-10 |
| [infra_foundation](implemented_features/2026_05_09_infra_foundation/feature_spec.md) | Infra | A relevance engineer can `git clone`, `docker compose up`, see all subsystems healthy in <60s on a 16GB laptop, and have a CI pipeline that gates every PR on lint, type-check, test, and an 80% coverag | — | [PR #4](https://github.com/SoundMindsAI/relyloop/pull/4) merged 2026-05-09 |
| [infra_optuna_eval](implemented_features/2026_05_10_infra_optuna_eval/feature_spec.md) | Infra | Optuna RDB storage co-tenants with the application Postgres; TPE sampler + median pruner are the MVP1 defaults; pytrec_eval scores trials against judgment lists for nDCG@k, MAP, P@k, recall@k, and MRR | — | [PR #23](https://github.com/SoundMindsAI/relyloop/pull/23) merged 2026-05-10 |

### Implementing (1)

| Feature | Type | One-liner | Depends on | Status |
|---|---|---|---|---|
| [feat_study_lifecycle](../02_product/planned_features/feat_study_lifecycle/feature_spec.md) | Feature | A relevance engineer creates a study via API or chat, the orchestrator enqueues N parallel `run_trial` jobs, trials accumulate in real time on the study detail page, the orchestrator detects stop-cond | — | [PR #18](https://github.com/SoundMindsAI/relyloop/pull/18) merged 2026-05-10 |

### Plan (1)
### Plan (0)

| Feature | Type | One-liner | Depends on | Status |
|---|---|---|---|---|
| [infra_optuna_eval](../02_product/planned_features/infra_optuna_eval/feature_spec.md) | Infra | Optuna RDB storage co-tenants with the application Postgres; TPE sampler + median pruner are the MVP1 defaults; pytrec_eval scores trials against judgment lists for nDCG@k, MAP, P@k, recall@k, and MRR | — | [PR #18](https://github.com/SoundMindsAI/relyloop/pull/18) merged 2026-05-10 |
_None._

### Spec (8)

Expand Down Expand Up @@ -88,12 +87,12 @@ graph LR
class feat_studies_ui spec;
feat_study_lifecycle["study lifecycle"]
class feat_study_lifecycle implement;
infra_optuna_eval["optuna eval"]
class infra_optuna_eval plan;
infra_foundation["foundation"]
class infra_foundation done;
infra_adapter_elastic["adapter elastic"]
class infra_adapter_elastic done;
infra_optuna_eval["optuna eval"]
class infra_optuna_eval done;
feat_study_lifecycle --> feat_digest_proposal
feat_llm_judgments --> feat_digest_proposal
infra_foundation --> feat_github_pr_worker
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- [docs/01_architecture/data-model.md](../../../01_architecture/data-model.md) — `studies`, `trials` tables (consumed; created by `feat_study_lifecycle`)
- [docs/01_architecture/system-overview.md](../../../01_architecture/system-overview.md) — worker pool detail
- Depends on: [`infra_foundation/feature_spec.md`](../infra_foundation/feature_spec.md)
- Consumed by: [`feat_study_lifecycle` Phase 2 (orchestrator + API)](../feat_study_lifecycle/phase2_idea.md) — Phase 2's `start_study` Arq job dispatches this feature's `run_trial`
- Consumed by: [`feat_study_lifecycle` Phase 2 (orchestrator + API)](../../../02_product/planned_features/feat_study_lifecycle/phase2_idea.md) — Phase 2's `start_study` Arq job dispatches this feature's `run_trial`

---

Expand All @@ -28,7 +28,7 @@ All upstream dependencies have shipped — this feature is unblocked:
- **Worker process** (`infra_foundation` Story 4.3): exists as a placeholder at [`backend/workers/all.py`](../../../../backend/workers/all.py) with `functions=[]`; this feature adds the `run_trial` job to that list. The file's docstring already pre-declares the slot: "feat_study_lifecycle → run_trial".
- **Engine adapter** (`infra_adapter_elastic` / PR #16 merged 2026-05-10): provides `SearchAdapter` Protocol + `ElasticAdapter.search_batch()` — the engine call this feature's `run_trial` makes.
- **Schema** (`feat_study_lifecycle` Phase 1 / PR #18 merged 2026-05-10): `studies`, `trials`, `judgment_lists`, `query_*`, `proposals` tables exist on `0003`. This feature's `run_trial` job reads `studies` and writes `trials` — both shapes are documented in [`docs/01_architecture/data-model.md`](../../../01_architecture/data-model.md). 15 minimal repo functions also shipped at [`backend/app/db/repo/`](../../../../backend/app/db/repo/) covering the read/write set this feature needs.
- **Phase 2 of `feat_study_lifecycle`** (orchestrator + 12 endpoints + `start_study` Arq job) is **deferred** via [`phase2_idea.md`](../feat_study_lifecycle/phase2_idea.md). Phase 2 dispatches this feature's `run_trial`; this feature provides the trial runner that Phase 2 enqueues.
- **Phase 2 of `feat_study_lifecycle`** (orchestrator + 12 endpoints + `start_study` Arq job) is **deferred** via [`phase2_idea.md`](../../../02_product/planned_features/feat_study_lifecycle/phase2_idea.md). Phase 2 dispatches this feature's `run_trial`; this feature provides the trial runner that Phase 2 enqueues.

## 3) Scope

Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Implementation Plan — infra_optuna_eval

**Date:** 2026-05-10
**Status:** Approved (GPT-5.5 cross-model review converged at cycle 3; 28 findings across 3 cycles, all 28 accepted and applied; zero rejected)
**Status:** Complete (PR #23, merged 2026-05-10 as squash commit `c4f1aab`). GPT-5.5 plan review converged at cycle 3 (28 findings, all accepted); final-review cycle on the merged diff produced 4 findings (3 accepted + applied in commit `3b112f9`; 1 rejected with cited counter-evidence — AC-7 covered at adapter+worker layer composition).
**Primary spec:** [`feature_spec.md`](feature_spec.md)
**Policy source(s):** [`docs/01_architecture/optimization.md`](../../../01_architecture/optimization.md), [`docs/01_architecture/adapters.md`](../../../01_architecture/adapters.md), [`docs/01_architecture/data-model.md`](../../../01_architecture/data-model.md), [`CLAUDE.md`](../../../../CLAUDE.md)
**Tangential discovery filed:** [`chore_infra_optuna_eval_spec_text_drift/idea.md`](../chore_infra_optuna_eval_spec_text_drift/idea.md) (spec §14 vs §11 wording drift around the partial-failure retry contract — controlling §11 is honored by the plan; §14 needs a one-paragraph rewrite).
**Tangential discovery filed:** [`chore_infra_optuna_eval_spec_text_drift/idea.md`](../../../02_product/planned_features/chore_infra_optuna_eval_spec_text_drift/idea.md) (spec §14 vs §11 wording drift around the partial-failure retry contract — controlling §11 is honored by the plan; §14 needs a one-paragraph rewrite).

---

Expand Down Expand Up @@ -302,7 +302,7 @@ def get_or_create_study(

**Outcome:** A single import point for the `run_trial` job to fetch qrels. In MVP1 the loader raises `JudgmentsTableMissing` because the `judgments` child table is owned by `feat_llm_judgments` (per [`data-model.md` §"judgment_lists and judgments"](../../../01_architecture/data-model.md)) and is not yet shipped. Integration tests in Epic 3 monkeypatch `load_qrels` to inject hand-built qrels (per spec AC-4 "hand-built judgment list"). When `feat_llm_judgments` lands, that feature replaces the stub with a real `SELECT` against `judgments`.

**Why this is safe for MVP1 production**, even though the production path raises: the only callers of `run_trial` in production are Phase 2's orchestrator (`feat_study_lifecycle` Phase 2 — also deferred per [`phase2_idea.md`](../feat_study_lifecycle/phase2_idea.md)) and `feat_llm_judgments`. **Neither has shipped.** There is no MVP1 surface that can dispatch a real trial — the API has no endpoint to start a study, the worker has no enqueuer, and `run_trial` cannot be invoked from outside the test suite. The stub-with-typed-exception pattern therefore lets us ship the runtime substrate without compromising correctness: any premature dispatch (e.g., an operator manually invoking `arq` against the queue) fails loud with a clear message, and the real loader implementation lands atomically with `feat_llm_judgments`.
**Why this is safe for MVP1 production**, even though the production path raises: the only callers of `run_trial` in production are Phase 2's orchestrator (`feat_study_lifecycle` Phase 2 — also deferred per [`phase2_idea.md`](../../../02_product/planned_features/feat_study_lifecycle/phase2_idea.md)) and `feat_llm_judgments`. **Neither has shipped.** There is no MVP1 surface that can dispatch a real trial — the API has no endpoint to start a study, the worker has no enqueuer, and `run_trial` cannot be invoked from outside the test suite. The stub-with-typed-exception pattern therefore lets us ship the runtime substrate without compromising correctness: any premature dispatch (e.g., an operator manually invoking `arq` against the queue) fails loud with a clear message, and the real loader implementation lands atomically with `feat_llm_judgments`.

**Why this design (vs. real `SELECT`):** Spec §9 explicitly forbids new tables in this feature ("This feature does NOT define new tables"). Implementing the loader as a real `SELECT` against `judgments` would either (a) fail at SQL parse time on every dispatch (since the table doesn't exist), or (b) require creating the table as part of this feature (scope violation). The stub-with-typed-error path keeps the runtime interface stable and gives `feat_llm_judgments` an unambiguous swap point: that feature's plan should include "replace `qrels_loader.load_qrels` with a real `SELECT query_id, doc_id, rating FROM judgments WHERE judgment_list_id = :id GROUP BY query_id`".

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,20 @@
- Epic 2 (Optuna runtime + run_trial): Stories 2.1, 2.2, 2.3
- Epic 3 (tests, contract, benchmark, docs): Stories 3.1, 3.2, 3.3
- Phases covered: single-phase feature per spec §3 "Phase boundaries"
- Tangential discovery filed: [`chore_infra_optuna_eval_spec_text_drift`](../chore_infra_optuna_eval_spec_text_drift/idea.md)
- Tangential discovery filed: [`chore_infra_optuna_eval_spec_text_drift`](../../../02_product/planned_features/chore_infra_optuna_eval_spec_text_drift/idea.md)

## Implementation
- Status: Not started
- Branch: TBD (will be `feature/infra-optuna-eval` per pipeline convention)
- Next action: `/impl-execute docs/02_product/planned_features/infra_optuna_eval/implementation_plan.md --all`
- Status: Complete
- Branch: `feature/infra-optuna-eval` (squash-merged, deleted)
- Date: 2026-05-10
- PR: #23
- Squash commit: `c4f1aab`
- CI: green (backend / frontend / docker buildx)
- Cross-model review: GPT-5.5 final review on merged diff — 4 findings (3 accepted + applied in `3b112f9`, 1 rejected with cited counter-evidence)
- Gemini Code Assist: not configured on this repo — N/A
- Tests: 247 unit · 8 integration · 1 contract · 1 benchmark · 4 helper modules

## Done
- Status: Not yet shipped
- Status: Merged to main (no remote staging in MVP1 — `make migrate` + worker boot on a developer machine activates the runtime)
- Date: 2026-05-10
- Release: pre-1.0 (target tag at MVP1 cutover)
43 changes: 22 additions & 21 deletions docs/00_overview/mvp1_dashboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -267,13 +267,13 @@ <h2>MVP1 Progress</h2>
<div class="kpi-row">
<div class="kpi ">
<div class="label">Features done</div>
<div class="value">2 / 12</div>
<div class="sub">17% of scoped MVP1 features</div>
<div class="bar"><span style="width:17%"></span></div>
<div class="value">3 / 12</div>
<div class="sub">25% of scoped MVP1 features</div>
<div class="bar"><span style="width:25%"></span></div>
</div>
<div class="kpi warn">
<div class="label">Path to MVP1</div>
<div class="value">15</div>
<div class="value">14</div>
<div class="sub">items left = features + bugs + chores</div>
</div>
<div class="kpi bug">
Expand Down Expand Up @@ -499,18 +499,7 @@ <h3>Spec <span class="count">8</span></h3>
</div>

<div class="col plan">
<h3>Plan <span class="count">1</span></h3>

<div class="card infra" data-prefix="infra">
<div class="name"><a href="../../docs/02_product/planned_features/infra_optuna_eval/feature_spec.md">Optuna Eval</a></div>
<div class="meta">
<span class="badge infra">Infra</span>
<a class="pr" href="https://github.com/SoundMindsAI/relyloop/pull/18">PR #18</a><span>merged 2026-05-10</span>
</div>
<div class="one-liner">Optuna RDB storage co-tenants with the application Postgres; TPE sampler + median pruner are the MVP1 defaults; pytrec_eval scores trials against judgment lists for nDCG@k, MAP, P@k, recall@k, and MRR</div>


</div>
<h3>Plan <span class="count">0</span></h3>

</div>

Expand All @@ -531,7 +520,7 @@ <h3>Implementing <span class="count">1</span></h3>
</div>

<div class="col done">
<h3>Done <span class="count">2</span></h3>
<h3>Done <span class="count">3</span></h3>

<div class="card infra" data-prefix="infra">
<div class="name"><a href="../../docs/00_overview/implemented_features/2026_05_10_infra_adapter_elastic/feature_spec.md">Adapter Elastic</a></div>
Expand All @@ -554,6 +543,18 @@ <h3>Done <span class="count">2</span></h3>
<div class="one-liner">A relevance engineer can `git clone`, `docker compose up`, see all subsystems healthy in &lt;60s on a 16GB laptop, and have a CI pipeline that gates every PR on lint, type-check, test, and an 80% coverag</div>


</div>


<div class="card infra" data-prefix="infra">
<div class="name"><a href="../../docs/00_overview/implemented_features/2026_05_10_infra_optuna_eval/feature_spec.md">Optuna Eval</a></div>
<div class="meta">
<span class="badge infra">Infra</span>
<a class="pr" href="https://github.com/SoundMindsAI/relyloop/pull/23">PR #23</a><span>merged 2026-05-10</span>
</div>
<div class="one-liner">Optuna RDB storage co-tenants with the application Postgres; TPE sampler + median pruner are the MVP1 defaults; pytrec_eval scores trials against judgment lists for nDCG@k, MAP, P@k, recall@k, and MRR</div>


</div>

</div>
Expand Down Expand Up @@ -587,12 +588,12 @@ <h2>Dependency graph (feat_ + infra_)</h2>
class feat_studies_ui spec;
feat_study_lifecycle[&quot;study lifecycle&quot;]
class feat_study_lifecycle implement;
infra_optuna_eval[&quot;optuna eval&quot;]
class infra_optuna_eval plan;
infra_foundation[&quot;foundation&quot;]
class infra_foundation done;
infra_adapter_elastic[&quot;adapter elastic&quot;]
class infra_adapter_elastic done;
infra_optuna_eval[&quot;optuna eval&quot;]
class infra_optuna_eval done;
feat_study_lifecycle --&gt; feat_digest_proposal
feat_llm_judgments --&gt; feat_digest_proposal
infra_foundation --&gt; feat_github_pr_worker
Expand Down Expand Up @@ -636,12 +637,12 @@ <h2>Dependency graph (feat_ + infra_)</h2>
class feat_studies_ui spec;
feat_study_lifecycle[&quot;study lifecycle&quot;]
class feat_study_lifecycle implement;
infra_optuna_eval[&quot;optuna eval&quot;]
class infra_optuna_eval plan;
infra_foundation[&quot;foundation&quot;]
class infra_foundation done;
infra_adapter_elastic[&quot;adapter elastic&quot;]
class infra_adapter_elastic done;
infra_optuna_eval[&quot;optuna eval&quot;]
class infra_optuna_eval done;
feat_study_lifecycle --&gt; feat_digest_proposal
feat_llm_judgments --&gt; feat_digest_proposal
infra_foundation --&gt; feat_github_pr_worker
Expand Down
Loading
Loading