MVP2 backlog batch: caplog-isolation fix + 6 cross-model-reviewed spec/plan pairs#364
Conversation
…igure_logging calls The backend suite was order-dependently red: unit tests asserting on structlog.testing.capture_logs() failed with empty-capture shapes (assert []) in the full suite but passed in isolation, and identical-commit re-runs produced different failing sets. Root cause: configure_logging() handed structlog a brand-new processors list on every call. structlog binds loggers against that list instance and (with cache_logger_on_first_use=True) freezes the reference; capture_logs() mutates the list in place. A second configure_logging() (e.g. an integration test's FastAPI lifespan) swapped the list, leaving module-level loggers cached against the stale instance blind to capture_logs() — order-dependent, so the failing set varied. Fix: mutate the existing processors list in place (existing[:] = new) instead of replacing it, so every bound logger always observes the current chain — the same invariant capture_logs() relies on. Caching stays on; no production behavior change (the single startup caller configures once). Verified: reset_defaults() / cache-off cannot un-blind an already-cached proxy, so a conftest reset fixture (the idea's proposed fix) is insufficient — the fix must live in configure_logging(). New regression test fails on main, passes here. Also corrects a stale pyproject dev-group comment claiming CI passes `-n auto` (it was reverted). Closes bug_backend_suite_nondeterministic_caplog_isolation (caplog cluster). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…P2 endpoints Three hand-maintained contract-test allowlists drifted as MVP2 features shipped without updating them in the same PR, leaving the backend suite deterministically red on a clean tree: - test_resolve_engine_type_wire: add "solr" (infra_adapter_solr, PR #336). - test_health_contract subsystems: add "solr" (infra_adapter_solr Story A12). - test_openapi_has_no_orphan_endpoints: add the 5 endpoints that shipped without an EXPECTED_ENDPOINTS entry — ubi-readiness + generate-from-ubi (feat_ubi_ judgments #317), studies/{id}/chain (feat_overnight_autopilot #343), clusters/test-connection + clusters/{id}/reprobe (infra_adapter_solr #336). Test-only; status codes verified against the live OpenAPI surface. Closes bug_contract_allowlists_outdated_after_mvp2_features. Contract layer: 325 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
… ideas Advances the next 6 MVP2 idea-stage features from idea.md to feature_spec.md + implementation_plan.md, each via idea-preflight -> spec-gen -> impl-plan-gen with the mandatory GPT-5.5 cross-model review (Opus creates, GPT-5.5 reviews). - feat_apply_path_normalizer_declaration (8 FR / 7 stories) — DESIGN-AHEAD: gated on feat_query_normalization_tuning Phase 1 merge + operator-friction evidence; plan refuses /impl-execute until both clear. - feat_overnight_studies_summary_card (6 FR / 7 stories) — Phase 2 of the shipped feat_overnight_autopilot; ready. - feat_query_normalizer_typed_pipeline (9 FR / 8 stories) — DESIGN-AHEAD: Phase 2 of feat_query_normalization_tuning; Story 0 hard precondition gate. - infra_generated_artifact_freshness_gate (9 FR / 6 stories) — CI freshness gate. - chore_arq_pool_aclose_deprecation (4 FR / 2 stories) — preflight found a SECOND call site (workers/all.py:225) + corrected the arq version claim. - chore_cluster_detail_rung_badge (8 FR / 8 stories) — surface UbiRungBadge. Preflight patched each idea.md against the live tree. Dashboards regenerated. No production code; no feature implemented. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
There was a problem hiding this comment.
Code Review
This pull request resolves the non-deterministic test isolation issue (bug_backend_suite_nondeterministic_caplog_isolation) by ensuring that configure_logging mutates the existing structlog processors list in place rather than replacing it with a new instance. It also updates several contract tests to include Solr and other new endpoints, adds regression tests for the logging fix, and introduces several design-ahead specifications and plans for upcoming features and chores. A critical review comment points out that the rewritten capture_logs fixture in test_position_bias_prior.py will raise an AttributeError if run in isolation when structlog defaults to an immutable tuple of processors; a check should be added to handle this case gracefully.
| configured = structlog.get_config()["processors"] | ||
| saved = configured.copy() | ||
| configured.clear() | ||
| configured.append(cap) | ||
| structlog.configure(processors=configured) | ||
| try: | ||
| yield cap | ||
| finally: | ||
| configured.clear() | ||
| configured.extend(saved) | ||
| structlog.configure(processors=configured) |
There was a problem hiding this comment.
If test_position_bias_prior.py is run in isolation and configure_logging() has not been called yet, structlog.get_config()["processors"] will return a tuple (the default in structlog). Since tuple is immutable and does not have .copy(), .clear(), or .extend() methods, this fixture will raise an AttributeError and fail the test.
To prevent this, check if configured is an instance of list before attempting to mutate it in place. If it is a tuple (or other non-list type), fall back to configuring it directly and restoring the original value on teardown.
| configured = structlog.get_config()["processors"] | |
| saved = configured.copy() | |
| configured.clear() | |
| configured.append(cap) | |
| structlog.configure(processors=configured) | |
| try: | |
| yield cap | |
| finally: | |
| configured.clear() | |
| configured.extend(saved) | |
| structlog.configure(processors=configured) | |
| configured = structlog.get_config()["processors"] | |
| if isinstance(configured, list): | |
| saved = list(configured) | |
| configured.clear() | |
| configured.append(cap) | |
| structlog.configure(processors=configured) | |
| try: | |
| yield cap | |
| finally: | |
| configured.clear() | |
| configured.extend(saved) | |
| structlog.configure(processors=configured) | |
| else: | |
| structlog.configure(processors=[cap]) | |
| try: | |
| yield cap | |
| finally: | |
| structlog.configure(processors=configured) |
…2 drift
- test_judgment_generate::test_happy_path_ac1_ac6: source_breakdown is the
three-term {llm,human,click} shape since feat_ubi_judgments FR-10 (the old
"click folds into human" contract was retired when UBI lists ship click
rows). Update the expected dict to include click:0.
- test_migration_0021_generation_params::test_downgrade_drops_column_then_round_trip:
the test did `alembic downgrade -1` assuming 0021 was head, but head is now
0022_solr_engine_auth_check — so `-1` reversed 0022 (a CHECK relaxation) and
left 0021's generation_params column intact. Target absolute revision 0020 so
0021's downgrade actually runs, robust against future head advances. The
migration itself was correct; only the test's relative target was stale.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…mini) Adopt Gemini Code Assist's defensive isinstance(list) guard on the in-place processors-list mutation. structlog 25.5.0 returns a mutable list here (verified — the in-place path is live and the test passes standalone), so Gemini's stated AttributeError is not currently reachable; the guard is accepted as cheap future-proofing that mirrors the same isinstance guard in backend/app/core/logging.configure_logging. Falls back to a clean configure/restore if a future structlog hands back an immutable tuple. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…MVP2 batch Solr is not CI-ready: the pr.yml backend job has no Solr service container (test_demo_seeding_ubi_full ConnectErrors) and the smoke job's Solr container crashes (exited 1) during make up. Pre-existing since Solr + the Solr demo scenario shipped (PRs #336/#348); not introduced by this batch. Captured as a P1 infra idea with three fix options. Dashboards + roadmap regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Gemini Code Assist adjudication
No other review findings. CI: 11/13 checks green, 3168 backend tests pass. The 2 red checks ( |
* docs(state): finalize 3 stale MVP2 folders + close #357 + dashboards Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(dashboards): regenerate after rebase on #385 (reflect both folder-move sets) Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
What this PR does
Drives the next 7 highest-priority MVP2 idea-stage items forward in one batch (one branch, per the single-branch-per-session convention). Item #1 was a bug (fixed for real); items #2–#7 are taken from
idea.md→feature_spec.md+implementation_plan.mdvia idea-preflight → spec-gen → impl-plan-gen, each with the mandatory GPT-5.5 cross-model review. No feature is implemented here — the six are planning artifacts.Code changes (2 commits)
fix(logging)—e4707cd0— closesbug_backend_suite_nondeterministic_caplog_isolation(P1)The backend suite was nondeterministically red:
capture_logs()-based unit tests asserted on[]in the full suite but passed in isolation. The idea'spytest-randomlytheory was wrong (that plugin isn't installed). Real cause:configure_logging()handed structlog a brand-new processors list instance on every call; loggers cache against the instance (cache_logger_on_first_use=True), so a secondconfigure_logging()(e.g. an integration test's FastAPI lifespan) left earlier-bound loggers pointing at a stale list whilecapture_logs()mutated the new one → blind capture. Reproduced deterministically (8 failed → 75 passed). Fix: mutate the list in place. Also fixed a second polluter — a hand-rolled capture fixture intest_position_bias_prior.pydoingconfigure(processors=[cap])+reset_defaults(). Regression test added;bug_fix.mdwritten.fix(tests)—2c287a48— closesbug_contract_allowlists_outdated_after_mvp2_featuresThree hand-maintained contract allowlists never got
'solr'(+ 5 new MVP2 endpoints) after the Solr adapter shipped, leaving the suite deterministically red on a clean tree. Updatedtest_resolve_engine_type_wire,test_health_contractsubsystems, andEXPECTED_ENDPOINTS. Folded in (operator decision) so this batch's CI can go green. Contract layer: 325 passed.Planning artifacts (1 commit,
09585893)feat_query_normalization_tuningPhase 1 merge + operator-friction evidence; plan refuses/impl-executeuntil both clearfeat_overnight_autopilot; readyfeat_query_normalization_tuning; Story 0 hard precondition gateworkers/all.py:225) + corrected the arq version claimUbiRungBadgeon cluster detailEach idea.md was preflighted (stale claims corrected, dependency status recorded); each folder carries a
pipeline_status.mdthrough the Plan stage. Dashboards regenerated.Not in scope
No production feature code. The two design-ahead specs (#2, #4) are explicitly not green-lit for
/impl-executeuntil their Phase 1 dependency merges.🤖 Generated with Claude Code