spec 015: pipeline convergence protocol (closes #239) by jeremymanning · Pull Request #250 · ContextLab/llmXive

jeremymanning · 2026-05-29T18:06:15Z

Summary

Implements spec 015 — Pipeline Convergence Protocol (issue #239). Replaces the legacy accumulated-review-points model (≥10 LLMs / ≥5 humans, 0.5/1.0 points) with a convergence-based gate: each reviewable stage runs identify → revise → re-review with its LLM panel and advances only on unanimous panel acceptance within a 3-round cap, else an adaptive kickback to the prior stage with full provenance. Human/personality reviews are advisory only and route through stage-aware triage.

Key behavior (selected FRs)

Convergence engine: R1 identify → R2 revise → R3 re-review; unanimous-acceptance gate; honest converged reporting (FR-016).
FR-012 selective re-review: dissenters always re-review; R1-accepters re-review only when R2 changed a lens-relevant artifact.
FR-011 reviser self-consistency: a second code-level audit call + one corrective re-pass, exception-guarded.
FR-048 living-document batched recompile: render Discussion → sha256 material-change → FR-054 sign-off gate → version DOI; cron auto-triggers but never auto-mints.
HF Inference-API backend removed — HF models run locally via transformers; backend chain is dartmouth → local.

Hardening in this PR

Fixed 2 latent finally: return bugs (implementer/publisher) that double-appended run-log entries on the skip path and swallowed re-raises.
Fixed a real NameError in agents/librarian.py (loop var/body mismatch on the marginal-fallback path), surfaced by the mypy pass.
Introduced LLMXIVE_REPO_ROOT repo-root override (centralized ~60 __file__ climbs) and de-rotted the Phase-3 real-call e2e so it runs hermetically against a synthetic repo (verified: real Specifier+Clarifier run, 95s).
(str, Enum) → StrEnum migration; mypy strict: 213 → 0; ruff check .: clean (repo-wide); offline suite 1232 passed.

Verification

ruff check . → All checks passed
mypy src/llmxive → 0 errors (154 files)
offline suite → 1232 passed, 1 skipped, 2 deselected
Phase-3 e2e (real-call) → passes (95s); prompts-check → OK (53 agents)

Note: part 7 of #239 (full sequential end-to-end pipeline run with per-step artifact-quality review) is in progress as follow-up work on this branch.

🤖 Generated with Claude Code

…+ review-model overhaul (#239) Comprehensive Spec Kit specification for umbrella issue #239, grounded in the 2026-05-27 design doc SSoT and a code-verified audit. Covers: the inode-table summarize/desummarize primitive (no silent loss of check-critical elements), the generic identify->revise->re-review convergence engine + adaptive kickback, removal of the point system for unanimous-panel acceptance + advisory triage, per-step ReviewSpec adapters across the whole research + paper track, reviewer calibration (9 domains, held-out generality), end-to-end traversal proof, living-document discussion board, and all 10 audit bug fixes + arXiv resilience. Three scope decisions resolved with maintainer up front (living-doc=full; point cutover=migrate-forward; overflow floor=inode-table pointers). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Five clarifications integrated into the spec (Clarifications + FRs/SCs/scenarios/assumptions): - Publish target: real public Zenodo/GitHub/site, but a MANDATORY manual maintainer sign-off before every DOI mint for the duration of this spec (new FR-054, SC-014; FR-036/FR-048 updated). - E2E coverage: all 9 domains traverse end-to-end to posted (FR-045, SC-007). - Calibration: differential clean-vs-injected test + manual adjudication + adaptive sensitivity tuning (no fixed over-flag % / K) (FR-042, FR-044, SC-005). - Kickback budget: NO global cap; monotonic-improvement-until-convergence; per-step 3-round cap retained (FR-017, edge case, assumptions). - Cutover: no posted/done projects exist -> migration applies to in-flight only (FR-025). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

plan.md (Constitution Check: points-removal + no-global-cap tracked as authorized deviations -> constitution amendment task), research.md (10 grounded technical decisions incl. inode-table summarizer format, engine-as-callables, adaptive kickback, manual DOI sign-off, differential calibration), data-model.md (pydantic entities), quickstart.md, and 6 contracts (summarize-api, convergence-engine, reviewspec-registry, review-intake-triage, kickback-record, publisher-signoff). CLAUDE.md SPECKIT ref -> 015. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Organized by user story (US1-US8) with Setup/Foundational/Polish. TDD + real-call + manual-QC tasks included per spec. Dependency chain: summarizer first -> engine -> bug fixes -> review model -> per-step panels -> calibration (9 domains) -> e2e to posted (9 domains, manual DOI sign-off) -> living-doc -> polish. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Closed 4 coverage/underspecification findings from /speckit-analyze (0 remain): - C1 (HIGH): FR-006 authoring-side overflow routing + paper twins -> T054-T057 - C2 (MED): FR-026 repository_hygiene line-count/gitignore -> T043 - U1 (MED): FR-053 convergence principle encoding -> T007 - U2 (LOW): FR-017 ProgressRecord emission -> T026 Constitution point-conflict (CRITICAL) resolved by explicit amendment task T007. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- T001: new package dirs (convergence/, calibration/, agents/prompts/panels/) - T002: STATUS.md living progress doc (FR-052) - T003: Stage.AWAITING_PUBLICATION_SIGNOFF; config CONVERGENCE_MAX_ROUNDS=3 + CONVERGENCE_PER_ROUND_BUDGET_SECONDS=600. Imports verified. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

New SSoT primitive src/llmxive/tools/summarize.py: summarize()/desummarize() with on-disk inode-table pointer hierarchy. Deterministic no-loss guarantee (URLs/DOIs/ arXiv/citations/FR-SC-task ids/numbers preserved verbatim; full content on disk, recursively paged in). 12 tests pass (7 edge cases + core no-loss + manifest contract + no-dangling-pointer); ruff + mypy clean. Remaining for US1: T009 real-call fidelity, T017 re-point paper_reviewer (SSoT), T018 real-call verification. See STATUS.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

_build_corpus_with_summaries now delegates context reduction to tools/summarize.summarize() (inode-table, no silent truncation), preserving the 1-arg summarize_fn contract + _cached_summarize memoization. Supersedes the old truncate-with-notice fallback (Const. I SSoT). Updated the 2 coupled unit tests to the new behavior (full source recoverable via desummarize); _chunk_corpus + its 3 tests untouched. 24 paper_reviewer + 12 summarizer tests pass; mypy-clean for the changed function. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

tests/real_call/test_summarize_fidelity.py: real qwen3.5-122b summarize_fn over an over-budget doc; desummarize recovers EVERY critical element verbatim (no loss through a real-LLM reduction). PASSED in 334s. US1 (summarizer) fully done & verified: 12 offline + 1 real-call, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tion (#239) - T004/T005: convergence/types.py — Severity (ordered + legacy mapping) and the Concern/ConcernResponse/Verdict/ProgressRecord/ConvergenceResult/KickbackRecord/ TriageRecord pydantic models + Reviewer/Reviser Protocols + ReviewSpec dataclass. - T006: tests/contract/test_convergence_types.py (7 pass; ruff + mypy clean). - T007: constitution -> v1.1.0; added Principle VI (Convergent Review, NON-NEGOTIABLE), replaced the point-based Review-thresholds gate with unanimous-panel convergence + advisory triage, Sync Impact Report updated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

convergence/engine.py run_convergence: identify->revise->re-review loop with honest converged reporting (FR-016), 3-round cap, self-review/producer exclusion + stale-never-passes (FR-018), per-round wall-clock budget (FR-013), and overflow inputs routed through tools/summarize (FR-006). convergence/kickback.py route_kickback (adaptive worst-severity->stage, full-provenance KickbackRecord) + progress_record (FR-017). 15 unit tests pass; ruff + mypy clean. US2 remaining (coupled to US4/US3): T021 real-project integration, T025 advancement.py _produced_by stub, T027 tasker Mode-A/B refactor into the engine. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Addressed the tech debt I had flagged (per "fix issues as you notice them"): - types-PyYAML dev dep -> yaml stubs resolve under `python -m mypy` (clears yaml errors codebase-wide). - ReviewRecord.score: invalid Literal[float] -> float + field_validator (PEP 586; identical {0.0,0.5,1.0} constraint). - paper_reviewer: list[dict]->list[dict[str,Any]]; text coerced to str. - removed 2 unused PaperReviewerAgent imports in test. - FIX: T003 added Stage.AWAITING_PUBLICATION_SIGNOFF but not the project-state schema enum -> contract test failed; added it (single SSoT schema). - FIX: T001 panels dir was under src/llmxive/agents/prompts/ but prompts live at repo-root agents/prompts/ -> relocated; corrected 7 path refs in tasks.md. Finding (STATUS.md): project does NOT gate on ruff/mypy (no config, no CI step; gates = pytest + checks.*). ~273 legacy mypy errors are pre-existing, out of #239. Focused regression: 92 passed (all contract + score/paper_reviewer/convergence). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…239) New agents/prompts/implementer_research.md: instructs the research speckit implementer to emit the artifacts/verdict YAML the parser expects (write real runnable code/data, no stubs/diffs, fail-loud verdicts). implement_cmd.py now renders it instead of the paper-revision LaTeX implementer.md (which stays for the separate paper-revision agent). Also fixed 2 pre-existing ruff nits in implement_cmd.py (I001 import sort, F541) since I touched the file. tests/integration/test_audit_bugfixes.py verifies the fix (2 pass). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

theoremsearch.search() now retries transient failures (429/500/502/503/504 + RequestException/timeout) with exponential backoff (MAX_TRANSIENT_RETRIES=3), then degrades via TransientBackendError (the librarian wrapper already treats that as "optional source unavailable"). Non-transient 4xx are not retried. retry_backoff_base_seconds is injectable (tests pass 0). 4 unit tests; ruff+mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#239) Full offline suite verified green: tests/contract + 599 tests/unit (7.45s) + real-call summarize_fidelity. Flagged pre-existing live-PDF test in tests/unit (not CI-gated, hangs offline) for separate gating. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…yze (#239) Discrepancy #4 fix: ANALYZE_SYSTEM_PROMPT_PATH was defined but unused (inline prompt hardcoded; paper reused research tasker.md). Now there are TWO real analyze prompts that ARE used via render_prompt: - agents/prompts/analyze.md (research): requirements_coverage / internal_consistency / testability / scope / constitution_alignment lenses (same vocabulary as the US4 Tasks panel). - agents/prompts/paper_analyze.md (paper): reader_scenario_coverage / claims_supported / required_sections_figures / scope_vs_research / internal_consistency / constitution_alignment. run_analyze() gains kind={"research","paper"} + constitution_text kwargs. paper_tasks_cmd passes kind="paper" + paper constitution; tasks_cmd passes research constitution (FR-030: constitution is a standard analyze input from `specified` onward). 6 audit-bugfix tests + 38 phase4 integration tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

clarifier.attempts_so_far was hardcoded 0 (escalation unreachable) and paper_clarifier never branched on verdict=escalate AND silently substituted a "Resolved by default" stub on missing patches — a no-silent-shortcuts violation. Fixes: - New shared _clarify_attempts.py: persists per-project attempt count under .specify/memory/clarifier_attempts.yaml; bump/read/reset + write_human_input_needed. - Both clarifiers now read REAL attempts and pass them to the prompt. - Both branch on verdict=escalate -> write human_input_needed.yaml + raise. - Both escalate at TASKER_MAX_REVISION_ROUNDS (=5) -> write human_input_needed.yaml + raise. - paper_clarifier no longer substitutes the silent "Resolved by default" stub (matches research clarifier's loud failure behavior). - Also removed 2 pre-existing F841 dead locals in clarify_cmd._spec_path. 29 tests pass (audit + phase3 integration); ruff clean for touched files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…239) paper_specifier.md advertised `code_summary` / `data_summary` inputs that the code never supplied (silent drift between prompt and reality). paper_specify_cmd now injects both blocks into the user message, reusing research_reviewer's _summarize_tree() as the SSoT tree-summary helper — Const. I (share, don't fork). The advertised inputs ARE now present, grounding the paper-spec generation in the project's actual code/ and data/ trees. 11 audit-bugfix tests pass; ruff clean for touched files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…R-054) (#239) Discrepancy #2 fix (FR-036): graph._decide_next_stage no longer shortcuts PAPER_ACCEPTED -> POSTED. It now routes paper_accepted -> AWAITING_PUBLICATION_SIGNOFF, then AWAITING_PUBLICATION_SIGNOFF -> POSTED ONLY when the maintainer sign-off record exists. The PaperPublisher itself enforces the same gate (defense-in-depth) — at PAPER_ACCEPTED or AWAITING_PUBLICATION_SIGNOFF with NO signoff record it SKIPs with a clear "awaiting manual maintainer DOI sign-off (FR-054)" reason. No Zenodo DOI is minted without recorded approval. New surface: - src/llmxive/speckit/_publication_signoff.py: read/write/has/clear_signoff persistence under <project>/.specify/memory/publication_signoff.yaml; FR-054 who/when/what record (kinds "initial" / "version"). - `llmxive project publish-approve <PROJ-ID> --who X --what Y [--kind initial|version]` CLI command writes the sign-off record. - 6 new audit-bugfix tests + 27 publisher/graph regression tests pass. Also fixed 38 pre-existing ruff issues in touched files (auto-fix). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Discrepancy #7 fix (FR-018): advancement._produced_by was a stub returning None. It now scans state/run-log/<YYYY-MM>/*.jsonl for the latest entry whose outputs list contains the artifact path and returns that entry's agent_name. Exact + suffix path matching tolerates relative-vs-absolute bookkeeping. A repo_root kwarg keeps the production call (no repo_root) working while making tests hermetic. Defensive: returns None on missing run-log instead of raising. T029: the audit-bugfix test file (now 18 tests) verifies T030/T031/T032/T033/ T034/T035/T025 fixes. 38 tests pass (audit + advancement regression). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…to US3 (#239) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

New convergence/triage.py — stage-aware triage for submitted human + simulated- personality reviews. Three filters: quality (length + evidence-indicator regex sweep — FR/SC/T ids, citations, URLs, DOIs, quoted phrases, code fences, scientific topic vocab), safety + on-topic (rule-based stop-list + stage/lens vocabulary overlap), and aspect-mapping to LLM reviewer lenses (preserved but mapped_lenses=[] when no match -> routes to the step's generic reviewer per FR-022). Injectable judge_fn for the real-LLM path (US4 wiring); rule-based default keeps unit tests offline. tests/integration/test_triage.py: 8 tests covering quality pass/fail, safety exclusion, off-topic exclusion, lens mapping, unmapped-but-preserved, record provenance, and the judge_fn injection override. All pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

) Rewrote the user-facing status-model descriptions in README + web/index.html + docs/index.html (HTML mirror copy) to convergence semantics: identify -> revise -> re-review; unanimous panel acceptance within a 3-round cap; advisory triage for human + simulated-personality reviews; no accumulated points. Replaces 6 stale "points threshold" / "Human reviews count double" passages. status_reporter.py + repository_hygiene.py needed no change for the new status model — their FR-026 duties (projects.json regen, GitHub issue comment/close on POSTED, line-count delta, gitignore assertions) are not point-dependent and remain in force unchanged. The points_research_total / points_paper_total fields the web JS displays will be removed in a follow-up (part of T041 point-system removal). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…239) Discrepancy #9 + Const. I cleanup: the accumulated review-point system is gone from the advancement decision path. Unanimous LLM-panel acceptance is now the sole gate everywhere (research + paper both). advancement.py: - Research-review gate no longer reads `accept_total` / `RESEARCH_ACCEPT_THRESHOLD`. It now uses `_all_specialists_accept(records, required)` with a defensive backstop (require ≥1 accept AND zero non-accept records when the registry isn't loaded) — mirroring the paper-side default. - Paper-review gate's `_award_review_points` call removed (the all-specialists- accept-most-recent check was already the real decision). - `_award_review_points` definition DELETED (no remaining callers). - `RESEARCH_ACCEPT_THRESHOLD` import dropped; replaced with an FR-019 comment. config.py: - `RESEARCH_ACCEPT_THRESHOLD` and `PAPER_ACCEPT_THRESHOLD` constants kept for back-compat with `web/about.html` mirror consumers, but VALUES set to 0.0 and no advancement code reads them. T038 tests (`tests/integration/test_no_points.py`, 3 tests): grep guard + behavioral assertion that no-accept records cannot trip the gate. T044: per clarify Q3 there are no posted/done projects to grandfather; the gate change applies on next tick automatically — no data-migration logic needed. Broad regression: 784 passed, 1 skipped (was 781 — three new T038 tests added). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

src/llmxive/convergence/reviewspecs.py: reviewspec_for(stage) -> ReviewSpec | None. 9 stage entries (idea + 4 research + 4 paper) matching contracts/reviewspec- registry.md; EXEMPT_STAGES frozenset of 7 mechanical steps. Constitution input is True for every spec from `specified` onward (FR-030); idea-stage opts out (no constitution yet). Kickback routing per the contract's worst-severity -> prior-stage table. Stages whose panel prompts (T049-T053) or wiring (T054-T059) haven't landed yet get _TodoReviewer / _TodoReviser placeholders that conform to the Protocol but raise NotImplementedError with a clear pointer to the follow-up task -- fail-loud SSoT structure, no silent empty verdicts. 15 contract tests pass; ruff clean. Also marked T060 (constitution-as-analyze-input, done in T031) and T061 (publisher wired into graph, done in T035) as already complete. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

US4 panel-prompt authoring: 27 lens prompts + 1 SSoT shared block + a contract test that catches future registry/file-name drift. agents/prompts/_shared/panel_review_block.md - SSoT (Constitution Principle I) for the panel R1/R3 output contract. Severity vocabulary matches the spec-015 Severity enum (trivial → fatal); identify and re-review phases both defined. agents/prompts/panels/ — 27 files total T049: panel_idea_{rq_validity,novelty,feasibility,idea_quality}.md T050: panel_spec_{requirements_coverage,internal_consistency,testability,scope}.md T051: panel_plan_{methodology,spec_coverage,data_resources,consistency}.md T052: panel_tasks_{coverage,ordering,executability,constraint_preservation}.md T053: panel_paper_spec_* (4) + panel_paper_plan_* (3) + panel_paper_tasks_* (4) Each per-lens file is thin: lens + scope ("what NOT to flag") + inputs (constitution from `specified` onward per FR-030) + per-severity-class guidance + reference to the SSoT block. T054-T059 wiring will concatenate lens-prompt + SSoT-block at render time. tests/contract/test_panel_prompts.py (16 tests) - Every lens in the ReviewSpec registry resolves to a real prompt file. - Every panel file references the SSoT block (Principle I drift guard). - Every panel file has `## Lens` and `## Output format` sections. - Reuse-stages (research_review/paper_review) map to existing specialist files, with the _research/_paper suffix convention preserved. - The SSoT block enumerates every Severity enum value + defines R1 and R3. Tech debt fixed inline (surfaced by ruff+mypy installation in venv): - reviewspecs.py: _todo_reviewers now returns list[Reviewer] (list is invariant). Removed an unused `# type: ignore`. - triage.py: JudgeFn return-type narrowed to dict[str, object]; the mapped_lenses access narrowed with isinstance(list|tuple) at the callsite — honest about the contract boundary rather than ignore. Verification: - ruff check src/llmxive/convergence + summarize.py: All checks passed - mypy src/llmxive/convergence + summarize.py: 0 errors (7 source files) - pytest tests/contract: 43 passed - pytest 4 conv-related unit files: 27 passed - pytest 3 spec-015 integration files: 29 passed - llmxive.checks.prompts: OK (53 agents) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Spec convergence unit: the new SpecReviser implements the Reviser Protocol and folds BOTH `[NEEDS CLARIFICATION]` marker resolution AND every panel concern into ONE LLM round. This is the spec-015 "collapse" — the previous two-step author + refine flow becomes one R2 call that produces a fully- revised spec.md plus a per-concern change-log. src/llmxive/convergence/revisers/spec_reviser.py - `SpecReviser` class (Reviser-protocol-conformant): constructed with (backend, repo_root, project_id, model?, token_budget?, cache_dir?). - `.revise(artifacts, concerns)`: - Picks the spec.md artifact (suffix match; excludes paper-side spec). - Gathers idea text from artifacts (`idea/` keys). - Overflow routing (FR-006): when bundle approx-tokens > budget, routes idea + comments_block through `tools.summarize.summarize` with a preservation goal that pins FR/SC ids verbatim. spec.md itself is NEVER summarized — the reviser must see what it's editing. - Composes a system (clarifier.md SSoT) + user (current spec + concerns + remaining markers + comments) prompt asking for ONE JSON document with `new_spec_md` + `responses[]`. - Honest failure modes: missing `new_spec_md` raises; non-JSON raises; fewer responses than concerns → padded with `<missing>` entries (Constitution Principle II: no silent omission). - `_scan_markers` + `_strip_json_fences` helpers (testable in isolation). src/llmxive/convergence/revisers/__init__.py - Package docstring documenting the build_*_reviewspec pattern. src/llmxive/convergence/reviewspecs.py - New `build_spec_reviewspec(backend=, repo_root=, project_id=, model=?)` returns a LIVE ReviewSpec for the spec stage with the SpecReviser bound as `.reviser`. Static `reviewspec_for("clarified")` still returns the TodoReviser placeholder; the build_* path is the live wiring (T058 will add reviewer-side wiring for the panel). - Local import of SpecReviser keeps the static-registry import graph clean for callers that never touch the live path. tests/integration/test_spec_reviser.py (8 tests) - `_scan_markers` handles bracket + bold marker forms; returns empty on clean specs. - `_strip_json_fences` handles fenced + bare JSON. - End-to-end revise: backend called with system+user; new spec text written; markers resolved; ConcernResponse per concern. - Padded missing responses: backend omits one concern → `<missing>` marker preserved (honest no-silent-omission). - Missing `new_spec_md` → RuntimeError. - Non-JSON reply → RuntimeError. - No spec.md in artifacts → ValueError (engine misuse). Verification - ruff check src/llmxive/convergence + tests: All checks passed - mypy src/llmxive/convergence + summarize.py: 0 errors (9 source files) - pytest tests/integration/test_spec_reviser.py + tests/contract: 51 passed - pytest broader unit + integration suite: 52 passed (no regressions) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#58) The audit found the publisher fix was fake: PaperPublisher was referenced only by its own class + tests + the EXEMPT list — NEVER invoked by the live pipeline. _decide_next_stage just flipped PAPER_ACCEPTED -> AWAITING_PUBLICATION_SIGNOFF -> POSTED with a comment *claiming* 'the publisher assembles the manuscript during this transition', but no code ran it. So no DOI, no final compile, no publication.yaml was ever produced. Wire it for real: - STAGE_TO_AGENT[AWAITING_PUBLICATION_SIGNOFF] = 'paper_publisher'; register PaperPublisher in _NON_SPECKIT_AGENTS + import it. run_one_step now dispatches the publisher at AWAITING; it self-gates on the FR-054 maintainer sign-off (no sign-off -> no-op stays awaiting; sign-off -> compile + Zenodo DOI + publication.yaml -> POSTED). - Make the publisher the SOLE driver of -> POSTED: _decide_next_stage(AWAITING) no longer auto-advances to POSTED on has_signoff. Previously, if the publisher hadn't run (it never did) but a sign-off existed, the graph would mark the project POSTED with no DOI/publication.yaml. Now only the publisher's own successful self-transition reaches POSTED. - Fix the issue-close hook: it fired only on a graph-detected transition, but the publisher self-transitions (project sees no next_stage change). Capture entry_stage and fire the hook whenever the project REACHES POSTED this step. - Update the brittle source-grep test to assert the real wiring (STAGE_TO_AGENT mapping + no graph-side auto-POSTED) instead of a since-removed import. Verified: ruff+mypy clean; offline suite 1232 passed / 1 skipped. Direct end-to-end verification (publisher mints a real DOI gated on sign-off) is part 7. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ew (FR-025) Spec 015 deleted the transient revision stages but the FR-025/T044 in-flight migration never reached 8 real projects left at removed stages: paper_minor_revision (7): PROJ-564/565/566/568/570/571/576 ready_for_implementation (1): PROJ-578 Neither value is in the Stage enum or the project-state schema, so project_store.load() (validate -> model_validate) RAISED on them — the projects were unloadable: invisible to the pipeline, web_data, and status_reporter (surfaced by tests/phase2/test_web_data_blocks). Verified these states are unreachable under the new architecture (the user's condition for a direct fix): not in the enum/schema, and no code assigns current_stage to either (the lone 'ready_for_implementation' literal is a revision-round final_outcome LABEL, not a stage). So a one-time data migration is correct — no load-time shim needed. All 8 are paper-track (have paper/ + completed 12-panel reviews, no research specs/), so per FR-025 'migrate forward + re-evaluate under unanimous convergence on next tick' they re-enter at paper_review (the paper convergence unit re-runs; the engine kicks back to paper revision if concerns remain). Only the current_stage line changed (file formatting + dead-but-present legacy point fields preserved, per research.md). All 639 project states now load. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The differential calibration driver (run_calibration.py) omitted the 'idea' stage, so '--stage idea' was rejected and '--stage all' skipped it — leaving the mandated circular-RQ negative (one of the 6 required flaws -> rq_validity) with no real-call differential wiring, even though its on-disk labeled set (calibration/idea/negative_circular_rq.md), the build_idea_reviewspec panel (rq_validity/novelty/feasibility/idea_quality), and the data-layer unit test already existed. Wires the idea entry: _STAGES['idea'] + build_idea_reviewspec import; _artifact_key_for_stage('idea') = a /idea/...md path (FleshOutReviser requires the idea md under such a key); _supporting_artifacts_for_stage('idea') supplies only __comments_block__ (idea is the earliest stage — constitution_input=False, so no constitution is injected). All 6 mandated flaws are now end-to-end calibration-runnable. ruff clean; argparse now accepts --stage idea. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ESTS Pre-existing bug noticed during the e2e investigation: tests/phase1, phase2, and e2e contained real network/LLM/browser-call tests gated ONLY on credential presence (e.g. skipif(not HAS_DM_KEY), both_keys_required) — NOT on the repo's real-call opt-in LLMXIVE_REAL_TESTS. In any env with keys but no/ slow network (the default dev sandbox), they EXECUTED and HUNG forever on a network socket (0% CPU) — e.g. test_librarian_cross_domain[biology], test_site.py (browser). This made `pytest tests/phase2` / tests/e2e un-runnable offline. Gates every such test (AND-ed with existing key gates) behind `_REAL = os.environ.get('LLMXIVE_REAL_TESTS') == '1'`, matching the repo's established convention (test_math_classifier, test_submission_*, the tests/real_call conftest). Offline tests in mixed files stay ungated. Verified: collection clean (262, no sockets); default `pytest tests/phase2` = 165 passed / 45 skipped, NO HANG; the previously-hanging cross_domain test now skips in 0.05s; phase1/e2e no hang; ruff clean. No src changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…alization Regression from this PR's repo-root refactor (bf94af4a): ProjectInitializerAgent now resolves its repo via llmxive.config.repo_root() instead of climbing from its own module __file__. The 3 fake-repo idempotency tests isolated by monkeypatching pi_mod.__file__ — now INERT — so the agent wrote projects/<id>/.specify/memory/constitution.md into the REAL repo: this both failed test_project_initializer_writes_on_first_invocation (it asserted against the tmp fake_repo) AND polluted the real projects/ tree with PROJ-test-* dirs. Undetected because tests/phase1 is outside the default contract+integration+unit suite scope. Fix: point repo_root() at the fake repo via monkeypatch.setenv('LLMXIVE_REPO_ROOT', fake_repo) in all 3 tests (the mechanism the refactor introduced). Agent now writes to tmp; no real-repo pollution. Removed the stray PROJ-test-* dirs + two run-log entries the buggy runs had committed-adjacent. Verified: tests/phase1/test_idempotency.py -> 4 passed; no PROJ-test-* created in the real repo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…leanup + runlog hardening Three audit findings + a latent bug surfaced fixing the first: - FIX 3 (discrepancy #7/#49): research_reviewer/paper_reviewer passed produced_by_agent=None (self-review-prevention stub). Added runlog.producer_of_artifact(project_id, artifact_path) — resolves the agent that actually recorded the artifact in its run-log outputs (newest-first, suffix path-match) — and wired it into both reviewers. (personality.py left as None by design: a '(simulated)' persona never authors an artifact.) - LATENT BUG hardened while doing FIX 3: run-log .jsonl files also hold FOREIGN records (personality-activity rows written by personality.py: action/ personality_slug/... no run_id). read_entries / latest_for_project / producer_of_artifact did RunLogEntry.model_validate_json with no guard -> crash on the first such line (latest_for_project only dodged it by reverse- scan early-return luck). Added _parse_run_log_entry() that skips non- RunLogEntry lines, catching PYDANTIC's ValidationError (runlog had imported jsonschema's ValidationError, so even the new guard wouldn't have caught the pydantic one — fixed). Regression test in test_runlog_producer.py. - FIX 4b (summarizer §3a): _render_pointer_block inlined critical elements PER CHUNK and broke out on budget overflow, so under a tight budget a reviewer's block could contain only some — or zero — critical elements (they survived on disk but not in what the panel reads). Now inlines the FULL deduped critical-element list FIRST and unconditionally (prose is what's bounded); 500-URL/budget-300 repro -> 500/500 in the block. Test updated to the correct contract (block <= budget + critical-list). - FIX 4a (discrepancy #9): deleted the unused RESEARCH_ACCEPT_THRESHOLD / PAPER_ACCEPT_THRESHOLD constants (DEFAULTS + module vars + __all__) — no consumer anywhere (the 'back-compat' comment was stale). Verified: ruff (whole repo) clean; mypy src 0; offline suite 1235 passed / 1 skipped / 2 deselected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…FR-027/028 / #239 core) Completes the central #239 deliverable left as placeholders: the per-step multi-lens convergence panels are now INVOKED in production for every reviewable doc-stage (previously the agent ran + advanced with no panel). - New shared helper src/llmxive/speckit/_stage_panel.py::run_stage_panel: in-cmd engine path (mirrors paper_implement_cmd) via run_engine_for_project — converged -> advance; kickback -> write human_input_needed.yaml + raise StagePanelKickback (no advance); engine exception -> escalate (StagePanelEscalation). Sources each stage's required __X__ extras from the REAL project artifacts (idea md, comments, spec/plan, constitution, templates); empty string for legitimately-absent inputs (FR-049). - Wired into write_artifacts of: clarify_cmd (spec), plan_cmd (plan), paper_clarify_cmd (paper_spec), paper_plan_cmd (paper_plan), paper_tasks_cmd (paper_tasks). Each guards backend-None (offline) -> skip gracefully (the agent already produced the artifact; never crash the stage). - tasks: _tasker_engine_bridge no longer OVERWRITES the panel with a single analyze reviewer — it now runs the live 4-lens build_tasks_reviewspec panel ALONGSIDE the analyze-derived reviewer (keeps spec-014 honest-reporting AND the LLM lenses; placeholder reviewers filtered when backend is None). - 11 new integration tests (fake backend): per doc-stage, panel-invoked + advance on all-accept, and kickback marker written + no-advance on a fail verdict; + a tasks-bridge test proving the live panel runs with analyze. - Corrected 2 tests written against the bug (panel bypassed) to be panel-aware fake backends WITHOUT weakening their disk-state/convergence assertions. Verified: ruff src clean; mypy src 0 (155 files); offline suite 1246 passed / 1 skipped / 2 deselected (1235 + 11). No real LLM calls; PROJ-552 untouched by this change. Real-call verification on PROJ-552 follows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y calls (F-13) Found by the real-call e2e (PROJ-552 spec panel): the convergence panel (LLMReviewer), the revisers, and the FR-011 self-consistency audit call backend.chat DIRECTLY (not via the router), and invoke_reviser_backend + self_consistency_pass passed NO max_tokens. Dartmouth then omits the field, so the API applies its OpenAI-shaped 512-token default. qwen3.5-122b is a *reasoning* model — its hidden chain-of-thought consumed all 512 tokens before emitting any answer -> empty content + finish_reason=length -> TransientBackendError -> the stage panel escalated to human_input_needed. This broke EVERY reasoning-model panel/reviser/self-consistency call in production. Fix: pass a reasoning-safe budget (131072, matching chat_with_fallback's default; qwen's 256K context leaves ample input room) on these direct backend.chat calls, via a shared _chat_reasoning_safe() helper that degrades gracefully for backends / test fakes whose signature omits the kwargs (TypeError -> retry without max_tokens, then bare). LLMReviewer's default max_tokens 8192 -> 131072 for consistency + safety on complex lenses. Verified live: after the fix the spec panel RUNS the full 4-lens x 3-round loop (R1 produced 10 substantive, well-calibrated concerns) instead of failing at 512. Offline suite 1246 passed / 1 skipped / 2 deselected; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Real-call e2e bug: every convergence reviser instructed the LLM to embed the full revised document(s) as JSON STRING value(s) (new_*_md) then bare json.loads'd them. A ~19K-char spec/plan/paper doc full of quotes/$/backslashes made the model emit invalid JSON (one unescaped quote ends the string early -> "Expecting , delimiter ... char 19455"), crashing R2 of the convergence loop on EVERY reviewable stage. Fix - new shared src/llmxive/convergence/revisers/_reviser_response.py: - RESPONSE_FORMAT_BLOCK: a SMALL fenced json change-log, then each full artifact VERBATIM between ===BEGIN_ARTIFACT <repo-rel-path>=== / ===END_ARTIFACT=== markers (raw - quotes/$/backslashes need no escaping). - parse_reviser_response(text, expected_artifacts) -> (artifacts_by_path, responses): regex-extracts delimited blocks (no unescaping, CRLF-tolerant); parses the change-log leniently (reuses clarify _escape_newlines + YAML fallback); BACKWARD-COMPAT fallback to legacy new_*_md/updated_artifacts JSON; fail-loud RuntimeError on total failure. - build_concern_responses: shared one-per-concern padding (<missing>/<empty>). Migrated all 9 reviser classes (single-doc spec/paper_spec/tasks/paper_tasks/ flesh_out; multi-doc plan/paper_plan; code implementer/paper_implement) - prompt + _parse_response - preserving each one's exact error messages, path-validation, empty-map tolerance (impl #49), and dispatch prefixes. Legacy single-doc key selection is target-suffix-aware (tasks.md->new_tasks_md) - fixed a test_tasker_production_cutover regression. Verified: ruff src clean; mypy src 0 (156 files); offline suite 1260 passed / 1 skipped / 2 deselected (+14 new). REAL-CALL (PROJ-552 spec, qwen3.5-122b, one isolated round): qwen reliably produced the delimited format; parser extracted a complete 16,981-char revised spec.md + 2 well-formed responses. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…fs (F-18) Part-7 e2e finding: PROJ-552's spec.md attached a FABRICATED citation ("Lee et al. 2024, arXiv:2402.13") to a (correct) knot count. The malformed arXiv id slipped past extraction (regex required \d{4}\.\d{4,5}), and when the convergence panel correctly flagged "verify this citation" the reviser "resolved" it by fabricating a *different* wrong number + a second fake citation. Violates Constitution Principle II (no fabricated references). General fix — a citation-verification "strip/flag" pass that resolves every external reference in produced docs and rewrites unresolvable ones in-place as `[UNVERIFIED: <ref> — <reason>]` (explicit + greppable; never silently deleted): - NEW src/llmxive/agents/citation_guard.py: apply_citation_verdicts (pure, idempotent rewriter), verify_and_clean (network orchestrator). - reference_validator.extract_citations: also capture MALFORMED arXiv refs (e.g. arXiv:2402.13) so fabricated-malformed ids are flagged, not ignored. - Hooked at BOTH production points: stage-doc write (slash_command _validate_artifact_citations writes cleaned text back) AND the shared reviser chokepoint (_self_consistency) so reviser-introduced fakes are caught too, BEFORE the panel re-reviews (prevents the fabrication cascade). Resolution is REGISTRAR-AGNOSTIC (requirement: support Zenodo/bioRxiv/PsyArXiv/ medRxiv/OSF + all URLs). New public verify.resolve_reference(kind, value) resolves DOIs via https://doi.org/<doi> redirect (works for Crossref AND DataCite) instead of the old Crossref-only path that would FALSE-FLAG every DataCite DOI (Zenodo 10.5281, PsyArXiv/OSF 10.31234). arXiv→arxiv.org/abs, URL→HEAD+GET. Paywall/403-after-redirect = PRESENT (not flagged); 404/DNS/ malformed = flagged. Drops the FR-022-forbidden fetch_citation caller. Real-call verified: real Zenodo/PsyArXiv/OSF/bioRxiv/arXiv/URL all resolve PRESENT; fabricated DOI/URL + malformed arXiv:2402.13 all flagged. ruff clean; mypy 0 (157 files); offline gate 1267 passed (+pure-logic guard tests); real_call + FR-022 no-duplicate-caller tests pass live. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…18c) Per user decision: a document containing any citation-guard `[UNVERIFIED: <ref> — <reason>]` marker (a reference that could not be resolved to a live primary source) MUST NOT advance through the pipeline. Wired generally at the three gate sites: - convergence engine (universal gate for the 6 doc-stages): run_convergence scans the FINAL produced-doc artifacts (skipping __sentinel__ context keys) BEFORE declaring convergence; each artifact still carrying a marker yields a synthesized Severity.SCIENCE concern naming the artifact + verbatim marker bodies, appended to open_concerns so converged->False and route_kickback carries the reason (SCIENCE routes the factual defect to an earlier content stage, not an in-loop re-edit). Clean artifacts converge exactly as before. - advancement evaluator: research-accept and paper-accept now block when the project's governing artifacts contain markers, OR-combined with the existing _has_blocking_citations status gate. - paper_complete gate (graph.py): blocks paper_in_progress->paper_complete when paper artifacts contain markers (cheap short-circuit before the LaTeX build). New citation_guard helpers: UNVERIFIED_MARKER_PREFIX (single source of truth), has_unverified_markers, find_unverified_markers, project_unverified_markers, project_artifacts_have_markers. kickback.route_kickback reason surfaces markers. Also: graph.py two telemetry print() -> logger.warning (noticed in passing). Tests +10 (guard helpers, engine converged->kickback on marker, advancement non-advance, paper-complete block). ruff clean; mypy 0 (157 files); offline gate 1277 passed / 1 skipped / 2 deselected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Brainstormed design for verifying that a cited claim is substantiated by the FULL TEXT of the source (numbers match, concept conveyed accurately), not just that the reference exists (F-18) or that the abstract overlaps (F-19 v1). Maintainer-confirmed decisions: hybrid passage-location + LLM entailment; open-access-first retrieval cascade (arXiv / Unpaywall / Semantic Scholar openAccessPdf / preprint patterns / direct URL) with abstract fallback; reviser-chokepoint each round + persistent (source,claim) cache; flag on unreadable/unresolvable/free-text; standalone llmxive.grounding service reusing pdf_sample/verify helpers (librarian untouched). UNPAYWALL_EMAIL=llmxive@gmail.com. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

9 TDD tasks: config, RetrievedDoc + PDF/HTML extractors, OA-first retrieval cascade (arXiv/Unpaywall/S2/preprint/URL), passage location + LLM entailment, persistent caches, service orchestrator + policy, wire into F-19 guard, real-call e2e + gates, tracker. Reuses pdf_sample/verify/librarian.cache. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… + reviser hook) Abstract-only/arXiv-only grounding baseline. Becomes the extraction + rewriter front-end for the full-text grounding service (F-19 v2, see docs/superpowers/plans/2026-05-29-full-text-claim-grounding.md); its arXiv-only _fetch_source_text/ground_claim internals are replaced there. Env-gated LLMXIVE_GROUNDING_GUARD (on in cli.run, off in offline gate). offline 1290. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Add offline test that runs pypdf on a hand-authored minimal PDF byte string asserting "Grounding" and "12345" round-trip through extract_pdf_text (issue 1). - Move `import pypdf` outside the bare except so ImportError propagates instead of being swallowed silently (issue 2). - Re-export RetrievedDoc, extract_pdf_text, html_to_text from grounding/__init__.py with __all__ (issue 3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…reprint/URL) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two JSON caches under state/grounding-cache/{fulltext,verdict}/, keyed by SHA-256. Verdict key includes normalized claim + number so different numbers yield independent cache entries. max_age_s<0 always expires. TTL defaults: 90d full-text, 30d verdict. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…JSON on concurrent writes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace manual os.write/os.close pair with os.fdopen context manager so the file descriptor is closed exactly once. On failure, unlink the temp file safely (ignoring OSError) so the original exception is not masked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ground_claim now keeps only the free-text short-circuit and delegates resolvable-source grounding to grounding.service.ground_cited_claim via a function-local _service_ground seam (avoids the import cycle: service + entailment import names from grounding_guard at module top). Deletes the dead abstract-only grounding body and _fetch_source_text. Threads repo_root into ground_claim from verify_grounding_and_clean. Updates the real_call test to the new signature + full-text service reason strings (verified live: 5 passed against arXiv + Dartmouth). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-call assertion Remove the unused `timeout=30.0` parameter from `ground_claim` — the underlying service takes no timeout arg so it was never forwarded. Confirm no callers pass it. Strengthen `test_number_not_in_cited_source_is_flagged`: in addition to asserting `ok is False`, assert the reason text contains at least one of "not found", "contradict", or "unreadable" (case-insensitive), ensuring the service vocabulary is reflected in the flag reason. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

F-19 v2 Task 8. Adds tests/real_call/test_grounding_end_to_end.py: a fabricated cited number on a real arXiv paper (1706.03762) is flagged [UNVERIFIED]; a correctly cited number is not. Also fixes prompt-block resolution so the extraction and entailment prompt blocks load from the real repo root (config.repo_root()) when not found under the per-run cache repo_root (which may be a tmp dir for isolation). Without this, passing repo_root=tmp_path silently skipped extraction/entailment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r unreadable grounding sources F-19 full-text claim-grounding holistic-review fixes: Fix 1 (number gate, design §5): ground_cited_claim now overrides an LLM "grounded" verdict to a FLAG when the claim's number is absent from the retrieved source text (number_substantiated() pure helper, unit-tested offline + proven end-to-end to flip grounded->flagged). Fix 2 (Tier-5 URL): _fetch_url_text restricts to http/https, streams the body with a 50MB cap (shared PDF_MAX_BYTES), bounded timeout+redirects; non-http schemes yield no text. Fix 3: unreadable sources are no longer written to either cache, so a transient retrieval failure self-heals next round. Fix 4 (doc): design §9 records the reviser-chokepoint-only and v1-only preprint limitations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…/F-20 B) Panel non-convergence now writes a generic convergence_kickback.yaml record (to_stage/worst_severity/reason/unresolved_concerns/stage) instead of human_input_needed.yaml; the graph consumes it and auto-kicks-back to the content stage, bounded by a per-stage kickback cap (CONVERGENCE_KICKBACK_CAP=3) that escalates to human_input_needed and resets on clean advancement. human_input_needed.yaml is reserved for genuine human escalation (engine exception + cap-hit). Adds an on_round inspection trail persisted under .specify/memory/convergence_trail/<stage>-NNN.jsonl, and the missing backward kickback transitions in ALLOWED_TRANSITIONS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremymanning and others added 30 commits May 27, 2026 20:08

docs(015): T036 US8 roll-up — 7 of 10 discrepancies closed; 3 fold in…

e9b3b77

…to US3 (#239) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

docs(015): T043 follow-up — 2 more stragglers (#239)

8b2f066

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

docs(015): T045 US3 sweep — points gone, triage live (#239)

1ded8d4

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix(015): T048 ruff — replace en-dashes in docstrings (#239)

97bcfff

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jeremymanning and others added 29 commits May 29, 2026 14:39

feat(015): grounding config — UNPAYWALL_EMAIL + grounding_cache_dir

912b834

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(015): grounding RetrievedDoc + PDF/HTML text extractors

7edeb82

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(015): OA-first full-text retrieval cascade (arXiv/Unpaywall/S2/p…

5a43d43

…reprint/URL) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(015): passage location + LLM entailment for claim grounding

abf0fed

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(cache): atomic _write via tempfile+os.replace to prevent corrupt …

fcc0444

…JSON on concurrent writes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(015): grounding service orchestrator + policy decide()

17e327b

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(015): gitignore state/grounding-cache (transient grounding cache)

d9eb04a

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremymanning mentioned this pull request May 30, 2026

Claim-verification layer: register → resolve → cache every factual claim (trustworthy science, no hallucinated results) #256

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec 015: pipeline convergence protocol (closes #239)#250

spec 015: pipeline convergence protocol (closes #239)#250
jeremymanning wants to merge 163 commits into
mainfrom
015-pipeline-convergence-protocol

jeremymanning commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeremymanning commented May 29, 2026

Summary

Key behavior (selected FRs)

Hardening in this PR

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant