Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
163 commits
Select commit Hold shift + click to select a range
4c9e1a9
spec(015): pipeline-wide convergence protocol + recursive summarizer …
jeremymanning May 28, 2026
ec89128
spec(015): clarify — record Session 2026-05-27 decisions (#239)
jeremymanning May 28, 2026
c8f8469
plan(015): implementation plan + Phase 0/1 artifacts (#239)
jeremymanning May 28, 2026
f621a50
tasks(015): 85 dependency-ordered tasks across 11 phases (#239)
jeremymanning May 28, 2026
ec1625d
analyze(015): fix all cross-artifact findings (#239)
jeremymanning May 28, 2026
e57282a
impl(015): Setup phase T001-T003 (#239)
jeremymanning May 28, 2026
001f877
impl(015): US1 summarizer core T008/T010-T016 (#239)
jeremymanning May 28, 2026
639833d
impl(015): T017 re-point paper_reviewer to SSoT summarizer (#239)
jeremymanning May 28, 2026
6141f4c
impl(015): US1 complete — T009/T018 real-call fidelity (#239)
jeremymanning May 28, 2026
573560d
impl(015): Foundational T004-T007 — convergence data model + constitu…
jeremymanning May 28, 2026
d6e9183
impl(015): US2 convergence engine core (#239)
jeremymanning May 28, 2026
cd24230
fix(015): tech debt noticed-as-encountered + T003 schema gap (#239)
jeremymanning May 28, 2026
5814f03
impl(015): T030 research-implementer prompt (FR-035, discrepancy #1) …
jeremymanning May 28, 2026
a4bbb53
impl(015): T034 arXiv/theoremsearch transient resilience (FR-040) (#239)
jeremymanning May 28, 2026
e8ecdcc
docs(015): STATUS checkpoint — US8 T030/T034 done; offline suite gree…
jeremymanning May 28, 2026
6c0501b
impl(015): T031 + FR-030 — real analyze prompts; constitution in anal…
jeremymanning May 28, 2026
9a25957
impl(015): T032 dead escalation paths (discrepancy #5) (#239)
jeremymanning May 28, 2026
0c864a5
impl(015): T033 paper_specifier prompt input drift (discrepancy #10) …
jeremymanning May 28, 2026
9f56a1f
impl(015): T035 publisher wiring + manual DOI sign-off gate (FR-036/F…
jeremymanning May 28, 2026
63f5cdf
impl(015): T025 self-review prevention + T029 audit-bugfix tests (#239)
jeremymanning May 28, 2026
e9b3b77
docs(015): T036 US8 roll-up — 7 of 10 discrepancies closed; 3 fold in…
jeremymanning May 28, 2026
a494d8c
impl(015): T037/T039 review-intake triage (FR-021/22/23) (#239)
jeremymanning May 28, 2026
f9f3995
docs(015): T043 status-model re-expression — convergence semantics (#…
jeremymanning May 28, 2026
8b2f066
docs(015): T043 follow-up — 2 more stragglers (#239)
jeremymanning May 28, 2026
63dc508
impl(015): US3 T038/T041/T044 — point system REMOVED (FR-019/FR-020) …
jeremymanning May 28, 2026
1ded8d4
docs(015): T045 US3 sweep — points gone, triage live (#239)
jeremymanning May 28, 2026
4b10318
impl(015): T048 ReviewSpec registry (FR-027-031) (#239)
jeremymanning May 28, 2026
97bcfff
fix(015): T048 ruff — replace en-dashes in docstrings (#239)
jeremymanning May 28, 2026
b325d5c
impl(015): T049-T053 panel prompts (FR-027/FR-028/FR-030) (#239)
jeremymanning May 28, 2026
568331a
impl(015): T054 SpecReviser — collapse specifier+clarifier (#239)
jeremymanning May 28, 2026
7ef781a
impl(015): T055 PaperSpecReviser — collapse paper_specifier+paper_cla…
jeremymanning May 28, 2026
cf9202d
impl(015): T056 PlanReviser + PaperPlanReviser — plan unit collapse (…
jeremymanning May 28, 2026
c489c02
impl(015): T057 TasksReviser + PaperTasksReviser — tasks unit collaps…
jeremymanning May 28, 2026
3b0dafe
impl(015): T058 ImplementerReviser + filesystem re-verify (#49) (#239)
jeremymanning May 28, 2026
b558e05
impl(015): T059 PaperImplementReviser — paper-implement unit collapse…
jeremymanning May 28, 2026
f9080db
impl(015): T042 unify two legacy paper-revision schemes onto Kickback…
jeremymanning May 28, 2026
039082f
docs(015): mark T042 done in tasks.md (#239)
jeremymanning May 28, 2026
a147cd3
impl(015): T040 personality cron → triage wiring (#239)
jeremymanning May 28, 2026
b126b88
impl(015): T046+T047 panel integration tests — engine ↔ reviser ↔ pan…
jeremymanning May 28, 2026
b1936e0
docs(015): mark T046 done in tasks.md (#239)
jeremymanning May 28, 2026
4475a2a
impl(015): T080+T081+T082 polish — inspection hook, invariants, SSoT …
jeremymanning May 28, 2026
15f3ff8
fix(015): T084 full verification suite — fix T040 regression in pre-e…
jeremymanning May 28, 2026
578974a
docs(015): T083 docs parity — README mentions sign-off gate + publish…
jeremymanning May 28, 2026
a345d65
docs(015): T062 per-step verification table (#239)
jeremymanning May 28, 2026
bbf7cef
impl(015): T021 run_engine_for_project bridge + real-project integrat…
jeremymanning May 28, 2026
eb6a54d
impl(015): T027 tasker → convergence-engine bridge (#239)
jeremymanning May 28, 2026
2b114aa
docs(015): T085 autonomous completion summary + final verification ta…
jeremymanning May 28, 2026
ffc2490
impl(015): T063+T064 calibration flaw injectors (US5) (#239)
jeremymanning May 28, 2026
844bbbb
impl(015): T065 differential calibration harness (US5) (#239)
jeremymanning May 28, 2026
d650b11
impl(015): T066 calibration anchor papers per domain (US5) (#239)
jeremymanning May 28, 2026
8e7cb0d
impl(015): T067 per-panel labeled calibration sets (US5) (#239)
jeremymanning May 28, 2026
4f832bd
impl(015): T069 held-out calibration + T071 e2e domain traversal harn…
jeremymanning May 28, 2026
2bbf40c
impl(015): T076+T077+T078 living-document module (US7) (#239)
jeremymanning May 28, 2026
b97a875
impl(015): T072 golden + weak project fixtures (US6) (#239)
jeremymanning May 28, 2026
a7da5a9
docs(015): STATUS final — 78 of 85 autonomous done; 1020 tests pass (…
jeremymanning May 28, 2026
830fd66
impl(015): LLMReviewer unblocks T068 real-call calibration (#239)
jeremymanning May 28, 2026
cebee31
fix(015): LLMReviewer accepts YAML wrapped in code fences (#239)
jeremymanning May 28, 2026
0fe85db
impl(015): calibration runner + GitHub Actions workflow (#239)
jeremymanning May 28, 2026
1c45210
fix(015): match CrossRef en-dash in biology anchor title (#239)
jeremymanning May 28, 2026
be96060
fix(015): diversify golden projects + disguise weak control (#239)
jeremymanning May 28, 2026
1754edb
fix(015): calibration workflow rebase+retry push + upload report arti…
jeremymanning May 28, 2026
0c01e8e
calib(015): spec run (20260528T174412Z) (#239)
May 28, 2026
5309158
Revert "calib(015): spec run (20260528T174412Z) (#239)"
jeremymanning May 28, 2026
30f1f72
fix(015): runner detects Dartmouth outage + aborts on total-failure (…
jeremymanning May 28, 2026
51843b0
feat(015): centralized retry-with-backoff for all Dartmouth Chat call…
jeremymanning May 28, 2026
7ccf777
calib(015): spec run (20260528T182009Z) (#239)
May 28, 2026
f9d19b9
impl(015): FR-027 idea-stage reviser + build_idea_reviewspec (#239)
jeremymanning May 28, 2026
cc77504
impl(015): FR-047/048 production wiring — posted comments → living_do…
jeremymanning May 28, 2026
c39e49e
impl(015): T027 production cutover — convergence engine is the defaul…
jeremymanning May 28, 2026
b36803b
impl(015): FR-044 adaptive sensitivity recommender (#239)
jeremymanning May 28, 2026
23899f7
impl(015): T042 engine = sole revision driver (FR-034 SSoT) (#239)
jeremymanning May 28, 2026
1ae7628
docs(015): back-compat alias `summarize_to_budget` for issue #239's d…
jeremymanning May 28, 2026
984eccc
fix(015): calibration run 26592405739 surfaced 2 real bugs (#239)
jeremymanning May 28, 2026
fd7e15c
fix(015): build_panel resolves paper_review + research_review stages …
jeremymanning May 28, 2026
5409a56
calib(015): paper_plan run (20260528T202352Z) (#239)
May 28, 2026
e976fe0
calib(015): paper_implement run (20260528T202723Z) (#239)
May 28, 2026
d3d5909
calib(015): paper_spec run (20260528T202756Z) (#239)
May 28, 2026
b11778f
calib(015): spec run (20260528T202901Z) (#239)
May 28, 2026
3a981ea
calib(015): plan run (20260528T203340Z) (#239)
May 28, 2026
b2ae107
calib(015): paper_tasks run (20260528T203805Z) (#239)
May 28, 2026
c791639
calib(015): tasks run (20260528T203932Z) (#239)
May 28, 2026
818acaa
docs(015): T082 SSoT audit — row 4 count update + stale .pyc cleanup …
jeremymanning May 28, 2026
9f02a0a
feat(015): LLM-based off-topic triage judge for T079 robustness (#239)
jeremymanning May 28, 2026
d417a63
feat(015): bind FR-054 publish-approve to GitHub identity (#239)
jeremymanning May 28, 2026
6c5d088
calib(015): paper_implement run (20260528T210116Z) (#239)
May 28, 2026
fb5a6e8
calib(015): paper_plan run (20260528T210525Z) (#239)
May 28, 2026
2dacb36
calib(015): paper_spec run (20260528T210724Z) (#239)
May 28, 2026
3f6ec54
calib(015): spec run (20260528T211212Z) (#239)
May 28, 2026
f69e265
calib(015): tasks run (20260528T211446Z) (#239)
May 28, 2026
f416bd8
calib(015): paper_tasks run (20260528T211542Z) (#239)
May 28, 2026
8dc4038
calib(015): plan run (20260528T213528Z) (#239)
May 28, 2026
a3101be
fix(015): regen calibration labels after stage-lens-override fix (#239)
jeremymanning May 28, 2026
597b756
feat(015): runner-version tagging for FR-044 recommender (#239)
jeremymanning May 29, 2026
9da4a8f
fix(015): YAML parser stage 3 — aggressive block-scalar fallback (#239)
jeremymanning May 29, 2026
f4c3958
calib(015): paper_implement run (20260529T021125Z) (#239)
May 29, 2026
9388ea7
docs(015): document terminal-judgment stages vs engine kickback (#239)
jeremymanning May 29, 2026
4f1a288
fix(015): YAML recovery handles list-item key form (- text: value) (#…
jeremymanning May 29, 2026
74e1ebf
calib(015): paper_implement run (20260529T022113Z) (#239)
May 29, 2026
d4d7bc9
calib(015): paper_plan run (20260529T022131Z) (#239)
May 29, 2026
77d9204
calib(015): spec run (20260529T022550Z) (#239)
May 29, 2026
96e0cb1
calib(015): tasks run (20260529T023617Z) (#239)
May 29, 2026
d01751f
fix(015): LLMReviewer parser accepts action_items as concerns alias (…
jeremymanning May 29, 2026
6dfa224
calib(015): paper_plan run (20260529T032201Z) (#239)
May 29, 2026
457b3ed
calib(015): paper_spec run (20260529T032611Z) (#239)
May 29, 2026
a3eee11
calib(015): paper_tasks run (20260529T033443Z) (#239)
May 29, 2026
848334e
fix(015): per-paper-stage injectors + paper_implement no-tasks short-…
jeremymanning May 29, 2026
d31e37a
calib(015): paper_spec run (20260529T101745Z) (#239)
May 29, 2026
a39e874
calib(015): paper_plan run (20260529T101927Z) (#239)
May 29, 2026
cf550bc
calib(015): paper_implement run (20260529T102205Z) (#239)
May 29, 2026
37b703f
calib(015): paper_tasks run (20260529T102930Z) (#239)
May 29, 2026
ec644ce
style(015): post-commit ruff RUF059 unused-unpacked fixes (#239)
jeremymanning May 29, 2026
f99176b
fix(015): Concern.text + ConcernResponse fields require non-empty con…
jeremymanning May 29, 2026
8241d81
fix(015): supply upstream artifacts to engine in calibration + produc…
jeremymanning May 29, 2026
6076dc5
calib(015): paper_plan run (20260529T115316Z) (#239)
May 29, 2026
6ab91d0
calib(015): plan run (20260529T115431Z) (#239)
May 29, 2026
db69167
calib(015): spec run (20260529T115933Z) (#239)
May 29, 2026
989237c
calib(015): paper_spec run (20260529T120158Z) (#239)
May 29, 2026
9e4dbd0
calib(015): paper_tasks run (20260529T120741Z) (#239)
May 29, 2026
08b1cb3
calib(015): tasks run (20260529T120956Z) (#239)
May 29, 2026
5bc2746
perf(015): use qwen3.5-122b's real 256K context (was assumed 32K) (#239)
jeremymanning May 29, 2026
1190869
fix(015): FR-012 — R1-accepters re-review when R2 changed an artifact…
jeremymanning May 29, 2026
3044e7b
feat(015): FR-011 reviser self-consistency second pass (#239)
jeremymanning May 29, 2026
3d3c5cb
feat(015): FR-048 living-document batched-recompile orchestrator + cr…
jeremymanning May 29, 2026
bc3432d
chore(015): router.py lint + docstring drift (max_tokens default 8192…
jeremymanning May 29, 2026
5470aa1
docs(015): STATUS — record 2026-05-29 fresh-review fixes (#239)
jeremymanning May 29, 2026
ce19452
test(015): make dedicated reviser-test fakes audit-aware (follow-up t…
jeremymanning May 29, 2026
f33c37b
docs(015): STATUS final-verification count (1232 passed) (#239)
jeremymanning May 29, 2026
2ca4d4a
refactor(015): remove HF-Inference-API backend; HF models run locally…
jeremymanning May 29, 2026
3b04dd9
style(015): ruff safe auto-fixes across src/llmxive (169 findings)
jeremymanning May 29, 2026
0c84f48
fix(015): ruff per-case fixes + two latent finally-return bugs
jeremymanning May 29, 2026
77abdb6
fix(015): configurable repo root (LLMXIVE_REPO_ROOT) + de-rot Phase-3…
jeremymanning May 29, 2026
5095abc
fix(015): mypy strict-clean across src/llmxive (213 -> 0)
jeremymanning May 29, 2026
8c3a1a7
style(015): ruff-clean tests/ scripts/ agents/ specs/ prototypes (141…
jeremymanning May 29, 2026
75454bf
fix(015): wire live LLMReviewer panels into build_*_reviewspec (was N…
jeremymanning May 29, 2026
6565266
fix(015): actually wire PaperPublisher into the graph (discrepancy #2…
jeremymanning May 29, 2026
949074c
fix(015): migrate 8 projects stranded in removed stages -> paper_revi…
jeremymanning May 29, 2026
1f3f28c
fix(015): add idea stage to the real-call calibration driver (FR-046)
jeremymanning May 29, 2026
97ccf5b
test(015): gate offline-hanging real-call tests behind LLMXIVE_REAL_T…
jeremymanning May 29, 2026
c7ebad7
fix(015): repair test_idempotency isolation broken by repo-root centr…
jeremymanning May 29, 2026
61d1c05
fix(015): self-review resolver, summarizer no-loss block, threshold c…
jeremymanning May 29, 2026
cbc564b
feat(015): wire convergence panels into the 6 reviewable doc-stages (…
jeremymanning May 29, 2026
3668b72
fix(015): reasoning-safe max_tokens for panel/reviser/self-consistenc…
jeremymanning May 29, 2026
1234a01
fix(015): robust delimited reviser response contract (F-16)
jeremymanning May 29, 2026
9e0cef8
fix(015): citation strip/flag guard — flag fabricated/unresolvable re…
jeremymanning May 30, 2026
e9d13b3
fix(015): hard-block advancement on [UNVERIFIED] citation markers (F-…
jeremymanning May 30, 2026
a43bb9c
docs(015): design — full-text claim grounding (F-19 v2)
jeremymanning May 30, 2026
977d5e4
docs(015): implementation plan — full-text claim grounding (F-19 v2)
jeremymanning May 30, 2026
503a3cf
feat(015): F-19 v1 factual-grounding front-end (extraction + rewriter…
jeremymanning May 30, 2026
912b834
feat(015): grounding config — UNPAYWALL_EMAIL + grounding_cache_dir
jeremymanning May 30, 2026
7edeb82
feat(015): grounding RetrievedDoc + PDF/HTML text extractors
jeremymanning May 30, 2026
a3650c0
fix(grounding): real PDF test, loud ImportError, __init__ re-exports
jeremymanning May 30, 2026
5a43d43
feat(015): OA-first full-text retrieval cascade (arXiv/Unpaywall/S2/p…
jeremymanning May 30, 2026
abf0fed
feat(015): passage location + LLM entailment for claim grounding
jeremymanning May 30, 2026
35741ce
feat(015): persistent full-text + verdict caches for grounding
jeremymanning May 30, 2026
fcc0444
fix(cache): atomic _write via tempfile+os.replace to prevent corrupt …
jeremymanning May 30, 2026
ae543cd
fix(grounding): close temp fd exactly once in _write, clean up on error
jeremymanning May 30, 2026
17e327b
feat(015): grounding service orchestrator + policy decide()
jeremymanning May 30, 2026
018c0b6
feat(015): delegate F-19 ground_claim to the full-text grounding service
jeremymanning May 30, 2026
3b90606
fix(F-19): drop dead timeout param from ground_claim; strengthen real…
jeremymanning May 30, 2026
5dea5e9
test(015): end-to-end full-text grounding real-call proof
jeremymanning May 30, 2026
2b5252a
fix(015): deterministic number gate + hardened URL tier + no-cache fo…
jeremymanning May 30, 2026
d9eb04a
chore(015): gitignore state/grounding-cache (transient grounding cache)
jeremymanning May 30, 2026
d193294
feat(015): adaptive convergence-kickback flow + per-round trail (F-14…
jeremymanning May 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
149 changes: 149 additions & 0 deletions .github/workflows/spec015-calibration.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
name: spec015 — differential calibration

# Runs `scripts/run_calibration.py` against the on-disk calibration set
# and commits the produced adjudication reports.
#
# Triggers:
# - `workflow_dispatch` — manual trigger from the GitHub Actions UI;
# the maintainer picks a stage + domain label.
# - `schedule` (commented out by default) — weekly automatic runs once
# the maintainer is comfortable letting it auto-run.
#
# Permissions: needs `contents: write` to commit the report back to the
# calibration reports directory.

on:
workflow_dispatch:
inputs:
stage:
description: 'Which stage to calibrate (spec/plan/tasks/paper_spec/paper_plan/paper_tasks/paper_implement/all)'
required: true
default: 'spec'
type: choice
options:
- spec
- plan
- tasks
- paper_spec
- paper_plan
- paper_tasks
- paper_implement
- all
domain:
description: 'Domain label written into the report header (or "(unspecified)")'
required: false
default: '(unspecified)'
type: string
max_tokens:
description: 'Per-call max_tokens for the reasoning model (default 131072 = 128K; qwen3.5-122b has a 256K context window so this leaves ample room for input + reasoning)'
required: false
default: '131072'
type: string
# Uncomment to run weekly once the workflow is trusted:
# schedule:
# - cron: '0 6 * * 0' # 06:00 UTC Sundays

jobs:
calibrate:
runs-on: ubuntu-latest
timeout-minutes: 90
permissions:
contents: write
steps:
- name: Check out repo
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'

- name: Install package (with dev extras)
run: |
python -m pip install --upgrade pip
pip install -e '.[dev]'

- name: Run differential calibration
env:
DARTMOUTH_CHAT_API_KEY: ${{ secrets.DARTMOUTH_CHAT_API_KEY }}
# Inputs piped via env (safer than direct ${{ ... }} expansion
# in run: blocks — workflow_dispatch is write-access-only but
# we follow the same hardening as untrusted-input workflows).
STAGE: ${{ inputs.stage }}
DOMAIN: ${{ inputs.domain }}
MAX_TOKENS: ${{ inputs.max_tokens }}
run: |
mkdir -p specs/015-pipeline-convergence-protocol/calibration/reports
python scripts/run_calibration.py \
--stage "$STAGE" \
--domain "$DOMAIN" \
--max-tokens "$MAX_TOKENS" \
2>&1 | tee calibration-run.log

# Upload the produced report (+ run log) as an artifact BEFORE
# attempting any git commit. Calibration runs are expensive (~25 min);
# a race-condition push failure shouldn't lose the output.
- name: Upload calibration outputs as artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: spec015-calibration-output
path: |
calibration-run.log
specs/015-pipeline-convergence-protocol/calibration/reports/

- name: Commit + push the report
if: always()
env:
STAGE: ${{ inputs.stage }}
DOMAIN: ${{ inputs.domain }}
MAX_TOKENS: ${{ inputs.max_tokens }}
run: |
git config user.name 'spec015-calibration-bot'
git config user.email 'spec015-calibration-bot@users.noreply.github.com'
git add specs/015-pipeline-convergence-protocol/calibration/reports/ \
calibration-run.log || true
if git diff --cached --quiet; then
echo "No new report to commit."
exit 0
fi
TIMESTAMP="$(date -u +%Y%m%dT%H%M%SZ)"
git commit -m "calib(015): ${STAGE} run (${TIMESTAMP}) (#239)

Triggered via workflow_dispatch with:
stage=${STAGE}
domain=${DOMAIN}
max_tokens=${MAX_TOKENS}

Maintainer: review the produced report under
specs/015-pipeline-convergence-protocol/calibration/reports/
and fill in the adjudication checklist per FR-046.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>"

# Race-condition handling: the calibration step takes ~25 min,
# during which other commits may have landed on the branch. Pull
# --rebase to replay our single commit on top, then push. Retry
# up to 3 times in case multiple concurrent runs are competing.
BRANCH="${GITHUB_REF##*/}"
for attempt in 1 2 3; do
echo "::group::Push attempt ${attempt}"
git fetch origin "${BRANCH}"
if git pull --rebase origin "${BRANCH}"; then
if git push origin "HEAD:${BRANCH}"; then
echo "::endgroup::"
echo "Pushed on attempt ${attempt}."
exit 0
fi
fi
echo "::endgroup::"
echo "Attempt ${attempt} failed; sleeping before retry."
sleep $((attempt * 5))
done
echo "::error::Could not push the calibration report after 3 attempts."
echo "The report artifact has been uploaded above; download from"
echo "the workflow's Artifacts section."
exit 1
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,7 @@ Temporary Items
# transient in-progress sentinels and any local runtime caches.
state/run-log/*/in-progress/
state/run-log/*/.invalid/
state/grounding-cache/
.specify/cache/

# Multi-secret env variants used by Dartmouth + HF
Expand Down Expand Up @@ -296,3 +297,8 @@ state/audit/pdf/*/screenshots/
# demand keyed by sha256 of chunk bytes.
projects/*/paper/.chunk_summaries/


# Local agent/runtime state (not part of the repo)
.omc/
.summaries/
.claude/scheduled_tasks.lock
4 changes: 3 additions & 1 deletion .specify/feature.json
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
{"feature_directory": "specs/014-phase4-plan-tasks-testing"}
{
"feature_directory": "specs/015-pipeline-convergence-protocol"
}
50 changes: 41 additions & 9 deletions .specify/memory/constitution.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
<!--
SYNC IMPACT REPORT
==================
Version change: (uninitialized template) → 1.0.0
Rationale: Initial ratification of the llmXive constitution. MAJOR bump from
template-only state (no prior numbered version) to first formally adopted
governance document.
Version change: (uninitialized template) → 1.0.0 → 1.1.0
Rationale: 1.0.0 = initial ratification (MAJOR bump from template-only state).
1.1.0 (2026-05-27, spec 015 / issue #239) = MINOR — added Principle VI
(Convergent Review, NON-NEGOTIABLE) and replaced the point-based "Review
thresholds" quality gate with unanimous-panel convergence + advisory triage.

Modified principles:
- [PRINCIPLE_1_NAME] → I. Single Source of Truth (NON-NEGOTIABLE)
Expand Down Expand Up @@ -170,6 +171,34 @@ of compute and API spend. Up-front validation is cheap; late failure is
expensive. This principle protects both the user's time and the project's
budget.

### VI. Convergent Review (NON-NEGOTIABLE)

Every step that produces reviewable work MUST run a disciplined
**identify → revise → re-review** convergence cycle driven by that step's
review panel: each panelist raises structured critical concerns (R1); the
step's reviser addresses every concern and emits a per-concern change-log
(R2); each panelist re-judges its own concerns against the change-log (R3).
A step's gate is **unanimous acceptance by its LLM review panel** within a
3-round per-step cap; on non-convergence the project is **kicked back** to the
appropriate prior stage carrying full provenance (the unresolved concerns +
links to all artifacts/reviews + a plain-language "why it failed to converge").

There is NO accumulated point system: a panel either unanimously accepts, or
the work is revised and re-reviewed (kickbacks allowed, with no global cap —
each cycle is expected to improve the artifact until it converges). Human and
simulated-personality reviews are **advisory inputs** — routed through a
stage-aware triage (quality + safety + on-topic) to the matching panelist —
and never directly gate advancement. Mechanical scaffolding, dispatch, and
maintenance steps are exempt. Convergence MUST be reported honestly: a step is
never recorded as passed/converged when its panel has not unanimously accepted.

**Rationale**: Most pipeline steps historically advanced with no critique at
all, and the one self-critique loop never honestly converged (it masked
non-convergence as "passed"). A single SSoT convergence protocol makes quality
a function of disciplined revision rather than accumulated points, gives a
reasonable idea a convergent path to publication, and rejects work that cannot
converge — always with honest, inspectable provenance.

## Additional Constraints & Operational Standards

The following operational standards apply to all work in this repository:
Expand Down Expand Up @@ -220,10 +249,13 @@ released:
- **Reference validation**: Before any paper or user-facing document is
considered complete, every cited reference MUST be downloaded and reviewed
per Principle II.
- **Review thresholds**: Status advancement (Backlog → Ready → In Progress
→ Done) follows the point-based review system documented in the project
README; LLM reviews count 0.5 points and human reviews count 1 point. The
documented minimums MUST be met before transition.
- **Convergence gate (review model)**: Status advancement (Backlog → Ready →
In Progress → Done) is governed by the convergence protocol (Principle VI),
NOT a point system. A reviewable step advances iff its LLM review panel
unanimously accepts within the 3-round per-step cap; otherwise the project is
kicked back with full provenance. Human and simulated-personality reviews are
advisory inputs via stage-aware triage, never points. (Supersedes the prior
0.5/1.0-point thresholds — spec 015 / #239.)

## Governance

Expand Down Expand Up @@ -261,4 +293,4 @@ reasons. Unjustified violations block merge.
here, contributors should consult the project `README.md` and the
repository-level `CLAUDE.md`.

**Version**: 1.0.0 | **Ratified**: 2026-04-28 | **Last Amended**: 2026-04-28
**Version**: 1.1.0 | **Ratified**: 2026-04-28 | **Last Amended**: 2026-05-27
12 changes: 6 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
llmXive is an automated system for scientific discovery driven by LLMs with occasional human input. It's structured as a project management platform with five main task categories, each linked to GitHub issues:

1. **Backlog**: Brainstormed ideas requiring development
2. **Ready**: Ideas with technical design documents reviewed by ≥10 LLMs or ≥5 human scientists
2. **Ready**: Ideas whose technical design documents pass **unanimous LLM-panel acceptance** within the 3-round convergence cap (per spec 015 — supersedes the prior "≥10 LLMs / ≥5 humans" point-based threshold)
3. **In Progress**: Ideas with vetted implementation plans ready for execution
4. **Reviews**: Formal reviews of designs, implementations, papers, and code
4. **Reviews**: Formal reviews of designs, implementations, papers, and code (advisory only — human/personality reviews route through stage-aware triage and never directly gate advancement)
5. **Done**: Completed projects with associated papers

## Repository Architecture
Expand All @@ -29,7 +29,7 @@ Each directory contains a README.md with specific tables tracking projects, cont

### Project Status Management
- Projects move through: Backlog → Ready → In Progress → Done
- Each stage requires specific point thresholds (LLM reviews = 0.5 points, human reviews = 1 point)
- Each reviewable stage runs identify→revise→re-review with its LLM panel; advancement requires **unanimous panel acceptance** within 3 rounds (else adaptive kickback to the prior stage). The legacy 0.5/1.0 review-point system has been removed (spec 015).
- Status is tracked via GitHub issue labels and project board columns

### Documentation Standards
Expand All @@ -47,7 +47,7 @@ Each directory contains a README.md with specific tables tracking projects, cont
### Review Process
- Review files named as: `author__MM-DD-YYYY__type.md` (type: A=automatic, M=manual)
- Reviews organized in subdirectories: Design/Implementation/Paper/Code
- Minimum review thresholds must be met before status advancement
- Advancement requires **unanimous LLM-panel acceptance** within the 3-round convergence cap (spec 015); human and simulated-personality reviews are advisory inputs only, routed through stage-aware triage before reaching a panelist

## Common Development Tasks

Expand All @@ -64,11 +64,11 @@ Since this is primarily a research documentation repository without traditional
- Always use absolute paths when referencing files across directories
- Maintain the table structures in README files when adding new entries
- Verify all external links and references before committing
- Follow the point-based review system for project advancement
- Follow the convergence-based review model (unanimous LLM-panel acceptance within the 3-round cap; spec 015) for project advancement; do NOT re-introduce accumulated 0.5/1.0 review points
- Use GitHub issues for all project tracking and communication

<!-- SPECKIT START -->
For additional context about technologies to be used, project structure,
shell commands, and other important information, read the current plan:
[specs/014-phase4-plan-tasks-testing/plan.md](specs/014-phase4-plan-tasks-testing/plan.md).
[specs/015-pipeline-convergence-protocol/plan.md](specs/015-pipeline-convergence-protocol/plan.md).
<!-- SPECKIT END -->
52 changes: 35 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,15 @@ validator) → `specified` → `clarified` → `planned` → `tasked` (+ analyze
`in progress` (the implementer writes code, runs real tests, collects data; the
librarian verifies citations) → `research review`.

Research review needs **both** a points threshold **and** an accept verdict from
**every** specialist reviewer in the lane — seven of them: idea quality,
creativity, implementation correctness, completeness, code quality, data
quality, filesystem hygiene.
Research review (spec 015 / #239) runs as an **identify → revise → re-review**
convergence loop driven by the 8-reviewer panel (idea quality, creativity,
implementation correctness, completeness, code quality, data quality,
filesystem hygiene, plus the generic research reviewer). Each panelist raises
critical concerns; the implementer addresses every concern with a per-concern
change-log; each panelist re-judges its own concerns. The gate is **unanimous
panel acceptance** within the 3-round cap; otherwise the project is **kicked
back** to the appropriate prior stage (adaptive by worst unresolved severity)
carrying full provenance. There is no accumulated point system.

### The paper pipeline

Expand All @@ -41,12 +46,19 @@ specialist** (against the live artifact hash — stale reviews are ignored).

Three terminal outcomes:

- **All specialists accept** → `paper_accepted` → the `paper_publisher`
agent (spec 013) pre-reserves a Zenodo DOI, recompiles the PDF with
the final `\paperstatus{Auto-Reviewed | Auto-Revised | Published}`
byline + DOI + volume/issue, uploads to Zenodo, appends the
post-paper appendix (spacer + reviews + revision changelog), writes
`paper/publication.yaml`, and transitions to `posted`.
- **All specialists accept** → `paper_accepted` →
`awaiting_publication_signoff`. The transition through to `posted`
requires a maintainer to record explicit approval via
`llmxive project publish-approve <PROJ-ID>` (spec 015 FR-054 — every
real Zenodo DOI mint gated on a manual sign-off). Once approved, the
`paper_publisher` agent (spec 013) pre-reserves a Zenodo DOI,
recompiles the PDF with the final
`\paperstatus{Auto-Reviewed | Auto-Revised | Published}` byline +
DOI + volume/issue, uploads to Zenodo, appends the post-paper
appendix (spacer + reviews + revision changelog), writes
`paper/publication.yaml`, and transitions to `posted`. The graph and
the publisher BOTH enforce the sign-off check (defense in depth — no
DOI is ever minted without a recorded approval).
- **Any `fatal` severity** → `brainstormed` (back to the backlog), with a
rejection rationale appended to the idea record citing each fatal item.
- **Otherwise** (writing/science items, no fatal) → `paper_revision_in_progress`,
Expand Down Expand Up @@ -76,7 +88,10 @@ The twelve specialist reviewers (writing quality, logical consistency,
claim accuracy, over-reach, safety/ethics, scientific evidence,
statistical analysis, code quality, data quality, text formatting,
figure critic, jargon police) each emit action items in their lane.
Human reviews count double; self-review is rejected by the schema.
Human and simulated-personality reviews are **advisory inputs**, routed
through a stage-aware triage (quality + safety + on-topic filters) to the
matching LLM reviewer's lens — they inform a reviewer's verdict but never
directly gate advancement. Self-review is rejected by the schema.

arXiv-submitted papers (third-party, source frozen) skip the writing-
revision pipeline. Instead the consolidated action items land in
Expand Down Expand Up @@ -142,8 +157,9 @@ run-log entry.

All inference runs on free backends: Dartmouth's
[Discovery cluster](https://rc.dartmouth.edu/ai/computing-resources/discovery-cluster/)
(primary), [Hugging Face](https://huggingface.co/) (fallback), and local
transformers (last resort). Long, complex tasks (planning, paper writing, deep
(primary) and local [transformers](https://huggingface.co/docs/transformers)
(fallback) — open-weight Hugging Face models run locally, no API token.
Long, complex tasks (planning, paper writing, deep
review) go to **Qwen 3.5 122B**; faster classification-shaped tasks (clarifying
questions, triage, quick judgments) go to **Gemma 3 27B**. No paid services
(Constitution Principle IV — free-first).
Expand Down Expand Up @@ -176,7 +192,7 @@ never duplicates data, it derives it.
feedback; the `submission_intake` agent (hourly cron) triages it to the right
pipeline step.
- **Review existing content** — sign in with GitHub and add a verdict on a
project's spec, plan, code, data, or paper. Human reviews count double.
project's spec, plan, code, data, or paper. Human reviews are advisory inputs (triaged + routed to the matching LLM reviewer's lens, never a gate).
- **Explore the pipeline / agent registry** — the About page's pipeline diagram
and "Agent registry" button open in-place modals with each step's
inputs/outputs/agents/examples and each agent's prompt + tools.
Expand Down Expand Up @@ -213,6 +229,8 @@ python -m llmxive preflight # fail-fast environment check
python -m llmxive brainstorm -n 5 # seed 5 brainstormed ideas
python -m llmxive run --max-tasks 5 # run one scheduled pipeline pass
python -m llmxive submissions process # triage open human-submission issues
python -m llmxive project publish-approve PROJ-001 \
--who 'Maintainer Name' --what 'reviewed paper meets standards' # spec 015 FR-054
python -m llmxive agents run --agent <name> --project <PROJ-ID>
```

Expand All @@ -222,8 +240,8 @@ research/paper stages, `python -m llmxive submissions process` for the website
intake, and `Deploy Pages` to publish `web/` → `docs/`.

LLM calls need a Dartmouth Chat API key (`DARTMOUTH_CHAT_API_KEY`, or
`python -m llmxive auth set`); without it the backends fall through to Hugging
Face (`HF_TOKEN`) then local transformers.
`python -m llmxive auth set`); without it the backends fall through to local
transformers (open-weight Hugging Face models run locally; no token required).

### Audit tools (spec 010)

Expand Down Expand Up @@ -278,7 +296,7 @@ About page):
3. **Provide feedback** — leave feedback on any artifact; it's triaged within
the hour.
4. **Review existing content** — add a human review on a project at a review
stage. Human reviews count double.
stage. Human reviews are advisory inputs (triaged + routed to the matching LLM reviewer's lens, never a gate).

## License

Expand Down
Loading