v2.4.0
MINOR release. Driven by a 10-perspective adversarial audit of a real
project run (AUDIT_ontology_mapping.md, 233 findings across 10
personas — PI, junior researcher, senior domain reviewer, fresh-AI
handoff, Research-OS architect, code-quality, organization, outputs
quality, docs, reproducibility/citations). The synthesis identified
v2.0–v2.3 as having succeeded at producing consistent structure
(every project gets the same folder layout) but failing at consistent
substance (auto-generated figure captions leaked into papers as
placeholder rows; hallucinated bibliographies survived to submission;
empty literature/ stubs read as "no citations needed" when really the
AI just hadn't downloaded any). v2.4.0 closes the highest-impact gaps
without breaking existing projects.
Added
audit_pdf_grounding(entries, root)in
tools/actions/synthesis/citations.py— reports which citation
entries have a downloaded PDF on disk vs which don't. Searches
inputs/literature/<key>.pdf,inputs/literature/<doi-slug>.pdf,
andworkspace/*/literature/<key>.pdf. Returns
{grounded: [...], ungrounded: [{key, doi, url, title}, ...], count, grounded_count}. Closes the audit's strongest unified
finding (8/10 auditors): a project shipped 21 references in
synthesis/references.bibwhilefind . -name '*.pdf'returned
zero results.require_pdfsflag onwrite_references_bib— when true, drops
ungrounded entries from the bib and lists them at the file tail as
commented-outUNGROUNDED ENTRIES. Default keeps every entry but
adds a header comment noting how many lack on-disk grounding so the
gap is visible at the bib level even without opting in.figures:block inresearcher_config.yaml— three knobs
(svg_allowed,summary_sidecar,interactive_html_allowed) that
control the per-figure sidecar regime. All three default to a lean
shape (no SVG, no auto-summary, interactive HTML allowed). Added to
bothtemplates/researcher_config.yamland the in-code
CONFIG_TEMPLATE, kept in sync by
test_config_template_matches_file.figuresregistered in
docs/CONTRACT.mdA.3 stable-section list.validation_warningsonactive_plan.json—_persist_active_plan
now scans the decomposition for entries whosetoolfield is in
_REMOVED_TOOLS(tool_synthesize,tool_dashboard,
tool_slides_create, etc.) and writes a per-step warning. Surfaces
stale router-index entries at plan-write time so the AI sees them
before dispatching, not after burning a turn on the friendly
redirect.
Changed
- Figure audit no longer warns "PNG without SVG companion" by
default.audit_figure_qualityreadsresearcher_config.figures.*
via the new_load_figures_confighelper; the SVG warning fires
only whensvg_allowed=true; the summary-sidecar warning fires
only whensummary_sidecar=true. Drops a long-running source of
false-positive noise. tool_path_finalizestops auto-emitting.summary.mdsidecars.
Plain-English interpretation now integrates intoconclusions.md
next to the inlineembed. The
auto-generated sidecars trained the AI to leave stub captions
("Auto-drafted caption: regenerate from analysis context") that
leaked verbatim into one project'ssynthesis/paper.mdas 92
placeholder rows visibly telling reviewers the AI gave up. Opt back
in viafigures.summary_sidecar=true.AGENTS.mdhard rule #10 rewritten. Replaces the "every figure
carries four sidecars including an SVG companion" mandate with a
lean default (<slug>.png+ an authored<slug>.caption.md), opt-in
SVG / summary sidecars, encouragement of interactive.html
companions for visualisation types that benefit (networks,
multi-panel dashboards), and an explicit requirement that the AI
sys_file_readevery figure before declaring a step done — catches
legend-over-plot, missing axis labels, palette regressions,
snake-case-leaking-into-label bugs that no JSON audit catches._seed_step_subfolder_readmesstops pre-creating stub READMEs
inliterature/,environment/, andcontext/per step. These
dirs stay inEXPERIMENT_SUBDIRS(paths exist) but are empty until
a tool writes into them. Audit found pre-seeded stubs trained the
AI to leave dirs as boilerplate; causedliterature/to read as
"no citations" when really the AI just hadn't downloaded any; and
cluttered every step folder with content nobody wrote. The README
that answers "what goes here?" now lives once in
RESEARCHER_GUIDE.mdrather than duplicated 14× on disk.outputs/README.mdtemplate updated to reflect the new figure
contract: reports go DEEPER thanconclusions.md(choices,
reasoning, comparison of options); figures are.png-only by
default with optional interactive.htmlcompanions; AI MUST read
each figure before finalize.- Doc hardening: dropped hardcoded tool / protocol counts across
README.md,docs/{TOOLS,PROTOCOLS,RESEARCHER_GUIDE,START,AI_GUIDE}.md.
Replaces "144 tools" / "117 protocols" with vague phrases
("~150 tools", "100+ protocols", "every tool", "All core protocols").
CLAUDE.md doctrine already forbids hand-written counts; the
maintainer was violating it in 9+ places. Counts go stale within a
release. CONTRACT.md keeps its v2.0.0-anchored snapshot table. - Doc drift fix:
README.md:117code/→scripts/. The README
showed01_baseline_eda/code/in its file-layout diagram while the
framework, RESEARCHER_GUIDE, and every real project usescripts/.
A junior researcher walking through README and then opening a real
project would have hit the inconsistency immediately.
Migration
- Existing projects are unaffected by the figure default change
—audit_figure_qualitystill reads existing.summary.mdand
.svgfiles when they're present; the change is that it no longer
warns on their absence. To restore the v2.3 warning behaviour,
add toinputs/researcher_config.yaml:figures: svg_allowed: true summary_sidecar: true
- Existing per-step
literature/README.md/environment/README.md/
context/README.mdstubs are not touched — the change only
affects newly-created steps. Delete the stubs by hand if you want
empty dirs in legacy steps. write_references_bibsignature gained two optional kwargs
(root,require_pdfs). All existing positional calls keep
working; opt-in to PDF filtering by passing both.- AGENTS.md template change does NOT propagate to existing
projects (the wizard only copies once). Re-runresearch-os init
in a temp dir and diff the AGENTS.md against your project's copy
to pick up the new hard rule #10 wording. Aresearch-os refresh
CLI subcommand to do this automatically is planned for 2.4.x.
Not in this release (planned for 2.4.x / 2.5.0)
The full audit surfaced ~50 P0 framework changes; this release ships
the highest-impact subset that doesn't break existing projects. The
following remain for follow-up:
- Per-step
step_summary.yamlretirement: the YAML stub anti-pattern
flagged by 9/10 audits. The derived emit intool_path_finalize
stays in 2.4.0; the editable scaffold viastep_summary.yaml.template- the
update_step_summarystep inanalysis_plan.yaml/
literature_per_step.yamlawait migration to prompt-laden README
prose.
- the
.os_statesimplification: collapsestate_ledger.json+
manifest.jsonoverlap, drop dead fields, bound checkpoint storage
(single snapshot can be 39 MB of duplicate workspace; no GC).research-os refreshCLI subcommand: auto-upgrade
AGENTS.md/CLAUDE.md/ IDE-config templates in an existing
project to match the bundled current version.- Sparse-root finalize hook: regenerate top-level
README.mdat
project finalize (currently write-once at init). - Per-step
logs/removal + cross-step utility canonical home
(workspace/scratch/IS used in practice but the framework doesn't
document a canonical place for it). - Hard removal of
.preregistration/+.grounding/hidden dirs in
workspace (content moves into per-step README / methodology.md +
.os_state/grounding.jsonl).
Verified
- Preflight: 29/29 passed.
- Pytest: all green.
- Ruff: clean.
Bumped
pyproject.toml,src/research_os/__init__.py,CITATION.cffto
2.4.0.