Skip to content

v2.4.0

Choose a tag to compare

@github-actions github-actions released this 09 Jun 02:05
· 6 commits to main since this release
d00b4a9

MINOR release. Driven by a 10-perspective adversarial audit of a real
project run (AUDIT_ontology_mapping.md, 233 findings across 10
personas — PI, junior researcher, senior domain reviewer, fresh-AI
handoff, Research-OS architect, code-quality, organization, outputs
quality, docs, reproducibility/citations). The synthesis identified
v2.0–v2.3 as having succeeded at producing consistent structure
(every project gets the same folder layout) but failing at consistent
substance (auto-generated figure captions leaked into papers as
placeholder rows; hallucinated bibliographies survived to submission;
empty literature/ stubs read as "no citations needed" when really the
AI just hadn't downloaded any). v2.4.0 closes the highest-impact gaps
without breaking existing projects.

Added

  • audit_pdf_grounding(entries, root) in
    tools/actions/synthesis/citations.py — reports which citation
    entries have a downloaded PDF on disk vs which don't. Searches
    inputs/literature/<key>.pdf, inputs/literature/<doi-slug>.pdf,
    and workspace/*/literature/<key>.pdf. Returns
    {grounded: [...], ungrounded: [{key, doi, url, title}, ...], count, grounded_count}. Closes the audit's strongest unified
    finding (8/10 auditors): a project shipped 21 references in
    synthesis/references.bib while find . -name '*.pdf' returned
    zero results.
  • require_pdfs flag on write_references_bib — when true, drops
    ungrounded entries from the bib and lists them at the file tail as
    commented-out UNGROUNDED ENTRIES. Default keeps every entry but
    adds a header comment noting how many lack on-disk grounding so the
    gap is visible at the bib level even without opting in.
  • figures: block in researcher_config.yaml — three knobs
    (svg_allowed, summary_sidecar, interactive_html_allowed) that
    control the per-figure sidecar regime. All three default to a lean
    shape (no SVG, no auto-summary, interactive HTML allowed). Added to
    both templates/researcher_config.yaml and the in-code
    CONFIG_TEMPLATE, kept in sync by
    test_config_template_matches_file. figures registered in
    docs/CONTRACT.md A.3 stable-section list.
  • validation_warnings on active_plan.json_persist_active_plan
    now scans the decomposition for entries whose tool field is in
    _REMOVED_TOOLS (tool_synthesize, tool_dashboard,
    tool_slides_create, etc.) and writes a per-step warning. Surfaces
    stale router-index entries at plan-write time so the AI sees them
    before dispatching, not after burning a turn on the friendly
    redirect.

Changed

  • Figure audit no longer warns "PNG without SVG companion" by
    default. audit_figure_quality reads researcher_config.figures.*
    via the new _load_figures_config helper; the SVG warning fires
    only when svg_allowed=true; the summary-sidecar warning fires
    only when summary_sidecar=true. Drops a long-running source of
    false-positive noise.
  • tool_path_finalize stops auto-emitting .summary.md sidecars.
    Plain-English interpretation now integrates into conclusions.md
    next to the inline ![](outputs/figures/<slug>.png) embed. The
    auto-generated sidecars trained the AI to leave stub captions
    ("Auto-drafted caption: regenerate from analysis context") that
    leaked verbatim into one project's synthesis/paper.md as 92
    placeholder rows visibly telling reviewers the AI gave up. Opt back
    in via figures.summary_sidecar=true.
  • AGENTS.md hard rule #10 rewritten. Replaces the "every figure
    carries four sidecars including an SVG companion" mandate with a
    lean default (<slug>.png + an authored <slug>.caption.md), opt-in
    SVG / summary sidecars, encouragement of interactive .html
    companions for visualisation types that benefit (networks,
    multi-panel dashboards), and an explicit requirement that the AI
    sys_file_read every figure before declaring a step done — catches
    legend-over-plot, missing axis labels, palette regressions,
    snake-case-leaking-into-label bugs that no JSON audit catches.
  • _seed_step_subfolder_readmes stops pre-creating stub READMEs
    in literature/, environment/, and context/ per step. These
    dirs stay in EXPERIMENT_SUBDIRS (paths exist) but are empty until
    a tool writes into them. Audit found pre-seeded stubs trained the
    AI to leave dirs as boilerplate; caused literature/ to read as
    "no citations" when really the AI just hadn't downloaded any; and
    cluttered every step folder with content nobody wrote. The README
    that answers "what goes here?" now lives once in
    RESEARCHER_GUIDE.md rather than duplicated 14× on disk.
  • outputs/README.md template updated to reflect the new figure
    contract: reports go DEEPER than conclusions.md (choices,
    reasoning, comparison of options); figures are .png-only by
    default with optional interactive .html companions; AI MUST read
    each figure before finalize.
  • Doc hardening: dropped hardcoded tool / protocol counts across
    README.md, docs/{TOOLS,PROTOCOLS,RESEARCHER_GUIDE,START,AI_GUIDE}.md.
    Replaces "144 tools" / "117 protocols" with vague phrases
    ("~150 tools", "100+ protocols", "every tool", "All core protocols").
    CLAUDE.md doctrine already forbids hand-written counts; the
    maintainer was violating it in 9+ places. Counts go stale within a
    release. CONTRACT.md keeps its v2.0.0-anchored snapshot table.
  • Doc drift fix: README.md:117 code/scripts/. The README
    showed 01_baseline_eda/code/ in its file-layout diagram while the
    framework, RESEARCHER_GUIDE, and every real project use scripts/.
    A junior researcher walking through README and then opening a real
    project would have hit the inconsistency immediately.

Migration

  • Existing projects are unaffected by the figure default change
    audit_figure_quality still reads existing .summary.md and
    .svg files when they're present; the change is that it no longer
    warns on their absence. To restore the v2.3 warning behaviour,
    add to inputs/researcher_config.yaml:
    figures:
      svg_allowed: true
      summary_sidecar: true
  • Existing per-step literature/README.md / environment/README.md /
    context/README.md stubs are not touched
    — the change only
    affects newly-created steps. Delete the stubs by hand if you want
    empty dirs in legacy steps.
  • write_references_bib signature gained two optional kwargs
    (root, require_pdfs). All existing positional calls keep
    working; opt-in to PDF filtering by passing both.
  • AGENTS.md template change does NOT propagate to existing
    projects (the wizard only copies once). Re-run research-os init
    in a temp dir and diff the AGENTS.md against your project's copy
    to pick up the new hard rule #10 wording. A research-os refresh
    CLI subcommand to do this automatically is planned for 2.4.x.

Not in this release (planned for 2.4.x / 2.5.0)

The full audit surfaced ~50 P0 framework changes; this release ships
the highest-impact subset that doesn't break existing projects. The
following remain for follow-up:

  • Per-step step_summary.yaml retirement: the YAML stub anti-pattern
    flagged by 9/10 audits. The derived emit in tool_path_finalize
    stays in 2.4.0; the editable scaffold via step_summary.yaml.template
    • the update_step_summary step in analysis_plan.yaml /
      literature_per_step.yaml await migration to prompt-laden README
      prose.
  • .os_state simplification: collapse state_ledger.json +
    manifest.json overlap, drop dead fields, bound checkpoint storage
    (single snapshot can be 39 MB of duplicate workspace; no GC).
  • research-os refresh CLI subcommand: auto-upgrade
    AGENTS.md / CLAUDE.md / IDE-config templates in an existing
    project to match the bundled current version.
  • Sparse-root finalize hook: regenerate top-level README.md at
    project finalize (currently write-once at init).
  • Per-step logs/ removal + cross-step utility canonical home
    (workspace/scratch/ IS used in practice but the framework doesn't
    document a canonical place for it).
  • Hard removal of .preregistration/ + .grounding/ hidden dirs in
    workspace (content moves into per-step README / methodology.md +
    .os_state/grounding.jsonl).

Verified

  • Preflight: 29/29 passed.
  • Pytest: all green.
  • Ruff: clean.

Bumped

  • pyproject.toml, src/research_os/__init__.py, CITATION.cff to
    2.4.0.