Skip to content

Releases: VibhavSetlur/Research-OS

v3.1.0

17 Jun 15:54
d6b87aa

Choose a tag to compare

MINOR release. Backwards-compatible: every existing tool keeps its name + schema
(new capability is added via new operations/scopes and a new alias). Driven by a
fresh 11-area discovery audit of the 3.0.0 codebase.

Added

  • Compiled routing sidecar (_route_meta.json). Routing no longer parses the
    104K _router_index.yaml at runtime. build_embeddings.py compiles a compact,
    comments-free JSON mirror (protocols/shortcut_intents/hierarchy + pre-baked
    tier + workflow_shape) that router.py and semantic.py share via a single
    load. It parses ~300× faster (~0.42 ms vs ~126 ms) and removes the per-route
    protocol-body reads. The YAML stays the authoring source; --route-meta-only
    rebuilds the sidecar without fastembed. New preflight gate validates the sidecar
    is fresh + consistent + embeddings-parity-checked.
  • tool_verify(scope='outputs') — the "did the work actually land?" gate.
    Resolves a protocol's declared expected_outputs against the filesystem
    (glob-aware) and reports each present / empty / missing with a next_action.
    The injected protocol-completion step now requires it before logging
    completed, so the system refuses to call a missing or empty file "done".
    docs/VERSIONING.md documents the in-project versioning convention.
  • sys_path(operation='rename') — give a generic analysis step a meaningful
    label. Keeps the NN_ lineage number, renames the folder, and re-points every
    downstream data/* symlink that targeted it. sys_step is now an alias for
    sys_path (the clearer name for numbered steps).
  • Routing-targets preflight gate — every next_protocol / on_failure /
    see_also must point at a real protocol (dangling links were previously silent).

Improved

  • Figures. tool_figure_palette('accent') now returns the exact RO_PALETTE
    colours apply_research_os_style applies (a hand-coloured figure matches an
    auto-styled one); adds diverging_emphasis. audit_figure_quality runs its
    text-overlap + default-font (DejaVu) legibility scan on a PNG's sibling SVG too,
    and a corrupt/empty image now warns instead of crashing the audit.
  • Synthesis deliverables. tool_typst_compile archives the prior render to
    synthesis/archive/<name>_<timestamp>.pdf before overwriting (no silent
    clobber), flags single-page-target overflow (poster/cover-letter rendered to

    1 page = content overflowed — where overlapping text shows up), and counts
    pages without the off-by-one /Type /Pages miscount. The poster check no longer
    false-blocks scaffold-authored posters (#headline / #block-section).

  • Wizard. Ctrl+C/Ctrl+D mid-wizard exits cleanly instead of a traceback; the
    "already exists" check moved to the start (no more filling out the whole wizard
    to be rejected at the end); email + ORCID inputs are format-validated.
  • Doctrine. power_analysis replaced its data-shape→test-family menu with
    scaffold form (name the dimensions that fix the test, justify the choice).

Fixed

  • state_freshness_check read workspace/state.json — a file that never exists —
    so the staleness signal was permanently dead; now reads the real ledger.
  • get_dag_path / add_dag_node stopped persisting the constant
    execution_dag_path back into the ledger every call (write churn + schema noise).
  • Dead-end pause detection read protocol_name but the execution log writes
    protocol — the signal was silently dead.
  • Autopilot gate used str.lstrip('./') (strips characters), so .synthesis/x
    was mangled into synthesis/x and falsely gated; now strips the prefix properly.
  • Maintainer docs (CLAUDE.md) pointed at the long-gone src/research_os/server.py
    monolith — repointed to the server/ package; dropped stale protocol counts.

Bumped

  • version → 3.1.0 (pyproject / __init__ / CITATION); router index counter → 27.

Deferred (tracked for a future release)

  • Tool-cluster consolidation (SLURM 4→1) — aliased, low user-visible benefit.
  • A first-class renamable BRANCH object + retro-organization of loose work
    (higher-risk state-schema change beyond the step rename shipped here).
  • Deeper audit-gate hardening (claims-gate-on-by-default, ship-gate
    rerun-resolution) — behavior-changing, staged separately.

v3.0.0

17 Jun 13:04
6179a1f

Choose a tag to compare

MAJOR release. Research OS now fits the shape of your work — classic
linear analysis, iterative tool/software building, lightweight exploration,
notebook-driven analysis, or a multi-study program — instead of assuming one
shape. Alongside the modes it turns several "advertised but unenforced"
rigor promises into enforced ones, overhauls routing for both beginners and
deep-critic PIs, and improves every protocol.

Added — Workspace modes

  • workspace.mode in researcher_config.yaml (+ research-os init --workspace-mode, and a wizard "What are you building?" step):
    analysis (default, unchanged) · tool_build · exploration ·
    notebook · multi_study. A SCAFFOLD_PROFILES registry scaffolds each
    shape; state, router, and audits dispatch on the active mode.
  • tool_build mode — Research OS governs an inner project from above:
    spec/ + decisions/ (ADRs) + eval/ (the harness that defines "done")
    • milestones.md + governance.md, with the tool itself in an inner dir
      that gets its OWN git init. "Done" = tests + build + eval pass.
  • build/ protocol familyspec_and_designimplement_iteration
    (loop) → test_strategybenchmark_vs_baselinerelease_and_changelog.
    Plus exploration/ (triage → loop → promote-to-step) and notebook/ +
    program/ orienting protocols.
  • tool_git (inner-repo version control; commits stamped with the RO
    step for provenance), tool_build (configured build/test/lint
    runner), and tool_audit(scope='tool', dimension=tests|git_hygiene|build).

Added — Rigor that is actually enforced

  • tool_finalize_project — a server ship-gate that HARD-BLOCKS "done"
    on unresolved audit blockers, cited-but-invalid PDFs, ungrounded numbers,
    or stub sections, unless a logged researcher override clears it.
  • PDF integrity — literature downloads are validated by the %PDF-
    magic header; a renamed 403/HTML page is deleted + recorded, never counted
    as a paper. Every PDF count uses magic validation, not glob("*.pdf").
  • Substrate-checked groundingtool_verify now checks a claim against
    its cited file (a number is "verified" only if the source actually
    contains it; self-asserted support becomes "unverified").

Added — Beginner ↔ PI gradient

  • tool_explain — a layered, grounded tutor (intuition → mechanics →
    caveats → when-not-to-use → reading list) for any skill level.
  • tool_deliverable_chooser — an output_types-gated "I'm done, what
    now?" on-ramp.
  • Mode-scoped tool listing — the per-turn catalog shrinks from 151 to
    ~113–128 tools.
  • Router overhaul — beginner-vocabulary layer ("i have a csv what do i
    do", "make a chart", "is my result significant"), a confidence-margin gate
    that asks instead of confidently misrouting, capped reckless single-word
    triggers, mode-aware routing bias, and workflow_shape as a routing signal.

Improved

  • Every protocol swept for doctrine compliance: hardcoded thresholds /
    method menus / canned step sequences replaced with "name-the-dimension +
    cite-the-source" scaffolds; scope-tag mislabels fixed; see_also
    cross-links added.
  • Typst deliverables compile across all 12 venues (uniform template/conf).
  • Audits read the real synthesis/paper.typ (dual Typst + Markdown), so the
    rigor gates no longer silently no-op.
  • New researcher docs: TOOL_BUILDER.md, a beginner on-ramp in START.md,
    workspace modes documented across the guides.
  • New scripts/lint_coherence.py preflight gate: docs/templates can no
    longer reference a removed tool or hand-write a tool/protocol count.

Fixed

  • All 7 IDE rule templates + docs purged of removed-tool references
    (tool_plan_*, tool_synthesize/dashboard/figure, tool_grounding_*).
  • synthesis_check no longer reports success on a message-less error.
  • 11 broken documentation cross-references.

Behaviour changes that may affect existing projects (why this is MAJOR)

  • A project with placeholder/HTML files named *.pdf will see them no
    longer counted as literature — the literature gate may newly fire (add a
    real PDF, or override with a rationale).
  • tool_finalize_project can refuse to finalize a project with unresolved
    blockers; previously every blocker was advisory.
  • tool_verify returns unverified for self-asserted claims that do not
    resolve to a cited source.
  • New field workspace.mode defaults to analysis; existing projects keep
    classic behaviour with no change.

Migration

  • Analysis projects upgrade with no changes (mode defaults to analysis,
    byte-identical scaffold). Re-run research-os init --refresh to pick up
    the updated AGENTS.md / IDE rules.
  • The planned tool-cluster consolidations (merging the SLURM / exec / route
    families; sys_pathsys_step) are deferred to 3.1.0 and will ship with
    aliases so no call site breaks.

v2.4.4

10 Jun 19:23
ae963c5

Choose a tag to compare

PATCH release. Lifts the visual quality of both the figures AI produces
and the dashboard chrome that surrounds them to match a published
Research-OS reference deliverable (cream background, italic serif
titles, muted CVD-safe accent palette, value-labels-above-bars, clean
spines, generous whitespace). Also turns the loose "look at the
rendered figure" guidance into a mandatory render → view → v2 loop so
the AI can no longer ship a figure it never opened.

Added

  • research_os.tools.actions.viz.style — new module exporting
    the Research-OS publication style preset for matplotlib:
    • apply_research_os_style(destination=..., palette=...)
      one call sets rcParams (cream bg, serif typography, dropped
      spines, dotted horizontal grid, constrained_layout on,
      300 dpi save) and returns a context dict with the destination
      figsize + palette so the AI's first render lands close to
      publication-ready. Destinations: single_col / two_col /
      full_width / slide / slide_half / dashboard /
      dashboard_tile / poster.
    • RO_PALETTE — five muted CVD-safe accents
      (navy #1F4D7A, olive #9B7E2D, forest #3F6049, oxblood
      #9B3737, mustard #C3A14E) plus a diverging emphasis pair
      (oxblood / forest) and a neutral chrome set
      (cream / warm-dark / muted / hairline).
    • DESTINATION_FIGSIZES — pre-tuned (width, height) for every
      destination so the AI doesn't pick a 6×4 default that crops at
      print size.
    • label_bars_above(ax, bars, unit="ms") — italic value labels
      floating 2 % above each bar, matching the reference figure
      aesthetic (467 ms, 128 ms, …). Reserves headroom so the
      label doesn't crash into the next bar in a stacked chart.
    • label_diverging_bars(ax, bars, values) — signed delta labels
      coloured forest (positive) / oxblood (negative) for the
      diverging-bar comparison panel.
    • polish_axes(ax) — re-asserts top + right spine off and
      dotted horizontal grid on a specific axes after the AI built
      the chart.
    • apply_suptitle(fig, title, subtitle=...) — italic serif
      suptitle + smaller subtitle line, positioned to never overlap
      constrained_layout.
    • Graceful import: when matplotlib isn't installed, the module
      still imports and returns applied=False from
      apply_research_os_style instead of raising.
  • first_render_spacing_discipline: block in
    visualization/figure_guidelines — a 9-item upfront discipline
    (pick destination, leave y-margin for value labels, plan legend
    placement, decide tick rotation, reserve suptitle headroom) so the
    FIRST render doesn't need a v2 to fix spacing. Calls out the
    matplotlib tight_layout()constrained_layout conflict.
  • visually_verify_render step in
    visualization/visualization_workflow — the workflow's
    counterpart to the strengthened pre_publish_self_review step in
    figure_guidelines. Both protocols now teach the same mandatory
    render → open the PNG → check overlap / clipping / legend
    placement / palette cohesion → write v2 if anything fails → only
    ship v_final loop.
  • Research-OS accent palette in audit_color_palette — the
    five RO_PALETTE accents + the neutral chrome colours are now in
    the allowed-palette set, so dashboards built from the new
    scaffold and figures generated through apply_research_os_style
    no longer trip the out-of-palette warning.

Changed

  • synthesis/scaffold._DASHBOARD_HTML — full CSS rewrite to
    match the reference figure aesthetic. Cream background, two-font
    stack (EB Garamond serif for titles + figure captions, Inter sans
    for body), italic serif h1 / h2 / h3 / table headers, muted
    accent palette as CSS variables (--accent navy, --accent-gold
    olive, --accent-green forest, --accent-red oxblood,
    --accent-mustard), hairline rule colour for separators,
    near-white cards on cream, italic figcaption for figure
    interpretation, print-friendly fallback retained. Adds an
    .eyebrow line and a .lead paragraph class in the hero so the
    TL;DR has room to breathe.
  • visualization/figure_guidelines (v2.0.0 → v2.4.4) — adds the
    research_os_style_preset reference block, the
    first_render_spacing_discipline rules, the new set_up_canvas
    step (call apply_research_os_style BEFORE writing the chart
    code), and rewrites pre_publish_self_review into the mandatory
    open-the-PNG view loop with explicit sys_file_read filepath=...
    instructions and a 14-item OBSERVATION checklist that the human
    eye must verify against the rendered pixels.
  • visualization/visualization_workflow (v2.0.0 → v2.4.4) —
    inserts visually_verify_render after build_each_figure so the
    on-demand figure workflow inherits the same loop. Updates
    build_each_figure to mention apply_research_os_style + the
    spacing discipline.
  • synthesis/synthesis_dashboard (v2.4.3 → v2.4.4) — adds a
    "visual cohesion with the figures" principle pointing at
    apply_research_os_style(); bumps version.

Test gate

  • tests/unit/test_viz_style.py — new file covering the style
    preset surface (palette has 5+ entries, DESTINATION_FIGSIZES has
    the expected destinations, apply_research_os_style returns the
    context dict, helpers no-op safely without matplotlib bars).
  • tests/unit/test_v244_dashboard_style.py — new file covering
    the dashboard CSS rewrite (cream bg + accent palette present in
    scaffold, section IDs preserved, print stylesheet retained, new
    accent palette passes audit_color_palette without warnings,
    protocol YAMLs updated with the new spacing + view loop language).
  • preflight passes · pytest passes · ruff clean.

Not behaviour change for existing projects

  • Pre-v2.4.4 dashboards on disk are untouched — the new CSS only
    applies to scaffolds created after upgrading. Re-scaffold with
    tool_synthesis_scaffold(kind='dashboard', overwrite=true) to
    adopt the new style.
  • The style preset is opt-in. Plotting scripts that don't import
    apply_research_os_style continue to render with matplotlib
    defaults. The figure_guidelines protocol recommends adopting
    the preset for visual cohesion with the dashboard, but doesn't
    reject figures that depart from it (journal templates win).

v2.4.3

09 Jun 16:12
615c9b3

Choose a tag to compare

PATCH release. Closes two architectural holes that the v2.4.2
ontology-mapping audit surfaced as the root cause of "AI auto-creates
deliverables the user didn't ask for":

  1. The synthesis pipeline was hardcoded to synthesis_paper.
    get_next_protocol() in tools/actions/protocol.py ran a fixed
    9-step PIPELINE ending at synthesis_paper regardless of the
    researcher's declared research_goal.output_types. A project whose
    wizard answer was output_types: [dashboard] still saw "next is
    synthesis_paper" from the loader. Fixed: pipeline is now the
    universal analysis prefix + a synthesis tail filtered by declared
    output_types. Empty output_types falls back to synthesis_paper
    (legacy behaviour preserved).
  2. next_protocol chains auto-fired in every autonomy mode. Six
    protocols silently chained: analysis_plan → reproducibility
    (every analysis step triggered an audit), audit_and_validation → synthesis_paper (every audit triggered a paper draft),
    reproducibility → audit_and_validation,
    cox_ph_diagnostics → audit, missing_data_strategy → audit,
    qualitative_quality_audit → audit. Fixed: each chain now carries
    an explicit AUTONOMY GATE annotation telling the AI to suggest
    (not auto-chain) in manual / supervised / coaching modes.
    Single-step requests stop at the requested step.

Three parallel Explore-agent audits drove the fix. Reports captured
in chat transcripts (not checked in).

Added

  • output_types_gate(root, kind, *, autonomy=None) in
    tools/actions/synthesis/check.py. Returns
    {verdict: 'proceed'|'ask'|'skip', declared_outputs, message, kind}.
    • proceed when output_types is empty (no preference declared) OR
      kind is in the declared set.
    • ask when output_types is non-empty and kind is NOT in the
      set; the returned message is a one-line prompt the AI lifts
      verbatim to the researcher.
    • skip reserved for future use (researcher explicitly opted out).
    • Normalises aliases (lay-summarylay_summary,
      Lay Summarylay_summary) and ignores the exploratory
      sentinel (which marks "no deliverable yet").
  • tool_synthesis_scaffold(kind, confirmed=false) — new
    confirmed kwarg. When the output_types gate returns ask and the
    caller has not passed confirmed=true (or the existing
    overwrite=true), the scaffold returns status='ask' instead of
    writing. The AI is expected to surface the message to the researcher
    and only re-call with confirmed=true after they say yes. Prevents
    the failure mode where the AI auto-creates a paper / dashboard /
    poster the user never asked for.
  • SYNTHESIS_OUTPUT_TYPE_MAP in
    tools/actions/protocol.py — single source of truth mapping each
    output_types keyword (paper, dashboard, poster, slides,
    report, lay_summary, grant, abstract, essay, handout) to
    its synthesis protocol + "done" predicate. New synthesis protocols
    must register here to participate in pipeline filtering.

Changed

  • get_next_protocol(root) consults
    inputs/researcher_config.yaml#research_goal.output_types. The
    analysis prefix (session_boot → project_startup → domain →
    methodology → literature → analysis_plan → reproducibility →
    audit_and_validation) is unchanged. The synthesis tail is now
    dynamic — for output_types: [dashboard, lay_summary], the
    pipeline terminates at synthesis/synthesis_lay_summary (in
    declared order), NOT synthesis/synthesis_paper. Empty list →
    fallback to synthesis_paper (no regression for unfilled projects).
    Response envelope gains declared_output_types field.
  • tool_synthesis_check envelope gains an intent_gate field. If
    the kind being audited isn't in declared output_types, the gate's
    one-line message also appears in warnings.
  • synthesis_paper (v2.3.0 → 2.4.3) — adds an explicit
    verify_intent first step that reads output_types and stops the
    AI if paper isn't declared. Closes the auto-create-paper failure.
  • synthesis_dashboard (2.4.2 → 2.4.3) — same verify_intent
    first step for dashboard. Reinforces the existing trigger gate.
  • synthesis_slides (2.3.0 → 2.4.3) — slides prerequisite added.
  • synthesis_lay_summary (2.4.2 → 2.4.3) — lay_summary
    prerequisite added.
  • printable (2.3.0 → 2.4.3) — poster / handout prerequisite
    added.
  • Autonomy-gate annotations added to the six high-risk
    next_protocol chains (guidance/analysis_plan
    reproducibility/reproducibility; audit/audit_and_validation
    synthesis/synthesis_paper; reproducibility/reproducibility
    audit/audit_and_validation; the three methodology/* audits →
    audit/audit_and_validation;
    visualization/interactive_dashboard_design
    synthesis/synthesis_dashboard). Each carries a comment block
    telling the AI to surface "Next: ..." as a SUGGESTION in
    manual / supervised / coaching modes; only autopilot
    auto-chains, and autopilot further gates synthesis chains on
    output_types membership.

Test cleanup

  • Renamed tests/unit/test_v242_synthesis_dashboard_lints.py
    tests/unit/test_synthesis_check.py. Per-release test naming
    (test_v<version>_<feature>.py) is now retired as a convention;
    new tests for this surface land in the topic-named file.
  • +13 regression tests in the renamed file covering:
    output_types_gate proceed / ask / empty / alias-normalisation /
    exploratory-sentinel; synthesis_scaffold returns ask / honours
    confirmed=true / proceeds for declared kinds; pipeline tail
    respects dashboard-only / paper+lay_summary / empty-fallback;
    synthesis_check envelope includes the intent_gate field.

Test gate

  • preflight 29/29 · pytest 1643/1643 (+13 new in
    test_synthesis_check.py) · ruff clean.

Not behaviour change for existing projects

  • Projects with output_types: [] (or no researcher_config.yaml)
    see the legacy fallback: synthesis_paper terminal, no ask
    envelopes. This is the on-disk default for every project initialised
    before v2.4.3 and for every fresh research-os init until the
    researcher fills in output_types. Recommendation: update the
    wizard answer once to make the intent explicit.
  • The AUTONOMY GATE annotations are guidance to the AI, not
    loader-enforced refusals (which would be MINOR-shaped). An AI client
    that ignores them will still see the same next_protocol values it
    did in v2.4.2; correct AI behaviour is now spelled out in the
    protocol text + the new gate helper.

v2.4.2

09 Jun 15:22
7c34564

Choose a tag to compare

PATCH release. Six fixes driven by an audit of the
/scratch/vsetlur/ontology-mapping v2.1 synthesis run, which surfaced
a recurring failure mode: the AI authoring a slap-together dashboard
(one section per workspace step, figure + caption underneath each),
inventing non-canonical filenames (paper-lay.md, REPRODUCIBILITY.md,
METHODS.md, CITATIONS.md) in synthesis/, and leaving behind
random .md / .mermaid / .json clutter at workspace/ root. All
fixes are protocol guidance, scaffold rewrites, lint additions, and
one tool-mode extension; no breaking changes to existing APIs.

Changed

  • synthesis/synthesis_dashboard — rewrite (v2.3.0 → v2.4.2).
    The protocol now explicitly forbids the per-step recap antipattern
    and requires a custom, story-driven structure: Hero / Headline →
    Key findings (organised by claim, not by step) → Comparison
    (adopted vs ruled out) → Methods → Limitations → References.
    Introduces an explicit choice between Plan-mode (collaborative
    outline) and Autopilot (AI picks the headline finding and structure)
    before scaffolding. Quality bar now lists forbidden_structure
    (per-step recap, directory dump, caption-only sections) and
    required_structure (hero + ≥3 claim-driven findings sections).
  • synthesis/synthesis_paper — clarification (v2.3.0 → v2.4.2).
    States explicitly that synthesis/paper.pdf is mandatory before the
    paper deliverable is "done" (a stranded paper.md with no rendered
    PDF is a blocker). Lists the four most common AI-improvised
    filenames that downstream tools do NOT recognise (paper-lay.md,
    REPRODUCIBILITY.md, METHODS.md, CITATIONS.md) and points each
    at its canonical destination.
  • synthesis/synthesis_lay_summary — clarification (v2.0.0 → v2.4.2).
    Canonical filename is synthesis/lay_summary.md (not paper-lay.md,
    lay.md, summary.md, paper_lay.md); downstream tools recognise
    only the canonical name.
  • writing/writing_conclusions — figure/table citations (v2.0.0 →
    v2.4.2).
    Per-step conclusions.md template gains a mandatory
    Figures + tables produced section that lifts directly into
    paper / dashboard / slides synthesis. Every Findings bullet must
    cite at least one figure / table / output file produced by the
    step; an unciteable finding is rejected. The Statistical summary
    table gains a Source column. Closes the gap where downstream
    synthesis stages had to guess which figures backed which findings.
  • tool_synthesis_curate_figures — multi-figure curation. New
    mode parameter: 'focal' (default, unchanged behaviour — one
    focal figure per step, named figNN_<slug>.png for paper.typ) and
    'all' (every figure in every step's outputs/figures/, named
    with the step number prefix, plus every figure's caption sidecar
    copied or seeded). The 'all' mode fixes the failure where the
    AI bypasses curation and writes step figures directly to
    synthesis/figures/, leaving them without .caption.md sidecars.
    Backwards-compatible: omitting mode keeps the v2.4.1 behaviour.

Added

  • synthesis_check — story-structure lints for dashboards.
    Three new checks on synthesis/dashboard.html:
    1. BLOCKER on ≥4 Step NN section headings (per-step recap
      antipattern). Tolerates up to 3 (a comparison block
      referencing specific steps is fine).
    2. WARN on 2-3 Step NN headings (graduated nudge to
      claim-driven headings).
    3. WARN on missing hero / TL;DR / headline-finding section in
      the first viewport (any of "Headline", "TL;DR", "Hero",
      "Key finding(s)", "Summary", "Top-line", "Bottom line",
      "At a glance" as heading text or section id satisfies it).
  • synthesis_hygiene — synthesis-directory filename lint.
    Every tool_synthesis_check call now also walks synthesis/ for
    non-canonical files and surfaces per-file rename / delete hints.
    Recognises the four most common AI-improvised names from the
    ontology-mapping audit (paper-lay.mdlay_summary.md;
    REPRODUCIBILITY.md, METHODS.md, CITATIONS.md → delete and
    fold into canonical artefacts). Unknown filenames get a softer
    "move to archive/ or fold into canonical deliverable" warning.
    Subdirectories (figures/, archive/, scripts/,
    dashboard_data/, _typst_templates/) are ignored.
  • workspace_hygiene — workspace-root clutter lint. Every
    tool_synthesis_check call now also walks workspace/ for loose
    files / subdirectories outside the canonical set (methods.md,
    analysis.md, citations.md, researcher_certifications.yaml +
    the logs/, scratch/, archive/, .preregistration/, and
    numbered NN_<slug>/ directories). Loose planning docs, hand-rolled
    audits, .mermaid diagrams, and agent briefs at workspace root get
    per-file relocate hints (move to scratch/, logs/, or
    archive/).
  • Dashboard scaffold rewrite. tool_synthesis_scaffold(kind='dashboard')
    now writes a story-arc skeleton: hero section with metric-card
    grid + interpretive caption slot, key-findings block organised by
    claim, comparison block for adopted-vs-rejected, methods block
    linking to paper.pdf, limitations + open questions, references +
    cite. CSS is inline and CVD-aware. The previous scaffold's
    per-section <!-- AI: ... --> markers explicitly warn against
    per-step recap and step-numbered headings.

Test gate

  • tests/unit/test_v242_synthesis_dashboard_lints.py — 9 new
    regression tests covering: dashboard step-by-step recap BLOCKER,
    hero-section absence WARN, story-driven structure passes,
    synthesis_hygiene flags paper-lay.md / REPRODUCIBILITY.md /
    METHODS.md / CITATIONS.md with the right rename hints,
    workspace_hygiene flags v2_1_*.md / tools.md /
    workflow.mermaid / step_completeness_audit.{md,json} /
    loose subdirectories, curate_figures(mode='all') curates every
    figure with caption sidecars, curate_figures(mode='focal')
    default unchanged, unknown mode rejected.
  • 1630 tests pass (was 1621 in v2.4.1; +9 new).

Not behaviour change

  • The synthesis_check BLOCKER list grew by one (≥4 Step NN
    headings). Projects that want a per-step structure can either
    cap to ≤3 such sections (a comparison block referencing 2-3 steps
    is fine) or set the dashboard mode to a printable / handout
    artefact (those protocols don't run the per-step lint).
  • tool_synthesis_curate_figures continues to default to 'focal'
    mode; no behaviour change for callers that don't pass mode.

v2.4.1

09 Jun 04:07
3368637

Choose a tag to compare

PATCH-then-some release. Lands five of the items the v2.4.0
CHANGELOG explicitly deferred (one of them — research-os refresh
is technically a new CLI subcommand and so a borderline MINOR
addition; the rest are pure cleanups). Shipped as 2.4.1 because the
combined surface change is small and additive: no existing project
or caller breaks; readers + writers stay tolerant; old field names
migrate silently.

Added

  • research-os refresh — new CLI subcommand. Detects drift
    between a project's copies of bundled templates (AGENTS.md,
    CLAUDE.md, .claude/rules/research-os.md, IDE rule files) and
    the version shipped with the installed research-os package.
    Read-only by default; --check exits non-zero on drift (CI-friendly);
    --write [--yes] overwrites drifted project copies; --json emits
    machine-readable output; --regen-readme also rebuilds the project-
    root README.md from current state. Smoke-tested against the
    /scratch/vsetlur/ontology-mapping project that drove the v2.4.0
    audit: correctly flagged the +13-line AGENTS.md drift from the
    rule #10 rewrite and the +1-line .claude/rules/research-os.md
    drift; flagged CLAUDE.md as identical; ignored un-wired IDE rules.
    Closes "no refresh CLI" deferred item.
  • project_ops.regenerate_root_readme(root) — public helper that
    rewrites the project-root README.md with a "Project status" section
    listing actual on-disk numbered step folders (with a one-line
    summary cribbed from each step's README) plus any synthesis
    deliverables present (paper.{typ,pdf}, slides.{typ,pdf},
    poster.{typ,pdf}, dashboard.html). Idempotent. Internal
    _write_project_root_readme gained a force=False kwarg so the
    wizard's skip-if-exists default is preserved.
  • Checkpoint retention tags. create_checkpoint(description, root, *, tag=None, keep=5) now accepts an optional tag (e.g.
    "release-candidate", "before-major-refactor"). Tagged checkpoints
    survive the per-create GC pass; untagged ones beyond keep are
    pruned. .meta.json schema gains an optional tag field;
    list_checkpoints surfaces it.
  • Per-create checkpoint GC. create_checkpoint now calls
    _prune_old_checkpoints immediately after writing the snapshot,
    surfacing the {kept, removed, tagged} report under gc in the
    return envelope. Previously the pruner only ran at numbered-step
    creation, so explicit tool_checkpoint chains accumulated unboundedly
    (audit found one project at 61 MB across 2 checkpoints on a <5 MB
    source tree).

Changed

  • step_summary.yaml soft-deprecated. tool_path_finalize still
    writes the file (downstream readers — synthesis, audits, doctor —
    consume it) but the emit now carries a deprecation banner naming
    the file as DERIVED from conclusions.md, AUTO-GENERATED, "do
    NOT edit by hand", and "slated for removal once readers migrate
    to parsing conclusions.md directly". The payload gains a
    _derived_from: "conclusions.md" field so machine readers can
    detect the soft-deprecation programmatically.
    templates/step_summary.yaml.template gets a matching DEPRECATION
    NOTICE at the top telling new protocol authors NOT to scaffold the
    file and pointing them at conclusions.md prose answers instead.
    The 4 protocols that currently scaffold this file (analysis_plan,
    qualitative_research, close_reading, proof_verification_workflow)
    stay unchanged for back-compat; their migration is queued.
  • Dead state-ledger fields dropped. checkpoint_history and
    rollback_history were written every checkpoint / rollback but
    never read by any code path (the .meta.json sidecars in
    .os_state/checkpoints/ are the authoritative log). rollback()
    no longer appends; _migrate strips both from older state files
    on load. Reduces in-state JSON bloat across long sessions.

Not in this release (planned for v2.5.0 / v3.0)

The v2.4.0 deferral list shrank by 5; the remaining items either
require breaking schema changes or coordinated multi-file migrations:

  • Full step_summary.yaml retirement (delete the writer + migrate
    the 4 protocols that scaffold the editable template to require
    prose in conclusions.md). Breaking for any external reader of the
    file → v3.0.
  • .preregistration/ + .grounding/ directory removal (migrate
    content into per-step preregistration.md + .os_state/grounding.jsonl).
    Touches 20+ readers; needs a back-compat-tolerant migration
    pattern → v2.5.0.
  • Auto-invoked finalize hook at end of synthesis flow (the helper
    exists now via regenerate_root_readme; wiring it to fire
    automatically requires changes to the synthesis check / compile
    tools) → v2.5.0.
  • Per-step logs/ removal + cross-step utility canonical home
    (workspace/scratch/ IS used in practice; needs a positive
    convention before removing the catch-all) → v2.5.0.

Verified

  • Preflight: 29/29 passed.
  • Pytest: all green (12 new tests across refresh CLI + checkpoint
    GC + tag retention).
  • Ruff: clean.

Bumped

  • pyproject.toml, src/research_os/__init__.py, CITATION.cff to
    2.4.1.

v2.4.0

09 Jun 02:05
d00b4a9

Choose a tag to compare

MINOR release. Driven by a 10-perspective adversarial audit of a real
project run (AUDIT_ontology_mapping.md, 233 findings across 10
personas — PI, junior researcher, senior domain reviewer, fresh-AI
handoff, Research-OS architect, code-quality, organization, outputs
quality, docs, reproducibility/citations). The synthesis identified
v2.0–v2.3 as having succeeded at producing consistent structure
(every project gets the same folder layout) but failing at consistent
substance (auto-generated figure captions leaked into papers as
placeholder rows; hallucinated bibliographies survived to submission;
empty literature/ stubs read as "no citations needed" when really the
AI just hadn't downloaded any). v2.4.0 closes the highest-impact gaps
without breaking existing projects.

Added

  • audit_pdf_grounding(entries, root) in
    tools/actions/synthesis/citations.py — reports which citation
    entries have a downloaded PDF on disk vs which don't. Searches
    inputs/literature/<key>.pdf, inputs/literature/<doi-slug>.pdf,
    and workspace/*/literature/<key>.pdf. Returns
    {grounded: [...], ungrounded: [{key, doi, url, title}, ...], count, grounded_count}. Closes the audit's strongest unified
    finding (8/10 auditors): a project shipped 21 references in
    synthesis/references.bib while find . -name '*.pdf' returned
    zero results.
  • require_pdfs flag on write_references_bib — when true, drops
    ungrounded entries from the bib and lists them at the file tail as
    commented-out UNGROUNDED ENTRIES. Default keeps every entry but
    adds a header comment noting how many lack on-disk grounding so the
    gap is visible at the bib level even without opting in.
  • figures: block in researcher_config.yaml — three knobs
    (svg_allowed, summary_sidecar, interactive_html_allowed) that
    control the per-figure sidecar regime. All three default to a lean
    shape (no SVG, no auto-summary, interactive HTML allowed). Added to
    both templates/researcher_config.yaml and the in-code
    CONFIG_TEMPLATE, kept in sync by
    test_config_template_matches_file. figures registered in
    docs/CONTRACT.md A.3 stable-section list.
  • validation_warnings on active_plan.json_persist_active_plan
    now scans the decomposition for entries whose tool field is in
    _REMOVED_TOOLS (tool_synthesize, tool_dashboard,
    tool_slides_create, etc.) and writes a per-step warning. Surfaces
    stale router-index entries at plan-write time so the AI sees them
    before dispatching, not after burning a turn on the friendly
    redirect.

Changed

  • Figure audit no longer warns "PNG without SVG companion" by
    default. audit_figure_quality reads researcher_config.figures.*
    via the new _load_figures_config helper; the SVG warning fires
    only when svg_allowed=true; the summary-sidecar warning fires
    only when summary_sidecar=true. Drops a long-running source of
    false-positive noise.
  • tool_path_finalize stops auto-emitting .summary.md sidecars.
    Plain-English interpretation now integrates into conclusions.md
    next to the inline ![](outputs/figures/<slug>.png) embed. The
    auto-generated sidecars trained the AI to leave stub captions
    ("Auto-drafted caption: regenerate from analysis context") that
    leaked verbatim into one project's synthesis/paper.md as 92
    placeholder rows visibly telling reviewers the AI gave up. Opt back
    in via figures.summary_sidecar=true.
  • AGENTS.md hard rule #10 rewritten. Replaces the "every figure
    carries four sidecars including an SVG companion" mandate with a
    lean default (<slug>.png + an authored <slug>.caption.md), opt-in
    SVG / summary sidecars, encouragement of interactive .html
    companions for visualisation types that benefit (networks,
    multi-panel dashboards), and an explicit requirement that the AI
    sys_file_read every figure before declaring a step done — catches
    legend-over-plot, missing axis labels, palette regressions,
    snake-case-leaking-into-label bugs that no JSON audit catches.
  • _seed_step_subfolder_readmes stops pre-creating stub READMEs
    in literature/, environment/, and context/ per step. These
    dirs stay in EXPERIMENT_SUBDIRS (paths exist) but are empty until
    a tool writes into them. Audit found pre-seeded stubs trained the
    AI to leave dirs as boilerplate; caused literature/ to read as
    "no citations" when really the AI just hadn't downloaded any; and
    cluttered every step folder with content nobody wrote. The README
    that answers "what goes here?" now lives once in
    RESEARCHER_GUIDE.md rather than duplicated 14× on disk.
  • outputs/README.md template updated to reflect the new figure
    contract: reports go DEEPER than conclusions.md (choices,
    reasoning, comparison of options); figures are .png-only by
    default with optional interactive .html companions; AI MUST read
    each figure before finalize.
  • Doc hardening: dropped hardcoded tool / protocol counts across
    README.md, docs/{TOOLS,PROTOCOLS,RESEARCHER_GUIDE,START,AI_GUIDE}.md.
    Replaces "144 tools" / "117 protocols" with vague phrases
    ("~150 tools", "100+ protocols", "every tool", "All core protocols").
    CLAUDE.md doctrine already forbids hand-written counts; the
    maintainer was violating it in 9+ places. Counts go stale within a
    release. CONTRACT.md keeps its v2.0.0-anchored snapshot table.
  • Doc drift fix: README.md:117 code/scripts/. The README
    showed 01_baseline_eda/code/ in its file-layout diagram while the
    framework, RESEARCHER_GUIDE, and every real project use scripts/.
    A junior researcher walking through README and then opening a real
    project would have hit the inconsistency immediately.

Migration

  • Existing projects are unaffected by the figure default change
    audit_figure_quality still reads existing .summary.md and
    .svg files when they're present; the change is that it no longer
    warns on their absence. To restore the v2.3 warning behaviour,
    add to inputs/researcher_config.yaml:
    figures:
      svg_allowed: true
      summary_sidecar: true
  • Existing per-step literature/README.md / environment/README.md /
    context/README.md stubs are not touched
    — the change only
    affects newly-created steps. Delete the stubs by hand if you want
    empty dirs in legacy steps.
  • write_references_bib signature gained two optional kwargs
    (root, require_pdfs). All existing positional calls keep
    working; opt-in to PDF filtering by passing both.
  • AGENTS.md template change does NOT propagate to existing
    projects (the wizard only copies once). Re-run research-os init
    in a temp dir and diff the AGENTS.md against your project's copy
    to pick up the new hard rule #10 wording. A research-os refresh
    CLI subcommand to do this automatically is planned for 2.4.x.

Not in this release (planned for 2.4.x / 2.5.0)

The full audit surfaced ~50 P0 framework changes; this release ships
the highest-impact subset that doesn't break existing projects. The
following remain for follow-up:

  • Per-step step_summary.yaml retirement: the YAML stub anti-pattern
    flagged by 9/10 audits. The derived emit in tool_path_finalize
    stays in 2.4.0; the editable scaffold via step_summary.yaml.template
    • the update_step_summary step in analysis_plan.yaml /
      literature_per_step.yaml await migration to prompt-laden README
      prose.
  • .os_state simplification: collapse state_ledger.json +
    manifest.json overlap, drop dead fields, bound checkpoint storage
    (single snapshot can be 39 MB of duplicate workspace; no GC).
  • research-os refresh CLI subcommand: auto-upgrade
    AGENTS.md / CLAUDE.md / IDE-config templates in an existing
    project to match the bundled current version.
  • Sparse-root finalize hook: regenerate top-level README.md at
    project finalize (currently write-once at init).
  • Per-step logs/ removal + cross-step utility canonical home
    (workspace/scratch/ IS used in practice but the framework doesn't
    document a canonical place for it).
  • Hard removal of .preregistration/ + .grounding/ hidden dirs in
    workspace (content moves into per-step README / methodology.md +
    .os_state/grounding.jsonl).

Verified

  • Preflight: 29/29 passed.
  • Pytest: all green.
  • Ruff: clean.

Bumped

  • pyproject.toml, src/research_os/__init__.py, CITATION.cff to
    2.4.0.

v2.3.0

08 Jun 18:08
870dfa4

Choose a tag to compare

MINOR release. Retires the synthesis auto-generators in favour of
AI-direct authoring: the AI writes synthesis/paper.typ /
slides.typ / poster.typ / essay.typ / dashboard.html directly,
following the matching synthesis protocol. Tools validate and
compile; they no longer generate the prose / layout. The previous
auto-generators produced rigid, low-quality output — a 3MB
monolithic dashboard, a markdown-only paper intermediate, slide
decks no audience could read. Removing them moved 9700+ lines of
generator code out of the codebase and let the synthesis protocols
become true scaffolds (per docs/PROTOCOL_DOCTRINE.md).

Breaking changes

The following tools were removed. Each returns a _REMOVED_TOOLS
redirect message naming the new protocol + surviving tools:

  • tool_synthesize → follow synthesis/synthesis_paper; write
    synthesis/paper.typ directly; compile via tool_typst_compile.
  • tool_dashboard (+ 7 operations: create, story_generate,
    story_edit, story_quality_bar, reviewer_sim, test_generate,
    test_run) → follow synthesis/synthesis_dashboard; write
    synthesis/dashboard.html directly.
  • tool_slides_create → follow synthesis/synthesis_slides; write
    synthesis/slides.typ (Touying); compile via tool_typst_compile.
  • tool_poster_create → follow synthesis/synthesis_poster
    (redirect to synthesis/printable); write synthesis/poster.typ.
  • tool_humanities_essay_scaffold → use
    tool_synthesis_scaffold(kind='essay') + author content.
  • tool_paper_compile_typst → use tool_typst_compile (generic .typ
    → .pdf; the AI authors the .typ directly, no markdown
    intermediate).
  • tool_section_substantiveness → folded into
    tool_synthesis_check(mode='substantiveness') (now also handles
    Typst headings).
  • tool_figure dispatcher and operations caption_synthesise,
    interactive_autogen, paper_autoembed → the AI authors plain-
    English figure summaries, interactive companions, and Typst
    #figure(...) blocks directly when writing the plotting script or
    paper.typ. tool_figure_palette is now a top-level tool.
  • tool_reviewer operation simulate → the AI walks the paper
    through the persona YAMLs in assets/reviewer_personas/ directly
    (tool_reviewer keeps response, rebuttal, compile for real
    external reviews).

The autopilot floor gate enforcement also shifted: tool_typst_compile
replaces tool_synthesize / tool_dashboard(operation='create') as
the final-deliverable gate.

Added

  • tool_typst_compile — generic Typst compiler. Takes any
    AI-authored .typ source (paper, slides, poster, essay,
    cover_letter, response_to_reviewers) and renders the PDF.
    Resolves bundled venue templates from _typst_templates/;
    auto-generates synthesis/biblio.yml from workspace/citations.md
    when missing. Returns pdf_path, page_count, citation_count,
    typst_warnings, typst_errors.
  • tool_synthesis_check — quality audit for AI-authored
    synthesis files. Auto-detects file type from the path. Modes:
    all (default), substantiveness, structure, accessibility,
    cliches. Per-IMRAD-section content depth audits for paper /
    essay; slide-count + speaker-notes + path-leak audits for slides;
    section + headline + QR audits for poster; engineering invariants
    (offline, alt-text, semantic <section id>, no placeholders, no
    filesystem-path leaks) for dashboard.
  • tool_synthesis_scaffold — writes a <=80-line skeleton
    synthesis/<paper|slides|poster|essay>.typ or dashboard.html
    with section headers + // AI: author this section markers.
    Idempotent (refuses overwrite without overwrite=true).
  • tool_figure_palette — promoted from an operation under
    tool_figure to a top-level tool. Returns CVD-safe palettes
    (Okabe-Ito qualitative, viridis sequential, PuOr diverging,
    accent).

Improved

  • Synthesis protocols rewritten as scaffolds. synthesis_paper,
    synthesis_dashboard, synthesis_slides, printable (poster +
    handout), humanities_essay_structure, synthesis_grant,
    synthesis_abstract, synthesis_report, synthesis_lay_summary,
    synthesis_progress_update, synthesis_from_inputs — each
    collapsed from 100-370 lines of prescriptive recipe to <=130 lines
    of scaffold (design principles + quality standards + workflow +
    available tools). Spec files (synthesis_spec.yaml,
    slides_spec.yaml, dashboard_spec.yaml) are no longer required.
  • Cleaner synthesis/ folder. After a full project run:
    paper.typ, paper.pdf, slides.typ, slides.pdf, poster.typ,
    poster.pdf, dashboard.html, biblio.yml, figures/. No
    intermediate .md files, no spec YAMLs, no handout duplicates.
  • researcher_config.yaml schema simplified. The synthesis:
    block is empty by default. Removed knobs:
    figures_auto_embed*, figure_xref_rewrite, slide_engine,
    slide_template, slide_theme, slide_speaker_notes_enabled,
    slide_print_handout, poster_engine, poster_template,
    poster_theme, poster_qr_url, poster_handout_pdf,
    drafter_loop_* (5 knobs).
  • _router_index.yaml v21. Synthesis decompositions point at
    the new tool_synthesize_plantool_synthesis_scaffold
    tool_synthesis_checktool_typst_compile chain.

Removed

  • 9 implementation files under src/research_os/tools/actions/synthesis/:
    dashboard.py (1604 lines), dashboard_app.py (1424), slides.py
    (946), drafter_loop.py (850), reviewer.py (partial — reviewer_simulate),
    figure_auto_embed.py (747), poster_typst.py (697),
    dashboard_humanities.py (465), dashboard_qualitative.py (455),
    humanities_essay.py (212), synthesize.py (1374),
    dashboard_story.py (300). Total: ~9700 lines.
  • src/research_os/tools/actions/viz/dashboard_tests.py (the
    Playwright scaffold for auto-generated dashboards).
  • src/research_os/assets/reveal/ (260 KB), slide_templates/
    (24 KB), poster_templates/ (20 KB) — vendored assets only
    the removed generators consumed.
  • 12 obsolete test files (test_v191_dashboard_app,
    test_v190_dashboard_content, test_dashboard_humanities,
    test_dashboard_qualitative, test_v191_story_mode,
    test_slides_engine, test_poster_typst, test_drafter_loop,
    test_figure_auto_embed, test_humanities_essay_structure,
    test_synthesize_auto_proceed,
    test_synthesize_blocks_on_unresolved_findings,
    test_synthesize_uses_pack_sections, test_paper_drafter_loop,
    test_researcher_config_synthesis,
    test_audit_audit_figure_coverage,
    test_citation_retrieval_empty_response,
    test_audit_findings_explain).

Migration

Existing project files (synthesis/paper.md, synthesis/dashboard.html
from prior versions) are preserved as-is on disk. The new tools do
not regenerate them. To produce the new artefact next to the old:
ask the AI to follow the matching synthesis protocol (e.g. "redo the
paper as Typst") — it will author synthesis/paper.typ and you can
delete the old paper.md once you're happy with the new PDF.

Tool count: 148 → 144 (8 removed + 4 added). Protocol count
unchanged at 117 core.

Bumped

  • pyproject.toml, src/research_os/__init__.py, CITATION.cff
    to 2.3.0.
  • 11 rewritten synthesis-related protocols to version: '2.3.0'.
  • _router_index.yaml to version: 21.

v2.2.0

07 Jun 04:50
eb07174

Choose a tag to compare

MINOR release. Shipped after a 35-agent audit (10 researcher-domain
perspectives, 5 technical, 5 UX, 5 AI-model personas, 5 online-research,
5 meta-improvement) surfaced 119+ findings across 12 themes. The
synthesis selected v2.2.0 over v2.1.2 because 6 p0 + 12 p1 work-items
genuinely add tools and knobs rather than just polish.

Added

  • sys_where — ~30-token mid-session orientation snapshot
    (project_root, tier, active_plan position, unresolved BLOCK count,
    last protocol). Use instead of sys_boot when you only need to
    remember "where am I?".
  • sys_export_ro_crate — emits ro-crate-metadata.json +
    codemeta.json at project root. Closes the FAIR-alignment claim
    that was unbacked in v2.0–v2.1. Discoverable by Zenodo, OSF,
    downstream RO-Crate consumers.
  • sys_export_share_archive now bundles ro-crate-metadata.json
    • codemeta.json + CITATION.cff at archive root automatically.
  • Autopilot floor gates (research_os.server.autopilot_gate) —
    8 floor gates enforce mandatory audits before tier advance, even
    in autopilot mode. Closes the bypass path where autopilot=true
    silently skipped block-severity findings.
  • research-os mcp / research-os api-key / research-os completion
    CLI subcommands (4 → 7). mcp adds/removes external MCP server
    configs (memory, filesystem, github). api-key securely stores
    per-provider keys (chmod 600). completion emits shell completion
    for bash / zsh / fish (uses argcomplete when installed, falls
    back to a hand-rolled script otherwise).
  • argcomplete>=3.0 as the new completion optional extra
    (pip install 'research-os[completion]') + included in dev.
  • model_profile + ai.context_class config knobs
    researcher_config.yaml's ai section now carries
    model_profile: small|medium|large (controls protocol-detail
    level) and context_class: short|long (controls history-window
    size). sys_boot respects both.
  • docs/SECURITY.md — new page documenting path-containment,
    autopilot floor gates, override rationale enforcement, the
    .os_state/overrides.log audit trail, and the boundary between
    trusted and untrusted MCP-tool inputs.
  • research-os doctor expanded to 25+ checks (was 18+).
    New checks include: tool_short_field_present, citation_cff_valid,
    external_pack_entrypoints, embeddings_fresh, and
    docs_referenced_scripts.
  • 22 work-item implementation report ships in docs/SECURITY.md
    • this CHANGELOG entry as evidence of the multi-perspective audit
      that drove this release.

Changed

  • Envelope normalization at the dispatcher. Pack and adapter tools
    that previously returned the legacy {"status", "data"} shape are
    now upgraded to the v2.1.0 envelope by
    research_os.server.envelopes._normalize_envelope, invoked once in
    dispatch._handle_tool_call. Closes the v2.1.0 envelope gap for
    13+ pack + adapter tools in one place rather than per-tool. New
    pack code should call _success / _error directly per
    docs/PLUGIN_AUTHORING.md.
  • RoError(what, why, next_action) signature loosened from
    keyword-only to positional. Matches the contract documented in
    docs/CONTRACT.md A.6.2 verbatim.
  • did_you_mean is namespace-aware for the sys_/tool_/mem_
    prefixes. Typing sys_X now prefers other sys_* matches before
    cross-namespace.
  • Envelope adds next_recommended_call_structured — a
    {"tool": str, "arguments": dict} form derived from
    next_recommended_call when parseable. Strict tool-loop clients
    dispatch this directly without re-parsing free-form text.
  • override_rationale enforcement wired across 9 handler sites
    (synthesis_writing, synthesis_visual, audit_core, audit_gates,
    methodology, meta_workspace.sys_path_create,
    meta_workspace.sys_checkpoint_rollback, tool_step_complete,
    tool_path_finalize). Thin rationales ('TODO', 'preview',
    single-word, <20 chars) are rejected before the underlying audit
    runs. Empty-rationale paired with override flag now returns an
    explicit error instead of silently no-opping.
  • sys_file_* path containment. sys_file_read, sys_file_write,
    sys_file_list, and sys_file_delete now refuse paths that
    resolve outside the workspace root. Closes the host-FS escape
    (../../etc/passwd) that was reachable from any MCP client.
  • CLAUDE.md, FAQ.md, START.md updated to current counts (preflight
    25+, doctor 20+, subcommands 7). Future drift is policed by the
    new preflight_docs_consistency test.

Fixed

  • Test test_audit_version_coherence_rejects_unknown_step_id
    updated to pytest.raises((RoError, FileNotFoundError))
    iteration._step_dir now raises RoError per the contract.
  • docs/CONTRACT.md A.6.1 corrected: the data alias removal is
    slated for v3.0.0 (not v2.2.0 as the row erroneously claimed).
    The alias is preserved in _success / _error through every v2.x
    release for back-compat with v2.0 callers.
  • docs/CONTRACT.md A.3 no longer lists tool_stack as a stable
    top-level researcher_config.yaml section — the key was never
    shipped in templates/researcher_config.yaml.
  • Internal work-item IDs (W##, FIX-#) stripped from tool
    descriptions (audit.py, meta.py, synthesis.py) and
    user-facing docs (SECURITY.md, FAQ.md, AI_GUIDE.md,
    AGENTS.md). Inline # W##: source comments cleaned up
    (substance kept). Future leaks are caught by
    test_tool_description_no_version_chatter.
  • docs/TOOLS.md lists sys_where + sys_export_ro_crate
    both were callable but undocumented after Wave-D.
  • Tool count references updated 146 → 148 across
    docs/{TOOLS,AI_GUIDE,FAQ,RESEARCHER_GUIDE,CONTRACT,START}.md.
    Doctor check count 14/18+ → 20+. START.md subcommand count
    4 → 7 with the full list.

Removed

  • dashboard_v2.py / dashboard_v2_humanities.py /
    dashboard_v2_qualitative.py / humanities_essay_scaffold.py

    deprecation shims (one-minor-cycle removal promised in v2.1.1).
    Canonical paths: dashboard_app, humanities_essay.

Verified

  • Preflight: 29/29 passed.
  • Pytest: 1894 passed, 13 skipped, 0 failed.
  • Ruff: clean.
  • 5 independent validators reviewed the diff by reading + reasoning
    (not pytest): logic, consistency, contract, UX, tests. Their
    2 blockers + 14 concerns were triaged and fixed before release.

Migration

  • No required code changes. Every addition is additive; the data
    envelope alias is kept. Tool argument names unchanged.
  • If your code imported from
    research_os.tools.actions.synthesis.dashboard_v2* or
    research_os.tools.actions.synthesis.humanities_essay_scaffold,
    switch to dashboard_app / humanities_essay (the canonical
    modules). The shims were removed per the v2.1.1 deprecation
    promise.
  • If you parsed envelope["data"], that still works through every
    2.x release. Switch to envelope["payload"] before v3.0.0.

v2.1.1

06 Jun 17:54
4b1fa26

Choose a tag to compare

PATCH release. Pure cleanup — no behavior changes, no new tools, no
new protocols, no API or tool-signature changes.

Changed

  • Source files renamed to canonical names (no _v2, _scaffold,
    etc.): humanities_essay_scaffold.pyhumanities_essay.py
    (back-compat shim kept at the old path through v2.2.0). The
    dashboard_v2*.py shims created in v2.1.0 stay in place for one
    more minor cycle per the migration table (removed v2.2.0). 11
    unit-test filenames dropped a redundant _v2 suffix
    (test_audit_audit_*_v2.pytest_audit_audit_*.py,
    test_router_output_v2.pytest_router_output.py).
  • docs/ folder reduced to one file per concept, no version
    suffixes. Version-tagged historical reports + working-session
    scratchpads removed (preserved in git history; recover via
    git show v2.1.0:docs/<file>). Final shape: 22 markdown files +
    2 mermaid diagrams (PROTOCOL_GRAPH.mermaid, workflow_dag.mermaid).
  • docs/README.md rewritten as a single audience-routing page
    (researchers / AI agents + plugin authors / maintainers +
    integrators).
  • Root README.md release badge bumped to v2.1.1; deep links to the
    deleted V2_RELEASE_NOTES + MIGRATION_v1_to_v2 docs replaced with
    pointers to CHANGELOG.md (with [2.0.0] section hint where the
    context warrants it).
  • Code + protocol comments swept for historical-version references:
    ~115 strips across 23 files (server, audit/state, synthesis/viz,
    cli + plugins, router_index protocols). 1 pure-historical block
    deleted. Git log + CHANGELOG carry version history; live doctrine
    stays focused on current behavior. Stable surfaces (e.g.
    _REMOVED_TOOLS migration data, the canonical replacement entry
    points) were KEPT — those name the version because the version is
    load-bearing user-facing data, not commentary.

Added

  • .gitignore entries blocking future creation of version-tagged
    docs + handoff scratchpads in docs/. Patterns added:
    /docs/v*_handoff/, /docs/*_handoff/, /docs/AUDIT_v*.md,
    /docs/USABILITY_v*.md, /docs/CHANGELOG_DETAILED_v*.md,
    /docs/MIGRATION_v*.md, /docs/V[0-9]*.md, /docs/V[0-9]*/,
    /docs/audit_v*/, /docs/usability_v*/, /docs/PHASE_*.md,
    /docs/archive/. Prevents the clutter from recurring; future
    sessions that try to write these paths get them silently ignored.

Verified

  • MCP wiring smoke (in /tmp/ro_v211_mcp/): research-os init
    scaffolds correctly, .claude/mcp.json writes the standard
    research-os start config, research-os doctor reports
    mcp_configs_wired: pass, research-os start boots cleanly,
    and TOOL_DEFINITIONS count (146) matches the v2.1.0 surface
    (unchanged).

Migration

  • No code changes required. Imports from old _v2 paths still
    resolve via the deprecation shim (removed v2.2.0).
  • Imports of from research_os.tools.actions.synthesis.humanities_essay_scaffold import scaffold_humanities_essay
    keep working via the new 2-line shim at the old path; update at
    your convenience to
    from research_os.tools.actions.synthesis.humanities_essay import scaffold_humanities_essay.
  • Anyone with local edits to deleted docs: recover via
    git show v2.1.0:docs/<file> (or any tag where the file lived)
    and re-save outside the repo as a personal note.