Releases · VibhavSetlur/Research-OS

17 Jun 15:54

v3.1.0

d6b87aa

v3.1.0 Latest

Latest

MINOR release. Backwards-compatible: every existing tool keeps its name + schema
(new capability is added via new operations/scopes and a new alias). Driven by a
fresh 11-area discovery audit of the 3.0.0 codebase.

Added

Compiled routing sidecar (_route_meta.json). Routing no longer parses the
104K _router_index.yaml at runtime. build_embeddings.py compiles a compact,
comments-free JSON mirror (protocols/shortcut_intents/hierarchy + pre-baked
tier + workflow_shape) that router.py and semantic.py share via a single
load. It parses ~300× faster (~0.42 ms vs ~126 ms) and removes the per-route
protocol-body reads. The YAML stays the authoring source; --route-meta-only
rebuilds the sidecar without fastembed. New preflight gate validates the sidecar
is fresh + consistent + embeddings-parity-checked.
tool_verify(scope='outputs') — the "did the work actually land?" gate.
Resolves a protocol's declared expected_outputs against the filesystem
(glob-aware) and reports each present / empty / missing with a next_action.
The injected protocol-completion step now requires it before logging
completed, so the system refuses to call a missing or empty file "done".
docs/VERSIONING.md documents the in-project versioning convention.
sys_path(operation='rename') — give a generic analysis step a meaningful
label. Keeps the NN_ lineage number, renames the folder, and re-points every
downstream data/* symlink that targeted it. sys_step is now an alias for
sys_path (the clearer name for numbered steps).
Routing-targets preflight gate — every next_protocol / on_failure /
see_also must point at a real protocol (dangling links were previously silent).

Improved

Figures. tool_figure_palette('accent') now returns the exact RO_PALETTE
colours apply_research_os_style applies (a hand-coloured figure matches an
auto-styled one); adds diverging_emphasis. audit_figure_quality runs its
text-overlap + default-font (DejaVu) legibility scan on a PNG's sibling SVG too,
and a corrupt/empty image now warns instead of crashing the audit.
Synthesis deliverables. tool_typst_compile archives the prior render to
synthesis/archive/<name>_<timestamp>.pdf before overwriting (no silent
clobber), flags single-page-target overflow (poster/cover-letter rendered to

1 page = content overflowed — where overlapping text shows up), and counts
pages without the off-by-one /Type /Pages miscount. The poster check no longer
false-blocks scaffold-authored posters (#headline / #block-section).
Wizard. Ctrl+C/Ctrl+D mid-wizard exits cleanly instead of a traceback; the
"already exists" check moved to the start (no more filling out the whole wizard
to be rejected at the end); email + ORCID inputs are format-validated.
Doctrine. power_analysis replaced its data-shape→test-family menu with
scaffold form (name the dimensions that fix the test, justify the choice).

Fixed

state_freshness_check read workspace/state.json — a file that never exists —
so the staleness signal was permanently dead; now reads the real ledger.
get_dag_path / add_dag_node stopped persisting the constant
execution_dag_path back into the ledger every call (write churn + schema noise).
Dead-end pause detection read protocol_name but the execution log writes
protocol — the signal was silently dead.
Autopilot gate used str.lstrip('./') (strips characters), so .synthesis/x
was mangled into synthesis/x and falsely gated; now strips the prefix properly.
Maintainer docs (CLAUDE.md) pointed at the long-gone src/research_os/server.py
monolith — repointed to the server/ package; dropped stale protocol counts.

Bumped

version → 3.1.0 (pyproject / __init__ / CITATION); router index counter → 27.

Deferred (tracked for a future release)

Tool-cluster consolidation (SLURM 4→1) — aliased, low user-visible benefit.
A first-class renamable BRANCH object + retro-organization of loose work
(higher-risk state-schema change beyond the step rename shipped here).
Deeper audit-gate hardening (claims-gate-on-by-default, ship-gate
rerun-resolution) — behavior-changing, staged separately.

Assets 2

17 Jun 13:04

github-actions

v3.0.0

6179a1f

v3.0.0

MAJOR release. Research OS now fits the shape of your work — classic
linear analysis, iterative tool/software building, lightweight exploration,
notebook-driven analysis, or a multi-study program — instead of assuming one
shape. Alongside the modes it turns several "advertised but unenforced"
rigor promises into enforced ones, overhauls routing for both beginners and
deep-critic PIs, and improves every protocol.

Added — Workspace modes

workspace.mode in researcher_config.yaml (+ research-os init --workspace-mode, and a wizard "What are you building?" step):
analysis (default, unchanged) · tool_build · exploration ·
notebook · multi_study. A SCAFFOLD_PROFILES registry scaffolds each
shape; state, router, and audits dispatch on the active mode.
tool_build mode — Research OS governs an inner project from above:
spec/ + decisions/ (ADRs) + eval/ (the harness that defines "done")
- milestones.md + governance.md, with the tool itself in an inner dir
  that gets its OWN git init. "Done" = tests + build + eval pass.
build/ protocol family — spec_and_design → implement_iteration
(loop) → test_strategy → benchmark_vs_baseline → release_and_changelog.
Plus exploration/ (triage → loop → promote-to-step) and notebook/ +
program/ orienting protocols.
tool_git (inner-repo version control; commits stamped with the RO
step for provenance), tool_build (configured build/test/lint
runner), and tool_audit(scope='tool', dimension=tests|git_hygiene|build).

Added — Rigor that is actually enforced

tool_finalize_project — a server ship-gate that HARD-BLOCKS "done"
on unresolved audit blockers, cited-but-invalid PDFs, ungrounded numbers,
or stub sections, unless a logged researcher override clears it.
PDF integrity — literature downloads are validated by the %PDF-
magic header; a renamed 403/HTML page is deleted + recorded, never counted
as a paper. Every PDF count uses magic validation, not glob("*.pdf").
Substrate-checked grounding — tool_verify now checks a claim against
its cited file (a number is "verified" only if the source actually
contains it; self-asserted support becomes "unverified").

Added — Beginner ↔ PI gradient

tool_explain — a layered, grounded tutor (intuition → mechanics →
caveats → when-not-to-use → reading list) for any skill level.
tool_deliverable_chooser — an output_types-gated "I'm done, what
now?" on-ramp.
Mode-scoped tool listing — the per-turn catalog shrinks from 151 to
~113–128 tools.
Router overhaul — beginner-vocabulary layer ("i have a csv what do i
do", "make a chart", "is my result significant"), a confidence-margin gate
that asks instead of confidently misrouting, capped reckless single-word
triggers, mode-aware routing bias, and workflow_shape as a routing signal.

Improved

Every protocol swept for doctrine compliance: hardcoded thresholds /
method menus / canned step sequences replaced with "name-the-dimension +
cite-the-source" scaffolds; scope-tag mislabels fixed; see_also
cross-links added.
Typst deliverables compile across all 12 venues (uniform template/conf).
Audits read the real synthesis/paper.typ (dual Typst + Markdown), so the
rigor gates no longer silently no-op.
New researcher docs: TOOL_BUILDER.md, a beginner on-ramp in START.md,
workspace modes documented across the guides.
New scripts/lint_coherence.py preflight gate: docs/templates can no
longer reference a removed tool or hand-write a tool/protocol count.

Fixed

All 7 IDE rule templates + docs purged of removed-tool references
(tool_plan_*, tool_synthesize/dashboard/figure, tool_grounding_*).
synthesis_check no longer reports success on a message-less error.
11 broken documentation cross-references.

Behaviour changes that may affect existing projects (why this is MAJOR)

A project with placeholder/HTML files named *.pdf will see them no
longer counted as literature — the literature gate may newly fire (add a
real PDF, or override with a rationale).
tool_finalize_project can refuse to finalize a project with unresolved
blockers; previously every blocker was advisory.
tool_verify returns unverified for self-asserted claims that do not
resolve to a cited source.
New field workspace.mode defaults to analysis; existing projects keep
classic behaviour with no change.

Migration

Analysis projects upgrade with no changes (mode defaults to analysis,
byte-identical scaffold). Re-run research-os init --refresh to pick up
the updated AGENTS.md / IDE rules.
The planned tool-cluster consolidations (merging the SLURM / exec / route
families; sys_path → sys_step) are deferred to 3.1.0 and will ship with
aliases so no call site breaks.

Assets 2

10 Jun 19:23

github-actions

v2.4.4

ae963c5

v2.4.4

PATCH release. Lifts the visual quality of both the figures AI produces
and the dashboard chrome that surrounds them to match a published
Research-OS reference deliverable (cream background, italic serif
titles, muted CVD-safe accent palette, value-labels-above-bars, clean
spines, generous whitespace). Also turns the loose "look at the
rendered figure" guidance into a mandatory render → view → v2 loop so
the AI can no longer ship a figure it never opened.

Added

research_os.tools.actions.viz.style — new module exporting
the Research-OS publication style preset for matplotlib:
- apply_research_os_style(destination=..., palette=...) —
  one call sets rcParams (cream bg, serif typography, dropped
  spines, dotted horizontal grid, constrained_layout on,
  300 dpi save) and returns a context dict with the destination
  figsize + palette so the AI's first render lands close to
  publication-ready. Destinations: single_col / two_col /
  full_width / slide / slide_half / dashboard /
  dashboard_tile / poster.
- RO_PALETTE — five muted CVD-safe accents
  (navy #1F4D7A, olive #9B7E2D, forest #3F6049, oxblood
  #9B3737, mustard #C3A14E) plus a diverging emphasis pair
  (oxblood / forest) and a neutral chrome set
  (cream / warm-dark / muted / hairline).
- DESTINATION_FIGSIZES — pre-tuned (width, height) for every
  destination so the AI doesn't pick a 6×4 default that crops at
  print size.
- label_bars_above(ax, bars, unit="ms") — italic value labels
  floating 2 % above each bar, matching the reference figure
  aesthetic (467 ms, 128 ms, …). Reserves headroom so the
  label doesn't crash into the next bar in a stacked chart.
- label_diverging_bars(ax, bars, values) — signed delta labels
  coloured forest (positive) / oxblood (negative) for the
  diverging-bar comparison panel.
- polish_axes(ax) — re-asserts top + right spine off and
  dotted horizontal grid on a specific axes after the AI built
  the chart.
- apply_suptitle(fig, title, subtitle=...) — italic serif
  suptitle + smaller subtitle line, positioned to never overlap
  constrained_layout.
- Graceful import: when matplotlib isn't installed, the module
  still imports and returns applied=False from
  apply_research_os_style instead of raising.
first_render_spacing_discipline: block in
visualization/figure_guidelines — a 9-item upfront discipline
(pick destination, leave y-margin for value labels, plan legend
placement, decide tick rotation, reserve suptitle headroom) so the
FIRST render doesn't need a v2 to fix spacing. Calls out the
matplotlib tight_layout() ↔ constrained_layout conflict.
visually_verify_render step in
visualization/visualization_workflow — the workflow's
counterpart to the strengthened pre_publish_self_review step in
figure_guidelines. Both protocols now teach the same mandatory
render → open the PNG → check overlap / clipping / legend
placement / palette cohesion → write v2 if anything fails → only
ship v_final loop.
Research-OS accent palette in audit_color_palette — the
five RO_PALETTE accents + the neutral chrome colours are now in
the allowed-palette set, so dashboards built from the new
scaffold and figures generated through apply_research_os_style
no longer trip the out-of-palette warning.

Changed

synthesis/scaffold._DASHBOARD_HTML — full CSS rewrite to
match the reference figure aesthetic. Cream background, two-font
stack (EB Garamond serif for titles + figure captions, Inter sans
for body), italic serif h1 / h2 / h3 / table headers, muted
accent palette as CSS variables (--accent navy, --accent-gold
olive, --accent-green forest, --accent-red oxblood,
--accent-mustard), hairline rule colour for separators,
near-white cards on cream, italic figcaption for figure
interpretation, print-friendly fallback retained. Adds an
.eyebrow line and a .lead paragraph class in the hero so the
TL;DR has room to breathe.
visualization/figure_guidelines (v2.0.0 → v2.4.4) — adds the
research_os_style_preset reference block, the
first_render_spacing_discipline rules, the new set_up_canvas
step (call apply_research_os_style BEFORE writing the chart
code), and rewrites pre_publish_self_review into the mandatory
open-the-PNG view loop with explicit sys_file_read filepath=...
instructions and a 14-item OBSERVATION checklist that the human
eye must verify against the rendered pixels.
visualization/visualization_workflow (v2.0.0 → v2.4.4) —
inserts visually_verify_render after build_each_figure so the
on-demand figure workflow inherits the same loop. Updates
build_each_figure to mention apply_research_os_style + the
spacing discipline.
synthesis/synthesis_dashboard (v2.4.3 → v2.4.4) — adds a
"visual cohesion with the figures" principle pointing at
apply_research_os_style(); bumps version.

Test gate

tests/unit/test_viz_style.py — new file covering the style
preset surface (palette has 5+ entries, DESTINATION_FIGSIZES has
the expected destinations, apply_research_os_style returns the
context dict, helpers no-op safely without matplotlib bars).
tests/unit/test_v244_dashboard_style.py — new file covering
the dashboard CSS rewrite (cream bg + accent palette present in
scaffold, section IDs preserved, print stylesheet retained, new
accent palette passes audit_color_palette without warnings,
protocol YAMLs updated with the new spacing + view loop language).
preflight passes · pytest passes · ruff clean.

Not behaviour change for existing projects

Pre-v2.4.4 dashboards on disk are untouched — the new CSS only
applies to scaffolds created after upgrading. Re-scaffold with
tool_synthesis_scaffold(kind='dashboard', overwrite=true) to
adopt the new style.
The style preset is opt-in. Plotting scripts that don't import
apply_research_os_style continue to render with matplotlib
defaults. The figure_guidelines protocol recommends adopting
the preset for visual cohesion with the dashboard, but doesn't
reject figures that depart from it (journal templates win).

Assets 2

09 Jun 16:12

github-actions

v2.4.3

615c9b3

v2.4.3

PATCH release. Closes two architectural holes that the v2.4.2
ontology-mapping audit surfaced as the root cause of "AI auto-creates
deliverables the user didn't ask for":

The synthesis pipeline was hardcoded to synthesis_paper.
get_next_protocol() in tools/actions/protocol.py ran a fixed
9-step PIPELINE ending at synthesis_paper regardless of the
researcher's declared research_goal.output_types. A project whose
wizard answer was output_types: [dashboard] still saw "next is
synthesis_paper" from the loader. Fixed: pipeline is now the
universal analysis prefix + a synthesis tail filtered by declared
output_types. Empty output_types falls back to synthesis_paper
(legacy behaviour preserved).
next_protocol chains auto-fired in every autonomy mode. Six
protocols silently chained: analysis_plan → reproducibility
(every analysis step triggered an audit), audit_and_validation → synthesis_paper (every audit triggered a paper draft),
reproducibility → audit_and_validation,
cox_ph_diagnostics → audit, missing_data_strategy → audit,
qualitative_quality_audit → audit. Fixed: each chain now carries
an explicit AUTONOMY GATE annotation telling the AI to suggest
(not auto-chain) in manual / supervised / coaching modes.
Single-step requests stop at the requested step.

Three parallel Explore-agent audits drove the fix. Reports captured
in chat transcripts (not checked in).

Added

output_types_gate(root, kind, *, autonomy=None) in
tools/actions/synthesis/check.py. Returns
{verdict: 'proceed'|'ask'|'skip', declared_outputs, message, kind}.
- proceed when output_types is empty (no preference declared) OR
  kind is in the declared set.
- ask when output_types is non-empty and kind is NOT in the
  set; the returned message is a one-line prompt the AI lifts
  verbatim to the researcher.
- skip reserved for future use (researcher explicitly opted out).
- Normalises aliases (lay-summary → lay_summary,
  Lay Summary → lay_summary) and ignores the exploratory
  sentinel (which marks "no deliverable yet").
tool_synthesis_scaffold(kind, confirmed=false) — new
confirmed kwarg. When the output_types gate returns ask and the
caller has not passed confirmed=true (or the existing
overwrite=true), the scaffold returns status='ask' instead of
writing. The AI is expected to surface the message to the researcher
and only re-call with confirmed=true after they say yes. Prevents
the failure mode where the AI auto-creates a paper / dashboard /
poster the user never asked for.
SYNTHESIS_OUTPUT_TYPE_MAP in
tools/actions/protocol.py — single source of truth mapping each
output_types keyword (paper, dashboard, poster, slides,
report, lay_summary, grant, abstract, essay, handout) to
its synthesis protocol + "done" predicate. New synthesis protocols
must register here to participate in pipeline filtering.

Changed

get_next_protocol(root) consults
inputs/researcher_config.yaml#research_goal.output_types. The
analysis prefix (session_boot → project_startup → domain →
methodology → literature → analysis_plan → reproducibility →
audit_and_validation) is unchanged. The synthesis tail is now
dynamic — for output_types: [dashboard, lay_summary], the
pipeline terminates at synthesis/synthesis_lay_summary (in
declared order), NOT synthesis/synthesis_paper. Empty list →
fallback to synthesis_paper (no regression for unfilled projects).
Response envelope gains declared_output_types field.
tool_synthesis_check envelope gains an intent_gate field. If
the kind being audited isn't in declared output_types, the gate's
one-line message also appears in warnings.
synthesis_paper (v2.3.0 → 2.4.3) — adds an explicit
verify_intent first step that reads output_types and stops the
AI if paper isn't declared. Closes the auto-create-paper failure.
synthesis_dashboard (2.4.2 → 2.4.3) — same verify_intent
first step for dashboard. Reinforces the existing trigger gate.
synthesis_slides (2.3.0 → 2.4.3) — slides prerequisite added.
synthesis_lay_summary (2.4.2 → 2.4.3) — lay_summary
prerequisite added.
printable (2.3.0 → 2.4.3) — poster / handout prerequisite
added.
Autonomy-gate annotations added to the six high-risk
next_protocol chains (guidance/analysis_plan →
reproducibility/reproducibility; audit/audit_and_validation →
synthesis/synthesis_paper; reproducibility/reproducibility →
audit/audit_and_validation; the three methodology/* audits →
audit/audit_and_validation;
visualization/interactive_dashboard_design →
synthesis/synthesis_dashboard). Each carries a comment block
telling the AI to surface "Next: ..." as a SUGGESTION in
manual / supervised / coaching modes; only autopilot
auto-chains, and autopilot further gates synthesis chains on
output_types membership.

Test cleanup

Renamed tests/unit/test_v242_synthesis_dashboard_lints.py
→ tests/unit/test_synthesis_check.py. Per-release test naming
(test_v<version>_<feature>.py) is now retired as a convention;
new tests for this surface land in the topic-named file.
+13 regression tests in the renamed file covering:
output_types_gate proceed / ask / empty / alias-normalisation /
exploratory-sentinel; synthesis_scaffold returns ask / honours
confirmed=true / proceeds for declared kinds; pipeline tail
respects dashboard-only / paper+lay_summary / empty-fallback;
synthesis_check envelope includes the intent_gate field.

Test gate

preflight 29/29 · pytest 1643/1643 (+13 new in
test_synthesis_check.py) · ruff clean.

Not behaviour change for existing projects

Projects with output_types: [] (or no researcher_config.yaml)
see the legacy fallback: synthesis_paper terminal, no ask
envelopes. This is the on-disk default for every project initialised
before v2.4.3 and for every fresh research-os init until the
researcher fills in output_types. Recommendation: update the
wizard answer once to make the intent explicit.
The AUTONOMY GATE annotations are guidance to the AI, not
loader-enforced refusals (which would be MINOR-shaped). An AI client
that ignores them will still see the same next_protocol values it
did in v2.4.2; correct AI behaviour is now spelled out in the
protocol text + the new gate helper.

Assets 2

09 Jun 15:22

github-actions

v2.4.2

7c34564

v2.4.2

PATCH release. Six fixes driven by an audit of the
/scratch/vsetlur/ontology-mapping v2.1 synthesis run, which surfaced
a recurring failure mode: the AI authoring a slap-together dashboard
(one section per workspace step, figure + caption underneath each),
inventing non-canonical filenames (paper-lay.md, REPRODUCIBILITY.md,
METHODS.md, CITATIONS.md) in synthesis/, and leaving behind
random .md / .mermaid / .json clutter at workspace/ root. All
fixes are protocol guidance, scaffold rewrites, lint additions, and
one tool-mode extension; no breaking changes to existing APIs.

Changed

synthesis/synthesis_dashboard — rewrite (v2.3.0 → v2.4.2).
The protocol now explicitly forbids the per-step recap antipattern
and requires a custom, story-driven structure: Hero / Headline →
Key findings (organised by claim, not by step) → Comparison
(adopted vs ruled out) → Methods → Limitations → References.
Introduces an explicit choice between Plan-mode (collaborative
outline) and Autopilot (AI picks the headline finding and structure)
before scaffolding. Quality bar now lists forbidden_structure
(per-step recap, directory dump, caption-only sections) and
required_structure (hero + ≥3 claim-driven findings sections).
synthesis/synthesis_paper — clarification (v2.3.0 → v2.4.2).
States explicitly that synthesis/paper.pdf is mandatory before the
paper deliverable is "done" (a stranded paper.md with no rendered
PDF is a blocker). Lists the four most common AI-improvised
filenames that downstream tools do NOT recognise (paper-lay.md,
REPRODUCIBILITY.md, METHODS.md, CITATIONS.md) and points each
at its canonical destination.
synthesis/synthesis_lay_summary — clarification (v2.0.0 → v2.4.2).
Canonical filename is synthesis/lay_summary.md (not paper-lay.md,
lay.md, summary.md, paper_lay.md); downstream tools recognise
only the canonical name.
writing/writing_conclusions — figure/table citations (v2.0.0 →
v2.4.2). Per-step conclusions.md template gains a mandatory
Figures + tables produced section that lifts directly into
paper / dashboard / slides synthesis. Every Findings bullet must
cite at least one figure / table / output file produced by the
step; an unciteable finding is rejected. The Statistical summary
table gains a Source column. Closes the gap where downstream
synthesis stages had to guess which figures backed which findings.
tool_synthesis_curate_figures — multi-figure curation. New
mode parameter: 'focal' (default, unchanged behaviour — one
focal figure per step, named figNN_<slug>.png for paper.typ) and
'all' (every figure in every step's outputs/figures/, named
with the step number prefix, plus every figure's caption sidecar
copied or seeded). The 'all' mode fixes the failure where the
AI bypasses curation and writes step figures directly to
synthesis/figures/, leaving them without .caption.md sidecars.
Backwards-compatible: omitting mode keeps the v2.4.1 behaviour.

Added

synthesis_check — story-structure lints for dashboards.
Three new checks on synthesis/dashboard.html:
1. BLOCKER on ≥4 Step NN section headings (per-step recap
  antipattern). Tolerates up to 3 (a comparison block
  referencing specific steps is fine).
2. WARN on 2-3 Step NN headings (graduated nudge to
  claim-driven headings).
3. WARN on missing hero / TL;DR / headline-finding section in
  the first viewport (any of "Headline", "TL;DR", "Hero",
  "Key finding(s)", "Summary", "Top-line", "Bottom line",
  "At a glance" as heading text or section id satisfies it).
synthesis_hygiene — synthesis-directory filename lint.
Every tool_synthesis_check call now also walks synthesis/ for
non-canonical files and surfaces per-file rename / delete hints.
Recognises the four most common AI-improvised names from the
ontology-mapping audit (paper-lay.md → lay_summary.md;
REPRODUCIBILITY.md, METHODS.md, CITATIONS.md → delete and
fold into canonical artefacts). Unknown filenames get a softer
"move to archive/ or fold into canonical deliverable" warning.
Subdirectories (figures/, archive/, scripts/,
dashboard_data/, _typst_templates/) are ignored.
workspace_hygiene — workspace-root clutter lint. Every
tool_synthesis_check call now also walks workspace/ for loose
files / subdirectories outside the canonical set (methods.md,
analysis.md, citations.md, researcher_certifications.yaml +
the logs/, scratch/, archive/, .preregistration/, and
numbered NN_<slug>/ directories). Loose planning docs, hand-rolled
audits, .mermaid diagrams, and agent briefs at workspace root get
per-file relocate hints (move to scratch/, logs/, or
archive/).
Dashboard scaffold rewrite. tool_synthesis_scaffold(kind='dashboard')
now writes a story-arc skeleton: hero section with metric-card
grid + interpretive caption slot, key-findings block organised by
claim, comparison block for adopted-vs-rejected, methods block
linking to paper.pdf, limitations + open questions, references +
cite. CSS is inline and CVD-aware. The previous scaffold's
per-section  markers explicitly warn against
per-step recap and step-numbered headings.

Test gate

tests/unit/test_v242_synthesis_dashboard_lints.py — 9 new
regression tests covering: dashboard step-by-step recap BLOCKER,
hero-section absence WARN, story-driven structure passes,
synthesis_hygiene flags paper-lay.md / REPRODUCIBILITY.md /
METHODS.md / CITATIONS.md with the right rename hints,
workspace_hygiene flags v2_1_*.md / tools.md /
workflow.mermaid / step_completeness_audit.{md,json} /
loose subdirectories, curate_figures(mode='all') curates every
figure with caption sidecars, curate_figures(mode='focal')
default unchanged, unknown mode rejected.
1630 tests pass (was 1621 in v2.4.1; +9 new).

Not behaviour change

The synthesis_check BLOCKER list grew by one (≥4 Step NN
headings). Projects that want a per-step structure can either
cap to ≤3 such sections (a comparison block referencing 2-3 steps
is fine) or set the dashboard mode to a printable / handout
artefact (those protocols don't run the per-step lint).
tool_synthesis_curate_figures continues to default to 'focal'
mode; no behaviour change for callers that don't pass mode.

Assets 2

09 Jun 04:07

github-actions

v2.4.1

3368637

v2.4.1

PATCH-then-some release. Lands five of the items the v2.4.0
CHANGELOG explicitly deferred (one of them — research-os refresh —
is technically a new CLI subcommand and so a borderline MINOR
addition; the rest are pure cleanups). Shipped as 2.4.1 because the
combined surface change is small and additive: no existing project
or caller breaks; readers + writers stay tolerant; old field names
migrate silently.

Added

research-os refresh — new CLI subcommand. Detects drift
between a project's copies of bundled templates (AGENTS.md,
CLAUDE.md, .claude/rules/research-os.md, IDE rule files) and
the version shipped with the installed research-os package.
Read-only by default; --check exits non-zero on drift (CI-friendly);
--write [--yes] overwrites drifted project copies; --json emits
machine-readable output; --regen-readme also rebuilds the project-
root README.md from current state. Smoke-tested against the
/scratch/vsetlur/ontology-mapping project that drove the v2.4.0
audit: correctly flagged the +13-line AGENTS.md drift from the
rule #10 rewrite and the +1-line .claude/rules/research-os.md
drift; flagged CLAUDE.md as identical; ignored un-wired IDE rules.
Closes "no refresh CLI" deferred item.
project_ops.regenerate_root_readme(root) — public helper that
rewrites the project-root README.md with a "Project status" section
listing actual on-disk numbered step folders (with a one-line
summary cribbed from each step's README) plus any synthesis
deliverables present (paper.{typ,pdf}, slides.{typ,pdf},
poster.{typ,pdf}, dashboard.html). Idempotent. Internal
_write_project_root_readme gained a force=False kwarg so the
wizard's skip-if-exists default is preserved.
Checkpoint retention tags. create_checkpoint(description, root, *, tag=None, keep=5) now accepts an optional tag (e.g.
"release-candidate", "before-major-refactor"). Tagged checkpoints
survive the per-create GC pass; untagged ones beyond keep are
pruned. .meta.json schema gains an optional tag field;
list_checkpoints surfaces it.
Per-create checkpoint GC. create_checkpoint now calls
_prune_old_checkpoints immediately after writing the snapshot,
surfacing the {kept, removed, tagged} report under gc in the
return envelope. Previously the pruner only ran at numbered-step
creation, so explicit tool_checkpoint chains accumulated unboundedly
(audit found one project at 61 MB across 2 checkpoints on a <5 MB
source tree).

Changed

step_summary.yaml soft-deprecated. tool_path_finalize still
writes the file (downstream readers — synthesis, audits, doctor —
consume it) but the emit now carries a deprecation banner naming
the file as DERIVED from conclusions.md, AUTO-GENERATED, "do
NOT edit by hand", and "slated for removal once readers migrate
to parsing conclusions.md directly". The payload gains a
_derived_from: "conclusions.md" field so machine readers can
detect the soft-deprecation programmatically.
templates/step_summary.yaml.template gets a matching DEPRECATION
NOTICE at the top telling new protocol authors NOT to scaffold the
file and pointing them at conclusions.md prose answers instead.
The 4 protocols that currently scaffold this file (analysis_plan,
qualitative_research, close_reading, proof_verification_workflow)
stay unchanged for back-compat; their migration is queued.
Dead state-ledger fields dropped. checkpoint_history and
rollback_history were written every checkpoint / rollback but
never read by any code path (the .meta.json sidecars in
.os_state/checkpoints/ are the authoritative log). rollback()
no longer appends; _migrate strips both from older state files
on load. Reduces in-state JSON bloat across long sessions.

Not in this release (planned for v2.5.0 / v3.0)

The v2.4.0 deferral list shrank by 5; the remaining items either
require breaking schema changes or coordinated multi-file migrations:

Full step_summary.yaml retirement (delete the writer + migrate
the 4 protocols that scaffold the editable template to require
prose in conclusions.md). Breaking for any external reader of the
file → v3.0.
.preregistration/ + .grounding/ directory removal (migrate
content into per-step preregistration.md + .os_state/grounding.jsonl).
Touches 20+ readers; needs a back-compat-tolerant migration
pattern → v2.5.0.
Auto-invoked finalize hook at end of synthesis flow (the helper
exists now via regenerate_root_readme; wiring it to fire
automatically requires changes to the synthesis check / compile
tools) → v2.5.0.
Per-step logs/ removal + cross-step utility canonical home
(workspace/scratch/ IS used in practice; needs a positive
convention before removing the catch-all) → v2.5.0.

Verified

Preflight: 29/29 passed.
Pytest: all green (12 new tests across refresh CLI + checkpoint
GC + tag retention).
Ruff: clean.

Bumped

pyproject.toml, src/research_os/__init__.py, CITATION.cff to
2.4.1.

Assets 2

09 Jun 02:05

github-actions

v2.4.0

d00b4a9

v2.4.0

MINOR release. Driven by a 10-perspective adversarial audit of a real
project run (AUDIT_ontology_mapping.md, 233 findings across 10
personas — PI, junior researcher, senior domain reviewer, fresh-AI
handoff, Research-OS architect, code-quality, organization, outputs
quality, docs, reproducibility/citations). The synthesis identified
v2.0–v2.3 as having succeeded at producing consistent structure
(every project gets the same folder layout) but failing at consistent
substance (auto-generated figure captions leaked into papers as
placeholder rows; hallucinated bibliographies survived to submission;
empty literature/ stubs read as "no citations needed" when really the
AI just hadn't downloaded any). v2.4.0 closes the highest-impact gaps
without breaking existing projects.

Added

audit_pdf_grounding(entries, root) in
tools/actions/synthesis/citations.py — reports which citation
entries have a downloaded PDF on disk vs which don't. Searches
inputs/literature/<key>.pdf, inputs/literature/<doi-slug>.pdf,
and workspace/*/literature/<key>.pdf. Returns
{grounded: [...], ungrounded: [{key, doi, url, title}, ...], count, grounded_count}. Closes the audit's strongest unified
finding (8/10 auditors): a project shipped 21 references in
synthesis/references.bib while find . -name '*.pdf' returned
zero results.
require_pdfs flag on write_references_bib — when true, drops
ungrounded entries from the bib and lists them at the file tail as
commented-out UNGROUNDED ENTRIES. Default keeps every entry but
adds a header comment noting how many lack on-disk grounding so the
gap is visible at the bib level even without opting in.
figures: block in researcher_config.yaml — three knobs
(svg_allowed, summary_sidecar, interactive_html_allowed) that
control the per-figure sidecar regime. All three default to a lean
shape (no SVG, no auto-summary, interactive HTML allowed). Added to
both templates/researcher_config.yaml and the in-code
CONFIG_TEMPLATE, kept in sync by
test_config_template_matches_file. figures registered in
docs/CONTRACT.md A.3 stable-section list.
validation_warnings on active_plan.json — _persist_active_plan
now scans the decomposition for entries whose tool field is in
_REMOVED_TOOLS (tool_synthesize, tool_dashboard,
tool_slides_create, etc.) and writes a per-step warning. Surfaces
stale router-index entries at plan-write time so the AI sees them
before dispatching, not after burning a turn on the friendly
redirect.

Changed

Figure audit no longer warns "PNG without SVG companion" by
default. audit_figure_quality reads researcher_config.figures.*
via the new _load_figures_config helper; the SVG warning fires
only when svg_allowed=true; the summary-sidecar warning fires
only when summary_sidecar=true. Drops a long-running source of
false-positive noise.
tool_path_finalize stops auto-emitting .summary.md sidecars.
Plain-English interpretation now integrates into conclusions.md
next to the inline ![](outputs/figures/<slug>.png) embed. The
auto-generated sidecars trained the AI to leave stub captions
("Auto-drafted caption: regenerate from analysis context") that
leaked verbatim into one project's synthesis/paper.md as 92
placeholder rows visibly telling reviewers the AI gave up. Opt back
in via figures.summary_sidecar=true.
AGENTS.md hard rule #10 rewritten. Replaces the "every figure
carries four sidecars including an SVG companion" mandate with a
lean default (<slug>.png + an authored <slug>.caption.md), opt-in
SVG / summary sidecars, encouragement of interactive .html
companions for visualisation types that benefit (networks,
multi-panel dashboards), and an explicit requirement that the AI
sys_file_read every figure before declaring a step done — catches
legend-over-plot, missing axis labels, palette regressions,
snake-case-leaking-into-label bugs that no JSON audit catches.
_seed_step_subfolder_readmes stops pre-creating stub READMEs
in literature/, environment/, and context/ per step. These
dirs stay in EXPERIMENT_SUBDIRS (paths exist) but are empty until
a tool writes into them. Audit found pre-seeded stubs trained the
AI to leave dirs as boilerplate; caused literature/ to read as
"no citations" when really the AI just hadn't downloaded any; and
cluttered every step folder with content nobody wrote. The README
that answers "what goes here?" now lives once in
RESEARCHER_GUIDE.md rather than duplicated 14× on disk.
outputs/README.md template updated to reflect the new figure
contract: reports go DEEPER than conclusions.md (choices,
reasoning, comparison of options); figures are .png-only by
default with optional interactive .html companions; AI MUST read
each figure before finalize.
Doc hardening: dropped hardcoded tool / protocol counts across
README.md, docs/{TOOLS,PROTOCOLS,RESEARCHER_GUIDE,START,AI_GUIDE}.md.
Replaces "144 tools" / "117 protocols" with vague phrases
("~150 tools", "100+ protocols", "every tool", "All core protocols").
CLAUDE.md doctrine already forbids hand-written counts; the
maintainer was violating it in 9+ places. Counts go stale within a
release. CONTRACT.md keeps its v2.0.0-anchored snapshot table.
Doc drift fix: README.md:117 code/ → scripts/. The README
showed 01_baseline_eda/code/ in its file-layout diagram while the
framework, RESEARCHER_GUIDE, and every real project use scripts/.
A junior researcher walking through README and then opening a real
project would have hit the inconsistency immediately.

Migration

Existing projects are unaffected by the figure default change
— audit_figure_quality still reads existing .summary.md and
.svg files when they're present; the change is that it no longer
warns on their absence. To restore the v2.3 warning behaviour,
add to inputs/researcher_config.yaml:
```
figures:
  svg_allowed: true
  summary_sidecar: true
```
Existing per-step literature/README.md / environment/README.md /
context/README.md stubs are not touched — the change only
affects newly-created steps. Delete the stubs by hand if you want
empty dirs in legacy steps.
write_references_bib signature gained two optional kwargs
(root, require_pdfs). All existing positional calls keep
working; opt-in to PDF filtering by passing both.
AGENTS.md template change does NOT propagate to existing
projects (the wizard only copies once). Re-run research-os init
in a temp dir and diff the AGENTS.md against your project's copy
to pick up the new hard rule #10 wording. A research-os refresh
CLI subcommand to do this automatically is planned for 2.4.x.

Not in this release (planned for 2.4.x / 2.5.0)

The full audit surfaced ~50 P0 framework changes; this release ships
the highest-impact subset that doesn't break existing projects. The
following remain for follow-up:

Per-step step_summary.yaml retirement: the YAML stub anti-pattern
flagged by 9/10 audits. The derived emit in tool_path_finalize
stays in 2.4.0; the editable scaffold via step_summary.yaml.template
- the update_step_summary step in analysis_plan.yaml /
  literature_per_step.yaml await migration to prompt-laden README
  prose.
.os_state simplification: collapse state_ledger.json +
manifest.json overlap, drop dead fields, bound checkpoint storage
(single snapshot can be 39 MB of duplicate workspace; no GC).
research-os refresh CLI subcommand: auto-upgrade
AGENTS.md / CLAUDE.md / IDE-config templates in an existing
project to match the bundled current version.
Sparse-root finalize hook: regenerate top-level README.md at
project finalize (currently write-once at init).
Per-step logs/ removal + cross-step utility canonical home
(workspace/scratch/ IS used in practice but the framework doesn't
document a canonical place for it).
Hard removal of .preregistration/ + .grounding/ hidden dirs in
workspace (content moves into per-step README / methodology.md +
.os_state/grounding.jsonl).

Verified

Preflight: 29/29 passed.
Pytest: all green.
Ruff: clean.

Bumped

pyproject.toml, src/research_os/__init__.py, CITATION.cff to
2.4.0.

Assets 2

08 Jun 18:08

github-actions

v2.3.0

870dfa4

v2.3.0

MINOR release. Retires the synthesis auto-generators in favour of
AI-direct authoring: the AI writes synthesis/paper.typ /
slides.typ / poster.typ / essay.typ / dashboard.html directly,
following the matching synthesis protocol. Tools validate and
compile; they no longer generate the prose / layout. The previous
auto-generators produced rigid, low-quality output — a 3MB
monolithic dashboard, a markdown-only paper intermediate, slide
decks no audience could read. Removing them moved 9700+ lines of
generator code out of the codebase and let the synthesis protocols
become true scaffolds (per docs/PROTOCOL_DOCTRINE.md).

Breaking changes

The following tools were removed. Each returns a _REMOVED_TOOLS
redirect message naming the new protocol + surviving tools:

tool_synthesize → follow synthesis/synthesis_paper; write
synthesis/paper.typ directly; compile via tool_typst_compile.
tool_dashboard (+ 7 operations: create, story_generate,
story_edit, story_quality_bar, reviewer_sim, test_generate,
test_run) → follow synthesis/synthesis_dashboard; write
synthesis/dashboard.html directly.
tool_slides_create → follow synthesis/synthesis_slides; write
synthesis/slides.typ (Touying); compile via tool_typst_compile.
tool_poster_create → follow synthesis/synthesis_poster
(redirect to synthesis/printable); write synthesis/poster.typ.
tool_humanities_essay_scaffold → use
tool_synthesis_scaffold(kind='essay') + author content.
tool_paper_compile_typst → use tool_typst_compile (generic .typ
→ .pdf; the AI authors the .typ directly, no markdown
intermediate).
tool_section_substantiveness → folded into
tool_synthesis_check(mode='substantiveness') (now also handles
Typst headings).
tool_figure dispatcher and operations caption_synthesise,
interactive_autogen, paper_autoembed → the AI authors plain-
English figure summaries, interactive companions, and Typst
#figure(...) blocks directly when writing the plotting script or
paper.typ. tool_figure_palette is now a top-level tool.
tool_reviewer operation simulate → the AI walks the paper
through the persona YAMLs in assets/reviewer_personas/ directly
(tool_reviewer keeps response, rebuttal, compile for real
external reviews).

The autopilot floor gate enforcement also shifted: tool_typst_compile
replaces tool_synthesize / tool_dashboard(operation='create') as
the final-deliverable gate.

Added

tool_typst_compile — generic Typst compiler. Takes any
AI-authored .typ source (paper, slides, poster, essay,
cover_letter, response_to_reviewers) and renders the PDF.
Resolves bundled venue templates from _typst_templates/;
auto-generates synthesis/biblio.yml from workspace/citations.md
when missing. Returns pdf_path, page_count, citation_count,
typst_warnings, typst_errors.
tool_synthesis_check — quality audit for AI-authored
synthesis files. Auto-detects file type from the path. Modes:
all (default), substantiveness, structure, accessibility,
cliches. Per-IMRAD-section content depth audits for paper /
essay; slide-count + speaker-notes + path-leak audits for slides;
section + headline + QR audits for poster; engineering invariants
(offline, alt-text, semantic <section id>, no placeholders, no
filesystem-path leaks) for dashboard.
tool_synthesis_scaffold — writes a <=80-line skeleton
synthesis/<paper|slides|poster|essay>.typ or dashboard.html
with section headers + // AI: author this section markers.
Idempotent (refuses overwrite without overwrite=true).
tool_figure_palette — promoted from an operation under
tool_figure to a top-level tool. Returns CVD-safe palettes
(Okabe-Ito qualitative, viridis sequential, PuOr diverging,
accent).

Improved

Synthesis protocols rewritten as scaffolds. synthesis_paper,
synthesis_dashboard, synthesis_slides, printable (poster +
handout), humanities_essay_structure, synthesis_grant,
synthesis_abstract, synthesis_report, synthesis_lay_summary,
synthesis_progress_update, synthesis_from_inputs — each
collapsed from 100-370 lines of prescriptive recipe to <=130 lines
of scaffold (design principles + quality standards + workflow +
available tools). Spec files (synthesis_spec.yaml,
slides_spec.yaml, dashboard_spec.yaml) are no longer required.
Cleaner synthesis/ folder. After a full project run:
paper.typ, paper.pdf, slides.typ, slides.pdf, poster.typ,
poster.pdf, dashboard.html, biblio.yml, figures/. No
intermediate .md files, no spec YAMLs, no handout duplicates.
researcher_config.yaml schema simplified. The synthesis:
block is empty by default. Removed knobs:
figures_auto_embed*, figure_xref_rewrite, slide_engine,
slide_template, slide_theme, slide_speaker_notes_enabled,
slide_print_handout, poster_engine, poster_template,
poster_theme, poster_qr_url, poster_handout_pdf,
drafter_loop_* (5 knobs).
_router_index.yaml v21. Synthesis decompositions point at
the new tool_synthesize_plan → tool_synthesis_scaffold →
tool_synthesis_check → tool_typst_compile chain.

Removed

9 implementation files under src/research_os/tools/actions/synthesis/:
dashboard.py (1604 lines), dashboard_app.py (1424), slides.py
(946), drafter_loop.py (850), reviewer.py (partial — reviewer_simulate),
figure_auto_embed.py (747), poster_typst.py (697),
dashboard_humanities.py (465), dashboard_qualitative.py (455),
humanities_essay.py (212), synthesize.py (1374),
dashboard_story.py (300). Total: ~9700 lines.
src/research_os/tools/actions/viz/dashboard_tests.py (the
Playwright scaffold for auto-generated dashboards).
src/research_os/assets/reveal/ (260 KB), slide_templates/
(24 KB), poster_templates/ (20 KB) — vendored assets only
the removed generators consumed.
12 obsolete test files (test_v191_dashboard_app,
test_v190_dashboard_content, test_dashboard_humanities,
test_dashboard_qualitative, test_v191_story_mode,
test_slides_engine, test_poster_typst, test_drafter_loop,
test_figure_auto_embed, test_humanities_essay_structure,
test_synthesize_auto_proceed,
test_synthesize_blocks_on_unresolved_findings,
test_synthesize_uses_pack_sections, test_paper_drafter_loop,
test_researcher_config_synthesis,
test_audit_audit_figure_coverage,
test_citation_retrieval_empty_response,
test_audit_findings_explain).

Migration

Existing project files (synthesis/paper.md, synthesis/dashboard.html
from prior versions) are preserved as-is on disk. The new tools do
not regenerate them. To produce the new artefact next to the old:
ask the AI to follow the matching synthesis protocol (e.g. "redo the
paper as Typst") — it will author synthesis/paper.typ and you can
delete the old paper.md once you're happy with the new PDF.

Tool count: 148 → 144 (8 removed + 4 added). Protocol count
unchanged at 117 core.

Bumped

pyproject.toml, src/research_os/__init__.py, CITATION.cff
to 2.3.0.
11 rewritten synthesis-related protocols to version: '2.3.0'.
_router_index.yaml to version: 21.

Assets 2

07 Jun 04:50

github-actions

v2.2.0

eb07174

v2.2.0

MINOR release. Shipped after a 35-agent audit (10 researcher-domain
perspectives, 5 technical, 5 UX, 5 AI-model personas, 5 online-research,
5 meta-improvement) surfaced 119+ findings across 12 themes. The
synthesis selected v2.2.0 over v2.1.2 because 6 p0 + 12 p1 work-items
genuinely add tools and knobs rather than just polish.

Added

sys_where — ~30-token mid-session orientation snapshot
(project_root, tier, active_plan position, unresolved BLOCK count,
last protocol). Use instead of sys_boot when you only need to
remember "where am I?".
sys_export_ro_crate — emits ro-crate-metadata.json +
codemeta.json at project root. Closes the FAIR-alignment claim
that was unbacked in v2.0–v2.1. Discoverable by Zenodo, OSF,
downstream RO-Crate consumers.
sys_export_share_archive now bundles ro-crate-metadata.json
- codemeta.json + CITATION.cff at archive root automatically.
Autopilot floor gates (research_os.server.autopilot_gate) —
8 floor gates enforce mandatory audits before tier advance, even
in autopilot mode. Closes the bypass path where autopilot=true
silently skipped block-severity findings.
research-os mcp / research-os api-key / research-os completion
CLI subcommands (4 → 7). mcp adds/removes external MCP server
configs (memory, filesystem, github). api-key securely stores
per-provider keys (chmod 600). completion emits shell completion
for bash / zsh / fish (uses argcomplete when installed, falls
back to a hand-rolled script otherwise).
argcomplete>=3.0 as the new completion optional extra
(pip install 'research-os[completion]') + included in dev.
model_profile + ai.context_class config knobs —
researcher_config.yaml's ai section now carries
model_profile: small|medium|large (controls protocol-detail
level) and context_class: short|long (controls history-window
size). sys_boot respects both.
docs/SECURITY.md — new page documenting path-containment,
autopilot floor gates, override rationale enforcement, the
.os_state/overrides.log audit trail, and the boundary between
trusted and untrusted MCP-tool inputs.
research-os doctor expanded to 25+ checks (was 18+).
New checks include: tool_short_field_present, citation_cff_valid,
external_pack_entrypoints, embeddings_fresh, and
docs_referenced_scripts.
22 work-item implementation report ships in docs/SECURITY.md
- this CHANGELOG entry as evidence of the multi-perspective audit
  that drove this release.

Changed

Envelope normalization at the dispatcher. Pack and adapter tools
that previously returned the legacy {"status", "data"} shape are
now upgraded to the v2.1.0 envelope by
research_os.server.envelopes._normalize_envelope, invoked once in
dispatch._handle_tool_call. Closes the v2.1.0 envelope gap for
13+ pack + adapter tools in one place rather than per-tool. New
pack code should call _success / _error directly per
docs/PLUGIN_AUTHORING.md.
RoError(what, why, next_action) signature loosened from
keyword-only to positional. Matches the contract documented in
docs/CONTRACT.md A.6.2 verbatim.
did_you_mean is namespace-aware for the sys_/tool_/mem_
prefixes. Typing sys_X now prefers other sys_* matches before
cross-namespace.
Envelope adds next_recommended_call_structured — a
{"tool": str, "arguments": dict} form derived from
next_recommended_call when parseable. Strict tool-loop clients
dispatch this directly without re-parsing free-form text.
override_rationale enforcement wired across 9 handler sites
(synthesis_writing, synthesis_visual, audit_core, audit_gates,
methodology, meta_workspace.sys_path_create,
meta_workspace.sys_checkpoint_rollback, tool_step_complete,
tool_path_finalize). Thin rationales ('TODO', 'preview',
single-word, <20 chars) are rejected before the underlying audit
runs. Empty-rationale paired with override flag now returns an
explicit error instead of silently no-opping.
sys_file_* path containment. sys_file_read, sys_file_write,
sys_file_list, and sys_file_delete now refuse paths that
resolve outside the workspace root. Closes the host-FS escape
(../../etc/passwd) that was reachable from any MCP client.
CLAUDE.md, FAQ.md, START.md updated to current counts (preflight
25+, doctor 20+, subcommands 7). Future drift is policed by the
new preflight_docs_consistency test.

Fixed

Test test_audit_version_coherence_rejects_unknown_step_id
updated to pytest.raises((RoError, FileNotFoundError)) —
iteration._step_dir now raises RoError per the contract.
docs/CONTRACT.md A.6.1 corrected: the data alias removal is
slated for v3.0.0 (not v2.2.0 as the row erroneously claimed).
The alias is preserved in _success / _error through every v2.x
release for back-compat with v2.0 callers.
docs/CONTRACT.md A.3 no longer lists tool_stack as a stable
top-level researcher_config.yaml section — the key was never
shipped in templates/researcher_config.yaml.
Internal work-item IDs (W##, FIX-#) stripped from tool
descriptions (audit.py, meta.py, synthesis.py) and
user-facing docs (SECURITY.md, FAQ.md, AI_GUIDE.md,
AGENTS.md). Inline # W##: source comments cleaned up
(substance kept). Future leaks are caught by
test_tool_description_no_version_chatter.
docs/TOOLS.md lists sys_where + sys_export_ro_crate —
both were callable but undocumented after Wave-D.
Tool count references updated 146 → 148 across
docs/{TOOLS,AI_GUIDE,FAQ,RESEARCHER_GUIDE,CONTRACT,START}.md.
Doctor check count 14/18+ → 20+. START.md subcommand count
4 → 7 with the full list.

Removed

dashboard_v2.py / dashboard_v2_humanities.py /
dashboard_v2_qualitative.py / humanities_essay_scaffold.py
deprecation shims (one-minor-cycle removal promised in v2.1.1).
Canonical paths: dashboard_app, humanities_essay.

Verified

Preflight: 29/29 passed.
Pytest: 1894 passed, 13 skipped, 0 failed.
Ruff: clean.
5 independent validators reviewed the diff by reading + reasoning
(not pytest): logic, consistency, contract, UX, tests. Their
2 blockers + 14 concerns were triaged and fixed before release.

Migration

No required code changes. Every addition is additive; the data
envelope alias is kept. Tool argument names unchanged.
If your code imported from
research_os.tools.actions.synthesis.dashboard_v2* or
research_os.tools.actions.synthesis.humanities_essay_scaffold,
switch to dashboard_app / humanities_essay (the canonical
modules). The shims were removed per the v2.1.1 deprecation
promise.
If you parsed envelope["data"], that still works through every
2.x release. Switch to envelope["payload"] before v3.0.0.

Assets 2

06 Jun 17:54

github-actions

v2.1.1

4b1fa26

v2.1.1

PATCH release. Pure cleanup — no behavior changes, no new tools, no
new protocols, no API or tool-signature changes.

Changed

Source files renamed to canonical names (no _v2, _scaffold,
etc.): humanities_essay_scaffold.py → humanities_essay.py
(back-compat shim kept at the old path through v2.2.0). The
dashboard_v2*.py shims created in v2.1.0 stay in place for one
more minor cycle per the migration table (removed v2.2.0). 11
unit-test filenames dropped a redundant _v2 suffix
(test_audit_audit_*_v2.py → test_audit_audit_*.py,
test_router_output_v2.py → test_router_output.py).
docs/ folder reduced to one file per concept, no version
suffixes. Version-tagged historical reports + working-session
scratchpads removed (preserved in git history; recover via
git show v2.1.0:docs/<file>). Final shape: 22 markdown files +
2 mermaid diagrams (PROTOCOL_GRAPH.mermaid, workflow_dag.mermaid).
docs/README.md rewritten as a single audience-routing page
(researchers / AI agents + plugin authors / maintainers +
integrators).
Root README.md release badge bumped to v2.1.1; deep links to the
deleted V2_RELEASE_NOTES + MIGRATION_v1_to_v2 docs replaced with
pointers to CHANGELOG.md (with [2.0.0] section hint where the
context warrants it).
Code + protocol comments swept for historical-version references:
~115 strips across 23 files (server, audit/state, synthesis/viz,
cli + plugins, router_index protocols). 1 pure-historical block
deleted. Git log + CHANGELOG carry version history; live doctrine
stays focused on current behavior. Stable surfaces (e.g.
_REMOVED_TOOLS migration data, the canonical replacement entry
points) were KEPT — those name the version because the version is
load-bearing user-facing data, not commentary.

Added

.gitignore entries blocking future creation of version-tagged
docs + handoff scratchpads in docs/. Patterns added:
/docs/v*_handoff/, /docs/*_handoff/, /docs/AUDIT_v*.md,
/docs/USABILITY_v*.md, /docs/CHANGELOG_DETAILED_v*.md,
/docs/MIGRATION_v*.md, /docs/V[0-9]*.md, /docs/V[0-9]*/,
/docs/audit_v*/, /docs/usability_v*/, /docs/PHASE_*.md,
/docs/archive/. Prevents the clutter from recurring; future
sessions that try to write these paths get them silently ignored.

Verified

MCP wiring smoke (in /tmp/ro_v211_mcp/): research-os init
scaffolds correctly, .claude/mcp.json writes the standard
research-os start config, research-os doctor reports
mcp_configs_wired: pass, research-os start boots cleanly,
and TOOL_DEFINITIONS count (146) matches the v2.1.0 surface
(unchanged).

Migration

No code changes required. Imports from old _v2 paths still
resolve via the deprecation shim (removed v2.2.0).
Imports of from research_os.tools.actions.synthesis.humanities_essay_scaffold import scaffold_humanities_essay
keep working via the new 2-line shim at the old path; update at
your convenience to
from research_os.tools.actions.synthesis.humanities_essay import scaffold_humanities_essay.
Anyone with local edits to deleted docs: recover via
git show v2.1.0:docs/<file> (or any tag where the file lived)
and re-save outside the repo as a personal note.

Assets 2

Releases: VibhavSetlur/Research-OS

v3.1.0

Added

Improved

Fixed

Bumped

Deferred (tracked for a future release)

Uh oh!

v3.0.0

Added — Workspace modes

Added — Rigor that is actually enforced

Added — Beginner ↔ PI gradient

Improved

Fixed

Behaviour changes that may affect existing projects (why this is MAJOR)

Migration

Uh oh!

v2.4.4

Added

Changed

Test gate

Not behaviour change for existing projects

Uh oh!

v2.4.3

Added

Changed

Test cleanup

Test gate

Not behaviour change for existing projects

Uh oh!

v2.4.2

Changed

Added

Test gate

Not behaviour change

Uh oh!

v2.4.1

Added

Changed

Not in this release (planned for v2.5.0 / v3.0)

Verified

Bumped

Uh oh!

v2.4.0

Added

Changed

Migration

Not in this release (planned for 2.4.x / 2.5.0)

Verified

Bumped

Uh oh!

v2.3.0

Breaking changes

Added

Improved

Removed

Migration

Bumped

Uh oh!

v2.2.0

Added

Changed

Fixed

Removed

Verified

Migration

Uh oh!

v2.1.1

Changed

Added

Verified

Migration

Uh oh!