Releases: VibhavSetlur/Research-OS
v3.1.0
MINOR release. Backwards-compatible: every existing tool keeps its name + schema
(new capability is added via new operations/scopes and a new alias). Driven by a
fresh 11-area discovery audit of the 3.0.0 codebase.
Added
- Compiled routing sidecar (
_route_meta.json). Routing no longer parses the
104K_router_index.yamlat runtime.build_embeddings.pycompiles a compact,
comments-free JSON mirror (protocols/shortcut_intents/hierarchy + pre-baked
tier+workflow_shape) thatrouter.pyandsemantic.pyshare via a single
load. It parses ~300× faster (~0.42 ms vs ~126 ms) and removes the per-route
protocol-body reads. The YAML stays the authoring source;--route-meta-only
rebuilds the sidecar without fastembed. New preflight gate validates the sidecar
is fresh + consistent + embeddings-parity-checked. tool_verify(scope='outputs')— the "did the work actually land?" gate.
Resolves a protocol's declaredexpected_outputsagainst the filesystem
(glob-aware) and reports each present / empty / missing with anext_action.
The injected protocol-completion step now requires it before logging
completed, so the system refuses to call a missing or empty file "done".
docs/VERSIONING.mddocuments the in-project versioning convention.sys_path(operation='rename')— give a generic analysis step a meaningful
label. Keeps theNN_lineage number, renames the folder, and re-points every
downstreamdata/*symlink that targeted it.sys_stepis now an alias for
sys_path(the clearer name for numbered steps).- Routing-targets preflight gate — every
next_protocol/on_failure/
see_alsomust point at a real protocol (dangling links were previously silent).
Improved
- Figures.
tool_figure_palette('accent')now returns the exactRO_PALETTE
coloursapply_research_os_styleapplies (a hand-coloured figure matches an
auto-styled one); addsdiverging_emphasis.audit_figure_qualityruns its
text-overlap + default-font (DejaVu) legibility scan on a PNG's sibling SVG too,
and a corrupt/empty image now warns instead of crashing the audit. - Synthesis deliverables.
tool_typst_compilearchives the prior render to
synthesis/archive/<name>_<timestamp>.pdfbefore overwriting (no silent
clobber), flags single-page-target overflow (poster/cover-letter rendered to1 page = content overflowed — where overlapping text shows up), and counts
pages without the off-by-one/Type /Pagesmiscount. The poster check no longer
false-blocks scaffold-authored posters (#headline/#block-section). - Wizard. Ctrl+C/Ctrl+D mid-wizard exits cleanly instead of a traceback; the
"already exists" check moved to the start (no more filling out the whole wizard
to be rejected at the end); email + ORCID inputs are format-validated. - Doctrine.
power_analysisreplaced its data-shape→test-family menu with
scaffold form (name the dimensions that fix the test, justify the choice).
Fixed
state_freshness_checkreadworkspace/state.json— a file that never exists —
so the staleness signal was permanently dead; now reads the real ledger.get_dag_path/add_dag_nodestopped persisting the constant
execution_dag_pathback into the ledger every call (write churn + schema noise).- Dead-end pause detection read
protocol_namebut the execution log writes
protocol— the signal was silently dead. - Autopilot gate used
str.lstrip('./')(strips characters), so.synthesis/x
was mangled intosynthesis/xand falsely gated; now strips the prefix properly. - Maintainer docs (CLAUDE.md) pointed at the long-gone
src/research_os/server.py
monolith — repointed to theserver/package; dropped stale protocol counts.
Bumped
version→ 3.1.0 (pyproject /__init__/ CITATION); router index counter → 27.
Deferred (tracked for a future release)
- Tool-cluster consolidation (SLURM 4→1) — aliased, low user-visible benefit.
- A first-class renamable BRANCH object + retro-organization of loose work
(higher-risk state-schema change beyond the step rename shipped here). - Deeper audit-gate hardening (claims-gate-on-by-default, ship-gate
rerun-resolution) — behavior-changing, staged separately.
v3.0.0
MAJOR release. Research OS now fits the shape of your work — classic
linear analysis, iterative tool/software building, lightweight exploration,
notebook-driven analysis, or a multi-study program — instead of assuming one
shape. Alongside the modes it turns several "advertised but unenforced"
rigor promises into enforced ones, overhauls routing for both beginners and
deep-critic PIs, and improves every protocol.
Added — Workspace modes
workspace.modeinresearcher_config.yaml(+research-os init --workspace-mode, and a wizard "What are you building?" step):
analysis(default, unchanged) ·tool_build·exploration·
notebook·multi_study. ASCAFFOLD_PROFILESregistry scaffolds each
shape; state, router, and audits dispatch on the active mode.tool_buildmode — Research OS governs an inner project from above:
spec/+decisions/(ADRs) +eval/(the harness that defines "done")milestones.md+governance.md, with the tool itself in an inner dir
that gets its OWNgit init. "Done" = tests + build + eval pass.
build/protocol family —spec_and_design→implement_iteration
(loop) →test_strategy→benchmark_vs_baseline→release_and_changelog.
Plusexploration/(triage → loop → promote-to-step) andnotebook/+
program/orienting protocols.tool_git(inner-repo version control; commits stamped with the RO
step for provenance),tool_build(configured build/test/lint
runner), andtool_audit(scope='tool', dimension=tests|git_hygiene|build).
Added — Rigor that is actually enforced
tool_finalize_project— a server ship-gate that HARD-BLOCKS "done"
on unresolved audit blockers, cited-but-invalid PDFs, ungrounded numbers,
or stub sections, unless a logged researcher override clears it.- PDF integrity — literature downloads are validated by the
%PDF-
magic header; a renamed 403/HTML page is deleted + recorded, never counted
as a paper. Every PDF count uses magic validation, notglob("*.pdf"). - Substrate-checked grounding —
tool_verifynow checks a claim against
its cited file (a number is "verified" only if the source actually
contains it; self-asserted support becomes "unverified").
Added — Beginner ↔ PI gradient
tool_explain— a layered, grounded tutor (intuition → mechanics →
caveats → when-not-to-use → reading list) for any skill level.tool_deliverable_chooser— anoutput_types-gated "I'm done, what
now?" on-ramp.- Mode-scoped tool listing — the per-turn catalog shrinks from 151 to
~113–128 tools. - Router overhaul — beginner-vocabulary layer ("i have a csv what do i
do", "make a chart", "is my result significant"), a confidence-margin gate
that asks instead of confidently misrouting, capped reckless single-word
triggers, mode-aware routing bias, andworkflow_shapeas a routing signal.
Improved
- Every protocol swept for doctrine compliance: hardcoded thresholds /
method menus / canned step sequences replaced with "name-the-dimension +
cite-the-source" scaffolds; scope-tag mislabels fixed;see_also
cross-links added. - Typst deliverables compile across all 12 venues (uniform
template/conf). - Audits read the real
synthesis/paper.typ(dual Typst + Markdown), so the
rigor gates no longer silently no-op. - New researcher docs:
TOOL_BUILDER.md, a beginner on-ramp inSTART.md,
workspace modes documented across the guides. - New
scripts/lint_coherence.pypreflight gate: docs/templates can no
longer reference a removed tool or hand-write a tool/protocol count.
Fixed
- All 7 IDE rule templates + docs purged of removed-tool references
(tool_plan_*,tool_synthesize/dashboard/figure,tool_grounding_*). synthesis_checkno longer reports success on a message-less error.- 11 broken documentation cross-references.
Behaviour changes that may affect existing projects (why this is MAJOR)
- A project with placeholder/HTML files named
*.pdfwill see them no
longer counted as literature — the literature gate may newly fire (add a
real PDF, or override with a rationale). tool_finalize_projectcan refuse to finalize a project with unresolved
blockers; previously every blocker was advisory.tool_verifyreturnsunverifiedfor self-asserted claims that do not
resolve to a cited source.- New field
workspace.modedefaults toanalysis; existing projects keep
classic behaviour with no change.
Migration
- Analysis projects upgrade with no changes (mode defaults to
analysis,
byte-identical scaffold). Re-runresearch-os init --refreshto pick up
the updatedAGENTS.md/ IDE rules. - The planned tool-cluster consolidations (merging the SLURM / exec / route
families;sys_path→sys_step) are deferred to 3.1.0 and will ship with
aliases so no call site breaks.
v2.4.4
PATCH release. Lifts the visual quality of both the figures AI produces
and the dashboard chrome that surrounds them to match a published
Research-OS reference deliverable (cream background, italic serif
titles, muted CVD-safe accent palette, value-labels-above-bars, clean
spines, generous whitespace). Also turns the loose "look at the
rendered figure" guidance into a mandatory render → view → v2 loop so
the AI can no longer ship a figure it never opened.
Added
research_os.tools.actions.viz.style— new module exporting
the Research-OS publication style preset for matplotlib:apply_research_os_style(destination=..., palette=...)—
one call sets rcParams (cream bg, serif typography, dropped
spines, dotted horizontal grid, constrained_layout on,
300 dpi save) and returns a context dict with the destination
figsize + palette so the AI's first render lands close to
publication-ready. Destinations:single_col/two_col/
full_width/slide/slide_half/dashboard/
dashboard_tile/poster.RO_PALETTE— five muted CVD-safe accents
(navy#1F4D7A, olive#9B7E2D, forest#3F6049, oxblood
#9B3737, mustard#C3A14E) plus a diverging emphasis pair
(oxblood / forest) and a neutral chrome set
(cream / warm-dark / muted / hairline).DESTINATION_FIGSIZES— pre-tuned(width, height)for every
destination so the AI doesn't pick a 6×4 default that crops at
print size.label_bars_above(ax, bars, unit="ms")— italic value labels
floating 2 % above each bar, matching the reference figure
aesthetic (467 ms,128 ms, …). Reserves headroom so the
label doesn't crash into the next bar in a stacked chart.label_diverging_bars(ax, bars, values)— signed delta labels
coloured forest (positive) / oxblood (negative) for the
diverging-bar comparison panel.polish_axes(ax)— re-asserts top + right spine off and
dotted horizontal grid on a specific axes after the AI built
the chart.apply_suptitle(fig, title, subtitle=...)— italic serif
suptitle + smaller subtitle line, positioned to never overlap
constrained_layout.- Graceful import: when matplotlib isn't installed, the module
still imports and returnsapplied=Falsefrom
apply_research_os_styleinstead of raising.
first_render_spacing_discipline:block in
visualization/figure_guidelines— a 9-item upfront discipline
(pick destination, leave y-margin for value labels, plan legend
placement, decide tick rotation, reserve suptitle headroom) so the
FIRST render doesn't need a v2 to fix spacing. Calls out the
matplotlibtight_layout()↔constrained_layoutconflict.visually_verify_renderstep in
visualization/visualization_workflow— the workflow's
counterpart to the strengthenedpre_publish_self_reviewstep in
figure_guidelines. Both protocols now teach the same mandatory
render → open the PNG → check overlap / clipping / legend
placement / palette cohesion → write v2 if anything fails → only
ship v_final loop.- Research-OS accent palette in
audit_color_palette— the
five RO_PALETTE accents + the neutral chrome colours are now in
the allowed-palette set, so dashboards built from the new
scaffold and figures generated throughapply_research_os_style
no longer trip the out-of-palette warning.
Changed
synthesis/scaffold._DASHBOARD_HTML— full CSS rewrite to
match the reference figure aesthetic. Cream background, two-font
stack (EB Garamond serif for titles + figure captions, Inter sans
for body), italic serifh1/h2/h3/ table headers, muted
accent palette as CSS variables (--accentnavy,--accent-gold
olive,--accent-greenforest,--accent-redoxblood,
--accent-mustard), hairline rule colour for separators,
near-white cards on cream, italicfigcaptionfor figure
interpretation, print-friendly fallback retained. Adds an
.eyebrowline and a.leadparagraph class in the hero so the
TL;DR has room to breathe.visualization/figure_guidelines(v2.0.0 → v2.4.4) — adds the
research_os_style_presetreference block, the
first_render_spacing_disciplinerules, the newset_up_canvas
step (callapply_research_os_styleBEFORE writing the chart
code), and rewritespre_publish_self_reviewinto the mandatory
open-the-PNG view loop with explicitsys_file_read filepath=...
instructions and a 14-item OBSERVATION checklist that the human
eye must verify against the rendered pixels.visualization/visualization_workflow(v2.0.0 → v2.4.4) —
insertsvisually_verify_renderafterbuild_each_figureso the
on-demand figure workflow inherits the same loop. Updates
build_each_figureto mentionapply_research_os_style+ the
spacing discipline.synthesis/synthesis_dashboard(v2.4.3 → v2.4.4) — adds a
"visual cohesion with the figures" principle pointing at
apply_research_os_style(); bumps version.
Test gate
tests/unit/test_viz_style.py— new file covering the style
preset surface (palette has 5+ entries, DESTINATION_FIGSIZES has
the expected destinations,apply_research_os_stylereturns the
context dict, helpers no-op safely without matplotlib bars).tests/unit/test_v244_dashboard_style.py— new file covering
the dashboard CSS rewrite (cream bg + accent palette present in
scaffold, section IDs preserved, print stylesheet retained, new
accent palette passesaudit_color_palettewithout warnings,
protocol YAMLs updated with the new spacing + view loop language).- preflight passes · pytest passes · ruff clean.
Not behaviour change for existing projects
- Pre-v2.4.4 dashboards on disk are untouched — the new CSS only
applies to scaffolds created after upgrading. Re-scaffold with
tool_synthesis_scaffold(kind='dashboard', overwrite=true)to
adopt the new style. - The style preset is opt-in. Plotting scripts that don't import
apply_research_os_stylecontinue to render with matplotlib
defaults. Thefigure_guidelinesprotocol recommends adopting
the preset for visual cohesion with the dashboard, but doesn't
reject figures that depart from it (journal templates win).
v2.4.3
PATCH release. Closes two architectural holes that the v2.4.2
ontology-mapping audit surfaced as the root cause of "AI auto-creates
deliverables the user didn't ask for":
- The synthesis pipeline was hardcoded to
synthesis_paper.
get_next_protocol()intools/actions/protocol.pyran a fixed
9-step PIPELINE ending atsynthesis_paperregardless of the
researcher's declaredresearch_goal.output_types. A project whose
wizard answer wasoutput_types: [dashboard]still saw "next is
synthesis_paper" from the loader. Fixed: pipeline is now the
universal analysis prefix + a synthesis tail filtered by declared
output_types. Emptyoutput_typesfalls back tosynthesis_paper
(legacy behaviour preserved). next_protocolchains auto-fired in every autonomy mode. Six
protocols silently chained:analysis_plan → reproducibility
(every analysis step triggered an audit),audit_and_validation → synthesis_paper(every audit triggered a paper draft),
reproducibility → audit_and_validation,
cox_ph_diagnostics → audit,missing_data_strategy → audit,
qualitative_quality_audit → audit. Fixed: each chain now carries
an explicitAUTONOMY GATEannotation telling the AI to suggest
(not auto-chain) inmanual/supervised/coachingmodes.
Single-step requests stop at the requested step.
Three parallel Explore-agent audits drove the fix. Reports captured
in chat transcripts (not checked in).
Added
output_types_gate(root, kind, *, autonomy=None)in
tools/actions/synthesis/check.py. Returns
{verdict: 'proceed'|'ask'|'skip', declared_outputs, message, kind}.proceedwhenoutput_typesis empty (no preference declared) OR
kindis in the declared set.askwhenoutput_typesis non-empty andkindis NOT in the
set; the returnedmessageis a one-line prompt the AI lifts
verbatim to the researcher.skipreserved for future use (researcher explicitly opted out).- Normalises aliases (
lay-summary→lay_summary,
Lay Summary→lay_summary) and ignores theexploratory
sentinel (which marks "no deliverable yet").
tool_synthesis_scaffold(kind, confirmed=false)— new
confirmedkwarg. When the output_types gate returnsaskand the
caller has not passedconfirmed=true(or the existing
overwrite=true), the scaffold returnsstatus='ask'instead of
writing. The AI is expected to surface the message to the researcher
and only re-call withconfirmed=trueafter they say yes. Prevents
the failure mode where the AI auto-creates a paper / dashboard /
poster the user never asked for.SYNTHESIS_OUTPUT_TYPE_MAPin
tools/actions/protocol.py— single source of truth mapping each
output_typeskeyword (paper,dashboard,poster,slides,
report,lay_summary,grant,abstract,essay,handout) to
its synthesis protocol + "done" predicate. New synthesis protocols
must register here to participate in pipeline filtering.
Changed
get_next_protocol(root)consults
inputs/researcher_config.yaml#research_goal.output_types. The
analysis prefix (session_boot → project_startup → domain →
methodology → literature → analysis_plan → reproducibility →
audit_and_validation) is unchanged. The synthesis tail is now
dynamic — foroutput_types: [dashboard, lay_summary], the
pipeline terminates atsynthesis/synthesis_lay_summary(in
declared order), NOTsynthesis/synthesis_paper. Empty list →
fallback tosynthesis_paper(no regression for unfilled projects).
Response envelope gainsdeclared_output_typesfield.tool_synthesis_checkenvelope gains anintent_gatefield. If
the kind being audited isn't in declaredoutput_types, the gate's
one-line message also appears inwarnings.synthesis_paper(v2.3.0 → 2.4.3) — adds an explicit
verify_intentfirst step that readsoutput_typesand stops the
AI ifpaperisn't declared. Closes the auto-create-paper failure.synthesis_dashboard(2.4.2 → 2.4.3) — sameverify_intent
first step fordashboard. Reinforces the existing trigger gate.synthesis_slides(2.3.0 → 2.4.3) —slidesprerequisite added.synthesis_lay_summary(2.4.2 → 2.4.3) —lay_summary
prerequisite added.printable(2.3.0 → 2.4.3) —poster/handoutprerequisite
added.- Autonomy-gate annotations added to the six high-risk
next_protocolchains (guidance/analysis_plan→
reproducibility/reproducibility;audit/audit_and_validation→
synthesis/synthesis_paper;reproducibility/reproducibility→
audit/audit_and_validation; the threemethodology/*audits →
audit/audit_and_validation;
visualization/interactive_dashboard_design→
synthesis/synthesis_dashboard). Each carries a comment block
telling the AI to surface "Next: ..." as a SUGGESTION in
manual/supervised/coachingmodes; onlyautopilot
auto-chains, andautopilotfurther gates synthesis chains on
output_typesmembership.
Test cleanup
- Renamed
tests/unit/test_v242_synthesis_dashboard_lints.py
→tests/unit/test_synthesis_check.py. Per-release test naming
(test_v<version>_<feature>.py) is now retired as a convention;
new tests for this surface land in the topic-named file. - +13 regression tests in the renamed file covering:
output_types_gate proceed / ask / empty / alias-normalisation /
exploratory-sentinel; synthesis_scaffold returnsask/ honours
confirmed=true/ proceeds for declared kinds; pipeline tail
respects dashboard-only / paper+lay_summary / empty-fallback;
synthesis_check envelope includes the intent_gate field.
Test gate
- preflight 29/29 · pytest 1643/1643 (+13 new in
test_synthesis_check.py) · ruff clean.
Not behaviour change for existing projects
- Projects with
output_types: [](or noresearcher_config.yaml)
see the legacy fallback: synthesis_paper terminal, noask
envelopes. This is the on-disk default for every project initialised
before v2.4.3 and for every freshresearch-os inituntil the
researcher fills inoutput_types. Recommendation: update the
wizard answer once to make the intent explicit. - The AUTONOMY GATE annotations are guidance to the AI, not
loader-enforced refusals (which would be MINOR-shaped). An AI client
that ignores them will still see the samenext_protocolvalues it
did in v2.4.2; correct AI behaviour is now spelled out in the
protocol text + the new gate helper.
v2.4.2
PATCH release. Six fixes driven by an audit of the
/scratch/vsetlur/ontology-mapping v2.1 synthesis run, which surfaced
a recurring failure mode: the AI authoring a slap-together dashboard
(one section per workspace step, figure + caption underneath each),
inventing non-canonical filenames (paper-lay.md, REPRODUCIBILITY.md,
METHODS.md, CITATIONS.md) in synthesis/, and leaving behind
random .md / .mermaid / .json clutter at workspace/ root. All
fixes are protocol guidance, scaffold rewrites, lint additions, and
one tool-mode extension; no breaking changes to existing APIs.
Changed
synthesis/synthesis_dashboard— rewrite (v2.3.0 → v2.4.2).
The protocol now explicitly forbids the per-step recap antipattern
and requires a custom, story-driven structure: Hero / Headline →
Key findings (organised by claim, not by step) → Comparison
(adopted vs ruled out) → Methods → Limitations → References.
Introduces an explicit choice between Plan-mode (collaborative
outline) and Autopilot (AI picks the headline finding and structure)
before scaffolding. Quality bar now listsforbidden_structure
(per-step recap, directory dump, caption-only sections) and
required_structure(hero + ≥3 claim-driven findings sections).synthesis/synthesis_paper— clarification (v2.3.0 → v2.4.2).
States explicitly thatsynthesis/paper.pdfis mandatory before the
paper deliverable is "done" (a strandedpaper.mdwith no rendered
PDF is a blocker). Lists the four most common AI-improvised
filenames that downstream tools do NOT recognise (paper-lay.md,
REPRODUCIBILITY.md,METHODS.md,CITATIONS.md) and points each
at its canonical destination.synthesis/synthesis_lay_summary— clarification (v2.0.0 → v2.4.2).
Canonical filename issynthesis/lay_summary.md(notpaper-lay.md,
lay.md,summary.md,paper_lay.md); downstream tools recognise
only the canonical name.writing/writing_conclusions— figure/table citations (v2.0.0 →
v2.4.2). Per-stepconclusions.mdtemplate gains a mandatory
Figures + tables produced section that lifts directly into
paper / dashboard / slides synthesis. Every Findings bullet must
cite at least one figure / table / output file produced by the
step; an unciteable finding is rejected. The Statistical summary
table gains aSourcecolumn. Closes the gap where downstream
synthesis stages had to guess which figures backed which findings.tool_synthesis_curate_figures— multi-figure curation. New
modeparameter:'focal'(default, unchanged behaviour — one
focal figure per step, namedfigNN_<slug>.pngfor paper.typ) and
'all'(every figure in every step'soutputs/figures/, named
with the step number prefix, plus every figure's caption sidecar
copied or seeded). The'all'mode fixes the failure where the
AI bypasses curation and writes step figures directly to
synthesis/figures/, leaving them without.caption.mdsidecars.
Backwards-compatible: omittingmodekeeps the v2.4.1 behaviour.
Added
synthesis_check— story-structure lints for dashboards.
Three new checks onsynthesis/dashboard.html:- BLOCKER on ≥4
Step NNsection headings (per-step recap
antipattern). Tolerates up to 3 (a comparison block
referencing specific steps is fine). - WARN on 2-3
Step NNheadings (graduated nudge to
claim-driven headings). - WARN on missing hero / TL;DR / headline-finding section in
the first viewport (any of "Headline", "TL;DR", "Hero",
"Key finding(s)", "Summary", "Top-line", "Bottom line",
"At a glance" as heading text or section id satisfies it).
- BLOCKER on ≥4
synthesis_hygiene— synthesis-directory filename lint.
Everytool_synthesis_checkcall now also walkssynthesis/for
non-canonical files and surfaces per-file rename / delete hints.
Recognises the four most common AI-improvised names from the
ontology-mapping audit (paper-lay.md→lay_summary.md;
REPRODUCIBILITY.md,METHODS.md,CITATIONS.md→ delete and
fold into canonical artefacts). Unknown filenames get a softer
"move to archive/ or fold into canonical deliverable" warning.
Subdirectories (figures/,archive/,scripts/,
dashboard_data/,_typst_templates/) are ignored.workspace_hygiene— workspace-root clutter lint. Every
tool_synthesis_checkcall now also walksworkspace/for loose
files / subdirectories outside the canonical set (methods.md,
analysis.md,citations.md,researcher_certifications.yaml+
thelogs/,scratch/,archive/,.preregistration/, and
numberedNN_<slug>/directories). Loose planning docs, hand-rolled
audits,.mermaiddiagrams, and agent briefs at workspace root get
per-file relocate hints (move toscratch/,logs/, or
archive/).- Dashboard scaffold rewrite.
tool_synthesis_scaffold(kind='dashboard')
now writes a story-arc skeleton: hero section with metric-card
grid + interpretive caption slot, key-findings block organised by
claim, comparison block for adopted-vs-rejected, methods block
linking to paper.pdf, limitations + open questions, references +
cite. CSS is inline and CVD-aware. The previous scaffold's
per-section<!-- AI: ... -->markers explicitly warn against
per-step recap and step-numbered headings.
Test gate
tests/unit/test_v242_synthesis_dashboard_lints.py— 9 new
regression tests covering: dashboard step-by-step recap BLOCKER,
hero-section absence WARN, story-driven structure passes,
synthesis_hygieneflagspaper-lay.md/REPRODUCIBILITY.md/
METHODS.md/CITATIONS.mdwith the right rename hints,
workspace_hygieneflagsv2_1_*.md/tools.md/
workflow.mermaid/step_completeness_audit.{md,json}/
loose subdirectories,curate_figures(mode='all')curates every
figure with caption sidecars,curate_figures(mode='focal')
default unchanged, unknownmoderejected.- 1630 tests pass (was 1621 in v2.4.1; +9 new).
Not behaviour change
- The
synthesis_checkBLOCKER list grew by one (≥4Step NN
headings). Projects that want a per-step structure can either
cap to ≤3 such sections (a comparison block referencing 2-3 steps
is fine) or set the dashboard mode to a printable / handout
artefact (those protocols don't run the per-step lint). tool_synthesis_curate_figurescontinues to default to'focal'
mode; no behaviour change for callers that don't passmode.
v2.4.1
PATCH-then-some release. Lands five of the items the v2.4.0
CHANGELOG explicitly deferred (one of them — research-os refresh —
is technically a new CLI subcommand and so a borderline MINOR
addition; the rest are pure cleanups). Shipped as 2.4.1 because the
combined surface change is small and additive: no existing project
or caller breaks; readers + writers stay tolerant; old field names
migrate silently.
Added
research-os refresh— new CLI subcommand. Detects drift
between a project's copies of bundled templates (AGENTS.md,
CLAUDE.md,.claude/rules/research-os.md, IDE rule files) and
the version shipped with the installedresearch-ospackage.
Read-only by default;--checkexits non-zero on drift (CI-friendly);
--write [--yes]overwrites drifted project copies;--jsonemits
machine-readable output;--regen-readmealso rebuilds the project-
root README.md from current state. Smoke-tested against the
/scratch/vsetlur/ontology-mappingproject that drove the v2.4.0
audit: correctly flagged the +13-line AGENTS.md drift from the
rule #10 rewrite and the +1-line.claude/rules/research-os.md
drift; flagged CLAUDE.md as identical; ignored un-wired IDE rules.
Closes "no refresh CLI" deferred item.project_ops.regenerate_root_readme(root)— public helper that
rewrites the project-root README.md with a "Project status" section
listing actual on-disk numbered step folders (with a one-line
summary cribbed from each step's README) plus any synthesis
deliverables present (paper.{typ,pdf},slides.{typ,pdf},
poster.{typ,pdf},dashboard.html). Idempotent. Internal
_write_project_root_readmegained aforce=Falsekwarg so the
wizard's skip-if-exists default is preserved.- Checkpoint retention tags.
create_checkpoint(description, root, *, tag=None, keep=5)now accepts an optionaltag(e.g.
"release-candidate","before-major-refactor"). Tagged checkpoints
survive the per-create GC pass; untagged ones beyondkeepare
pruned..meta.jsonschema gains an optionaltagfield;
list_checkpointssurfaces it. - Per-create checkpoint GC.
create_checkpointnow calls
_prune_old_checkpointsimmediately after writing the snapshot,
surfacing the{kept, removed, tagged}report undergcin the
return envelope. Previously the pruner only ran at numbered-step
creation, so explicittool_checkpointchains accumulated unboundedly
(audit found one project at 61 MB across 2 checkpoints on a <5 MB
source tree).
Changed
step_summary.yamlsoft-deprecated.tool_path_finalizestill
writes the file (downstream readers — synthesis, audits, doctor —
consume it) but the emit now carries a deprecation banner naming
the file as DERIVED fromconclusions.md, AUTO-GENERATED, "do
NOT edit by hand", and "slated for removal once readers migrate
to parsing conclusions.md directly". The payload gains a
_derived_from: "conclusions.md"field so machine readers can
detect the soft-deprecation programmatically.
templates/step_summary.yaml.templategets a matching DEPRECATION
NOTICE at the top telling new protocol authors NOT to scaffold the
file and pointing them atconclusions.mdprose answers instead.
The 4 protocols that currently scaffold this file (analysis_plan,
qualitative_research, close_reading, proof_verification_workflow)
stay unchanged for back-compat; their migration is queued.- Dead state-ledger fields dropped.
checkpoint_historyand
rollback_historywere written every checkpoint / rollback but
never read by any code path (the.meta.jsonsidecars in
.os_state/checkpoints/are the authoritative log).rollback()
no longer appends;_migratestrips both from older state files
on load. Reduces in-state JSON bloat across long sessions.
Not in this release (planned for v2.5.0 / v3.0)
The v2.4.0 deferral list shrank by 5; the remaining items either
require breaking schema changes or coordinated multi-file migrations:
- Full
step_summary.yamlretirement (delete the writer + migrate
the 4 protocols that scaffold the editable template to require
prose in conclusions.md). Breaking for any external reader of the
file → v3.0. .preregistration/+.grounding/directory removal (migrate
content into per-steppreregistration.md+.os_state/grounding.jsonl).
Touches 20+ readers; needs a back-compat-tolerant migration
pattern → v2.5.0.- Auto-invoked finalize hook at end of synthesis flow (the helper
exists now viaregenerate_root_readme; wiring it to fire
automatically requires changes to the synthesis check / compile
tools) → v2.5.0. - Per-step
logs/removal + cross-step utility canonical home
(workspace/scratch/IS used in practice; needs a positive
convention before removing the catch-all) → v2.5.0.
Verified
- Preflight: 29/29 passed.
- Pytest: all green (12 new tests across refresh CLI + checkpoint
GC + tag retention). - Ruff: clean.
Bumped
pyproject.toml,src/research_os/__init__.py,CITATION.cffto
2.4.1.
v2.4.0
MINOR release. Driven by a 10-perspective adversarial audit of a real
project run (AUDIT_ontology_mapping.md, 233 findings across 10
personas — PI, junior researcher, senior domain reviewer, fresh-AI
handoff, Research-OS architect, code-quality, organization, outputs
quality, docs, reproducibility/citations). The synthesis identified
v2.0–v2.3 as having succeeded at producing consistent structure
(every project gets the same folder layout) but failing at consistent
substance (auto-generated figure captions leaked into papers as
placeholder rows; hallucinated bibliographies survived to submission;
empty literature/ stubs read as "no citations needed" when really the
AI just hadn't downloaded any). v2.4.0 closes the highest-impact gaps
without breaking existing projects.
Added
audit_pdf_grounding(entries, root)in
tools/actions/synthesis/citations.py— reports which citation
entries have a downloaded PDF on disk vs which don't. Searches
inputs/literature/<key>.pdf,inputs/literature/<doi-slug>.pdf,
andworkspace/*/literature/<key>.pdf. Returns
{grounded: [...], ungrounded: [{key, doi, url, title}, ...], count, grounded_count}. Closes the audit's strongest unified
finding (8/10 auditors): a project shipped 21 references in
synthesis/references.bibwhilefind . -name '*.pdf'returned
zero results.require_pdfsflag onwrite_references_bib— when true, drops
ungrounded entries from the bib and lists them at the file tail as
commented-outUNGROUNDED ENTRIES. Default keeps every entry but
adds a header comment noting how many lack on-disk grounding so the
gap is visible at the bib level even without opting in.figures:block inresearcher_config.yaml— three knobs
(svg_allowed,summary_sidecar,interactive_html_allowed) that
control the per-figure sidecar regime. All three default to a lean
shape (no SVG, no auto-summary, interactive HTML allowed). Added to
bothtemplates/researcher_config.yamland the in-code
CONFIG_TEMPLATE, kept in sync by
test_config_template_matches_file.figuresregistered in
docs/CONTRACT.mdA.3 stable-section list.validation_warningsonactive_plan.json—_persist_active_plan
now scans the decomposition for entries whosetoolfield is in
_REMOVED_TOOLS(tool_synthesize,tool_dashboard,
tool_slides_create, etc.) and writes a per-step warning. Surfaces
stale router-index entries at plan-write time so the AI sees them
before dispatching, not after burning a turn on the friendly
redirect.
Changed
- Figure audit no longer warns "PNG without SVG companion" by
default.audit_figure_qualityreadsresearcher_config.figures.*
via the new_load_figures_confighelper; the SVG warning fires
only whensvg_allowed=true; the summary-sidecar warning fires
only whensummary_sidecar=true. Drops a long-running source of
false-positive noise. tool_path_finalizestops auto-emitting.summary.mdsidecars.
Plain-English interpretation now integrates intoconclusions.md
next to the inlineembed. The
auto-generated sidecars trained the AI to leave stub captions
("Auto-drafted caption: regenerate from analysis context") that
leaked verbatim into one project'ssynthesis/paper.mdas 92
placeholder rows visibly telling reviewers the AI gave up. Opt back
in viafigures.summary_sidecar=true.AGENTS.mdhard rule #10 rewritten. Replaces the "every figure
carries four sidecars including an SVG companion" mandate with a
lean default (<slug>.png+ an authored<slug>.caption.md), opt-in
SVG / summary sidecars, encouragement of interactive.html
companions for visualisation types that benefit (networks,
multi-panel dashboards), and an explicit requirement that the AI
sys_file_readevery figure before declaring a step done — catches
legend-over-plot, missing axis labels, palette regressions,
snake-case-leaking-into-label bugs that no JSON audit catches._seed_step_subfolder_readmesstops pre-creating stub READMEs
inliterature/,environment/, andcontext/per step. These
dirs stay inEXPERIMENT_SUBDIRS(paths exist) but are empty until
a tool writes into them. Audit found pre-seeded stubs trained the
AI to leave dirs as boilerplate; causedliterature/to read as
"no citations" when really the AI just hadn't downloaded any; and
cluttered every step folder with content nobody wrote. The README
that answers "what goes here?" now lives once in
RESEARCHER_GUIDE.mdrather than duplicated 14× on disk.outputs/README.mdtemplate updated to reflect the new figure
contract: reports go DEEPER thanconclusions.md(choices,
reasoning, comparison of options); figures are.png-only by
default with optional interactive.htmlcompanions; AI MUST read
each figure before finalize.- Doc hardening: dropped hardcoded tool / protocol counts across
README.md,docs/{TOOLS,PROTOCOLS,RESEARCHER_GUIDE,START,AI_GUIDE}.md.
Replaces "144 tools" / "117 protocols" with vague phrases
("~150 tools", "100+ protocols", "every tool", "All core protocols").
CLAUDE.md doctrine already forbids hand-written counts; the
maintainer was violating it in 9+ places. Counts go stale within a
release. CONTRACT.md keeps its v2.0.0-anchored snapshot table. - Doc drift fix:
README.md:117code/→scripts/. The README
showed01_baseline_eda/code/in its file-layout diagram while the
framework, RESEARCHER_GUIDE, and every real project usescripts/.
A junior researcher walking through README and then opening a real
project would have hit the inconsistency immediately.
Migration
- Existing projects are unaffected by the figure default change
—audit_figure_qualitystill reads existing.summary.mdand
.svgfiles when they're present; the change is that it no longer
warns on their absence. To restore the v2.3 warning behaviour,
add toinputs/researcher_config.yaml:figures: svg_allowed: true summary_sidecar: true
- Existing per-step
literature/README.md/environment/README.md/
context/README.mdstubs are not touched — the change only
affects newly-created steps. Delete the stubs by hand if you want
empty dirs in legacy steps. write_references_bibsignature gained two optional kwargs
(root,require_pdfs). All existing positional calls keep
working; opt-in to PDF filtering by passing both.- AGENTS.md template change does NOT propagate to existing
projects (the wizard only copies once). Re-runresearch-os init
in a temp dir and diff the AGENTS.md against your project's copy
to pick up the new hard rule #10 wording. Aresearch-os refresh
CLI subcommand to do this automatically is planned for 2.4.x.
Not in this release (planned for 2.4.x / 2.5.0)
The full audit surfaced ~50 P0 framework changes; this release ships
the highest-impact subset that doesn't break existing projects. The
following remain for follow-up:
- Per-step
step_summary.yamlretirement: the YAML stub anti-pattern
flagged by 9/10 audits. The derived emit intool_path_finalize
stays in 2.4.0; the editable scaffold viastep_summary.yaml.template- the
update_step_summarystep inanalysis_plan.yaml/
literature_per_step.yamlawait migration to prompt-laden README
prose.
- the
.os_statesimplification: collapsestate_ledger.json+
manifest.jsonoverlap, drop dead fields, bound checkpoint storage
(single snapshot can be 39 MB of duplicate workspace; no GC).research-os refreshCLI subcommand: auto-upgrade
AGENTS.md/CLAUDE.md/ IDE-config templates in an existing
project to match the bundled current version.- Sparse-root finalize hook: regenerate top-level
README.mdat
project finalize (currently write-once at init). - Per-step
logs/removal + cross-step utility canonical home
(workspace/scratch/IS used in practice but the framework doesn't
document a canonical place for it). - Hard removal of
.preregistration/+.grounding/hidden dirs in
workspace (content moves into per-step README / methodology.md +
.os_state/grounding.jsonl).
Verified
- Preflight: 29/29 passed.
- Pytest: all green.
- Ruff: clean.
Bumped
pyproject.toml,src/research_os/__init__.py,CITATION.cffto
2.4.0.
v2.3.0
MINOR release. Retires the synthesis auto-generators in favour of
AI-direct authoring: the AI writes synthesis/paper.typ /
slides.typ / poster.typ / essay.typ / dashboard.html directly,
following the matching synthesis protocol. Tools validate and
compile; they no longer generate the prose / layout. The previous
auto-generators produced rigid, low-quality output — a 3MB
monolithic dashboard, a markdown-only paper intermediate, slide
decks no audience could read. Removing them moved 9700+ lines of
generator code out of the codebase and let the synthesis protocols
become true scaffolds (per docs/PROTOCOL_DOCTRINE.md).
Breaking changes
The following tools were removed. Each returns a _REMOVED_TOOLS
redirect message naming the new protocol + surviving tools:
tool_synthesize→ followsynthesis/synthesis_paper; write
synthesis/paper.typdirectly; compile viatool_typst_compile.tool_dashboard(+ 7 operations:create,story_generate,
story_edit,story_quality_bar,reviewer_sim,test_generate,
test_run) → followsynthesis/synthesis_dashboard; write
synthesis/dashboard.htmldirectly.tool_slides_create→ followsynthesis/synthesis_slides; write
synthesis/slides.typ(Touying); compile viatool_typst_compile.tool_poster_create→ followsynthesis/synthesis_poster
(redirect tosynthesis/printable); writesynthesis/poster.typ.tool_humanities_essay_scaffold→ use
tool_synthesis_scaffold(kind='essay')+ author content.tool_paper_compile_typst→ usetool_typst_compile(generic .typ
→ .pdf; the AI authors the .typ directly, no markdown
intermediate).tool_section_substantiveness→ folded into
tool_synthesis_check(mode='substantiveness')(now also handles
Typst headings).tool_figuredispatcher and operationscaption_synthesise,
interactive_autogen,paper_autoembed→ the AI authors plain-
English figure summaries, interactive companions, and Typst
#figure(...)blocks directly when writing the plotting script or
paper.typ.tool_figure_paletteis now a top-level tool.tool_revieweroperationsimulate→ the AI walks the paper
through the persona YAMLs inassets/reviewer_personas/directly
(tool_reviewerkeepsresponse,rebuttal,compilefor real
external reviews).
The autopilot floor gate enforcement also shifted: tool_typst_compile
replaces tool_synthesize / tool_dashboard(operation='create') as
the final-deliverable gate.
Added
tool_typst_compile— generic Typst compiler. Takes any
AI-authored.typsource (paper, slides, poster, essay,
cover_letter, response_to_reviewers) and renders the PDF.
Resolves bundled venue templates from_typst_templates/;
auto-generatessynthesis/biblio.ymlfromworkspace/citations.md
when missing. Returnspdf_path,page_count,citation_count,
typst_warnings,typst_errors.tool_synthesis_check— quality audit for AI-authored
synthesis files. Auto-detects file type from the path. Modes:
all(default),substantiveness,structure,accessibility,
cliches. Per-IMRAD-section content depth audits for paper /
essay; slide-count + speaker-notes + path-leak audits for slides;
section + headline + QR audits for poster; engineering invariants
(offline, alt-text, semantic<section id>, no placeholders, no
filesystem-path leaks) for dashboard.tool_synthesis_scaffold— writes a<=80-line skeleton
synthesis/<paper|slides|poster|essay>.typordashboard.html
with section headers +// AI: author this sectionmarkers.
Idempotent (refuses overwrite withoutoverwrite=true).tool_figure_palette— promoted from an operation under
tool_figureto a top-level tool. Returns CVD-safe palettes
(Okabe-Ito qualitative, viridis sequential, PuOr diverging,
accent).
Improved
- Synthesis protocols rewritten as scaffolds.
synthesis_paper,
synthesis_dashboard,synthesis_slides,printable(poster +
handout),humanities_essay_structure,synthesis_grant,
synthesis_abstract,synthesis_report,synthesis_lay_summary,
synthesis_progress_update,synthesis_from_inputs— each
collapsed from 100-370 lines of prescriptive recipe to <=130 lines
of scaffold (design principles + quality standards + workflow +
available tools). Spec files (synthesis_spec.yaml,
slides_spec.yaml,dashboard_spec.yaml) are no longer required. - Cleaner
synthesis/folder. After a full project run:
paper.typ,paper.pdf,slides.typ,slides.pdf,poster.typ,
poster.pdf,dashboard.html,biblio.yml,figures/. No
intermediate.mdfiles, no spec YAMLs, no handout duplicates. researcher_config.yamlschema simplified. Thesynthesis:
block is empty by default. Removed knobs:
figures_auto_embed*,figure_xref_rewrite,slide_engine,
slide_template,slide_theme,slide_speaker_notes_enabled,
slide_print_handout,poster_engine,poster_template,
poster_theme,poster_qr_url,poster_handout_pdf,
drafter_loop_*(5 knobs)._router_index.yamlv21. Synthesis decompositions point at
the newtool_synthesize_plan→tool_synthesis_scaffold→
tool_synthesis_check→tool_typst_compilechain.
Removed
- 9 implementation files under
src/research_os/tools/actions/synthesis/:
dashboard.py(1604 lines),dashboard_app.py(1424),slides.py
(946),drafter_loop.py(850),reviewer.py(partial —reviewer_simulate),
figure_auto_embed.py(747),poster_typst.py(697),
dashboard_humanities.py(465),dashboard_qualitative.py(455),
humanities_essay.py(212),synthesize.py(1374),
dashboard_story.py(300). Total: ~9700 lines. src/research_os/tools/actions/viz/dashboard_tests.py(the
Playwright scaffold for auto-generated dashboards).src/research_os/assets/reveal/(260 KB),slide_templates/
(24 KB),poster_templates/(20 KB) — vendored assets only
the removed generators consumed.- 12 obsolete test files (
test_v191_dashboard_app,
test_v190_dashboard_content,test_dashboard_humanities,
test_dashboard_qualitative,test_v191_story_mode,
test_slides_engine,test_poster_typst,test_drafter_loop,
test_figure_auto_embed,test_humanities_essay_structure,
test_synthesize_auto_proceed,
test_synthesize_blocks_on_unresolved_findings,
test_synthesize_uses_pack_sections,test_paper_drafter_loop,
test_researcher_config_synthesis,
test_audit_audit_figure_coverage,
test_citation_retrieval_empty_response,
test_audit_findings_explain).
Migration
Existing project files (synthesis/paper.md, synthesis/dashboard.html
from prior versions) are preserved as-is on disk. The new tools do
not regenerate them. To produce the new artefact next to the old:
ask the AI to follow the matching synthesis protocol (e.g. "redo the
paper as Typst") — it will author synthesis/paper.typ and you can
delete the old paper.md once you're happy with the new PDF.
Tool count: 148 → 144 (8 removed + 4 added). Protocol count
unchanged at 117 core.
Bumped
pyproject.toml,src/research_os/__init__.py,CITATION.cff
to2.3.0.- 11 rewritten synthesis-related protocols to
version: '2.3.0'. _router_index.yamltoversion: 21.
v2.2.0
MINOR release. Shipped after a 35-agent audit (10 researcher-domain
perspectives, 5 technical, 5 UX, 5 AI-model personas, 5 online-research,
5 meta-improvement) surfaced 119+ findings across 12 themes. The
synthesis selected v2.2.0 over v2.1.2 because 6 p0 + 12 p1 work-items
genuinely add tools and knobs rather than just polish.
Added
sys_where— ~30-token mid-session orientation snapshot
(project_root, tier, active_plan position, unresolved BLOCK count,
last protocol). Use instead ofsys_bootwhen you only need to
remember "where am I?".sys_export_ro_crate— emitsro-crate-metadata.json+
codemeta.jsonat project root. Closes the FAIR-alignment claim
that was unbacked in v2.0–v2.1. Discoverable by Zenodo, OSF,
downstream RO-Crate consumers.sys_export_share_archivenow bundlesro-crate-metadata.jsoncodemeta.json+CITATION.cffat archive root automatically.
- Autopilot floor gates (
research_os.server.autopilot_gate) —
8 floor gates enforce mandatory audits before tier advance, even
in autopilot mode. Closes the bypass path whereautopilot=true
silently skipped block-severity findings. research-os mcp/research-os api-key/research-os completion
CLI subcommands (4 → 7).mcpadds/removes external MCP server
configs (memory, filesystem, github).api-keysecurely stores
per-provider keys (chmod 600).completionemits shell completion
for bash / zsh / fish (usesargcompletewhen installed, falls
back to a hand-rolled script otherwise).argcomplete>=3.0as the newcompletionoptional extra
(pip install 'research-os[completion]') + included indev.model_profile+ai.context_classconfig knobs —
researcher_config.yaml'saisection now carries
model_profile: small|medium|large(controls protocol-detail
level) andcontext_class: short|long(controls history-window
size).sys_bootrespects both.docs/SECURITY.md— new page documenting path-containment,
autopilot floor gates, override rationale enforcement, the
.os_state/overrides.logaudit trail, and the boundary between
trusted and untrusted MCP-tool inputs.research-os doctorexpanded to 25+ checks (was 18+).
New checks include:tool_short_field_present,citation_cff_valid,
external_pack_entrypoints,embeddings_fresh, and
docs_referenced_scripts.- 22 work-item implementation report ships in
docs/SECURITY.md- this CHANGELOG entry as evidence of the multi-perspective audit
that drove this release.
- this CHANGELOG entry as evidence of the multi-perspective audit
Changed
- Envelope normalization at the dispatcher. Pack and adapter tools
that previously returned the legacy{"status", "data"}shape are
now upgraded to the v2.1.0 envelope by
research_os.server.envelopes._normalize_envelope, invoked once in
dispatch._handle_tool_call. Closes the v2.1.0 envelope gap for
13+ pack + adapter tools in one place rather than per-tool. New
pack code should call_success/_errordirectly per
docs/PLUGIN_AUTHORING.md. RoError(what, why, next_action)signature loosened from
keyword-only to positional. Matches the contract documented in
docs/CONTRACT.mdA.6.2 verbatim.did_you_meanis namespace-aware for thesys_/tool_/mem_
prefixes. Typingsys_Xnow prefers othersys_*matches before
cross-namespace.- Envelope adds
next_recommended_call_structured— a
{"tool": str, "arguments": dict}form derived from
next_recommended_callwhen parseable. Strict tool-loop clients
dispatch this directly without re-parsing free-form text. override_rationaleenforcement wired across 9 handler sites
(synthesis_writing, synthesis_visual, audit_core, audit_gates,
methodology, meta_workspace.sys_path_create,
meta_workspace.sys_checkpoint_rollback, tool_step_complete,
tool_path_finalize). Thin rationales ('TODO','preview',
single-word, <20 chars) are rejected before the underlying audit
runs. Empty-rationale paired with override flag now returns an
explicit error instead of silently no-opping.sys_file_*path containment.sys_file_read,sys_file_write,
sys_file_list, andsys_file_deletenow refuse paths that
resolve outside the workspace root. Closes the host-FS escape
(../../etc/passwd) that was reachable from any MCP client.- CLAUDE.md, FAQ.md, START.md updated to current counts (preflight
25+, doctor 20+, subcommands 7). Future drift is policed by the
newpreflight_docs_consistencytest.
Fixed
- Test
test_audit_version_coherence_rejects_unknown_step_id
updated topytest.raises((RoError, FileNotFoundError))—
iteration._step_dirnow raisesRoErrorper the contract. docs/CONTRACT.mdA.6.1 corrected: thedataalias removal is
slated for v3.0.0 (not v2.2.0 as the row erroneously claimed).
The alias is preserved in_success/_errorthrough every v2.x
release for back-compat with v2.0 callers.docs/CONTRACT.mdA.3 no longer liststool_stackas a stable
top-levelresearcher_config.yamlsection — the key was never
shipped intemplates/researcher_config.yaml.- Internal work-item IDs (W##, FIX-#) stripped from tool
descriptions (audit.py,meta.py,synthesis.py) and
user-facing docs (SECURITY.md,FAQ.md,AI_GUIDE.md,
AGENTS.md). Inline# W##:source comments cleaned up
(substance kept). Future leaks are caught by
test_tool_description_no_version_chatter. docs/TOOLS.mdlistssys_where+sys_export_ro_crate—
both were callable but undocumented after Wave-D.- Tool count references updated 146 → 148 across
docs/{TOOLS,AI_GUIDE,FAQ,RESEARCHER_GUIDE,CONTRACT,START}.md.
Doctor check count 14/18+ → 20+. START.md subcommand count
4 → 7 with the full list.
Removed
dashboard_v2.py/dashboard_v2_humanities.py/
dashboard_v2_qualitative.py/humanities_essay_scaffold.py
deprecation shims (one-minor-cycle removal promised in v2.1.1).
Canonical paths:dashboard_app,humanities_essay.
Verified
- Preflight: 29/29 passed.
- Pytest: 1894 passed, 13 skipped, 0 failed.
- Ruff: clean.
- 5 independent validators reviewed the diff by reading + reasoning
(not pytest): logic, consistency, contract, UX, tests. Their
2 blockers + 14 concerns were triaged and fixed before release.
Migration
- No required code changes. Every addition is additive; the
data
envelope alias is kept. Tool argument names unchanged. - If your code imported from
research_os.tools.actions.synthesis.dashboard_v2*or
research_os.tools.actions.synthesis.humanities_essay_scaffold,
switch todashboard_app/humanities_essay(the canonical
modules). The shims were removed per the v2.1.1 deprecation
promise. - If you parsed
envelope["data"], that still works through every
2.x release. Switch toenvelope["payload"]before v3.0.0.
v2.1.1
PATCH release. Pure cleanup — no behavior changes, no new tools, no
new protocols, no API or tool-signature changes.
Changed
- Source files renamed to canonical names (no
_v2,_scaffold,
etc.):humanities_essay_scaffold.py→humanities_essay.py
(back-compat shim kept at the old path through v2.2.0). The
dashboard_v2*.pyshims created in v2.1.0 stay in place for one
more minor cycle per the migration table (removed v2.2.0). 11
unit-test filenames dropped a redundant_v2suffix
(test_audit_audit_*_v2.py→test_audit_audit_*.py,
test_router_output_v2.py→test_router_output.py). docs/folder reduced to one file per concept, no version
suffixes. Version-tagged historical reports + working-session
scratchpads removed (preserved in git history; recover via
git show v2.1.0:docs/<file>). Final shape: 22 markdown files +
2 mermaid diagrams (PROTOCOL_GRAPH.mermaid,workflow_dag.mermaid).docs/README.mdrewritten as a single audience-routing page
(researchers / AI agents + plugin authors / maintainers +
integrators).- Root
README.mdrelease badge bumped to v2.1.1; deep links to the
deleted V2_RELEASE_NOTES + MIGRATION_v1_to_v2 docs replaced with
pointers toCHANGELOG.md(with[2.0.0]section hint where the
context warrants it). - Code + protocol comments swept for historical-version references:
~115 strips across 23 files (server, audit/state, synthesis/viz,
cli + plugins, router_index protocols). 1 pure-historical block
deleted. Git log + CHANGELOG carry version history; live doctrine
stays focused on current behavior. Stable surfaces (e.g.
_REMOVED_TOOLSmigration data, the canonical replacement entry
points) were KEPT — those name the version because the version is
load-bearing user-facing data, not commentary.
Added
.gitignoreentries blocking future creation of version-tagged
docs + handoff scratchpads indocs/. Patterns added:
/docs/v*_handoff/,/docs/*_handoff/,/docs/AUDIT_v*.md,
/docs/USABILITY_v*.md,/docs/CHANGELOG_DETAILED_v*.md,
/docs/MIGRATION_v*.md,/docs/V[0-9]*.md,/docs/V[0-9]*/,
/docs/audit_v*/,/docs/usability_v*/,/docs/PHASE_*.md,
/docs/archive/. Prevents the clutter from recurring; future
sessions that try to write these paths get them silently ignored.
Verified
- MCP wiring smoke (in
/tmp/ro_v211_mcp/):research-os init
scaffolds correctly,.claude/mcp.jsonwrites the standard
research-os startconfig,research-os doctorreports
mcp_configs_wired: pass,research-os startboots cleanly,
andTOOL_DEFINITIONScount (146) matches the v2.1.0 surface
(unchanged).
Migration
- No code changes required. Imports from old
_v2paths still
resolve via the deprecation shim (removed v2.2.0). - Imports of
from research_os.tools.actions.synthesis.humanities_essay_scaffold import scaffold_humanities_essay
keep working via the new 2-line shim at the old path; update at
your convenience to
from research_os.tools.actions.synthesis.humanities_essay import scaffold_humanities_essay. - Anyone with local edits to deleted docs: recover via
git show v2.1.0:docs/<file>(or any tag where the file lived)
and re-save outside the repo as a personal note.