v0.41.0
v0.41.0 — Run it anywhere, and trust what it ships
A week ago,
ooo autolearned to finish the job on its own. This release makes
that autonomy something you can actually rely on: it runs on one more runtime,
it refuses to start building until the goal is unambiguous, and the verdict that
decides "is this actually done?" can no longer be gamed.
The headline
Autonomy is only worth as much as the trust behind it. v0.40.0 closed the loop —
goal in, product out. v0.41.0 spends its week hardening the two ends of that
loop and widening the floor it runs on.
- Run it anywhere. Pi joins Claude, Codex, Gemini, OpenCode, Goose, and
Copilot as a first-class runtime. Ouroboros stays the workflow engine; the
runtime is a swappable kernel. Installing it got more reliable, and every
default model pin now lives in one place. - Trust what it ships — at the input. The Socratic interview no longer thinks
alone. At every ambiguity milestone it convenes a panel — a researcher, a
contrarian, a simplifier — to surface hidden assumptions before the question
reaches you. Andooo autowill not start building until the Seed is genuinely
low-ambiguity and passes QA. - Trust what it ships — at the output. The verifier's verdict is now typed,
audited, and routed by an explicit admission policy. A test that really ran but
reported the wrong evidence form is no longer smeared as "fabrication," and a
faked clean run still doesn't pass.
🖥️ Run it anywhere — the Agent OS gets a new kernel
Pi is now a first-class Ouroboros runtime. Ouroboros owns the workflow engine,
Seed decomposition, checkpointing, evaluation handoff, and ooo skill dispatch;
for each runtime task it shells out to pi --mode json and normalizes Pi's JSONL
events into Ouroboros AgentMessage values. As the new runtime guide puts it:
"Pi is an Ouroboros runtime" means the runtime is selectable — not that Pi is
imported into Ouroboros. That is the whole Agent OS thesis in one sentence.
PiLLMAdapterfor--llm-backend pi;pi/pi_cliregistered as LLM- and
interview-driver-capable in the backend registry and provider factory (#1326)- Pi backend-aware default-model normalization — default
--llm-backend piuses
Pi's own backend default instead of forwarding an Anthropic model name (#1326) - Align the Pi runtime with documented JSON mode (#1321)
- Report malformed Pi runtime events as a typed
ProviderErrorinstead of
failing opaquely (#1325) - Wire the Pi runtime setup surface —
ouroboros setup --runtime piinstalls the
managed Pi bridge (5c674c1) - Opt-in native Pi CLI smoke test for end-to-end confidence (#1329)
Installing and updating got more trustworthy. The week's two same-day releases
surfaced real install-path risk; this closes it.
- Run
setupwith the freshly installedouroborosbinary, not a stale one
left onPATH(#1345) - Installer UX improvements; pipx/pip install paths now preserve existing
PATH
precedence (#1343)
One source of truth for model pins. The same default model strings were
hand-copied across three layers, so Opus had silently been frozen at 4.6 since
February. They now live in a single _model_defaults.py.
- Centralize every default Claude model pin into one source of truth
(_model_defaults.py) and pin exact snapshots rather than the"default"
sentinel, so evaluation/consensus grading stays reproducible. Net move: the
Opus reasoning tier → 4.8 (interview, seed, ontology, evaluation, consensus
advocate); the Sonnet judgment tier (qa_model) stays pinned at 4.6,
retiring the datedclaude-sonnet-4-20250514(#1324, #1323)
Roadmap, in the open. A point-in-time AgentOS issue-sequencing graph
(Track A / B / C) is now published so you can see which merged PRs resolved which
roadmap tracks. #961 remains the canonical roadmap SSOT (#1293).
Housekeeping. Prune unused optional packages (#1301); pin typer before the
vendored click to stabilize resolution (#1300).
🧠 Trust what it ships — at the input: the interview stops thinking alone
Ouroboros has always opened with a single questioner. Now that questioner has a
panel. Milestone lateral review is promoted from a non-blocking advisory to a
required lightweight subagent pass at exactly the moments hidden assumptions
start to bite.
- When an interview crosses an ambiguity milestone —
initial → progress,
progress → refined,refined → ready— the main session dispatches
ouroboros_lateral_thinkwithresearcher,contrarian, andsimplifier
personas (addingarchitectwhen the answer changes system shape or ownership)
before answering or asking the returned question (9d229c4) - This is the supported "deep research style" interview experience: multiple
perspectives visibly help, while the final prompt stays easy to answer. Results
are folded into 2–3 concrete options or one recommended draft — not dumped
as a report - Lateral review also fires whenever the main session would otherwise compress a
user's free-text into a decision, or when the question is about tradeoffs,
priorities, non-goals, risk, success criteria, or rollout run_lateral_reviewis now a declared interview capability, with per-runtime
capability/instruction artifacts wired in (9d229c4)
ooo auto won't build something underspecified. The interview no longer
closes on ledger completeness alone.
- Gate auto runs on backend-confirmed low ambiguity (≤ 0.20) plus a pre-run
Seed QA pass for both the MCP and CLI entrypoints; QA findings feed back into
bounded Seed-repair attempts before blocking, so failures are actionable and
resumable (#1302) - Normalize natural worktree-policy names (e.g.
create_isolated_worktree → always)
and fail fast whencomplete_product=trueis paired with a too-short timeout,
instead of burning the budget in the interview and blocking late (#1305)
🛡️ Trust what it ships — at the output: a verdict you can't game
The more autonomous the loop, the more its "done" has to mean done. This release
makes the verifier's decision typed, auditable, and policy-routed (RFC #814,
Verdict Envelope v1).
- Promote TraceGuard verdict admission into
VerifierVerdict: H1 verifier
output now carries a typed status, evidence refs, and aretry_admission, and
ACCEPT / RETRY / REDISPATCH / ESCALATE_MODEL / ESCALATE_HUMAN / BLOCK decisions
are persisted on atomic typed-evidence events (#1330)- Benchmark fixtures:
accepted → ACCEPT,missing evidence → EVIDENCE_MISSING / RETRY,semantic miss → SCOPE_CREEP / REDISPATCH,repeated fabrication → FABRICATION_SUSPECTED / ESCALATE_MODEL
- Benchmark fixtures:
- Prefer the verifier's retry-admission policy (H7): re-run the same leaf only
whenretry_admission=RETRY; honor intentional divergence between
failure_classandretry_admission(e.g.FABRICATION_SUSPECTED+
REDISPATCH) instead of inferring policy from the failure class alone (#1331) - Classify masked test evidence fairly (#1292): a transcript that clearly ran
the test command but masked its status behind an output filter (… | tail) is
nowEVIDENCE_FORM_MISMATCH— retryable, with actionable feedback (e.g. add
set -o pipefail) — rather thanFABRICATION_SUSPECTED. The #1208 guard holds:
unprotected output-filter pipelines still don't prove a cleancommands_run
claim. The verifier's evidence boundary is now codified in docs so core stays
language- and runner-agnostic
What's Changed
Runtimes & Agent OS
- feat(providers): add Pi LLM adapter (#1326)
- fix(pi): align runtime with documented JSON mode (#1321)
- fix(pi): report malformed runtime events (#1325)
- fix(setup): wire Pi runtime setup surface (5c674c1)
- test(orchestrator): add opt-in Pi CLI smoke test (#1329)
- fix(installer): prefer freshly installed ouroboros for setup (#1345)
- feat(installer): improve install script UX (#1343)
- refactor(config): centralize Claude model pins into a single source of truth (align to 4.8) (#1324)
- fix(config): replace retiring qa_model default with claude-sonnet-4-6 (#1323)
- chore(deps): prune unused optional packages (#1301)
- fix(deps): pin typer before vendored click (#1300)
- fix(opencode): cover Windows cleanup review blockers (#1320)
- fix(goose): keep LLM completion calls profile-free (#1303)
- fix(run): guard home dir in
_detect_project_root_from_seed_path(#1313)
Interview (the philosophy layer)
- feat(interview): dispatch lateral review at milestones (9d229c4)
- fix(auto): gate runs on low-ambiguity seed QA (#1302)
- Harden
ooo autopolicy aliases and timeout preflight (#1305)
Verifier & harness integrity
- feat(harness): promote TraceGuard verdict admission (#1330, refs #814)
- fix(h7): prefer verifier retry admission policy (#1331)
- fix(orchestrator): classify masked test evidence forms (#1292, refs #1234)
Docs
- docs(providers): document Pi provider surfaces (#1327)
- docs(runtime): fix shipped backend wording (#1332)
- docs(agentos): add issue sequencing graph snapshot (#1293)
- Verdict Envelope v1 RFC, verifier-evidence-policy, runtime-capability-matrix,
Pi runtime guide, and contributing/key-patterns updates
What's Changed
- fix(orchestrator): classify masked test evidence forms by @Q00 in #1292
- docs(agentos): add issue sequencing graph snapshot by @Q00 in #1293
- fix(deps): pin typer before vendored click by @Q00 in #1300
- chore(deps): prune unused optional packages by @Q00 in #1301
- fix(goose): keep LLM completion calls profile-free by @mdc2122 in #1303
- fix(run): guard home dir in _detect_project_root_from_seed_path by @kenlin8827 in #1313
- fix(opencode): cover Windows cleanup review blockers by @shaun0927 in #1320
- fix(pi): align runtime with documented JSON mode by @shaun0927 in #1321
- fix(auto): gate runs on low-ambiguity seed QA by @Q00 in #1302
- Harden ooo auto policy aliases and timeout preflight by @shaun0927 in #1305
- fix(config): replace retiring qa_model default with claude-sonnet-4-6 by @shaun0927 in #1323
- fix(pi): report malformed runtime events by @Q00 in #1325
- refactor(config): centralize Claude model pins into a single source of truth (align to 4.8) by @shaun0927 in #1324
- feat(providers): add Pi LLM adapter by @Q00 in #1326
- feat(harness): promote TraceGuard verdict admission by @Q00 in #1330
- fix(h7): prefer verifier retry admission policy by @Q00 in #1331
- docs(providers): document Pi provider surfaces by @Q00 in #1327
- test(orchestrator): add opt-in Pi CLI smoke test by @Q00 in #1329
- docs(runtime): fix shipped backend wording by @Q00 in #1332
- fix(setup): wire Pi runtime setup surface by @Q00 in #1333
- feat(interview): dispatch lateral review at milestones by @Q00 in #1334
- feat(installer): improve install script UX by @Q00 in #1343
- fix(installer): prefer freshly installed ouroboros for setup by @Q00 in #1345
New Contributors
- @kenlin8827 made their first contribution in #1313
Full Changelog: v0.40.1...v0.41.0