Release v0.3.0 — the world-class revamp · aura-farming/pqa

The world-class revamp: corrected economics, context discipline, a deep surface, the
learning moat wired end-to-end, a falsifiable benchmark, and engine-owned worktrees.

Added

Worktree Phase 1, engine-owned (pqa/worktrees.py): one isolated git worktree
per branch on ephemeral pqa/<run>-bN branches, with a write-ahead registry in
.pqa/state.json so strays survive even a mid-run SIGKILL; rollback on partial
spawn; reconcile() merges --no-ff, aborts on conflict preserving the survivor
branch, and always prunes. Branch.workdir + run(workdirs=...) thread isolation
through the engine; spawn_branches.sh / reconcile.sh became thin engine callers;
the orchestrator and reconciler honor branches_mode = "worktree" with a stray
sweep at init. Zero-orphan recovery is a tested invariant.
Locked eval benchmark: 8 tasks under evals/tasks/ (each task.toml + LOCKED
verify.py + reference.py must-pass + sabotage.py must-fail) and
scripts/eval_harness.py (deterministic score/report/smoke, zero model
calls); /eval and pqa-eval-runner wired to it; nightly eval-smoke workflow
re-proves verifier integrity. The README documents the methodology; live numbers
land only from a live run — losses included.
The learning moat, wired: conviction signals get their outcomes back-filled
post-collapse; pqa/instincts.py synthesizes instincts from precipitates+failures
(overlap clustering; confidence from support and contradictions); prior-art injects
top instincts; RunReport carries instincts_injected and per-instinct agreement;
the dashboard gains calibration + instincts sections; the self-reflector reads the
engine's calibration().
Run resume: crash-resumable run journal (pqa/state.py → .pqa/state.json,
atomic tmp+rename) — /pqa --resume re-enters at the first incomplete stage.
Journal writes preserve foreign top-level keys (the file is shared with the
worktree registry).
Generated configuration reference: docs/configuration.md rendered from
pqa/config.py by scripts/generate_config_doc.py, drift-pinned by tests.

Changed

Economics corrected; tokens primary: cost-model defects fixed, model aliases
(fable/opus/sonnet/haiku) wired to real pricing/dispatch via
pqa.cost.resolve_model, budgets token-primary with USD secondary, and a
pre-flight would_abort gate before every dispatch (not just after the spend).
Model routing per role: Fable 5 where output quality is decided (generators,
unknown-scout, adversary, collapse-judge, baseline control); sonnet/haiku for
mechanical and bookkeeping tiers. "Every agent on Opus" is gone from docs and
dispatch.
Context discipline in the orchestrator: branch payloads live on disk and are
read only by the subagent that needs them; the orchestrator holds ≤200 tokens of
state per branch (digests only) and reports per-stage context telemetry.
Surface: depth over breadth — 34 agents · 59 skills · 27 commands trimmed to
14 agents · 12 commands · 12 deep skills (each skill a protocol + worked
example + anti-patterns playbook); validate_components.py gained census, depth,
and description-budget gates and drift-gates docs/catalog.json.
Memory retrieves by relevance under a hard token budget (not recency), and
every injected memory id is reported per run.
Docs truth pass: README (counts, hook claims, stage wording, workflow count,
status), architecture.md (rewritten to the shipped reality), CONTRIBUTING and the
plugin manifests; hooks language unified with SECURITY.md (two blocking hooks,
the rest fail open; the binding guarantee lives in CI).

Fixed

Hook hardening: per-hook kill-switches (PQA_DISABLED_HOOKS), once-per-session
research gate, fail-closed fixes on the security/secrets gates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0 — the world-class revamp

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Changed

Fixed

Uh oh!