Skip to content

v0.3.0 — the world-class revamp

Latest

Choose a tag to compare

@aura-farming aura-farming released this 10 Jun 04:52
ba23e0d

The world-class revamp: corrected economics, context discipline, a deep surface, the
learning moat wired end-to-end, a falsifiable benchmark, and engine-owned worktrees.

Added

  • Worktree Phase 1, engine-owned (pqa/worktrees.py): one isolated git worktree
    per branch on ephemeral pqa/<run>-bN branches, with a write-ahead registry in
    .pqa/state.json so strays survive even a mid-run SIGKILL; rollback on partial
    spawn; reconcile() merges --no-ff, aborts on conflict preserving the survivor
    branch, and always prunes. Branch.workdir + run(workdirs=...) thread isolation
    through the engine; spawn_branches.sh / reconcile.sh became thin engine callers;
    the orchestrator and reconciler honor branches_mode = "worktree" with a stray
    sweep at init. Zero-orphan recovery is a tested invariant.
  • Locked eval benchmark: 8 tasks under evals/tasks/ (each task.toml + LOCKED
    verify.py + reference.py must-pass + sabotage.py must-fail) and
    scripts/eval_harness.py (deterministic score/report/smoke, zero model
    calls); /eval and pqa-eval-runner wired to it; nightly eval-smoke workflow
    re-proves verifier integrity. The README documents the methodology; live numbers
    land only from a live run — losses included.
  • The learning moat, wired: conviction signals get their outcomes back-filled
    post-collapse; pqa/instincts.py synthesizes instincts from precipitates+failures
    (overlap clustering; confidence from support and contradictions); prior-art injects
    top instincts; RunReport carries instincts_injected and per-instinct agreement;
    the dashboard gains calibration + instincts sections; the self-reflector reads the
    engine's calibration().
  • Run resume: crash-resumable run journal (pqa/state.py.pqa/state.json,
    atomic tmp+rename) — /pqa --resume re-enters at the first incomplete stage.
    Journal writes preserve foreign top-level keys (the file is shared with the
    worktree registry).
  • Generated configuration reference: docs/configuration.md rendered from
    pqa/config.py by scripts/generate_config_doc.py, drift-pinned by tests.

Changed

  • Economics corrected; tokens primary: cost-model defects fixed, model aliases
    (fable/opus/sonnet/haiku) wired to real pricing/dispatch via
    pqa.cost.resolve_model, budgets token-primary with USD secondary, and a
    pre-flight would_abort gate before every dispatch (not just after the spend).
  • Model routing per role: Fable 5 where output quality is decided (generators,
    unknown-scout, adversary, collapse-judge, baseline control); sonnet/haiku for
    mechanical and bookkeeping tiers. "Every agent on Opus" is gone from docs and
    dispatch.
  • Context discipline in the orchestrator: branch payloads live on disk and are
    read only by the subagent that needs them; the orchestrator holds ≤200 tokens of
    state per branch (digests only) and reports per-stage context telemetry.
  • Surface: depth over breadth — 34 agents · 59 skills · 27 commands trimmed to
    14 agents · 12 commands · 12 deep skills (each skill a protocol + worked
    example + anti-patterns playbook); validate_components.py gained census, depth,
    and description-budget gates and drift-gates docs/catalog.json.
  • Memory retrieves by relevance under a hard token budget (not recency), and
    every injected memory id is reported per run.
  • Docs truth pass: README (counts, hook claims, stage wording, workflow count,
    status), architecture.md (rewritten to the shipped reality), CONTRIBUTING and the
    plugin manifests; hooks language unified with SECURITY.md (two blocking hooks,
    the rest fail open; the binding guarantee lives in CI).

Fixed

  • Hook hardening: per-hook kill-switches (PQA_DISABLED_HOOKS), once-per-session
    research gate, fail-closed fixes on the security/secrets gates.