Skip to content

v3.0.0

@asystemoffields asystemoffields tagged this 10 Jun 12:57
Nine features close the investigation loop:
- plan-evidence: per-card evidence-gap diagnosis with Student-t power
  analysis; ranks the cheapest grade-moving interventions
- Criterion dossiers: cumulative evidence per (model, criterion) across
  runs; grade transitions, same-provenance contradiction tracking
- calibrate: plants synthetic ground truth (causal features, correlated
  decoys, noise), runs the real pipeline blind, reports grade calibration
  with Wilson CIs; current machinery: precision@k=1.0, decoy resistance=1.0
- quant-diff: which validated features did quantization break; preset
  workflow + docs for FP16-vs-quant feature audits
- Steering artifacts: export-steering (provenance-gated) + apply-steering
- migrate-report: re-score pre-2.3 reports under current semantics
- GGUF bridge: export-gguf-records (llama.cpp final-layer embeddings) +
  convert-hidden-dump (any-runtime multi-layer dump converter)
- Hypothesis invariant suite: association-only inputs can never produce
  causal-labeled outputs, as a generative property across all surfaces
- MCP server: 19 tools (was 10) covering the whole loop; Claude Code
  skill; AGENTS.md/README/COMMANDS updated

Fixed (found by the invariant suite): contradicted/contradicted_effect now
require intervention provenance; association-only opposite pairs grade
needs_causal_evidence with reason opposite_associations_lack_intervention_provenance.

Breaking: 2.3.0 legacy next-action keys removed from emitted payloads
(canonical {id, title, command?+argv?, instruction?, requires?} only).

Suite: 363 -> 500 tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Assets 2
Loading