v1.0.1 — Platform overhaul: run_eval CLI, tiered case layout, OS-aware installer
Highlights
run_eval.py— the new primary user-facing command. Live mode only: fails fast with an actionable message whenANTHROPIC_API_KEYis unset; discovers cases dynamically from both tiers; writesout/<tier>/<case-id>/<timestamp>/{findings,report,summary}.json.- Tiered, self-contained case studies —
examples/case-studies/self-evaluation/case-01..08andexternal-evaluation/case-01..03(NIST CFReDS, Ali Hadi, Digital Corpora M57-Patents/Jo). Index-only folder names,truth.jsonper case, canonical bundled evidence atself-evaluation/case-01/evidence_root/. The public--variantselector is gone. - OS-aware installer —
scripts/install.sh --os auto|ubuntu|centos|macos, venv-first, clones+installs the collector adapter, optional SIFT (--install-sift, via cast) and Eric Zimmerman Tools (--install-eztools, .NET 9 builds, URLs validated before download). Plus rootrequirements.txtand an API-freescripts/healthcheck.py. - Downloader hardening — browser-like headers on every request (incl. resumed range requests), pure-Python streaming split-image reassembly,
--dry-run/--check-urls. - Hardening (earlier in this line) — MCP
call_tool()schema validation before dispatch, Plaso outputs isolated toDART_DERIVED_ROOT, benchmark summary no longer fabricates rows, hallucination scoring requires resolvable audit IDs.
Measured QA at this tag
- Full pytest suite green (
tests/+dart_corr/tests/);benchmark-integrityandCIworkflows green on this commit. scripts/measure_accuracy.py: recall 1.0, FPR 0.0, hallucinations 0, evidence integrity preserved (67 files).validate_ground_truth.py: FAIL 0 (6 documented external-tier warnings).
Known limitations
- The adapter's
--source image(Velociraptor dead-disk) path is covered by mocked end-to-end tests and has not been exercised against a live Velociraptor binary in CI. - External-tier evaluations require a one-time multi-GB dataset download; no external-dataset accuracy numbers are claimed at this tag.
Full details: CHANGELOG.md