Consolidates the work accumulated since v2.14.0: the brittle-rule cleanup that makes expert ingestion trustworthy, the evidence + capacity layers, guided onboarding, and a live bug-hunt round (including a real no-surprise-bills fix).
Agentic, not brittle (the headline)
The two worst lexical "verdicts" in the experts core are gone. The free word-overlap heuristics now only route; a cheap model verdict concludes:
- Contradiction: a phrasing-level false positive is absorbed normally instead of minting a false contested belief. Authoritative for the graph too (no lexical contradiction edge is re-created behind the model).
- Dedup: two different facts that merely share words (e.g. "$10/M" vs "$30/M") are no longer silently merged into one (data loss). Cost-bounded to the uncertain band.
- The absorb result reports how many false positives the verdicts caught (
contradictions_refuted,merges_blocked). - Removed a regex "atomicity monitor" — atomicity is meaning, the model's job, not a lexical rule.
The guardrail is written down: docs/plans/AGENTIC_BALANCE.md is now the standard for every rule-vs-agentic decision, with a STOP banner at the top of the roadmap.
Evidence + capacity
- Evidence layer:
deepr eval continuity(staleness honesty / abstention / contradiction-surfacing, $0) and the calibration harness (deepr eval calibrate). The published curve honestly shows the absorb extractor is faithful enough that confidence-vs-grounding saturates — measured, not asserted. - Capacity:
deepr capacity(+--probe) shows owned/prepaid capacity before metered API; a local Ollama backend runsexpert absorb/sync --localat $0; routing quality priors so auto mode routes sensibly without paid evals.
Onboarding + portability
deepr init(detect keys, write.env, set a budget ceiling, optional synced data dir) and a severity-awaredeepr doctor.- One experts root (
DEEPR_DATA_DIR/DEEPR_EXPERTS_PATH): point it at a synced folder and your experts follow you across machines.
Fixes (live bug-hunt with real keys)
- No-surprise-bills:
budget=0("do not spend") silently became a $10 ceiling (budget or 10.0). NowNone= default,0.0= honored. - Tests no longer pollute the real
data/experts/. eval continuitygives accurate errors (and stops creating empty dirs for typo'd names);costs doctorno longer cries wolf on a derived view; reports root unified across every component.
See docs/CHANGELOG.md for the full list.