Skip to content

v2.15.0 - Agentic absorb verdicts, evidence + capacity, no-surprise-bills

Latest

Choose a tag to compare

@blisspixel blisspixel released this 15 Jun 03:43
· 3 commits to main since this release

Consolidates the work accumulated since v2.14.0: the brittle-rule cleanup that makes expert ingestion trustworthy, the evidence + capacity layers, guided onboarding, and a live bug-hunt round (including a real no-surprise-bills fix).

Agentic, not brittle (the headline)

The two worst lexical "verdicts" in the experts core are gone. The free word-overlap heuristics now only route; a cheap model verdict concludes:

  • Contradiction: a phrasing-level false positive is absorbed normally instead of minting a false contested belief. Authoritative for the graph too (no lexical contradiction edge is re-created behind the model).
  • Dedup: two different facts that merely share words (e.g. "$10/M" vs "$30/M") are no longer silently merged into one (data loss). Cost-bounded to the uncertain band.
  • The absorb result reports how many false positives the verdicts caught (contradictions_refuted, merges_blocked).
  • Removed a regex "atomicity monitor" — atomicity is meaning, the model's job, not a lexical rule.

The guardrail is written down: docs/plans/AGENTIC_BALANCE.md is now the standard for every rule-vs-agentic decision, with a STOP banner at the top of the roadmap.

Evidence + capacity

  • Evidence layer: deepr eval continuity (staleness honesty / abstention / contradiction-surfacing, $0) and the calibration harness (deepr eval calibrate). The published curve honestly shows the absorb extractor is faithful enough that confidence-vs-grounding saturates — measured, not asserted.
  • Capacity: deepr capacity (+ --probe) shows owned/prepaid capacity before metered API; a local Ollama backend runs expert absorb/sync --local at $0; routing quality priors so auto mode routes sensibly without paid evals.

Onboarding + portability

  • deepr init (detect keys, write .env, set a budget ceiling, optional synced data dir) and a severity-aware deepr doctor.
  • One experts root (DEEPR_DATA_DIR/DEEPR_EXPERTS_PATH): point it at a synced folder and your experts follow you across machines.

Fixes (live bug-hunt with real keys)

  • No-surprise-bills: budget=0 ("do not spend") silently became a $10 ceiling (budget or 10.0). Now None = default, 0.0 = honored.
  • Tests no longer pollute the real data/experts/.
  • eval continuity gives accurate errors (and stops creating empty dirs for typo'd names); costs doctor no longer cries wolf on a derived view; reports root unified across every component.

See docs/CHANGELOG.md for the full list.