Skip to content

v0.4.0 — Context Savings

Choose a tag to compare

@bnimit bnimit released this 07 Jun 16:03
· 50 commits to main since this release
8a70e32

What's New

Five architecture improvements to measure and prove real token savings:

Query Complexity Routing

Scores queries by word count, identifier density, and structure. Routes to SKIP (trivial, 0 tokens), LIGHT (simple, k=4, 500 tokens), or FULL (complex, k=8, 1500 tokens). Trivial prompts like "yes" and "ok" inject nothing; simple follow-ups get a lighter budget.

Session Context Window

Maintains a rolling window of the last 5 queries per session. When a user types a sparse follow-up ("try the other approach"), the retriever sees it enriched with recent conversational context.

Semantic Feedback

Adds embedding cosine similarity alongside existing n-gram matching in the feedback analyzer. Catches paraphrased reuse that verbatim n-gram matching misses.

Distillation Quality Gate + Faster Decay

  • Extractive distiller gates chunks below a 0.3 signal score — filler text never becomes a memory
  • Quality scores decay by 50% every 14 days for unrecalled memories
  • Below 0.03 floor → auto-deactivate
  • Compaction threshold lowered from 0.90 → 0.85

Turn-Level Token ROI

Parses transcripts turn-by-turn, counts tool calls per turn, correlates with recall events. New turn_metrics table, /api/roi endpoint, and dashboard "Token ROI" card showing "X% fewer tool calls with recall."

Stats

  • 215 tests passing
  • 17 files changed, +993 lines
  • 8 new test files, 3 new modules

Install

pipx install --force memor-cli
# or
pip install memor-cli==0.4.0