v0.4.0 — Context Savings
What's New
Five architecture improvements to measure and prove real token savings:
Query Complexity Routing
Scores queries by word count, identifier density, and structure. Routes to SKIP (trivial, 0 tokens), LIGHT (simple, k=4, 500 tokens), or FULL (complex, k=8, 1500 tokens). Trivial prompts like "yes" and "ok" inject nothing; simple follow-ups get a lighter budget.
Session Context Window
Maintains a rolling window of the last 5 queries per session. When a user types a sparse follow-up ("try the other approach"), the retriever sees it enriched with recent conversational context.
Semantic Feedback
Adds embedding cosine similarity alongside existing n-gram matching in the feedback analyzer. Catches paraphrased reuse that verbatim n-gram matching misses.
Distillation Quality Gate + Faster Decay
- Extractive distiller gates chunks below a 0.3 signal score — filler text never becomes a memory
- Quality scores decay by 50% every 14 days for unrecalled memories
- Below 0.03 floor → auto-deactivate
- Compaction threshold lowered from 0.90 → 0.85
Turn-Level Token ROI
Parses transcripts turn-by-turn, counts tool calls per turn, correlates with recall events. New turn_metrics table, /api/roi endpoint, and dashboard "Token ROI" card showing "X% fewer tool calls with recall."
Stats
- 215 tests passing
- 17 files changed, +993 lines
- 8 new test files, 3 new modules
Install
pipx install --force memor-cli
# or
pip install memor-cli==0.4.0