Release v0.4.0 — Context Savings · bnimit/memor-ai

What's New

Five architecture improvements to measure and prove real token savings:

Query Complexity Routing

Scores queries by word count, identifier density, and structure. Routes to SKIP (trivial, 0 tokens), LIGHT (simple, k=4, 500 tokens), or FULL (complex, k=8, 1500 tokens). Trivial prompts like "yes" and "ok" inject nothing; simple follow-ups get a lighter budget.

Session Context Window

Maintains a rolling window of the last 5 queries per session. When a user types a sparse follow-up ("try the other approach"), the retriever sees it enriched with recent conversational context.

Semantic Feedback

Adds embedding cosine similarity alongside existing n-gram matching in the feedback analyzer. Catches paraphrased reuse that verbatim n-gram matching misses.

Distillation Quality Gate + Faster Decay

Extractive distiller gates chunks below a 0.3 signal score — filler text never becomes a memory
Quality scores decay by 50% every 14 days for unrecalled memories
Below 0.03 floor → auto-deactivate
Compaction threshold lowered from 0.90 → 0.85

Turn-Level Token ROI

Parses transcripts turn-by-turn, counts tool calls per turn, correlates with recall events. New turn_metrics table, /api/roi endpoint, and dashboard "Token ROI" card showing "X% fewer tool calls with recall."

Stats

215 tests passing
17 files changed, +993 lines
8 new test files, 3 new modules

Install

pipx install --force memor-cli
# or
pip install memor-cli==0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0 — Context Savings

Choose a tag to compare

Sorry, something went wrong.