Recall 0.2.0
Added
-
Effective confidence (
src/core/evidence.ts): every cell now carries a
living, graph-computed confidence alongside its immutable stated one —
clamp01(stated × actor-calibration + support − challenge), derived
one-hop from incomingsupports/contradicts/concernsrelations and
*-supports/*-contradictshyperedges, recomputed on every read with no
LLM involved. Search ranking consumes the effective value (challenged
cells sink even though challenge edges raise their graph degree — the
ranking-inversion class found in stress testing), compile packets render
it per cell aseff:<value>(challenged|supported|actor-discounted), and
actor discounts use the overconfidence signal (contradicted rate × mean
confidence-when-wrong) so humble-but-right writers are never penalized.
Pinned by a new test suite and a new adversarial retrieval gate case. -
README: new "Beyond memory: Checker and Solver" section describing the
truth and compute organs that plug into the graph (git-native attestation,
gated solver library with optimality contracts) and the contact for
access. -
Standing programs surfaced at compile: packets now carry a
standing_programs:section listing the enabled programs (watch, drift,
quorum, score) covering each selected cell, with program and hyperedge
handles — so agents wire new evidence into existing gates instead of
orphaning it. -
driftandquorumprogram operations:driftis watch with
attribution — a tripped run names which member moved (topMover, ranked
movers); untripped runs derive nothing.quorumis k-of-m sign-off as
a graph object — members approve when live effective confidence clears
minEff, counted across distinct actors by default, so a contradicted
approver's approval stops counting with no policy code; quorum runs
always derive their attestation witness. -
Graph reflexes (
watchprogram operation): a hyperedge program that
baselines against its own previous run, trips when the bundle's live
effective confidence moves more thandelta, and — with--derive—
files a concern against a configured target cell through the admission
gate, attributed toprogram:<id>so reflexes accumulate per-actor
calibration like any other writer. Untripped watch runs derive nothing:
silence means verified stability. Consequents file claims, never value
assignments — belief revision propagates as audited evidence, one
admission at a time. -
Tripwire bundles: scored hyperedge programs now price their members
from live effective confidence (averageEffectiveConfidencein the score
output; stated-confidence average retained for explainability). A scored
evidence bundle — a deploy gate, a launch review — loses score on its
next run when any member is contradicted anywhere in the graph, with no
model involved. Pinned by an end-to-end tripwire test. -
Real HTTP embedding backends:
RECALL_EMBEDDING_URL(+RECALL_EMBEDDING_MODEL,
RECALL_EMBEDDING_API_KEY) plug Ollama or any OpenAI-compatible embeddings
endpoint into the backend-aware semantic index. External failures latch off
per process and fall back tohash:v1, so writes never block on an embedding
service.recall semantic reindexrebuilds under the active backend. -
Closed-loop calibration v1:
recall calibrationscores each writing actor's
stated confidence against survived-contradiction outcomes (Brier score,
contradicted rate, overconfidence signal). Onlycontradictsreferences that
resolve to actual cells count toward the score. -
Agent coordination (ACP): a durable agent-to-agent request queue over the same
store —recall acp send / list / show / process / runplus matching MCP tools. -
Operator runs (
recall operate once/list/show), workflow allocation
(recall workflow allocate), pages (recall page), storage stats
(recall storage), trust and beliefs reports, blind locks, and compaction. -
MCP server idle self-exit: the stdio server shuts down after an idle period
(default 30 minutes,RECALL_MCP_IDLE_EXIT_MS,0disables) so abandoned
spawns no longer accumulate; clients respawn on demand. -
Public benchmark harness:
npm run benchandnpm run bench:publicagainst a
reproducible synthetic corpus — seedocs/19_PUBLIC_BENCHMARK.md. -
Adversarial retrieval-quality test gate: IDF, stemming, graph prior, recency
decay, and literal code-symbol matching are pinned by tests.
Fixed
- Python toolkit, JS/TS code extractor: exported const data bindings
(export const PLANS = {...}) are now extracted asconst-datasymbol
cells; previously only functions and classes were captured, leaving the
most change-sensitive symbols (catalogs, configs) invisible to
subgraph --entityqueries. - Python toolkit, code linker: JS import specifiers (
./plans.mjs) resolved
under Python-style dot-splitting to the file extension, socode-imports
hyperedges were never created for JS/TS projects; module file stems also
kept their.mjs/.tssuffixes. Both sides now share language-aware stem
derivation (relative paths,node:builtins, bare packages, Python dotted
modules). - Python toolkit, code linker: link discovery now targets only the newest
cell generation per title (older generations are kept for audit by
--rebuildsupersedure but previously duplicated every discovered edge);
--include-supersededrestores the old behavior. - Python toolkit, JS/TS extractor:
--rebuildnow warns when code cells
reference paths missing from the scanned tree (renamed or deleted files
leave stale active cells; detection-only, retirement semantics planned). - New DB-free regression suite:
python/tests/toolkit_unit_tests.py
(19 checks pinning the fixes above).
Changed
- Lexical retrieval rebuilt on SQLite FTS5 + BM25 (porter stemming, IDF), with
hybrid ranking fusing graph relation degree, calibrated confidence, and
recency as exponential decay rather than sort key. Falls back to LIKE search
on SQLite builds without FTS5;compiler_statereports the active backend.
The FTS shadow table is trigger-synced, so writes from older binaries keep
the index consistent; existing databases backfill on first open. - Compile is graph-aware: each selected cell's incoming
contradicts/concerns
relations surface in the conflicts section with expansion handles (cap 6 per
cell + overflow marker). - Reference resolution handles the short
recall://cell/<id>form everywhere —
relations, compile translation, health findings, calibration; legacy
address-form relation targets migrate to bare node ids on first open. - Admission warns on near-duplicate creates (title Jaccard / content cosine vs
active cells), naming the existing cell and suggestingupdate/supersede;
it also warns (never rejects) when a title exceeds 20 words. - Databases run in WAL journal mode (CLI, MCP server, daemon, and ACP workers
share one file). - Compile packets compress titles (20 words in
relevant_memory, 12 in
reference/cell-state/health lines) and cap translated references at 6 per
cell with an overflow marker. - Docs:
06_HYPEREDGE_PROGRAMS.mdreframed as06_ADVANCED_GRAPH_OPERATIONS.md,
14_ADDRESSABLE_CELLS_AND_HYPERNETWORKS.mdrenamed to
14_ADDRESSABLE_CELLS_AND_GRAPH_VIEWS.md, and19_PUBLIC_BENCHMARK.mdadded. - Test suite grew to 99 unit/integration tests and 94 end-to-end checks.
Full Changelog: v0.1.0...v0.2.0