Release Recall 0.2.0 · H-XX-D/recall-memory-substrate

Added

Effective confidence (src/core/evidence.ts): every cell now carries a
living, graph-computed confidence alongside its immutable stated one —
clamp01(stated × actor-calibration + support − challenge), derived
one-hop from incoming supports/contradicts/concerns relations and
*-supports/*-contradicts hyperedges, recomputed on every read with no
LLM involved. Search ranking consumes the effective value (challenged
cells sink even though challenge edges raise their graph degree — the
ranking-inversion class found in stress testing), compile packets render
it per cell as eff:<value>(challenged|supported|actor-discounted), and
actor discounts use the overconfidence signal (contradicted rate × mean
confidence-when-wrong) so humble-but-right writers are never penalized.
Pinned by a new test suite and a new adversarial retrieval gate case.
README: new "Beyond memory: Checker and Solver" section describing the
truth and compute organs that plug into the graph (git-native attestation,
gated solver library with optimality contracts) and the contact for
access.
Standing programs surfaced at compile: packets now carry a
standing_programs: section listing the enabled programs (watch, drift,
quorum, score) covering each selected cell, with program and hyperedge
handles — so agents wire new evidence into existing gates instead of
orphaning it.
drift and quorum program operations: drift is watch with
attribution — a tripped run names which member moved (topMover, ranked
movers); untripped runs derive nothing. quorum is k-of-m sign-off as
a graph object — members approve when live effective confidence clears
minEff, counted across distinct actors by default, so a contradicted
approver's approval stops counting with no policy code; quorum runs
always derive their attestation witness.
Graph reflexes (watch program operation): a hyperedge program that
baselines against its own previous run, trips when the bundle's live
effective confidence moves more than delta, and — with --derive —
files a concern against a configured target cell through the admission
gate, attributed to program:<id> so reflexes accumulate per-actor
calibration like any other writer. Untripped watch runs derive nothing:
silence means verified stability. Consequents file claims, never value
assignments — belief revision propagates as audited evidence, one
admission at a time.
Tripwire bundles: scored hyperedge programs now price their members
from live effective confidence (averageEffectiveConfidence in the score
output; stated-confidence average retained for explainability). A scored
evidence bundle — a deploy gate, a launch review — loses score on its
next run when any member is contradicted anywhere in the graph, with no
model involved. Pinned by an end-to-end tripwire test.
Real HTTP embedding backends: RECALL_EMBEDDING_URL (+ RECALL_EMBEDDING_MODEL,
RECALL_EMBEDDING_API_KEY) plug Ollama or any OpenAI-compatible embeddings
endpoint into the backend-aware semantic index. External failures latch off
per process and fall back to hash:v1, so writes never block on an embedding
service. recall semantic reindex rebuilds under the active backend.
Closed-loop calibration v1: recall calibration scores each writing actor's
stated confidence against survived-contradiction outcomes (Brier score,
contradicted rate, overconfidence signal). Only contradicts references that
resolve to actual cells count toward the score.
Agent coordination (ACP): a durable agent-to-agent request queue over the same
store — recall acp send / list / show / process / run plus matching MCP tools.
Operator runs (recall operate once/list/show), workflow allocation
(recall workflow allocate), pages (recall page), storage stats
(recall storage), trust and beliefs reports, blind locks, and compaction.
MCP server idle self-exit: the stdio server shuts down after an idle period
(default 30 minutes, RECALL_MCP_IDLE_EXIT_MS, 0 disables) so abandoned
spawns no longer accumulate; clients respawn on demand.
Public benchmark harness: npm run bench and npm run bench:public against a
reproducible synthetic corpus — see docs/19_PUBLIC_BENCHMARK.md.
Adversarial retrieval-quality test gate: IDF, stemming, graph prior, recency
decay, and literal code-symbol matching are pinned by tests.

Fixed

Python toolkit, JS/TS code extractor: exported const data bindings
(export const PLANS = {...}) are now extracted as const-data symbol
cells; previously only functions and classes were captured, leaving the
most change-sensitive symbols (catalogs, configs) invisible to
subgraph --entity queries.
Python toolkit, code linker: JS import specifiers (./plans.mjs) resolved
under Python-style dot-splitting to the file extension, so code-imports
hyperedges were never created for JS/TS projects; module file stems also
kept their .mjs/.ts suffixes. Both sides now share language-aware stem
derivation (relative paths, node: builtins, bare packages, Python dotted
modules).
Python toolkit, code linker: link discovery now targets only the newest
cell generation per title (older generations are kept for audit by
--rebuild supersedure but previously duplicated every discovered edge);
--include-superseded restores the old behavior.
Python toolkit, JS/TS extractor: --rebuild now warns when code cells
reference paths missing from the scanned tree (renamed or deleted files
leave stale active cells; detection-only, retirement semantics planned).
New DB-free regression suite: python/tests/toolkit_unit_tests.py
(19 checks pinning the fixes above).

Changed

Lexical retrieval rebuilt on SQLite FTS5 + BM25 (porter stemming, IDF), with
hybrid ranking fusing graph relation degree, calibrated confidence, and
recency as exponential decay rather than sort key. Falls back to LIKE search
on SQLite builds without FTS5; compiler_state reports the active backend.
The FTS shadow table is trigger-synced, so writes from older binaries keep
the index consistent; existing databases backfill on first open.
Compile is graph-aware: each selected cell's incoming contradicts/concerns
relations surface in the conflicts section with expansion handles (cap 6 per
cell + overflow marker).
Reference resolution handles the short recall://cell/<id> form everywhere —
relations, compile translation, health findings, calibration; legacy
address-form relation targets migrate to bare node ids on first open.
Admission warns on near-duplicate creates (title Jaccard / content cosine vs
active cells), naming the existing cell and suggesting update/supersede;
it also warns (never rejects) when a title exceeds 20 words.
Databases run in WAL journal mode (CLI, MCP server, daemon, and ACP workers
share one file).
Compile packets compress titles (20 words in relevant_memory, 12 in
reference/cell-state/health lines) and cap translated references at 6 per
cell with an overflow marker.
Docs: 06_HYPEREDGE_PROGRAMS.md reframed as 06_ADVANCED_GRAPH_OPERATIONS.md,
14_ADDRESSABLE_CELLS_AND_HYPERNETWORKS.md renamed to
14_ADDRESSABLE_CELLS_AND_GRAPH_VIEWS.md, and 19_PUBLIC_BENCHMARK.md added.
Test suite grew to 99 unit/integration tests and 94 end-to-end checks.

Full Changelog: v0.1.0...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recall 0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Fixed

Changed

Uh oh!