Skip to content

release: v0.7.0 — Grounded Retrieval & Provenance (dev → main)#104

Merged
ApiliumDevTeam merged 72 commits into
mainfrom
dev
Jul 1, 2026
Merged

release: v0.7.0 — Grounded Retrieval & Provenance (dev → main)#104
ApiliumDevTeam merged 72 commits into
mainfrom
dev

Conversation

@ApiliumDevTeam

Copy link
Copy Markdown
Contributor

Brings main up to date with dev — the v0.7.0 (Grounded Retrieval & Provenance) line of work, already tagged and released.

Highlights

  • Grounded retrieval (service::ground): cited, provenance-backed context with an explicit groundedness signal (grounded / weak / ungrounded).
  • Deterministic ingestion (aingle_ingest): documents → provenanced triples + text chunks.
  • Provenance anchoring: retrieved units carry provenance_anchor (signed DAG action hash).
  • Neural embeddings for semantic chunk retrieval.
  • MCP tools: aingle_ingest, aingle_ground, aingle_sources, aingle_backlinks, aingle_note_context.
  • Typed neighborhood graph (local_graph) + semantic edges in vault_map, with performance work.
  • Version bump to 0.7.0 across the active crates; refreshed brand logo; README section documenting the grounded-retrieval pipeline.

Verified with cargo check --workspace. Release: https://github.com/ApiliumCode/aingle/releases/tag/v0.7.0

ingest_path only deleted the old source-hash registry triple when a file
changed, leaving prior structural triples and Ineru chunks in place, so
ground() could surface stale content after an edit. Add purge_source to
delete every triple with subject==rel_path and forget every chunk whose
metadata.source==rel_path before re-writing the file's fresh data.
The placeholder hashing embedder can score an unrelated query's top chunk
above GROUND_HIGH, yielding a false 'grounded'. Require at least two strong
(>= GROUND_HIGH) chunks before declaring 'grounded'; a lone strong chunk is
downgraded to 'weak' with an explicit gap. Structural corroboration guard,
not a threshold tweak — revisit once a real embedder replaces the placeholder.
…lf-link tests

- Replace vec-cloning cosine() with a borrow-based implementation (no heap allocation per call)
- Replace std::ptr::eq identity trick in mean_sim() with explicit index comparison
- Build BTreeSet<&str> once in derive_structural for O(log n) note membership checks
- Add comment in vault_map_cached explaining intentional mutex release before .await
- Tests: self-link skip assertion, type_counts assertion, cache invalidation test
…k 1)

Add service::context with note_context() / note_context_cached(): for an
active note, retrieve the top-N semantically related notes (by embedding
cosine similarity) even when never explicitly linked, each annotated with
the best matching passage (char-safe ≤200), a signed DAG provenance anchor
(cfg dag), and an already_linked flag. Gate on SEMANTIC_MIN_DIMS=128 so the
64-d hash embedder short-circuits gracefully. Cache keyed on
(triple_count, total_memory_bytes) — same invalidation signal as
vault_map_cached. Six unit tests: semantic gate, ranking, passage truncation,
node-object links_to, _maps/ exclusion, dag provenance path.
Replace the no-op let _ = ... in provenance_present_when_signed with a
real assertion. The test now generates a DagSigningKey, signs a Custom
DAG action whose subject matches the neighbor note path, puts it in the
store, then asserts n.provenance_anchor.is_some() after note_context.
Items from code review:

1. Cache tests: add note_context_cached_hit_and_invalidation (locks
   cache-hit and version-change recompute) and cache_cap_clears_when_exceeded.
   Optional nit: no_chunks_falls_back_to_basename_and_never_self_matches.

2. Provenance for survivors only: build the neighbor Vec without
   provenance_anchor, sort+truncate to `limit`, then fill provenance
   for survivors. Cuts up to ~48 DAG reads to just `limit` reads.

3. Don't clone all memory for query text: read STM+LTM separately into
   two Vecs (no combined Vec), filter to note while iterating.

4. Avoid eager clone in per-source dedupe: match on btree_map::Entry so
   text.to_string() is called only on Vacant insert or score-improvement
   replace, never on occupied+same-score iterations.

5. Bound note_context_cache growth: clear map before insert when len
   exceeds 256. Documented with inline comment.

6. Lift obj_string into shared triple_util module: was duplicated
   verbatim in backlinks, context, and vault_map. Create
   service/triple_util.rs, register in mod.rs, replace all three copies.
   provenance_anchor_for duplication left as-is (dag-gated, separate spec).

All tests pass: context 9/9, backlinks 5/5, vault_map 7/7, total 228/228.
No new clippy warnings.
…th e5 embedder

Adds neural_note_context_finds_same_topic gated on the neural-embeddings
feature: bootstraps a real multilingual-e5-small embedder, ingests two
same-topic Spanish dog-care notes and one off-topic elections note, and
asserts the sibling note is found as a semantic neighbor (with passage)
while the off-topic note stays below the relevance floor (low=0.77).
Mirrors the neural_grounding_is_topical acceptance test in ground.rs.
…s, fix cache key

Three integration fixes in the context / backlinks / cache layer:

**Fix 1 — NEIGHBOR_FLOOR (neural calibration)**
Introduce `NEIGHBOR_FLOOR = 0.88` in `service/context.rs`. multilingual-e5
assigns a cosine baseline of ~0.83 to any same-language text, making the
embedder's grounding `low` threshold (0.77) too permissive for note-to-note
neighbor selection. `NEIGHBOR_FLOOR` mirrors `vault_map::SEMANTIC_THRESHOLD`
(related notes ~0.90+, unrelated ~0.81-0.83). Removes the now-unused
`relevance_thresholds()` call. Neural test `neural_note_context_finds_same_topic`
now passes: perros2.md (0.93) is a neighbor; elecciones.md (0.83) is excluded.

**Fix 2 — path-qualified wikilink resolution (I-1)**
Add `resolve_link_target` to `service/triple_util.rs`. Resolution order mirrors
the editor's `wikilinks.ts`: (1) exact path match; (2) path-qualified target
(`[[dir/note]]`) matched by path-without-ext scan — prevents the collision where
`b/note` wrongly collapsed to the alphabetically-first `a/note.md`; (3) basename
fallback. Three unit tests lock the fix. Both `context.rs` and `backlinks.rs`
inline `resolve` closures replaced with the shared helper.

**Fix 3 — cache key includes `limit` (M-2)**
`note_context_cache` map key changed from `String` (note path) to
`(String, usize)` (note path, limit) in `state.rs` and `context::note_context_cached`.
MCP calls with different limits now cache independently; the `cache_cap_clears_when_exceeded`
test updated to use the new key type.
Adds service::local_graph with LocalGraph, GNode, TypedEdge types and
local_graph / local_graph_cached async functions. Produces per-note BFS
neighborhoods (depth ≤ 2) with three typed edge kinds: explicit wikilinks
("link"), semantic neighbors via note_context ("semantic", with signed
provenance anchors under dag feature), and shared-tag connections ("tag").
Caps at 80 nodes, dedupes symmetric edges, excludes _maps/ paths.

Also makes NEIGHBOR_FLOOR pub in service::context so local_graph can
reuse the same threshold, and adds local_graph_cache field to AppState
(mirroring note_context_cache) in all four constructors.

7 tests: link_edge_from_wikilink, semantic_edge_from_neighbor,
semantic_edge_carries_provenance (dag), tag_edge_from_shared_tag,
hash_embedder_omits_semantic, maps_excluded, caps_respected — all pass
with and without --features dag.
…ble sort

- Replace note_context with note_context_cached in the BFS semantic pass
  to avoid redundant HNSW queries for the same note across depths.
- Pre-index links into by_src/by_dst BTreeMaps for O(1) per-node lookup;
  eliminates the O(links * frontier) linear scan per BFS level.
- Add SEM_FRONTIER_CAP (16): sort+truncate the next_frontier before
  promoting, bounding depth-2 semantic cost to at most 16 cached calls.
- Extend seen_sym key to (lo, hi, kind, tag_label) so two notes sharing
  two distinct tags produce two distinct tag edges instead of one.
- Sort final_edges by (source, target, kind) for stable cross-run output.
- Add 7 new tests (8-14): depth traversal, incoming links, link+semantic
  coexistence, symmetric dedup, cache hit/invalidation, cap eviction,
  and frontier-cap perf guard.
…from clustering

- GraphEdge gains pub kind: String; existing links_to edges get kind='link'
- cluster_semantic now returns (Vec<Topic>, Vec<(String, String, f32)>),
  capturing cosine pairs during the union-find pass at no extra O(n^2) cost
- compute_vault_map emits kind='semantic' edges for pairs that cleared
  SEMANTIC_THRESHOLD (0.88), filtered to the rendered node set (GRAPH_NODE_CAP),
  deduplicated order-insensitively, skipped when a link edge already exists,
  and capped at SEMANTIC_EDGE_CAP = 1200 (highest-cosine pairs kept)
- totals.links continues to count only explicit wikilinks (s.link_count)
- Tests: link_edges_have_link_kind, clustering_emits_semantic_edges (with
  no-dup-when-also-linked variant), totals_links_counts_only_explicit,
  and updated semantic_clusters_group_similar_notes for new return type
  (10/10 green)
… triple

Surface the note creation date on graph nodes so the Akashi UI can animate
a chronological timelapse. GraphNode (vault map) and GNode (local graph) gain
an optional `timestamp` field populated from the note'\''s `created` triple;
falls back to `date` when `created` is absent. Nodes without either triple
carry `None`. Covered by two new TDD tests (vault_map + local_graph).
Replace global pair-emission with per-node top-SEMANTIC_EDGES_PER_NODE=3 selection.
Final edge set is the UNION of each node's top-3 highest-cosine partners so
strongly similar pairs survive if either endpoint nominates the other.

For a fully-similar N-note vault edges drop from C(N,2) to ~N*3,
e.g. 63 notes: ~1953 edges before, ~183 after. totals.links unchanged.
SEMANTIC_EDGE_CAP stays as a final safety cap.

Extract pure helper top_k_semantic_pairs() with two new tests:
  - top_k_semantic_pairs_selects_union: unit test, k=1 prunes weakest pair
  - per_node_top_k_reduces_hairball: 5-note all-similar graph shows
    (n3.md,n4.md) absent and total==9 < C(5,2)=10; 13 tests green.
…s-reference

Add `content_hash: Option<String>` to `DagActionDto` and populate it in
`action_to_dto`: extracted from the first provenanced triple in a
`TripleInsert` payload, or from the first `TripleInsert` inside a `Batch`.
All other payload variants yield `None`.

Covered by five unit tests (RED→GREEN verified):
- TripleInsert with provenance → Some("deadbeef")
- Batch containing provenanced TripleInsert → Some("cafebabe")
- TripleInsert without provenance → None
- Genesis → None
- Noop → None
…arse_hex32, DEFAULT_HISTORY_LIMIT

A1: add VaultMapCache/NoteContextCache/LocalGraphCache type aliases in state.rs
A2: replace vec![…] with array literals in triple_util tests
A3: extract parse_hex32() helper in rest/dag.rs, replacing two inline loops
B1: make triple_util::basename pub(crate); delete duplicates from backlinks/vault_map
B2: move provenance_anchor_for to triple_util as pub(crate) async fn; delete from backlinks/context
B3: add pub(crate) fn strip_brackets to triple_util; replace all inline bracket-strip closures
B4: add DEFAULT_HISTORY_LIMIT const in service/dag.rs; update rest/dag.rs + mcp/server.rs wrappers
D: add #[inline] to obj_string; add doc comment to action_to_dto; obj_string uses strip_brackets
@ApiliumDevTeam ApiliumDevTeam merged commit 80b66cb into main Jul 1, 2026
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant