release: v0.7.0 — Grounded Retrieval & Provenance (dev → main)#104
Merged
Conversation
…d by create_triple
…s the signed action hash)
ingest_path only deleted the old source-hash registry triple when a file changed, leaving prior structural triples and Ineru chunks in place, so ground() could surface stale content after an edit. Add purge_source to delete every triple with subject==rel_path and forget every chunk whose metadata.source==rel_path before re-writing the file's fresh data.
The placeholder hashing embedder can score an unrelated query's top chunk above GROUND_HIGH, yielding a false 'grounded'. Require at least two strong (>= GROUND_HIGH) chunks before declaring 'grounded'; a lone strong chunk is downgraded to 'weak' with an explicit gap. Structural corroboration guard, not a threshold tweak — revisit once a real embedder replaces the placeholder.
…lf-link tests - Replace vec-cloning cosine() with a borrow-based implementation (no heap allocation per call) - Replace std::ptr::eq identity trick in mean_sim() with explicit index comparison - Build BTreeSet<&str> once in derive_structural for O(log n) note membership checks - Add comment in vault_map_cached explaining intentional mutex release before .await - Tests: self-link skip assertion, type_counts assertion, cache invalidation test
…cklinks + vault_map
…k 1) Add service::context with note_context() / note_context_cached(): for an active note, retrieve the top-N semantically related notes (by embedding cosine similarity) even when never explicitly linked, each annotated with the best matching passage (char-safe ≤200), a signed DAG provenance anchor (cfg dag), and an already_linked flag. Gate on SEMANTIC_MIN_DIMS=128 so the 64-d hash embedder short-circuits gracefully. Cache keyed on (triple_count, total_memory_bytes) — same invalidation signal as vault_map_cached. Six unit tests: semantic gate, ranking, passage truncation, node-object links_to, _maps/ exclusion, dag provenance path.
Replace the no-op let _ = ... in provenance_present_when_signed with a real assertion. The test now generates a DagSigningKey, signs a Custom DAG action whose subject matches the neighbor note path, puts it in the store, then asserts n.provenance_anchor.is_some() after note_context.
Items from code review: 1. Cache tests: add note_context_cached_hit_and_invalidation (locks cache-hit and version-change recompute) and cache_cap_clears_when_exceeded. Optional nit: no_chunks_falls_back_to_basename_and_never_self_matches. 2. Provenance for survivors only: build the neighbor Vec without provenance_anchor, sort+truncate to `limit`, then fill provenance for survivors. Cuts up to ~48 DAG reads to just `limit` reads. 3. Don't clone all memory for query text: read STM+LTM separately into two Vecs (no combined Vec), filter to note while iterating. 4. Avoid eager clone in per-source dedupe: match on btree_map::Entry so text.to_string() is called only on Vacant insert or score-improvement replace, never on occupied+same-score iterations. 5. Bound note_context_cache growth: clear map before insert when len exceeds 256. Documented with inline comment. 6. Lift obj_string into shared triple_util module: was duplicated verbatim in backlinks, context, and vault_map. Create service/triple_util.rs, register in mod.rs, replace all three copies. provenance_anchor_for duplication left as-is (dag-gated, separate spec). All tests pass: context 9/9, backlinks 5/5, vault_map 7/7, total 228/228. No new clippy warnings.
…th e5 embedder Adds neural_note_context_finds_same_topic gated on the neural-embeddings feature: bootstraps a real multilingual-e5-small embedder, ingests two same-topic Spanish dog-care notes and one off-topic elections note, and asserts the sibling note is found as a semantic neighbor (with passage) while the off-topic note stays below the relevance floor (low=0.77). Mirrors the neural_grounding_is_topical acceptance test in ground.rs.
…s, fix cache key Three integration fixes in the context / backlinks / cache layer: **Fix 1 — NEIGHBOR_FLOOR (neural calibration)** Introduce `NEIGHBOR_FLOOR = 0.88` in `service/context.rs`. multilingual-e5 assigns a cosine baseline of ~0.83 to any same-language text, making the embedder's grounding `low` threshold (0.77) too permissive for note-to-note neighbor selection. `NEIGHBOR_FLOOR` mirrors `vault_map::SEMANTIC_THRESHOLD` (related notes ~0.90+, unrelated ~0.81-0.83). Removes the now-unused `relevance_thresholds()` call. Neural test `neural_note_context_finds_same_topic` now passes: perros2.md (0.93) is a neighbor; elecciones.md (0.83) is excluded. **Fix 2 — path-qualified wikilink resolution (I-1)** Add `resolve_link_target` to `service/triple_util.rs`. Resolution order mirrors the editor's `wikilinks.ts`: (1) exact path match; (2) path-qualified target (`[[dir/note]]`) matched by path-without-ext scan — prevents the collision where `b/note` wrongly collapsed to the alphabetically-first `a/note.md`; (3) basename fallback. Three unit tests lock the fix. Both `context.rs` and `backlinks.rs` inline `resolve` closures replaced with the shared helper. **Fix 3 — cache key includes `limit` (M-2)** `note_context_cache` map key changed from `String` (note path) to `(String, usize)` (note path, limit) in `state.rs` and `context::note_context_cached`. MCP calls with different limits now cache independently; the `cache_cap_clears_when_exceeded` test updated to use the new key type.
Adds service::local_graph with LocalGraph, GNode, TypedEdge types and
local_graph / local_graph_cached async functions. Produces per-note BFS
neighborhoods (depth ≤ 2) with three typed edge kinds: explicit wikilinks
("link"), semantic neighbors via note_context ("semantic", with signed
provenance anchors under dag feature), and shared-tag connections ("tag").
Caps at 80 nodes, dedupes symmetric edges, excludes _maps/ paths.
Also makes NEIGHBOR_FLOOR pub in service::context so local_graph can
reuse the same threshold, and adds local_graph_cache field to AppState
(mirroring note_context_cache) in all four constructors.
7 tests: link_edge_from_wikilink, semantic_edge_from_neighbor,
semantic_edge_carries_provenance (dag), tag_edge_from_shared_tag,
hash_embedder_omits_semantic, maps_excluded, caps_respected — all pass
with and without --features dag.
…ble sort - Replace note_context with note_context_cached in the BFS semantic pass to avoid redundant HNSW queries for the same note across depths. - Pre-index links into by_src/by_dst BTreeMaps for O(1) per-node lookup; eliminates the O(links * frontier) linear scan per BFS level. - Add SEM_FRONTIER_CAP (16): sort+truncate the next_frontier before promoting, bounding depth-2 semantic cost to at most 16 cached calls. - Extend seen_sym key to (lo, hi, kind, tag_label) so two notes sharing two distinct tags produce two distinct tag edges instead of one. - Sort final_edges by (source, target, kind) for stable cross-run output. - Add 7 new tests (8-14): depth traversal, incoming links, link+semantic coexistence, symmetric dedup, cache hit/invalidation, cap eviction, and frontier-cap perf guard.
…from clustering - GraphEdge gains pub kind: String; existing links_to edges get kind='link' - cluster_semantic now returns (Vec<Topic>, Vec<(String, String, f32)>), capturing cosine pairs during the union-find pass at no extra O(n^2) cost - compute_vault_map emits kind='semantic' edges for pairs that cleared SEMANTIC_THRESHOLD (0.88), filtered to the rendered node set (GRAPH_NODE_CAP), deduplicated order-insensitively, skipped when a link edge already exists, and capped at SEMANTIC_EDGE_CAP = 1200 (highest-cosine pairs kept) - totals.links continues to count only explicit wikilinks (s.link_count) - Tests: link_edges_have_link_kind, clustering_emits_semantic_edges (with no-dup-when-also-linked variant), totals_links_counts_only_explicit, and updated semantic_clusters_group_similar_notes for new return type (10/10 green)
… triple Surface the note creation date on graph nodes so the Akashi UI can animate a chronological timelapse. GraphNode (vault map) and GNode (local graph) gain an optional `timestamp` field populated from the note'\''s `created` triple; falls back to `date` when `created` is absent. Nodes without either triple carry `None`. Covered by two new TDD tests (vault_map + local_graph).
Replace global pair-emission with per-node top-SEMANTIC_EDGES_PER_NODE=3 selection.
Final edge set is the UNION of each node's top-3 highest-cosine partners so
strongly similar pairs survive if either endpoint nominates the other.
For a fully-similar N-note vault edges drop from C(N,2) to ~N*3,
e.g. 63 notes: ~1953 edges before, ~183 after. totals.links unchanged.
SEMANTIC_EDGE_CAP stays as a final safety cap.
Extract pure helper top_k_semantic_pairs() with two new tests:
- top_k_semantic_pairs_selects_union: unit test, k=1 prunes weakest pair
- per_node_top_k_reduces_hairball: 5-note all-similar graph shows
(n3.md,n4.md) absent and total==9 < C(5,2)=10; 13 tests green.
…s-reference
Add `content_hash: Option<String>` to `DagActionDto` and populate it in
`action_to_dto`: extracted from the first provenanced triple in a
`TripleInsert` payload, or from the first `TripleInsert` inside a `Batch`.
All other payload variants yield `None`.
Covered by five unit tests (RED→GREEN verified):
- TripleInsert with provenance → Some("deadbeef")
- Batch containing provenanced TripleInsert → Some("cafebabe")
- TripleInsert without provenance → None
- Genesis → None
- Noop → None
…arse_hex32, DEFAULT_HISTORY_LIMIT A1: add VaultMapCache/NoteContextCache/LocalGraphCache type aliases in state.rs A2: replace vec![…] with array literals in triple_util tests A3: extract parse_hex32() helper in rest/dag.rs, replacing two inline loops B1: make triple_util::basename pub(crate); delete duplicates from backlinks/vault_map B2: move provenance_anchor_for to triple_util as pub(crate) async fn; delete from backlinks/context B3: add pub(crate) fn strip_brackets to triple_util; replace all inline bracket-strip closures B4: add DEFAULT_HISTORY_LIMIT const in service/dag.rs; update rest/dag.rs + mcp/server.rs wrappers D: add #[inline] to obj_string; add doc comment to action_to_dto; obj_string uses strip_brackets
…venance README section
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Brings
mainup to date withdev— the v0.7.0 (Grounded Retrieval & Provenance) line of work, already tagged and released.Highlights
service::ground): cited, provenance-backed context with an explicit groundedness signal (grounded/weak/ungrounded).aingle_ingest): documents → provenanced triples + text chunks.provenance_anchor(signed DAG action hash).aingle_ingest,aingle_ground,aingle_sources,aingle_backlinks,aingle_note_context.local_graph) + semantic edges invault_map, with performance work.Verified with
cargo check --workspace. Release: https://github.com/ApiliumCode/aingle/releases/tag/v0.7.0