Skip to content

Add XOR-Merkle tree, semiring algebra, and NARS truth gates to SPO store#167

Open
AdaWorldAPI wants to merge 3 commits intomainfrom
claude/review-lance-graph-architecture-i6TKf
Open

Add XOR-Merkle tree, semiring algebra, and NARS truth gates to SPO store#167
AdaWorldAPI wants to merge 3 commits intomainfrom
claude/review-lance-graph-architecture-i6TKf

Conversation

@AdaWorldAPI
Copy link
Owner

Summary

This PR introduces three major features to the SPO graph store: (1) an XOR-Merkle tree for authenticated query results, (2) pluggable semiring algebra for flexible graph traversal, and (3) NARS truth-value gating for epistemic filtering of edges.

Key Changes

XOR-Merkle Tree (src/graph/spo/merkle.rs)

  • New module implementing an order-independent Merkle tree over the DN address space using XOR for combining children instead of concatenation + hash
  • Properties: O(1) insert/update via XOR-flip propagation, O(log n) inclusion proofs, O(1) integrity verification
  • Leaf hash: SHA256(dn ‖ fingerprint ‖ freq ‖ conf) — binds content and NARS truth values
  • Interior nodes: XOR accumulation of children's hashes (commutative, enabling incremental updates)
  • Truth trajectory tracking: Snapshots and epoch diffs for incremental NARS recomputation
  • Authenticated results: AuthenticatedResult wraps query hits with Merkle proofs for wire-level verification
  • Comprehensive tests: 30+ test cases covering insertion, removal, verification, proofs, and trajectory computation

Semiring Algebra (src/graph/spo/semiring.rs)

  • New trait SpoSemiring for pluggable graph traversal algebra (⊗ for edge steps, ⊕ for path accumulation)
  • Four implementations:
    • XorBundle: Default HDC path composition (XOR bind + majority-vote bundle)
    • HammingMin: Shortest semantic path (Hamming cost, min accumulation)
    • SimilarityMax: Best match scoring (similarity product, max accumulation)
    • Reachability: Boolean reachability (AND/OR logic)
  • Integration: New walk_chain_semiring() method in SpoStore for flexible traversal

NARS Truth Gates (src/graph/spo/store.rs)

  • New TruthGate struct: Epistemic filtering with min_freq and min_conf thresholds
  • Presets: OPEN (no filtering), STRONG (f≥0.7, c≥0.5), CERTAIN (f≥0.9, c≥0.8)
  • New query methods: sxp2o(), spo2x(), xpo2s() variants that apply truth gates before distance ranking
  • Returns SpoHit: Enriched query results with NARS truth values attached

Belichtung Prefilter (src/graph/spo/store.rs)

  • 7-point Hamming distance sampling across the 128-bit bitmap range
  • ~90% rejection rate before full Hamming computation (14 cycles to estimate ±15%)
  • Applied to all three-axis queries (S×P→O, P×O→S, S×O→P) on primary axis

Store Integration

  • Merkle tree embedded in SpoStore: automatic updates on insert/remove
  • Root hash: Single value summarizing entire store state
  • Integrity verification: verify_integrity() checks record against Merkle tree
  • Removal support: New remove() method with Merkle cleanup

Utility Modules

  • case_insensitive.rs: Extension trait for forgiving label/verb matching (exact match fast path, case-insensitive fallback)
  • sparse.rs enhancements: Location-enriched error types via #[track_caller]

Notable Implementation Details

  • XOR properties: Order-independent combination enables O(1) updates without rehashing siblings; composable with HDC fingerprint binding
  • Merkle leaf binding: Content fingerprint + NARS truth values in single hash prevents tampering with either
  • Semiring composability: Trait design allows mixing different algebras for different query types

https://claude.ai/code/session_016SeGMg1pgf1MqK8YWkedvV

… trajectory

Implements the Stack Hardening Guide for graph::spo:

New modules:
- merkle.rs: XOR-Merkle tree with O(1) insert, authenticated queries,
  inclusion proofs, epoch snapshots, and truth trajectory diff (22 tests)
- semiring.rs: SpoSemiring trait with XorBundle, HammingMin, SimilarityMax,
  Reachability implementations for pluggable graph walks (6 tests)
- case_insensitive.rs: CaseInsensitiveLookup trait for HashMap/BTreeMap
  with exact-first, case-insensitive fallback (5 tests)

store.rs additions:
- Belichtung 7-point prefilter (rejects ~90% before full Hamming)
- TruthGate epistemic filtering (OPEN/STRONG/CERTAIN presets)
- S×P→O, S×O→P, P×O→S NARS-gated axis projections
- walk_chain_semiring() for pluggable traversal algebra
- Authenticated query wrappers with Merkle proofs
- Merkle tree wired into insert/remove

sparse.rs: #[track_caller] error enrichment (IntegrityViolation, Located)
cognitive_udfs.rs: hamming_bytes() delegates to rustynum SIMD

Also: wire graph::spo into build, fix builder.rs invalid hex literal

https://claude.ai/code/session_016SeGMg1pgf1MqK8YWkedvV
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80d5f894dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

let wb = b.get_word(idx);
sample_diff += (wa ^ wb).count_ones();
}
(sample_diff * 18) > threshold

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make Belichtung rejection lossless for exact radius queries

The rejection check uses sample_diff * 18 as an estimate, not a mathematical lower bound, so it can reject true matches when differences are concentrated in sampled words (e.g., full Hamming distance is within threshold but sampled words overrepresent the errors). Because the query paths immediately continue on this predicate, query_forward/query_reverse/query_relation (and their truth-gated variants) can now return incomplete results for exact-radius searches.

Useful? React with 👍 / 👎.

Comment on lines +321 to +323
fn flip_up(&mut self, dn: u64, hash: &MerkleHash) {
let acc = self.children_xor.entry(dn).or_insert(ZERO_HASH);
*acc = xor_hash(acc, hash);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate XOR flips up ancestor chain to keep root authentic

flip_up only mutates children_xor for the immediate node and never walks parent pointers, so changes under non-root parents do not update root_hash(). In a nested tree (for example, inserting a child under a non-root DN), subtree mutations are omitted from the root commitment, which breaks the integrity guarantees of root-hash comparison and authenticated proofs.

Useful? React with 👍 / 👎.

kids.remove(&dn);
}
}
self.flip_up(parent_dn, &old_leaf);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge XOR old leaf out of previous parent during reparent update

When a DN already exists, the code removes the child from old_parent but XORs out the old hash from parent_dn (the new parent) instead of old_parent. If a leaf is reparented, this leaves stale hash contribution in the old branch and corrupts the Merkle accumulators/root for that update.

Useful? React with 👍 / 👎.

Module 2 (clam_path.rs): IntegrityResult enum + verify_integrity()
  - roots_consistent(): compare MerkleRoots between parent/child
  - verify_integrity(): walk ClamPath::parent() chain checking word[0]
  - verify_batch(): fail-fast batch verification
  - 8 integrity tests

Module 4 (store.rs): ClamSpoIndex + sxp2o_clam()
  - ClamSpoIndex::build(): builds ClamTree over X-axis sparse data
  - sxp2o_clam(): triangle inequality pruned forward query
    Phase 1: rho_nn prunes clusters where δ⁻ > ρ (O(log n))
    Phase 2: hamming_dense_vs_sparse checks Y-axis on candidates only
  - hamming_dense_vs_sparse(): zero-alloc dense×sparse Hamming (Gate 3)
  - sparse_to_bytes(): avoid to_dense() which panics on 256-word Container
  - Wired graph::spo module into graph/mod.rs
  - Fixed invalid hex literal 0xCHA1_D15C in builder.rs
  - 8 CLAM index tests + 3 hamming function tests

https://claude.ai/code/session_016SeGMg1pgf1MqK8YWkedvV
AdaWorldAPI pushed a commit that referenced this pull request Mar 4, 2026
…egrity verification

Core fixes from PR #167/#168 review, rewritten from spec:

- sparse.rs: Widen bitmap [u64;2] → [u64;BITMAP_WORDS=4] to cover all 256 positions
- builder.rs: Fix invalid hex literal (0xCHA1_D15C → 0xC4A1_D15C), rewrite label_fp()
  to produce ~11% density sparse fingerprints via xorshift64 PRNG
- store.rs: Add TruthGate epistemic filter (OPEN/STRONG/CERTAIN), fix scent/Hamming
  scale mismatch with separate scent_radius parameter, add gated query methods
- geometry.rs: Add ContainerGeometry::Spo = 6 variant
- mod.rs: Wire pub mod spo (was missing — ~2000 lines of dead code)
- bind_space.rs: Add clam_merkle field to BindNode, stamp ClamPath + MerkleRoot
  on write_dn_node()/write_dn_path(), add verify_lineage() + Epoch snapshot

All 61 tests pass (24 SPO + 32 ClamPath + 5 bind_space integrity).

https://claude.ai/code/session_013JU8MRtxRRTtuc5217r6s5
Task 1 — Error Type Unification:
- Create src/query/error.rs with snafu-based QueryError enum
  (ParseError, PlanError, ExecutionError, TranspileError,
   UnsupportedFeature, InvalidPattern, Arrow, DataFusion, SpoError)
- Add #[track_caller] helper functions + convenience macros
- Remove old thiserror QueryError from mod.rs, wire new module
- Fix two .unwrap() calls in cypher.rs tokenizer (lines 487, 489)

Task 2 — Parser Extraction:
- Copy 6 lance-graph parser files into src/query/lance_parser/
  (ast, parser, semantic, config, case_insensitive, parameter_substitution)
- Rewrite imports: GraphError → QueryError, crate:: → super::
- Add nom + snafu dependencies to Cargo.toml
- All 107 tests pass (6 error + 101 lance_parser)

https://claude.ai/code/session_016SeGMg1pgf1MqK8YWkedvV
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants