Skip to content

feat(memory): embedding-based entity resolution for graph memory (#1230)#1257

Merged
bug-ops merged 7 commits intomainfrom
feat-m33-embedding-entity-resolution
Mar 6, 2026
Merged

feat(memory): embedding-based entity resolution for graph memory (#1230)#1257
bug-ops merged 7 commits intomainfrom
feat-m33-embedding-entity-resolution

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 6, 2026

Summary

Adds embedding-based entity resolution to graph memory, enabling semantic deduplication of entities beyond exact name+type matching.

  • When use_embedding_resolution = true, EntityResolver embeds entity name+summary and searches the zeph_graph_entities Qdrant collection for existing matches
  • Cosine >= 0.85 (configurable): auto-merge with existing entity
  • Cosine 0.70-0.85 (configurable): LLM disambiguation decides merge vs create
  • Cosine < 0.70: create new entity
  • All embedding/LLM failures gracefully fall back to exact match
  • Per-entity-name locking prevents concurrent duplicate creation
  • Batch resolution with buffer_unordered(4) concurrency limit

Files changed (11)

File Change
crates/zeph-memory/src/graph/resolver.rs Core: ResolutionOutcome, embedding flow, resolve_batch(), merge, LLM disambiguation, per-name locking
crates/zeph-memory/src/graph/store.rs find_entity_by_id(), set_entity_qdrant_point_id()
crates/zeph-memory/src/graph/mod.rs Re-export ResolutionOutcome
crates/zeph-memory/src/semantic.rs embedding_store() getter
crates/zeph-memory/src/embedding_store.rs upsert_to_collection() for point-id-stable upserts
crates/zeph-core/src/config/types.rs entity_ambiguous_threshold field (default 0.70)
Config snapshot Updated
CHANGELOG.md Added entries
README.md, crates/zeph-memory/README.md Feature mention
docs/src/concepts/graph-memory.md Entity resolution section

Breaking changes

  • EntityResolver::resolve() returns Result<(i64, ResolutionOutcome)> instead of Result<i64>

Test plan

  • 4284 tests pass (--features full), +26 new tests
  • cargo +nightly fmt --check clean
  • cargo clippy --workspace --features full -- -D warnings clean
  • All resolution paths covered: exact match, embedding merge, LLM disambiguate, create new
  • Failure paths: embed failure, LLM failure, fallback counter increment
  • Concurrent locking, entity type filter, batch resolution
  • Validator findings addressed: IC-S1 (orphaned points), IC-S2/PERF-01 (concurrency limit), IC-S3/SEC-M33-01 (entity type in prompt), GAP-1/2/3 (LLM disambiguation tests)

Closes #1230
Part of #1222

Extend EntityResolver with optional embedding-based fuzzy matching via
Qdrant collection zeph_graph_entities. Resolution flow: exact name+type
match -> cosine similarity search -> LLM disambiguation -> create new.

Key changes:
- EntityResolver builder: with_embedding_store(), with_provider(),
  with_thresholds() for optional embedding resolution
- ResolutionOutcome enum (ExactMatch, EmbeddingMatch, LlmDisambiguated,
  Created) returned from resolve()
- resolve_batch() with buffer_unordered(4) concurrency limit
- Per-entity-name DashMap locking to prevent concurrent duplicates
- Entity type filter in Qdrant search prevents cross-type merges
- LLM disambiguation with <external-data> spotlight tags
- Graceful fallback to exact match on any embedding/LLM failure
- GraphStore: find_entity_by_id(), set_entity_qdrant_point_id()
- EmbeddingStore: upsert_to_collection() for point-id-stable upserts
- SemanticMemory: embedding_store() getter
- GraphConfig: entity_ambiguous_threshold (default 0.70)

Part of graph memory epic #1222.
@github-actions github-actions bot added documentation Improvements or additions to documentation memory zeph-memory crate (SQLite) rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 6, 2026
bug-ops added 6 commits March 6, 2026 03:03
- Remove duplicate find_entity_by_id() added by main
- Destructure (id, _outcome) tuple from resolve() in extraction pipeline
- Merge README and docs content from both branches
- Merge FTS5 entity search (main) with embedding entity resolution (ours)
- Keep both find_entity_by_id (ours) and FTS5 fuzzy tests (main)
- Combine README descriptions from both branches
- Note: bootstrap test create_skill_matcher_when_semantic_disabled fails
  due to migration VersionMismatch(23) from FTS5 PR — pre-existing on main
…embedding resolution)

Integrate entity canonicalization (migration 024, canonical_name column,
EntityAlias table) from main with embedding-based entity resolution from
this branch.

Resolution strategy:
- resolve() now performs alias-first lookup (step 3) and canonical-name
  lookup (step 4) before embedding search (step 5), then create (step 6)
- upsert_entity() updated to 4-arg API (surface_name, canonical_name, type, summary)
  in all call sites within resolver.rs and store.rs tests
- find_entity_by_id() SELECT updated to include canonical_name column
- merge_entity() accepts new_surface_name + new_canonical_name separately
- register_aliases() extracted as private helper for alias registration
- Both test sets preserved: embedding resolution tests + canonicalization tests
- Canonicalization tests adapted from i64 return to (i64, ResolutionOutcome)

Tests: 3782 → 4357 (+575)
…tity-resolution

# Conflicts:
#	CHANGELOG.md
#	crates/zeph-memory/README.md
@bug-ops bug-ops enabled auto-merge (squash) March 6, 2026 04:20
@bug-ops bug-ops merged commit 0443815 into main Mar 6, 2026
28 checks passed
@bug-ops bug-ops deleted the feat-m33-embedding-entity-resolution branch March 6, 2026 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request memory zeph-memory crate (SQLite) rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(memory): embedding-based entity resolution for graph memory

1 participant