feat(memory): embedding-based entity resolution for graph memory (#1230)#1257
Merged
feat(memory): embedding-based entity resolution for graph memory (#1230)#1257
Conversation
Extend EntityResolver with optional embedding-based fuzzy matching via Qdrant collection zeph_graph_entities. Resolution flow: exact name+type match -> cosine similarity search -> LLM disambiguation -> create new. Key changes: - EntityResolver builder: with_embedding_store(), with_provider(), with_thresholds() for optional embedding resolution - ResolutionOutcome enum (ExactMatch, EmbeddingMatch, LlmDisambiguated, Created) returned from resolve() - resolve_batch() with buffer_unordered(4) concurrency limit - Per-entity-name DashMap locking to prevent concurrent duplicates - Entity type filter in Qdrant search prevents cross-type merges - LLM disambiguation with <external-data> spotlight tags - Graceful fallback to exact match on any embedding/LLM failure - GraphStore: find_entity_by_id(), set_entity_qdrant_point_id() - EmbeddingStore: upsert_to_collection() for point-id-stable upserts - SemanticMemory: embedding_store() getter - GraphConfig: entity_ambiguous_threshold (default 0.70) Part of graph memory epic #1222.
- Remove duplicate find_entity_by_id() added by main - Destructure (id, _outcome) tuple from resolve() in extraction pipeline - Merge README and docs content from both branches
- Merge FTS5 entity search (main) with embedding entity resolution (ours) - Keep both find_entity_by_id (ours) and FTS5 fuzzy tests (main) - Combine README descriptions from both branches - Note: bootstrap test create_skill_matcher_when_semantic_disabled fails due to migration VersionMismatch(23) from FTS5 PR — pre-existing on main
…embedding resolution) Integrate entity canonicalization (migration 024, canonical_name column, EntityAlias table) from main with embedding-based entity resolution from this branch. Resolution strategy: - resolve() now performs alias-first lookup (step 3) and canonical-name lookup (step 4) before embedding search (step 5), then create (step 6) - upsert_entity() updated to 4-arg API (surface_name, canonical_name, type, summary) in all call sites within resolver.rs and store.rs tests - find_entity_by_id() SELECT updated to include canonical_name column - merge_entity() accepts new_surface_name + new_canonical_name separately - register_aliases() extracted as private helper for alias registration - Both test sets preserved: embedding resolution tests + canonicalization tests - Canonicalization tests adapted from i64 return to (i64, ResolutionOutcome) Tests: 3782 → 4357 (+575)
…tity-resolution # Conflicts: # CHANGELOG.md # crates/zeph-memory/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds embedding-based entity resolution to graph memory, enabling semantic deduplication of entities beyond exact name+type matching.
use_embedding_resolution = true, EntityResolver embeds entity name+summary and searches thezeph_graph_entitiesQdrant collection for existing matchesbuffer_unordered(4)concurrency limitFiles changed (11)
crates/zeph-memory/src/graph/resolver.rscrates/zeph-memory/src/graph/store.rsfind_entity_by_id(),set_entity_qdrant_point_id()crates/zeph-memory/src/graph/mod.rsResolutionOutcomecrates/zeph-memory/src/semantic.rsembedding_store()gettercrates/zeph-memory/src/embedding_store.rsupsert_to_collection()for point-id-stable upsertscrates/zeph-core/src/config/types.rsentity_ambiguous_thresholdfield (default 0.70)CHANGELOG.mdREADME.md,crates/zeph-memory/README.mddocs/src/concepts/graph-memory.mdBreaking changes
EntityResolver::resolve()returnsResult<(i64, ResolutionOutcome)>instead ofResult<i64>Test plan
--features full), +26 new testscargo +nightly fmt --checkcleancargo clippy --workspace --features full -- -D warningscleanCloses #1230
Part of #1222