Skip to content

fix: include global entries in dedup scan#295

Merged
BYK merged 2 commits into
mainfrom
fix/global-dedup
May 13, 2026
Merged

fix: include global entries in dedup scan#295
BYK merged 2 commits into
mainfrom
fix/global-dedup

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 13, 2026

Summary

lore data dedup was only scanning project-local entries, skipping global/user-level knowledge entries entirely.

Root Cause

deduplicate() called forProject(projectPath, false) which queries WHERE project_id = ? — global entries (project_id IS NULL) were excluded.

Fix

  • Extract core dedup logic into _dedup(entries, dryRun) private helper
  • Add deduplicateGlobal() that queries WHERE project_id IS NULL
  • Wire into CLI: runs global dedup after all per-project passes, displayed under "Global" heading

Verified

Before: lore data dedup showed 0 clusters for global entries despite visible duplicates.
After: correctly finds 3 duplicates across 2 clusters in global entries (e.g. 3 variants of "Bun mock.module() pollutes...").

BYK added 2 commits May 13, 2026 16:42
0.93 still caught false positives via star clustering — a hub entry
similar to two distinct entries (at 0.9326 and 0.958) pulled all three
into one cluster. 0.935 excludes 0.9326 while keeping genuine dupes.

Empirical distribution (312 Nomic v1.5 entries):
- 0.935+: all genuine duplicates
- 0.92-0.935: false positives from same-subsystem entries
- <0.92: related-but-distinct or noise

Refs #292
deduplicate() only queried project-local entries (forProject with
includeCross=false), so global/user-level knowledge entries were
never checked for duplicates.

Extract core dedup logic into _dedup() helper, add deduplicateGlobal()
that queries entries with project_id IS NULL, and wire it into the
CLI's data dedup command.
@BYK BYK merged commit 8a930a1 into main May 13, 2026
7 checks passed
@BYK BYK deleted the fix/global-dedup branch May 13, 2026 17:04
@craft-deployer craft-deployer Bot mentioned this pull request May 13, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant