-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Tool Affected
All (cache / collect logic)
What Happened
After multiple runs of apc collect, the local cache contains 203 memory entries despite the source tools having only a handful of actual memory files/entries.
apc memory list shows the same entries repeated dozens of times:
[preference]
- Docker test pref (manual_add)
- Persist across collect (manual_add)
- Docker test pref (manual_add)
- Persist across collect (manual_add)
- Docker test pref (manual_add)
... (20+ duplicates of the same entries)
apc status shows Memory: 203 while only ~10 unique entries exist across all tools.
Root Cause
src/cache.py's merge_memory() function merges new entries into the existing cache, but the deduplication logic appears to fail for manual_add entries and entries that share the same content but were re-extracted across multiple collect runs.
Each time apc collect runs, raw-file entries (e.g., claude-code/CLAUDE.md, openclaw/MEMORY.md) are re-added without checking if an identical entry (same source_tool + source_file) already exists.
What Was Expected
apc collectshould upsert memory entries — update existing ones, not append duplicates- Entries with the same
source_tool+source_file(or sameentry_idfor manual entries) should never be duplicated
Impact
- LLM memory sync becomes extremely expensive (sending 200+ entries for a few unique facts)
- Output is unreadable
- Sync results are polluted with noise
Suggested Fix
In merge_memory():
- For raw-file entries: deduplicate on
(source_tool, source_file) - For manual entries: deduplicate on
entry_id - For legacy entries: deduplicate on
contenthash orentry_id