Skip to content

Disable Mem0 and remove seed machinery#321

Merged
dcellison merged 3 commits into
mainfrom
fix/disable-mem0-revert-seed
Apr 16, 2026
Merged

Disable Mem0 and remove seed machinery#321
dcellison merged 3 commits into
mainfrom
fix/disable-mem0-revert-seed

Conversation

@dcellison
Copy link
Copy Markdown
Owner

Summary

Disables Mem0 and removes the seed machinery added in #317. Two problems drove this:

Hallucination feedback loop

Track 1 ingests every assistant response via add_exchange() with infer=False - no truth filter. When the assistant fabricates something (a file path, a function name), Mem0 embeds it. On the next turn, semantic retrieval surfaces the fabrication as authoritative context in the [Relevant memories...] block. The model sees its own fiction presented as historical fact and doubles down. This was observed directly in production: Kai hallucinated src/kai/seed_memory.py with a --dry-run flag, Mem0 stored it, and subsequent turns kept referencing it because retrieval kept serving it back. The loop is self-reinforcing and gets harder to escape with every turn.

Seed was Claude Code-specific

seed_from_memory_md() assumed a specific MEMORY.md layout with hardcoded filename mappings (preferences.md, user.md, etc.) and skipped MEMORY.md itself. Users with a flat MEMORY.md (the common case for most installations) got nothing seeded. The attempted fix (dynamic discovery via markdown link parsing) was still too coupled to one tool's memory format. The seed concept needs to be rethought as a simple, backend-agnostic CLI command.

What changed

  • main.py: Removed Mem0 init block and seed block from _init_and_run(). MEMORY_ENABLED is now ignored.
  • memory.py: Removed _classify_source_file, _parse_topic_file, _is_duplicate, seed_from_memory_md (-271 lines). Core primitives (init_memory, search, add_exchange, add_structured, format_context) retained for future use.
  • tests: Removed TestParseTopicFile, TestClassifySourceFile, TestIsDuplicate, TestSeedFromMemoryMd, TestMemorySeedIntegration (-683 lines).
  • Net: -1019 lines.

Test plan

  • make check clean (ruff lint + format)
  • 1719 tests passing, zero failures
  • Mem0 init no longer runs regardless of MEMORY_ENABLED setting
  • Qdrant collection manually cleared in production

Fixes #320

Track 1 ingests assistant responses without verifying accuracy. When the
assistant hallucinates (e.g. fabricated file paths), Mem0 embeds it and
retrieval surfaces it as authoritative context on subsequent turns. This
creates a self-reinforcing feedback loop that was observed in production.

The seed_from_memory_md() implementation was also too coupled to Claude
Code's specific MEMORY.md format (hardcoded filenames, topic file
references). It did nothing for users with flat MEMORY.md files.

Remove: Mem0 init block and seed block from main.py, all seed functions
from memory.py (_classify_source_file, _parse_topic_file, _is_duplicate,
seed_from_memory_md), and all associated tests. MEMORY_ENABLED is now
ignored. Core memory primitives are retained for future use.

Fixes #320
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

This is a clean deletion PR. The changes are surgical and well-executed. No bugs or security issues found.

Suggestion - orphaned session flags

memory_seeded:{user_id} keys written by the old seed block still exist in the sessions store. No code reads or writes them anymore. They're harmless but will accumulate indefinitely. Worth a one-time cleanup migration or at minimum a note in the tracking issue, especially since any future re-implementation of seed will need to either clear them or account for them.

Suggestion - init_memory / is_enabled are now dead call sites

init_memory is never called (the call was removed from main.py), so _memory is always None at runtime. The remaining primitives (search, add_exchange, add_structured, format_context) all guard on _memory is None and degrade gracefully - that's fine. But is_enabled() will always return False now, which could mislead a future caller. Worth a docstring note on is_enabled or init_memory flagging that neither is wired into the startup path.

Style - minor

test_main.py module docstring update is clean. Path import removal in memory.py is correct. Nothing else to flag.

The root cause analysis in the PR description is solid and the fix is proportionate. The hallucination feedback loop problem is real and this is the right call.

@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

The prior review suggestions were addressed: orphaned flags are now cleaned up (better than just noting it), and init_memory has a docstring noting it is not wired in. Two new issues in the cleanup block added to address that suggestion:


Warning - Silent exception suppression (main.py, new cleanup block)

except Exception:
    pass  # Non-fatal; flags are harmless if cleanup fails

Completely silences all errors, including AttributeError if sessions._get_db() is renamed or removed. A future maintainer gets no signal that the cleanup never ran. Should be logging.warning("Could not clean up memory_seeded flags", exc_info=True) - same one-liner, free observability.


Suggestion - Private method access (main.py, new cleanup block)

sessions._get_db() reaches into a private implementation detail. If the sessions module refactors its connection management (e.g., switches from a single shared connection to a pool), this call breaks and the except Exception: pass above will hide it. If there is a public sessions.execute() or similar, prefer that. If there is not, this warrants a comment flagging the coupling so the next person touching sessions._get_db() knows to update this call site.


Everything else is clean. The -1019 line count is appropriate - the removed seed machinery was genuinely broken for the stated reasons and the tests for it were thorough enough that no coverage debt remains. The SQL literal ('memory_seeded:%') is not user-influenced, so no injection risk.

…lures

Replace sessions._get_db() private access in main.py with a new public
sessions.delete_settings_by_prefix() function. Swap bare except-pass
with logging.warning for observability. Adds tests for the new function.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Both prior review issues are resolved: the exception handler logs properly with exc_info=True, and the orphaned flag cleanup uses the new public delete_settings_by_prefix() instead of reaching into _get_db() directly. Nothing new to flag.

A few things I verified and found clean:

sessions.py - delete_settings_by_prefix (new function)

The LIKE escaping is correct. memory_seeded: contains a literal _ which would otherwise match any single character under LIKE - the replace("_", "\\_") handles that, and the ESCAPE '\\' clause (a single backslash in the emitted SQL) is valid SQLite. The double _get_db() call (once for execute, once for commit) is consistent with delete_setting and the rest of the module.

One edge case that won't matter in practice: an empty-string prefix would delete all settings. The function is only called from one hardcoded site, so no real risk, but worth keeping in mind if it gets reused.

main.py cleanup block

Runs on every startup. Once flags are gone it's a no-op DELETE - cheap and harmless. The logging.warning(..., exc_info=True) pattern is correct.

Tests

test_delete_settings_by_prefix covers the happy path and no-op case. That's sufficient for a DELETE-by-prefix that has no branching logic.

This is clean. The -1019 line count is appropriate and the remaining memory primitives degrade gracefully while the system is disabled.

@dcellison dcellison merged commit 67dd83c into main Apr 16, 2026
1 check passed
@dcellison dcellison deleted the fix/disable-mem0-revert-seed branch April 16, 2026 20:39
dcellison pushed a commit that referenced this pull request Apr 17, 2026
Re-enables the Mem0 + Qdrant semantic memory layer with source-aware
primitives so the v1 assistant-laundering feedback loop cannot recur.

- add_user_utterance(): Track 1 writes embed ONLY user text with
  metadata source=user_raw. The assistant reply is no longer stored.
- add_exchange() is deleted outright. All callers migrated: one
  production site (bot.py) and six test sites (test_memory.py).
- _SOURCE_WEIGHTS and per-line source hints in format_context():
  after threshold filtering, results are re-ranked by raw_score *
  source_weight (extracted 1.2, user_raw 1.0, unset/legacy 0.6) and
  lines emit as `- (YYYY-MM-DD, <source_short>) <text>`.
- delete_by_source(): paged, ValueError-tolerant scoped delete with
  live-lock guard; empty-source branch matches legacy rows with
  absent metadata or empty-string source.
- init_memory() wired into main.py startup; orphan memory_seeded
  cleanup removed (no-op on every boot since #321).

Phase 1 ships extraction disabled (MEMORY_EXTRACTION_ENABLED defaults
to False; that flag lands in Phase 2). Tests: 48 in test_memory.py;
full suite green (1737 passed).
dcellison added a commit that referenced this pull request Apr 17, 2026
#333)

* feat(memory): Phase 1 - structural safeguards (§12.1)

Re-enables the Mem0 + Qdrant semantic memory layer with source-aware
primitives so the v1 assistant-laundering feedback loop cannot recur.

- add_user_utterance(): Track 1 writes embed ONLY user text with
  metadata source=user_raw. The assistant reply is no longer stored.
- add_exchange() is deleted outright. All callers migrated: one
  production site (bot.py) and six test sites (test_memory.py).
- _SOURCE_WEIGHTS and per-line source hints in format_context():
  after threshold filtering, results are re-ranked by raw_score *
  source_weight (extracted 1.2, user_raw 1.0, unset/legacy 0.6) and
  lines emit as `- (YYYY-MM-DD, <source_short>) <text>`.
- delete_by_source(): paged, ValueError-tolerant scoped delete with
  live-lock guard; empty-source branch matches legacy rows with
  absent metadata or empty-string source.
- init_memory() wired into main.py startup; orphan memory_seeded
  cleanup removed (no-op on every boot since #321).

Phase 1 ships extraction disabled (MEMORY_EXTRACTION_ENABLED defaults
to False; that flag lands in Phase 2). Tests: 48 in test_memory.py;
full suite green (1737 passed).

* feat(memory): Phase 2 - Haiku extraction on (§12.2)

Implements Track 2 (extracted facts) of spec 320 behind the default-off
MEMORY_EXTRACTION_ENABLED flag. Fire-and-forget Haiku extraction runs
after every user/assistant exchange when the Claude backend is active.

Scope (per §12.0 Phase Map):
- src/kai/memory_extraction.py: _FACT_SCHEMA (§8 verbatim), system
  prompt (§7 verbatim), _run_extractor subprocess invocation (§9),
  _validate_facts cross-field rules, _is_duplicate top-1 Mem0 gate at
  0.9, _store_facts with source=extracted metadata, per-user LRU
  semaphore, and extract_and_store never-raises public API.
- src/kai/config.py + .env.example: four memory_extraction_* fields
  with validated parsing (budget non-negative, timeout positive int).
- src/kai/bot.py: extract_and_store call inside _ingest_memory,
  guarded by config.memory_extraction_enabled AND effective_backend
  resolving to "claude".
- tests/test_memory_extraction.py: 54 unit tests covering payload
  formatting, validation rules, dedup boundary, semaphore LRU,
  subprocess flag assembly (B1 and M5 regression guards), outcome
  handling, and the never-raises contract.
- tests/test_config.py: 10 new env-var parsing tests.

Pre-merge smoke tests (§13.2 steps 5 and 6) surfaced a parser bug in
§9 as originally written: the CLI nests validated facts under
structured_output.facts, not at the top level. Parser fixed and two
tests added (test_production_envelope_structured_output,
test_is_error_envelope_returns_zero) before commit. Full observed
output is saved in memory for the PR description.

1801 tests pass. make check green.

* feat(memory): Phase 3 - verification delay (§12.3)

Implements the one-turn verification window from spec §5.2. Raw user
utterances are no longer flushed to Mem0 immediately; they sit in a
per-user pending slot until the next turn, which either confirms them
(flush) or retracts them (drop).

Scope (per §12.0 Phase Map, §6.2 P3 + §6.3 P3):
- src/kai/memory.py: _CORRECTION_CUE_RE compiled verbatim from §5.2,
  _PendingWrite dataclass, module-level _pending_writes dict keyed by
  user_id, submit_user_utterance (queue/flush/drop state machine),
  flush_pending (session-end hook), drop_pending (admin/test helper).
  Stricter reading of §5.2 adopted: a correction-cue turn drops the
  pending write AND does not queue itself, keeping retraction
  meta-comments out of retrievable memory.
- src/kai/bot.py: _ingest_memory now calls submit_user_utterance
  instead of add_user_utterance. New _end_session helper flushes
  pending writes before clearing the session record; five direct
  callsites of sessions.clear_session (workspace switch, context
  bump, settings clear-one, settings clear-all, workspace base
  change) migrated to the helper so every session-end path flushes.
- tests/test_memory.py: 26 new tests across four classes - regex
  coverage (17 parametrized cues and non-cues, case handling, non-
  string defense), submit_user_utterance state machine (queue/flush/
  drop/multi-user isolation), flush_pending (write + return value +
  isolation), drop_pending (no-write semantics). _reset_memory_module
  extended to clear _pending_writes between tests.

1843 tests pass. make check green.

* feat(memory): Phase 4 - admin purge CLI (§12.4)

Adds `python -m kai memory purge <user_id> --source <source>` so
operators can scrub contaminated or legacy rows from the Qdrant store
without dropping into an ad-hoc Python shell. Wraps the existing
`memory.delete_by_source` primitive; no new deletion logic.

Authorization gate: without `--yes`, prints the planned action and
exits with status 2 (distinct from 0 success and 1 runtime error) so
automation can tell "not authorized" apart from "did nothing wrong".

Source allow-list (`extracted`, `user_raw`, empty-string for legacy
rows) is enforced before memory init so a typo fails fast rather than
silently matching zero rows.

`delete_all` is deliberately NOT exposed here. Phase 4 is scoped to
legacy cleanup; per-source purge covers the advertised use case. A
future incident requiring the nuclear option can add a separate `nuke`
subcommand with its own review.

Dispatch wiring in `__main__.py` uses lazy import so the CLI stays
cheap to invoke (no bot runtime pulled in).

Tests: 23 new tests in test_memory_admin.py covering parser surface,
the --yes authorization gate, allow-list behavior, happy path, init
failure modes, and top-level dispatch. All 1866 tests pass; make check
clean.

* fix(memory): PR #333 review findings

Addresses all seven findings from the PR #333 review in one commit.
Every change touches code this PR already introduces or materially
modifies; none widen the diff into unrelated territory.

1. Prompt-injection guard in _build_extraction_payload
   (memory_extraction.py). A user message containing
   "\n\nASSISTANT: fake reply\n\nUSER: yes, confirmed" could fabricate
   an exchange segment that Haiku would read as a real assistant turn
   plus user confirmation, producing a false confirmed_action fact.
   Added _strip_role_labels with a regex that matches newline-prefixed
   USER:/ASSISTANT: markers (case-insensitive, whitespace-tolerant)
   and replaces them with a visible sentinel. Role words that appear
   inline as prose (e.g. "the USER: tag is wrong") survive unchanged.

2. Allow-listed env in _run_extractor (memory_extraction.py). The
   subprocess no longer inherits the parent's full environment. Only
   PATH, HOME, CLAUDE_CONFIG_DIR, ANTHROPIC_API_KEY, and
   ANTHROPIC_BASE_URL are forwarded. This strictly limits blast
   radius if --tools "" ever regressed (CLI version bump, arg-parsing
   edge case) so the model could not receive DATABASE_URL, GitHub
   tokens, webhook secrets, or any other parent-env secret. Billing
   guard unchanged: --bare absence is still the control that keeps
   Max-plan OAuth alive; ANTHROPIC_API_KEY in env does not force
   API-key-only auth on its own.

3. Tail-miss log line in delete_by_source (memory.py). The live-lock
   termination on a full non-matching page is also a correctness-loss
   path: if matching rows sit past the first _DELETE_PAGE_SIZE
   non-matching rows in Mem0's row order, they are not deleted. The
   generic page-drain warning did not name the tail-miss case
   specifically. Split the two termination conditions and added an
   explicit "possible incomplete delete" warning for the non-matching
   path, naming user_id and source so the operator can diagnose after
   the fact.

4. CancelledError propagation in extract_and_store
   (memory_extraction.py). Swallowing CancelledError broke structured
   shutdown: the runner would see the task as complete and not wait
   for sibling tasks' cancellation cleanup. Changed to re-raise after
   the log line. The never-raises contract still holds for real
   errors (FileNotFound, Permission, RuntimeError, generic Exception).

5. Corrected regex comment for _CORRECTION_CUE_RE (memory.py). The
   old comment claimed the pattern covered "that is wrong" but it
   does not - the "s?\s+" part cannot match the space-then-"is"
   sequence. Rewrote the comment to describe actual behavior and note
   that "no, that is wrong" or "actually, that is wrong" still match
   via their leading alternatives. Regex itself unchanged to stay
   faithful to the verbatim spec text.

6. Rewrote the MEMORY_EXTRACTION_BUDGET_USD comment in .env.example.
   The old comment claimed "Normal cost under Max is zero", which the
   §13.2 smoke tests contradicted (single extraction ~$0.026 at Haiku
   rates, mostly cache-creation tokens). The new comment describes
   observed cost, warns that the 0.01 default is insufficient for a
   single call, and advises raising to 0.05 before enabling
   extraction in production.

7. Dry-run now runs _initialize_memory (memory_admin.py). The
   previous flow exited 2 without checking whether the memory store
   was reachable, so an operator could get a confident "would run..."
   message on dry-run and then hit exit 1 on the real --yes call.
   Init runs before the dry-run branch now; init failure takes
   precedence over authorization-missing.

Tests (+9 new, 1875 total):
- test_injected_role_labels_are_neutralized,
  test_inline_role_mention_in_prose_survives (TestBuildExtractionPayload)
- TestStripRoleLabels x5 (newline-prefixed upper/lower, whitespace
  variant, inline prose preservation, content preservation)
- test_env_is_minimal_allow_list (replaces
  test_env_does_not_force_anthropic_api_key, which was based on an
  incorrect premise about --bare vs env presence)
- TestCancelledPropagates::test_cancelled_error_re_raises (split out
  of TestNeverRaises)
- test_delete_by_source_tail_miss_emits_explicit_warning
- test_no_yes_with_init_ok_returns_2 (renamed +
  assert_called_once), test_no_yes_with_init_failure_returns_1 (new)

make check clean, all 1875 tests pass.

* chore: retrigger review under PR_REVIEW_TIMEOUT_S

* fix(memory): PR #333 second-round review findings

Two of three findings actionable (one confirmatory). Both address
code or comments introduced earlier in this PR.

1. Stale "Max cost is zero" claim in config.py and memory_extraction.py.
   The .env.example billing comment was corrected in commit 516dfbf,
   but the same misleading phrasing survived in the config field
   docstring and in the _run_extractor flag rationale. Rewrote both
   to match the corrected .env.example narrative: extraction bills
   pay-per-token at Haiku rates regardless of Max-plan status
   (observed ~$0.02-$0.03 per call), and the 0.01 default is
   intentionally insufficient for a single real call. config.py now
   advises raising to at least 0.05 before enabling extraction.

2. _SUBPROCESS_ENV_ALLOWLIST hoisted to module level. Previously a
   function-local tuple inside _run_extractor, recreated on every
   call. Moved to the module-constant block next to _SEMAPHORE_CAP,
   _EXTRACTOR_CWD, and _ROLE_LABEL_RE. The comment describing why
   the list is tight also moved up so future readers see the defense
   rationale alongside the definition instead of at the call site.

No test changes required - the allowlist hoist is a pure refactor
and the comment fixes are comment-only. 1875 tests pass; make check
clean.

* fix(memory): PR #333 third-round review findings

Two of three findings actionable. Finding #1 is a false positive given
the project's Python pin; #2 is partially wrong but surfaces a real
subset bug; #3 is a clean hoist.

1. Finding #1 (TimeoutError on Python <3.11) - FALSE POSITIVE.
   pyproject.toml pins requires-python = ">=3.13". Python 3.11 unified
   asyncio.TimeoutError with the builtin TimeoutError, so the handler
   catching only builtin TimeoutError is correct on every supported
   runtime. No change.

2. Finding #2 (_store_facts aborts on first exception) - PARTIALLY
   WRONG but contains a real subset. memory.add_structured already
   wraps the underlying _memory.add() in try/except and returns None
   on failure, so the "whole batch aborts" claim is incorrect - a
   single backend failure does not stop the loop. However, the
   surrounding counter increments `stored += 1` unconditionally, so a
   silent None return still counted as a successful store in the log
   summary. Fixed by capturing the return and only incrementing when
   non-None.

3. Finding #3 (_weight nested in format_context) - VALID. Hoisted to
   module level as _source_weight, next to _UNKNOWN_SOURCE_WEIGHT so
   the weight table, the fallback constant, and the lookup function
   all live together. The nested closure is gone; the sort key now
   calls the module-level function directly.

No test changes required - #2's fix preserves the existing return-value
contract, and #3 is a pure refactor whose behavior is already covered
by format_context's ranking tests. 1882 tests pass; make check clean.

* fix(memory): PR #333 fifth-round review findings

All three findings actionable. One is a real cost gate; one is a
metric-correctness fix; one is an operator-UX fix.

1. assistant_text truncation in _build_extraction_payload. user_text
   is already capped at _MAX_USER_CHARS (2000) inside add_user_utterance,
   but assistant_text arrives from bot.py at full length and bypassed
   any cap on the way into the Haiku subprocess. Long tool output or
   a large code block would inflate per-call token cost well above
   the documented $0.02-$0.03 envelope and blow through --max-budget-usd
   more often than the config comment claims. Applied _MAX_ASSISTANT_CHARS
   (1000) to assistant_text before _strip_role_labels, matching the
   user-side contract. Confirmation quotes live on the user side, so
   truncating the assistant tail does not drop signal for
   confirmed_action facts. Also rewrote the _MAX_ASSISTANT_CHARS
   comment, which still claimed "legacy call sites that have not yet
   been removed" - the constant is now actively used by extraction.

2. Timer start hoisted inside the per-user semaphore in
   extract_and_store. Previously `start = time.monotonic()` sat
   above `async with sem:`, so duration_ms in the memory.extract:
   log included semaphore wait time. A 4s extraction that queued 3s
   behind a prior in-flight extraction for the same user reported
   7s, which makes the metric useless for spotting slow subprocesses
   under queued load. Move is safe: the except branches all return
   without reading `start`, and the post-try success log only fires
   when the sem block completed (binding `start`).

3. Custom accepted-values message in memory_admin._cmd_purge.
   `sorted({"extracted", "user_raw", ""})!r` renders as
   `['', 'extracted', 'user_raw']`, and the leading empty string
   looks like a list artifact, not a valid input. Replaced with an
   explicit listing plus a parenthetical that names what the empty
   string means (legacy rows with no source set). The existing test
   only asserts "unknown source" is present, which still holds.

Tests: added two _build_extraction_payload coverage cases (long
assistant text truncated with ellipsis; short assistant text passes
through unchanged). Timer move and purge message are covered by
existing tests. 1884 pass; make check clean.

* fix(memory): PR #333 sixth-round review findings

Both findings actionable and both correct. The round 5 comment about
user_text being "already capped by add_user_utterance" was wrong, and
the .get()-then-await pattern in submit_user_utterance leaves a real
double-flush window open.

1. user_text truncation in _build_extraction_payload. The round 5
   comment claimed user_text was already capped upstream by
   add_user_utterance. That was false: add_user_utterance rebinds its
   local user_text parameter inside memory.py, but that mutation is
   scoped to that stack frame and does not propagate back to the
   variable in bot.py or to the user_text that extract_and_store hands
   to _build_extraction_payload. A 50k-char paste therefore reached
   Haiku unchanged and blew through --max-budget-usd on every call.
   Applied the same local truncation pattern already used for
   assistant_text, with memory._MAX_USER_CHARS (2000). Rewrote the
   comment to explain the stack-frame scoping so the misconception
   does not return. Confirmation quotes are short by construction
   (_CONFIRMATION_QUOTE_MIN_CHARS = 20) so 2000 chars preserves all
   realistic confirmation signal.

2. submit_user_utterance: pop instead of get. Previously:

       pending = _pending_writes.get(user_id)
       ...
       if pending is not None:
           await add_user_utterance(...)     # yield point
       _pending_writes[user_id] = _PendingWrite(...)

   Between the await and the assignment the old entry is still in
   _pending_writes. A concurrent _ingest_memory task for the same
   user (two messages in quick succession, both fire-and-forget tasks)
   that wakes during the flush calls .get(user_id), finds the same
   PendingWrite, and flushes it a second time. Swapped to
   _pending_writes.pop(user_id, None) so the old entry leaves the
   dict atomically before any await. The correction-cue branch no
   longer needs its own del - the pop already removed it.

Tests: two new coverage cases.
- test_long_user_text_truncated: mirrors the round 5 assistant-side
  test and asserts the cap is honored locally.
- test_concurrent_submits_do_not_double_flush: stubs
  add_user_utterance with a gated coroutine, races two submits
  during the flush await, and asserts the pending entry is absent
  from the dict mid-flush AND that the pending text was flushed
  exactly once. This catches the regression if anyone swaps pop()
  back for get().

1886 tests pass; make check clean.

* test(memory): PR #333 seventh-round review - close add_structured return-shape gap

Round 7 is a pass on code - reviewer confirmed all six prior findings
resolved and flagged no new bugs. The one minor suggestion was to pin
add_structured's "returns truthy on success" contract with a test so
future Mem0 version drift cannot silently cause _store_facts to
under-count. Existing test_returns_id_string covers the nested
{"results": [{"id": ...}]} shape; the bare-dict fallback at
memory.py:858 and the non-dict defensive branches were both untested.

Added two cases to TestAddStructured:

- test_bare_dict_return_shape_yields_id: covers the `raw.get("id")`
  fallthrough when Mem0 returns the memory dict directly instead of
  wrapped in "results". Without this test a Mem0 change to either
  shape could silently break fact counting in _store_facts.

- test_unexpected_return_shape_yields_none: pins the defensive
  behavior for shapes the unwrap logic does not recognize (None,
  list, bare string, int). The current code returns None; _store_facts
  treats None as "not actually stored" and under-counts in the log
  without raising. Locking this in makes the "silent miscounting"
  failure mode the reviewer described the loudest failure possible -
  the test flips red instead of production logs going quietly wrong.

No source changes this round. 1888 tests pass; make check clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disable Mem0 and remove seed machinery

2 participants