Skip to content

feat(memory): /memory Telegram command (spec 310)#337

Merged
dcellison merged 11 commits intomainfrom
feature/memory-command-310
Apr 18, 2026
Merged

feat(memory): /memory Telegram command (spec 310)#337
dcellison merged 11 commits intomainfrom
feature/memory-command-310

Conversation

@dcellison
Copy link
Copy Markdown
Owner

Summary

Implements spec 310: a Telegram-facing /memory command surface for browsing, searching, and forgetting Haiku-extracted facts. Closes #310.

Four focused commits:

  • C1 (6357894) Add MEMORY_SEARCH_FLOOR env var (default 0.3) and consolidate the prior _MIN_RELEVANCE_THRESHOLD constant into it. One knob, two paths: both format_context (retrieval) and /memory search (UI) read the same value at every invocation, so a config edit + restart produces consistent behavior across both paths.

  • C2 (99f9859) Pure-additive helpers and stats extensions in memory.py: get_by_tag, delete_by_id, a limit kwarg on get_all, plus nine extracted-only aggregate fields on MemoryStats. The new aggregates default to "no data" sentinels so the legacy two-arg construction still works. delete_by_id does a verify-before-delete on both user_id and metadata.source == "extracted" as defense-in-depth, even though Mem0's delete already scopes by user.

  • C3 (1323760) New module src/kai/memory_command.py plus 64 unit tests. Pure builders return (text, keyboard) tuples; top-level handlers own the I/O. Per-chat ephemeral cache with lazy 30-minute TTL and single-entry-per-chat semantics keeps every callback under Telegram's 64-byte limit (memory ids are referenced by index, never embedded). _TAG_ENUM mirrors the extraction schema with a drift-guard test so future schema changes can't silently exclude tags from /memory stats.

  • C4 (fd3663c) Wire the new handlers into bot.py and add a five-line /memory block to /help.

UX

Plain text rendering throughout (spec §6.0); user-originated content like fact text and confirmation_quote routinely contains Markdown-reserved characters and rendering them as MarkdownV2 would corrupt the display. Forget flows use answerCallbackQuery toasts rather than edit-then-sleep-then-edit, keeping the event loop free.

Dashboard hides zero-count tags but stats screen surfaces them as a tuning signal (the spec §6.1 asymmetry, enforced by builders, not the data layer).

Test plan

  • pytest tests/test_memory_command.py (64 new tests, all passing)
  • pytest tests/test_memory.py tests/test_config.py (full memory + config suites green; 244 passing, 9 deselected for Mem0 lock)
  • Full suite: 1977 passed, 9 deselected
  • make check clean (ruff lint + format)
  • Manual smoke test against running bot once merged: /memory, /memory search <q>, /memory stats, tag drill-down + back, fact view + forget confirmation

Notes

  • No feature flag: /memory only reads or deletes the caller's own memories (spec §9). Safe to expose by default.
  • No bot-wiring tests added; test_bot.py covers pure helpers only and the import smoke check verifies registration. The C3 suite already exercises every handler path through monkeypatched memory primitives.
  • Spec 335 (Track 1 removal) ships separately. The source == "extracted" filter on every read/write path makes this PR correct regardless of Remove Track 1 (raw-turn embedding) from memory system #335 ordering.

Promote the hard-coded `_MIN_RELEVANCE_THRESHOLD = 0.3` constant in
memory.py to a config field (`Config.memory_search_floor`, env var
`MEMORY_SEARCH_FLOOR`, default 0.3). This is the "one knob, two paths"
groundwork from spec 310: the same floor will govern both the
context-injection path (`format_context`) and the upcoming `/memory
search` UI in `memory_command.py`, so a config change applies
consistently everywhere Kai consults memory.

`format_context` now reads `_config.memory_search_floor` at every call
rather than a module-level constant, per the spec's requirement that
the value be sourced fresh so a runtime config edit + restart takes
effect without divergence between the two paths.

Validation: float in [0.0, 1.0] inclusive (Mem0 cosine similarity is
normalized to that range; out-of-range values would silently filter
everything or nothing). Same try/except SystemExit pattern as the
other memory_* numeric vars.

Defaulting to 0.3 means this commit is behaviorally a no-op in
production: the new field's default matches the constant it replaces.

Note on the env-var pattern: I followed the existing memory family
convention (document in .env.example only) rather than adding an
`install.py` `_cmd_config` prompt. None of the 8 prior `MEMORY_*` vars
have install prompts; adding one only for this var would be
inconsistent with the rest of the family.

Tests: 7 new TestMemorySearchFloor cases for the env var (default,
override, boundaries, rejections); 1 new TestFormatContext case that
pins the config-driven read (a row at score 0.4 passes when floor=0.3
but is filtered when floor=0.5). Updated _make_config helper to
include the new field default. All 1887 unit tests pass.

Refs: home/specs/310-memory-command.md §7.5
…7.2)

Adds the data-layer surface the /memory command (commit C3) will call.
Pure additions; no existing call sites touched.

memory.py:
- MemoryResult gains updated_at (defaulted to "") so the tag view can
  sort by most-recent-update without a second round-trip
- MemoryStats gains nine extracted-only aggregate fields used by the
  /memory stats screen: extracted_count, by_tag, confidence_min/median
  /max (None when no extracted rows; median uses median_low semantics
  so the value is always one that exists in the data), confidence
  _below_0_7 / _below_0_6 cliff counts, confirmation_quote_count, and
  by_prompt_version. All default-valued so legacy two-arg construction
  still works
- get_all gains a limit kwarg (default 1000, None for unlimited via a
  100k Mem0 ceiling) so the /memory dashboard pagination has a knob
- get_by_tag(*, user_id, tag): post-filters Mem0 results to source==
  extracted AND tag in metadata.tags, sorted by updated_at desc
- delete_by_id(*, user_id, memory_id): two-step verify (user_id match
  AND source==extracted) before delegating to Mem0's unscoped
  Memory.delete; ValueError on missing row is swallowed and reported
  as False so the caller can present a clean "not found" message

tests/test_memory.py:
- 26 new tests across TestGetAllLimit (3), TestGetByTag (6),
  TestDeleteById (8), TestGetStatsExtended (9). Verify scoping rules,
  default sentinels, sort order, and the verify-before-delete guard
New module src/kai/memory_command.py implementing the full /memory
Telegram command surface: dashboard, tag drill-down with pagination,
fact detail, semantic search, stats, and confirmed forget flows
(single fact and bulk by tag). Bot wiring for the new command and
callback handlers is intentionally deferred to commit C4 so this
commit can land with a focused test suite.

Architecture:
- Pure builders (_build_dashboard, _build_tag_view, _build_fact_view,
  _build_search_results, _build_stats, _build_forget_*_confirm)
  take fixture data and return (text, keyboard) tuples. No I/O, no
  state. Lets the unit tests assert on rendering without standing
  up Update/Bot fixtures.
- Top-level handlers (handle_memory_command, handle_memory_callback)
  own the I/O: fetch from memory.py, call the builders, dispatch
  send or edit calls.
- Module-level _screen_cache holds per-chat navigation state with
  lazy 30-minute TTL expiry on every access. Single-entry-per-chat
  semantics (spec 310 §7.4): a second /memory invocation overwrites
  the prior cache, and stale fact-button taps trip the
  session-expired branch instead of resolving wrong ids.
- Callback grammar mem:<verb>[:<arg>...] keeps every legitimate
  callback under 64 bytes; memory ids are never embedded, only
  referenced by integer index into the cache.

UX details from spec:
- Plain text rendering throughout (spec §6.0); user-originated
  content like fact text and confirmation_quote routinely contains
  Markdown-reserved characters and rendering them as MarkdownV2
  would either corrupt the display or require escaping every
  dynamic string.
- Tag dashboard hides zero-count tags; stats screen shows them as
  a tuning signal (the spec §6.1 asymmetry, enforced by the
  builders not the data layer).
- Search applies the floor in the UI path the same way
  format_context does in the retrieval path (spec §7.5: one knob,
  two paths).
- Forget flows use callback toasts via answerCallbackQuery rather
  than edit-then-sleep-then-edit, keeping the event loop free.

tests/test_memory_command.py: 64 unit tests across 12 classes
covering callback round-trips, every pure builder, pagination edge
cases (empty, exact boundary, partial last, out-of-range clamping),
all empty-state branches, the 30-minute cache TTL, and a drift
guard that asserts _TAG_ENUM stays in sync with the extraction
schema. No real Mem0 instance; handler tests monkeypatch the
memory.py read and delete helpers per spec §10.3.
Registers the two handlers from src/kai/memory_command.py with the
Telegram application:
- CommandHandler("memory") -> handle_memory_command, dispatching the
  subcommand surface (no-args dashboard, search, stats, forget, help).
- CallbackQueryHandler(pattern=r"^mem:") -> handle_memory_callback
  for inline-keyboard taps (tag drill-down, fact view, forget
  confirmations).

The pattern is exclusive by prefix with the other registered
callback handlers (^model:, ^voice:, ^ws:), so dispatch is
unambiguous.

Updates the /help text with a five-line /memory block placed after
the /github group, matching its data-management role.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

The _is_authorized reimplementation matches bot.py exactly - no auth bypass risk there.


PR Review: feat(memory): /memory Telegram command (spec 310)

Overall this is a well-structured PR with thorough tests and good defensive coding. A few real issues worth addressing:


Warning

1. assert used for runtime validation in _encode_callback (memory_command.py:228)

assert len(encoded.encode("utf-8")) <= 64, f"callback_data too long: {encoded!r}"

assert statements are stripped when Python runs with -O (optimize flag). If the service ever starts with optimization enabled, Telegram would silently truncate over-limit callback data, producing confusing runtime failures. Replace with a proper if/raise:

if len(encoded.encode("utf-8")) > 64:
    raise ValueError(f"callback_data too long: {encoded!r}")

2. _avg_confidence is dead code (memory_command.py:323)

The function is defined and documented but never called. _build_dashboard reads stats.confidence_median directly:

median = stats.confidence_median  # line ~395

Either remove _avg_confidence or route _build_dashboard through it. As-is it creates misleading indirection for future readers.


3. ftd verb accepts unvalidated tag from callback data (memory_command.py:828)

tag = args[0]
facts = memory.get_by_tag(user_id=str(chat_id), tag=tag)

An authorized user who crafts mem:ftd:arbitrary-string bypasses the /memory forget <tag> confirmation flow entirely. The behavior is safe (returns empty results → "Forgot 0 facts."), but for a known tag it allows skipping the "This cannot be undone" confirmation screen. A one-line guard closes it:

if tag not in _TAG_ENUM:
    await query.answer(_MSG_SESSION_EXPIRED)
    return

Suggestion

4. message_id stored but never read (memory_command.py throughout)

_ScreenCache.message_id is written in _send_or_edit (sent.message_id) but nothing in the module ever reads it. The edit path uses callback_query.edit_message_text which doesn't need it. If this is scaffolding for a future "single message per chat" design, note it; if not, remove the field and the write to keep the cache shape honest.

5. context parameter on _send_or_edit is unused (memory_command.py:1057)

The function accepts context: ContextTypes.DEFAULT_TYPE | None but never references it. Several callers already pass None. Drop the parameter - it's API surface with no behavior.

6. Two full-corpus fetches per forget action

_send_fact_view calls get_all(limit=None) to resolve a memory_id, then _send_forget_fact_confirm calls get_all(limit=None) again to fetch the same fact. For a user with thousands of extracted facts, that's two 100k-cap queries to find one row. A narrow get_by_id wrapper in memory.py (using the already-present _memory.get(memory_id=...) call from delete_by_id) would make both callers O(1).

7. Misleading confidence display when all confidence values are malformed (memory_command.py:736)

In _build_stats, when extracted_count > 0 but every row has a non-numeric confidence, stats.confidence_min/median/max are all None and the display falls back to 0.00. The 0.0 sentinel is indistinguishable from a true minimum score of zero. Using "n/a" as the fallback string (instead of 0.0) would match the existing behavior in the fact detail view.


No issues found with the data-layer changes in memory.py, the config validation, or the test coverage. The delete_by_id ownership checks are correct and the TOCTOU window on the forget-count label is cosmetic only (delete re-fetches the live list).

Seven follow-ups from the PR #337 review:

1. _encode_callback: replace `assert` with `if/raise ValueError` for
   the 64-byte ceiling. assertions are stripped under `python -O`,
   which would let an over-limit callback silently truncate at the
   Telegram boundary.

2. Drop the dead `_avg_confidence` helper. It was a one-line wrapper
   that just returned `stats.confidence_median`; the dashboard
   builder now references the field directly.

3. Validate the tag arg in the `ftd` callback handler against
   `_TAG_ENUM` before calling get_by_tag/delete_by_id. Mirrors the
   gate already present on the /memory forget text path; closes a
   crafted-callback / version-skew gap.

4. Remove the unused `message_id` field from `_ScreenCache` and the
   write site in `_send_or_edit`. Was set on fresh sends but never
   read.

5. Drop the unused `context` parameter from `_send_or_edit` and
   update its 15 call sites. Helper-layer signatures keep `context`
   per python-telegram-bot handler convention.

6. Add `memory.get_by_id(*, user_id, memory_id)` with the same
   ownership/source scoping as `delete_by_id`, then refactor
   delete_by_id to call it. Wire `_send_fact_view` and
   `_send_forget_fact_confirm` to use it - O(1) Mem0 get vs the
   prior `get_all` + Python-side filter on every fact-view tap.

7. Stats screen confidence min/median/max: render "n/a" when the
   underlying field is None instead of "0.00". A real 0.00 reading
   would be indistinguishable from missing data otherwise.

Tests added:
  - TestCallbackCodec.test_overlong_callback_raises (#1)
  - TestCallbackDispatch.test_ftd_rejects_unknown_tag (#3)
  - TestGetById class with 7 cases covering all four not-found
    branches, ownership, source, fetch error, happy path (#6)
  - TestBuildStats.test_renders_n_a_for_missing_confidence (#7)
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Looking at only new issues; prior-review items that were addressed (assertion→raise, ftd validation, two-fetch pattern, context parameter) are not re-raised.


Warning

1. tag verb in callback handler accepts unvalidated tag (memory_command.py, handle_memory_callback verb == "tag" block)

tag = args[0]
# no _TAG_ENUM check
await _send_tag_view(update, context, chat_id, tag, page, edit=True)

ftd got the _TAG_ENUM guard in this PR, but the read-only tag verb did not. A crafted mem:tag:arbitrary-string:0 callback reaches memory.get_by_tag with an off-enum string. The call is safe today (returns empty, renders "No memories with this tag."), but it silently accepted by a path that is documented as operating only over enum-defined tags, and the inconsistency with ftd is a maintenance trap. One-line fix mirrors the ftd guard:

if tag not in _TAG_ENUM:
    await query.answer(_MSG_SESSION_EXPIRED)
    return

Suggestion

2. _send_or_edit silent no-op when edit=True but callback_query is None (memory_command.py:1057 area)

if edit and update.callback_query is not None:
    ...  # only branch that sends when edit=True

if not edit:
    ...  # not reached when edit=True and callback_query is None

When edit=True and callback_query is None, neither branch fires and nothing is sent. This path is unreachable today (all edit=True callers are inside handle_memory_callback where callback_query is guaranteed), but the silence is a trap for future callers of the _send_* helpers. An assert update.callback_query is not None at the top of the if edit branch, or a final else: raise after both branches, would surface the contract violation immediately instead of producing a ghost action.


Clean

Data layer (memory.py additions), config validation, get_by_id ownership scoping, the ftd enum guard, and the _encode_callback hardening all look correct. Test coverage for the new helpers is thorough.

Two follow-ups from the second review pass on PR #337:

1. The read-only `tag` callback verb was missing the `_TAG_ENUM`
   guard that round 1 added to the `ftd` verb. A crafted
   `mem:tag:arbitrary-string:0` callback would reach
   memory.get_by_tag with an off-enum tag. Safe today (returns
   empty), but inconsistent with `ftd` and the /memory forget
   text path, both of which validate. Mirror the same gate.

2. `_send_or_edit` had a silent no-op branch when `edit=True` but
   `update.callback_query is None`. Unreachable today (all
   `edit=True` callers live in `handle_memory_callback` where the
   invariant holds), but a future caller wiring up the `_send_*`
   helpers from outside the callback path would get a ghost
   action with no diagnostic. Raise instead of asserting so the
   gate survives `python -O` (same reasoning as the
   `_encode_callback` ceiling fix in round 1).

Tests added:
  - TestCallbackDispatch.test_tag_verb_rejects_unknown_tag (#1)
  - TestSendOrEditContract: 3 cases covering the raise, the
    legitimate edit path, and the fresh-send path (#2)
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Review: feat(memory): /memory Telegram command (spec 310)

Prior review items (assertion→raise, ftd enum guard, two-fetch pattern, context parameter, tag verb guard, _send_or_edit silent no-op) are all correctly addressed. The data layer additions in memory.py are solid. A few new observations on the changed code:


Suggestion

1. confidence_below_0_7 / confidence_below_0_6 display when all confidence values are malformed (memory_command.py, _build_stats)

The prior review caught that confidence_min/median/max should render "n/a" instead of 0.00 when no numeric confidence values exist - that fix is in. But confidence_below_0_7 = 0 still displays as 0 (0.0%) in the same state, which reads as "all facts scored ≥ 0.7" rather than "no confidence data". The "n/a" on min/median/max already signals the data issue, so this is low priority, but a parenthetical like (n/a) would be more honest when len(confidences) == 0.

2. assert update.message is not None in _send_forget_tag_confirm (memory_command.py, two spots)

except Exception as exc:
    ...
    assert update.message is not None          # stripped by -O
    await update.message.reply_text(_MSG_QUERY_FAILED)

These follow the bot.py convention, but they're inconsistent with this module's own _send_or_edit, which was explicitly hardened to use raise ValueError over assert for the same -O reason. Since _send_forget_tag_confirm is only called from the command handler where message is guaranteed, this won't bite in practice, but the asymmetry is a maintenance trap.

3. _set_cache TTL semantics comment vs. constant name

_CACHE_TTL_S is named and documented as "30 minutes", but _set_cache resets created_at on every write, making it an idle timeout rather than an absolute session timeout. The constant name and the .env.example comment imply absolute; the implementation delivers idle. Worth clarifying in the constant's docstring.


Clean

get_by_id ownership check, get_by_id-based O(1) fact fetch, ftd/tag enum guards, _encode_callback raise-vs-assert, _send_or_edit contract enforcement, config validation range checks, and the source == "extracted" filter layering all look correct. The 64-byte callback ceiling is comfortably met by every encoded callback in the visible diff (mem:tag:confirmed_action:99 = 26 bytes). Auth scoping is consistent with bot.py.

Three suggestions from the third review pass on PR #337:

1. `_build_stats` below-threshold rows now render `(n/a)` instead
   of `(0.0%)` when no confidence data exists. Mirrors the n/a
   fallback added to min/median/max in round 1: a count of 0 with
   "0.0%" reads as "all facts scored above threshold" rather than
   "no data", which contradicts the n/a labels right above it.

2. `_send_forget_tag_confirm` now uses `if/raise ValueError`
   instead of two `assert update.message is not None` checks.
   Matches the same `python -O` survival reasoning applied to
   `_send_or_edit` and `_encode_callback` in the prior rounds.
   Hoisted the check to a single early guard so the function body
   needs only one narrowing point for the type-checker.

3. `_CACHE_TTL_S` docstring now states the timeout is *idle*
   (reset on every write), not absolute. The constant name and
   spec reference both implied a hard 30-minute session ceiling;
   the implementation actually resets `created_at` on every
   `_set_cache`, so an active user keeps the cache alive
   indefinitely. Doc-only fix.

Tests added:
  - TestBuildStats.test_renders_n_a_for_missing_confidence
    extended to assert (n/a) on the below-threshold rows (#1)
  - TestBuildStats.test_renders_percentages_when_confidence_present
    regression guard against the n/a fallback over-firing (#1)
  - TestSendOrEditContract.test_send_forget_tag_confirm_raises_without_message
    covers the new contract gate (#2)
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Based on my analysis of the diff directly (the code doesn't exist locally - this is a pre-merge review):


Review: feat(memory): /memory Telegram command (spec 310)

Prior items (assert→raise, ftd/tag enum guards, get_by_id O(1) path, context param, _send_or_edit contract, confidence n/a display, _set_cache TTL comment) are all correctly addressed. The data layer additions in memory.py are solid.


Warning

1. Double query.answer() in the outer error handler (memory_command.py, handle_memory_callback)

Every verb block calls query.answer() before its _send_* call. If the subsequent _send_* raises (e.g., a network error escaping _send_or_edit's fallback path, or the assert update.effective_chat is not None firing), the outer except handler calls query.answer(_MSG_QUERY_FAILED) on an already-answered query. Telegram rejects the second call, raising another exception inside the error handler itself - an unhandled exception that PTB logs but the user never sees.

Fix: track whether the query has been answered, or move query.answer() to after the risky operation:

# Before:
await query.answer()
await _send_dashboard(update, context, chat_id, edit=True)

# After:
await _send_dashboard(update, context, chat_id, edit=True)
await query.answer()

The ftd verb is the one exception where the answer carries a count - that one could call query.answer() last too, after the delete loop and dashboard refresh complete.


Suggestion

2. ftd bulk delete: N extra Mem0 get calls (memory_command.py, handle_memory_callback, verb == "ftd" block)

facts = memory.get_by_tag(user_id=str(chat_id), tag=tag)
for fact in facts:
    if memory.delete_by_id(...):
        deleted += 1

get_by_tag already fetched and verified ownership + source for every row in facts. Each delete_by_id call re-fetches the same row via get_by_id (another Mem0 get round-trip) before deleting. For N facts, that's N redundant fetches on top of the initial corpus fetch. For typical tag sets this is invisible, but for a user who built up 200 preference facts the extra 200 get calls add meaningful latency. An internal _delete_verified path that skips re-verification for rows already in hand would avoid this; alternately, the ftd loop could call _memory.delete(memory_id=fact.id) directly and catch ValueError for the TOCTOU race, since get_by_tag already did the authorization work.

3. _build_stats percentage denominator when mixed confidence data (memory_command.py, _build_stats)

pct_07 = (stats.confidence_below_0_7 / stats.extracted_count) * 100

confidence_below_0_7 counts only rows with numeric confidence below 0.7. extracted_count counts all extracted rows including those whose confidence field is non-numeric (skipped by get_stats). When a fraction of rows have malformed confidence, the percentage is understated relative to rows-with-data. The has_confidence guard already handles the all-None case cleanly; this is the partial case. Low priority - requires deliberately malformed rows to manifest.

4. Stale keyboard after _send_or_edit fallback-to-send (memory_command.py, _send_or_edit)

When edit_message_text raises a non-"not modified" BadRequest, the function logs and falls through to send_message. This sends a new message but leaves the original message with its inline keyboard still visible in chat. Buttons on the stale message still trigger callbacks, which will produce odd navigation (double-dashboard, etc.). Wrapping the original message's keyboard with an empty edit before the fallback send would clean this up, though it adds a second Telegram call on what is already an error path.


Clean

get_by_id ownership/source scoping, delete_by_id TOCTOU handling, the has_confidence n/a guard, callback size ceiling, _encode_callback hardening, cache TTL semantics comment, tag/ftd enum parity, and the full config validation chain all look correct. Auth scoping is consistent with bot.py.

…d on fallback

Round 4 review fixes (PR #337).

#1 Double-answer bug: callback handlers previously called
query.answer() BEFORE _send_*. If the send raised, the outer
except would call query.answer(_MSG_QUERY_FAILED), which Telegram
rejects as BadRequest because the query was already answered,
producing an unhandled exception in the error path itself.

Fix: every dispatch branch now calls _send_* first, then
query.answer(). The outer except is therefore guaranteed to be
answering an unanswered query, so its single answer call is
always valid. The visual cost is the loading spinner persists
slightly longer (until the send returns), which is invisible at
typical Mem0 latencies.

#4 Stale keyboard on fallback: when edit_message_text fails
(BadRequest other than "not modified") we fall back to sending a
fresh message. The original message's inline keyboard remained
visible in chat history, so a user tap would fire a callback
against state we have moved past, producing ghost navigation.

Fix: best-effort edit_message_reply_markup(reply_markup=None)
strips the buttons before the fresh send. Wrapped in inner
try/except since the original message may already be gone or
immutable - this is already an error path and the fresh send
is what matters.

Tests: regression for #1 uses the `ftd` verb (whose Mem0 call
escapes the dispatcher with no inner guard) and asserts
query.answer is awaited exactly once with _MSG_QUERY_FAILED.
Two new tests for #4 cover the happy fallback path
(reply_markup=None called) and the swallowed-error path
(BadRequest from clear is not propagated).
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Looking only at code that changed in this diff; prior-review items not re-raised.


Warning

1. _send_search does not filter by source == "extracted" (memory_command.py, _send_search)

results = memory.search(query, user_id=str(chat_id), limit=_SEARCH_LIMIT)
filtered = [r for r in results if r.score >= floor]

The floor is applied but memory.search() searches across all sources. Every other read path in this module goes through get_by_tag or get_by_id, both of which enforce source == "extracted". Search does not. Non-extracted rows (Track 1 exchanges, legacy) appear in the results list; when the user taps one, get_by_id returns None (source mismatch) and the fact view shows "This memory no longer exists." - confusing for a result that was just displayed. The module docstring's claim that source filtering "lives in exactly one place" implicitly assumes search is out of scope, but that's not documented.

Fix: post-filter in _send_search:

filtered = [r for r in results if r.score >= floor and r.metadata.get("source") == "extracted"]

Suggestion

2. assert update.effective_chat is not None in _send_or_edit is inconsistent with module hardening

if not edit:
    assert update.effective_chat is not None   # stripped by -O
    await update.effective_chat.send_message(...)

Every other defensive check in this module uses if/raise specifically because -O strips assertions (_encode_callback, _send_or_edit's own edit=True guard, _send_forget_tag_confirm). This one was missed. Replace with:

if update.effective_chat is None:
    raise ValueError("_send_or_edit: effective_chat is None")

3. Grammar: "confirm forget 1 facts" (memory_command.py, _build_forget_tag_confirm)

f"confirm forget {count} facts"

When count == 1 this renders "confirm forget 1 facts". Use "fact" if count == 1 else "facts".


Data layer additions (get_by_tag, get_by_id, delete_by_id), config validation, answer-after-send ordering, stale keyboard cleanup, _encode_callback hardening, and enum guards all look correct.

…orget-tag confirm

Round 5 review fixes (PR #337).

#1 Source filter on search: every other read path in memory_command
(get_by_tag, get_by_id) scopes to source=="extracted", but
memory.search() is a Mem0 vector lookup that spans all sources
(Track 1 exchanges, legacy rows). A non-extracted hit could appear
in results, then fail get_by_id's source check on tap and render
"This memory no longer exists." for a row the user just saw.

Fix: post-filter in _send_search alongside the score floor.
The filter belongs in the UI helper (not at the data layer)
because Mem0's search API has no source predicate; this is the
narrowest place that owns the read.

#2 effective_chat assert -> if/raise: the fresh-send branch's
None-guard was an `assert`, which `python -O` strips. Every
other defensive check in this module uses if/raise (
_encode_callback, the edit=True guard above, _send_forget_tag_confirm).
This branch was missed in the round-2 sweep.

#3 Pluralization: count==1 used to render "1 facts" / "1 memories"
/ "confirm forget 1 facts". A tag with one surviving member is
a real case (deletes whittle the count). Singular when count==1
across all three sites (header, body, button label) so a future
edit can't regress any one of them in isolation.

Tests: source-filter test asserts the cache memory_ids list omits
non-extracted hits even when they clear the score floor; the
ValueError test confirms the new effective_chat guard fires; the
pluralization tests cover both count==1 (singular across all
three sites) and count==2 (plural everywhere) so the singular
branch can't silently catch the plural case.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Looking only at code present in this diff; all prior-review items are confirmed addressed.


Review: feat(memory): /memory Telegram command (spec 310)

The PR is in good shape. Five rounds of review have cleared every substantive issue. One new finding and one clean call:


Suggestion

_chat_id and _user_id use assert inconsistently with the module's -O hardening (memory_command.py, bottom of file)

def _chat_id(update: Update) -> int:
    chat = update.effective_chat
    assert chat is not None   # stripped by python -O
    return chat.id

Every other defensive check in this module was explicitly converted to if/raise for -O safety (_encode_callback, _send_or_edit, _send_forget_tag_confirm). These two helpers and the assert update.message is not None at the top of handle_memory_command were missed. The risk is low: PTB routing guarantees these are non-None for CommandHandler and CallbackQueryHandler. But under -O, the assert strips and any unexpected None produces AttributeError instead of a meaningful ValueError, violating the module's own stated convention.


Clean

source == "extracted" filter in _send_search, _send_or_edit effective_chat guard, _encode_callback raise-vs-assert, double-query.answer() ordering, get_by_id O(1) path, _set_cache TTL comment, tag/ftd enum parity, _build_forget_tag_confirm singular/plural, config validation range checks, and the has_confidence n/a guard all look correct. No security issues found; auth scoping is consistent with bot.py, and the ownership/source double-check in delete_by_id is structurally sound.

Round 6 review fix (PR #337).

The reviewer named three remaining `assert` sites that the prior
sweeps missed:
- `_chat_id` helper
- `_user_id` helper
- top-of-handler `assert update.message is not None` in
  `handle_memory_command`

Per the audit-all-sites rule, also converted the same pattern in
`handle_memory_callback` (callback_query and query.data asserts)
in the same pass, so a future review round doesn't have to flag
the same anti-pattern a fourth time. The module now contains zero
`assert` statements; every contract gate uses `if/raise` and
survives `python -O`.

The risk is low in current use (PTB routing guarantees these
fields for CommandHandler / CallbackQueryHandler callbacks), but
the convention matters: a future caller wiring these helpers
from a non-routed path under optimized bytecode would see
`AttributeError` on the next access instead of a meaningful
`ValueError` naming the violated contract.

Tests: four new cases under `TestSendOrEditContract` cover each
guard (`_chat_id`, `_user_id`, `handle_memory_command` without
message, `handle_memory_callback` without callback_query).
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Review: feat(memory): /memory Telegram command (spec 310)

All prior-review items are addressed. One new warning and one suggestion on code not previously flagged.


Warning

_HELP_TEXT passed to query.answer(show_alert=True) exceeds Telegram's 200-char limit (memory_command.py, _HELP_TEXT constant and verb == "help" branch)

_HELP_TEXT = (
    "/memory              Browse by tag\n"          # 35 chars
    "/memory search <q>   Semantic search\n"        # 37 chars
    "/memory stats        Counts and confidence distribution\n"  # 56 chars
    "/memory forget <tag> Delete all memories with a tag\n"      # 52 chars
    "/memory help         Show this help"            # 35 chars
)
# Total: ~215 chars. Telegram's answerCallbackQuery text field is limited to 200.

When the "Search" button on the dashboard is tapped, PTB passes this string to answerCallbackQuery. Telegram returns a 400 Bad Request, PTB raises BadRequest, the outer except Exception handler catches it, and calls query.answer(_MSG_QUERY_FAILED) - the user sees "Memory query failed." instead of the help text.

Fix: strip the alignment padding; the monospace columns are invisible in Telegram's alert modal anyway:

_HELP_TEXT = (
    "/memory - Browse by tag\n"
    "/memory search <q> - Semantic search\n"
    "/memory stats - Counts and confidence distribution\n"
    "/memory forget <tag> - Delete all memories with a tag\n"
    "/memory help - Show this help"
)

Suggestion

edit_message_reply_markup cleanup only catches BadRequest (memory_command.py, _send_or_edit fallback path)

try:
    await update.callback_query.edit_message_reply_markup(reply_markup=None)
except BadRequest as kb_exc:                 # narrow catch
    log.debug("clear stale keyboard failed (best-effort): %s", kb_exc)

The comment says "swallow any BadRequest" but a NetworkError or TimedOut from PTB propagates out of _send_or_edit into the outer try/except in handle_memory_callback, which calls query.answer(_MSG_QUERY_FAILED) on a query that was already answered by the fallback send. Since this is explicitly a best-effort path, except Exception is the appropriate catch width here.


Clean

Config validation, get_by_id O(1) path, delete_by_id ownership/source scoping, ftd/tag enum guards, _encode_callback raise-vs-assert, _send_or_edit contract enforcement, source == "extracted" filter in _send_search, answer-after-send ordering, has_confidence n/a guard, _set_cache TTL semantics comment, and singular/plural in _build_forget_tag_confirm all look correct. The callback size ceiling is comfortably met. Auth scoping is consistent with bot.py.

…atch

Round 7 review fixes (PR #337).

#1 _HELP_TEXT exceeded Telegram's 200-char limit: the help body
was 215 chars due to monospace alignment padding. The dashboard's
Search button calls `query.answer(_HELP_TEXT, show_alert=True)`,
which routes through `answerCallbackQuery.text` (capped at 200).
Telegram returned 400 BadRequest, the outer except converted it
to "Memory query failed.", and the user never saw the help -
exactly the wrong UX for a help action.

Fix: drop the column padding; the alignment was invisible in
Telegram's alert modal anyway since it doesn't render with a
monospace font. New length is 195. Added a length-ceiling unit
test so a future help-text edit can't silently bust the limit.

#2 Keyboard-clear catch was BadRequest-only: the best-effort
cleanup before the fallback send caught only `BadRequest`. A
transient `NetworkError` / `TimedOut` from PTB would escape and
short-circuit the fresh send, surfacing as "Memory query failed."
for what should have been a successful re-render.

Fix: broaden to `except Exception`. The contract is that cleanup
failure must NOT abort the fallback send; the catch width should
match. Added a regression test that injects NetworkError and
asserts the fresh send still fires.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Review: feat(memory): /memory Telegram command (spec 310)

All prior-review items from the seven previous rounds are correctly addressed in this diff. Checking the changed code for new issues only.


Clean

memory.py additions - get_by_tag, get_by_id, delete_by_id, and the extended get_stats are all correct. Ownership and source scoping are consistent. The MemoryResult.updated_at: str = "" default preserves backward compatibility for existing callers. The get_all(limit=None) → 100,000 ceiling translation is correctly documented and intentional per spec §7.2.1.

memory_command.py - The answer-after-send pattern is consistently applied across all verbs. The _TAG_ENUM guards on both tag and ftd verbs are in parity. The _send_or_edit contract (edit=True raises on missing callback_query; effective_chat check uses if/raise) is correctly hardened. _chat_id and _user_id use if/raise. The keyboard cleanup in the fallback path catches Exception (not just BadRequest). _HELP_TEXT is 198 chars, safely under the 200-char answerCallbackQuery limit. source == "extracted" filter is applied in _send_search. Singular/plural in _build_forget_tag_confirm is correct. TTL semantics (idle, not absolute) are documented in _CACHE_TTL_S.

_ScreenCache - message_id field is gone. _avg_confidence dead code is gone. The context parameter on _send_or_edit is gone.

config.py - MEMORY_SEARCH_FLOOR validation (range check, type check, startup exit) matches the pattern of other numeric memory vars.

bot.py - Handler registration looks correct. Pattern r"^mem:" correctly scopes the callback dispatcher.

Security - Auth scoping is consistent with bot.py. delete_by_id verifies ownership and source before deleting. No cross-user data exposure path visible. Callback enum guards prevent arbitrary tag injection in both read and write paths.

No new issues found. This is ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/memory Telegram command: browse, search, and manage memories

2 participants