feat(memory): /memory Telegram command (spec 310)#337
Conversation
Promote the hard-coded `_MIN_RELEVANCE_THRESHOLD = 0.3` constant in memory.py to a config field (`Config.memory_search_floor`, env var `MEMORY_SEARCH_FLOOR`, default 0.3). This is the "one knob, two paths" groundwork from spec 310: the same floor will govern both the context-injection path (`format_context`) and the upcoming `/memory search` UI in `memory_command.py`, so a config change applies consistently everywhere Kai consults memory. `format_context` now reads `_config.memory_search_floor` at every call rather than a module-level constant, per the spec's requirement that the value be sourced fresh so a runtime config edit + restart takes effect without divergence between the two paths. Validation: float in [0.0, 1.0] inclusive (Mem0 cosine similarity is normalized to that range; out-of-range values would silently filter everything or nothing). Same try/except SystemExit pattern as the other memory_* numeric vars. Defaulting to 0.3 means this commit is behaviorally a no-op in production: the new field's default matches the constant it replaces. Note on the env-var pattern: I followed the existing memory family convention (document in .env.example only) rather than adding an `install.py` `_cmd_config` prompt. None of the 8 prior `MEMORY_*` vars have install prompts; adding one only for this var would be inconsistent with the rest of the family. Tests: 7 new TestMemorySearchFloor cases for the env var (default, override, boundaries, rejections); 1 new TestFormatContext case that pins the config-driven read (a row at score 0.4 passes when floor=0.3 but is filtered when floor=0.5). Updated _make_config helper to include the new field default. All 1887 unit tests pass. Refs: home/specs/310-memory-command.md §7.5
…7.2) Adds the data-layer surface the /memory command (commit C3) will call. Pure additions; no existing call sites touched. memory.py: - MemoryResult gains updated_at (defaulted to "") so the tag view can sort by most-recent-update without a second round-trip - MemoryStats gains nine extracted-only aggregate fields used by the /memory stats screen: extracted_count, by_tag, confidence_min/median /max (None when no extracted rows; median uses median_low semantics so the value is always one that exists in the data), confidence _below_0_7 / _below_0_6 cliff counts, confirmation_quote_count, and by_prompt_version. All default-valued so legacy two-arg construction still works - get_all gains a limit kwarg (default 1000, None for unlimited via a 100k Mem0 ceiling) so the /memory dashboard pagination has a knob - get_by_tag(*, user_id, tag): post-filters Mem0 results to source== extracted AND tag in metadata.tags, sorted by updated_at desc - delete_by_id(*, user_id, memory_id): two-step verify (user_id match AND source==extracted) before delegating to Mem0's unscoped Memory.delete; ValueError on missing row is swallowed and reported as False so the caller can present a clean "not found" message tests/test_memory.py: - 26 new tests across TestGetAllLimit (3), TestGetByTag (6), TestDeleteById (8), TestGetStatsExtended (9). Verify scoping rules, default sentinels, sort order, and the verify-before-delete guard
New module src/kai/memory_command.py implementing the full /memory Telegram command surface: dashboard, tag drill-down with pagination, fact detail, semantic search, stats, and confirmed forget flows (single fact and bulk by tag). Bot wiring for the new command and callback handlers is intentionally deferred to commit C4 so this commit can land with a focused test suite. Architecture: - Pure builders (_build_dashboard, _build_tag_view, _build_fact_view, _build_search_results, _build_stats, _build_forget_*_confirm) take fixture data and return (text, keyboard) tuples. No I/O, no state. Lets the unit tests assert on rendering without standing up Update/Bot fixtures. - Top-level handlers (handle_memory_command, handle_memory_callback) own the I/O: fetch from memory.py, call the builders, dispatch send or edit calls. - Module-level _screen_cache holds per-chat navigation state with lazy 30-minute TTL expiry on every access. Single-entry-per-chat semantics (spec 310 §7.4): a second /memory invocation overwrites the prior cache, and stale fact-button taps trip the session-expired branch instead of resolving wrong ids. - Callback grammar mem:<verb>[:<arg>...] keeps every legitimate callback under 64 bytes; memory ids are never embedded, only referenced by integer index into the cache. UX details from spec: - Plain text rendering throughout (spec §6.0); user-originated content like fact text and confirmation_quote routinely contains Markdown-reserved characters and rendering them as MarkdownV2 would either corrupt the display or require escaping every dynamic string. - Tag dashboard hides zero-count tags; stats screen shows them as a tuning signal (the spec §6.1 asymmetry, enforced by the builders not the data layer). - Search applies the floor in the UI path the same way format_context does in the retrieval path (spec §7.5: one knob, two paths). - Forget flows use callback toasts via answerCallbackQuery rather than edit-then-sleep-then-edit, keeping the event loop free. tests/test_memory_command.py: 64 unit tests across 12 classes covering callback round-trips, every pure builder, pagination edge cases (empty, exact boundary, partial last, out-of-range clamping), all empty-state branches, the 30-minute cache TTL, and a drift guard that asserts _TAG_ENUM stays in sync with the extraction schema. No real Mem0 instance; handler tests monkeypatch the memory.py read and delete helpers per spec §10.3.
Registers the two handlers from src/kai/memory_command.py with the
Telegram application:
- CommandHandler("memory") -> handle_memory_command, dispatching the
subcommand surface (no-args dashboard, search, stats, forget, help).
- CallbackQueryHandler(pattern=r"^mem:") -> handle_memory_callback
for inline-keyboard taps (tag drill-down, fact view, forget
confirmations).
The pattern is exclusive by prefix with the other registered
callback handlers (^model:, ^voice:, ^ws:), so dispatch is
unambiguous.
Updates the /help text with a five-line /memory block placed after
the /github group, matching its data-management role.
Review by KaiThe PR Review: feat(memory): /memory Telegram command (spec 310)Overall this is a well-structured PR with thorough tests and good defensive coding. A few real issues worth addressing: Warning1. assert len(encoded.encode("utf-8")) <= 64, f"callback_data too long: {encoded!r}"
if len(encoded.encode("utf-8")) > 64:
raise ValueError(f"callback_data too long: {encoded!r}")2. The function is defined and documented but never called. median = stats.confidence_median # line ~395Either remove 3. tag = args[0]
facts = memory.get_by_tag(user_id=str(chat_id), tag=tag)An authorized user who crafts if tag not in _TAG_ENUM:
await query.answer(_MSG_SESSION_EXPIRED)
returnSuggestion4.
5. The function accepts 6. Two full-corpus fetches per forget action
7. Misleading confidence display when all confidence values are malformed ( In No issues found with the data-layer changes in |
Seven follow-ups from the PR #337 review: 1. _encode_callback: replace `assert` with `if/raise ValueError` for the 64-byte ceiling. assertions are stripped under `python -O`, which would let an over-limit callback silently truncate at the Telegram boundary. 2. Drop the dead `_avg_confidence` helper. It was a one-line wrapper that just returned `stats.confidence_median`; the dashboard builder now references the field directly. 3. Validate the tag arg in the `ftd` callback handler against `_TAG_ENUM` before calling get_by_tag/delete_by_id. Mirrors the gate already present on the /memory forget text path; closes a crafted-callback / version-skew gap. 4. Remove the unused `message_id` field from `_ScreenCache` and the write site in `_send_or_edit`. Was set on fresh sends but never read. 5. Drop the unused `context` parameter from `_send_or_edit` and update its 15 call sites. Helper-layer signatures keep `context` per python-telegram-bot handler convention. 6. Add `memory.get_by_id(*, user_id, memory_id)` with the same ownership/source scoping as `delete_by_id`, then refactor delete_by_id to call it. Wire `_send_fact_view` and `_send_forget_fact_confirm` to use it - O(1) Mem0 get vs the prior `get_all` + Python-side filter on every fact-view tap. 7. Stats screen confidence min/median/max: render "n/a" when the underlying field is None instead of "0.00". A real 0.00 reading would be indistinguishable from missing data otherwise. Tests added: - TestCallbackCodec.test_overlong_callback_raises (#1) - TestCallbackDispatch.test_ftd_rejects_unknown_tag (#3) - TestGetById class with 7 cases covering all four not-found branches, ownership, source, fetch error, happy path (#6) - TestBuildStats.test_renders_n_a_for_missing_confidence (#7)
Review by KaiLooking at only new issues; prior-review items that were addressed (assertion→raise, Warning1. tag = args[0]
# no _TAG_ENUM check
await _send_tag_view(update, context, chat_id, tag, page, edit=True)
if tag not in _TAG_ENUM:
await query.answer(_MSG_SESSION_EXPIRED)
returnSuggestion2. if edit and update.callback_query is not None:
... # only branch that sends when edit=True
if not edit:
... # not reached when edit=True and callback_query is NoneWhen CleanData layer ( |
Two follow-ups from the second review pass on PR #337: 1. The read-only `tag` callback verb was missing the `_TAG_ENUM` guard that round 1 added to the `ftd` verb. A crafted `mem:tag:arbitrary-string:0` callback would reach memory.get_by_tag with an off-enum tag. Safe today (returns empty), but inconsistent with `ftd` and the /memory forget text path, both of which validate. Mirror the same gate. 2. `_send_or_edit` had a silent no-op branch when `edit=True` but `update.callback_query is None`. Unreachable today (all `edit=True` callers live in `handle_memory_callback` where the invariant holds), but a future caller wiring up the `_send_*` helpers from outside the callback path would get a ghost action with no diagnostic. Raise instead of asserting so the gate survives `python -O` (same reasoning as the `_encode_callback` ceiling fix in round 1). Tests added: - TestCallbackDispatch.test_tag_verb_rejects_unknown_tag (#1) - TestSendOrEditContract: 3 cases covering the raise, the legitimate edit path, and the fresh-send path (#2)
Review by KaiReview: feat(memory): /memory Telegram command (spec 310)Prior review items (assertion→raise, Suggestion1. The prior review caught that 2. except Exception as exc:
...
assert update.message is not None # stripped by -O
await update.message.reply_text(_MSG_QUERY_FAILED)These follow the bot.py convention, but they're inconsistent with this module's own 3.
Clean
|
Three suggestions from the third review pass on PR #337: 1. `_build_stats` below-threshold rows now render `(n/a)` instead of `(0.0%)` when no confidence data exists. Mirrors the n/a fallback added to min/median/max in round 1: a count of 0 with "0.0%" reads as "all facts scored above threshold" rather than "no data", which contradicts the n/a labels right above it. 2. `_send_forget_tag_confirm` now uses `if/raise ValueError` instead of two `assert update.message is not None` checks. Matches the same `python -O` survival reasoning applied to `_send_or_edit` and `_encode_callback` in the prior rounds. Hoisted the check to a single early guard so the function body needs only one narrowing point for the type-checker. 3. `_CACHE_TTL_S` docstring now states the timeout is *idle* (reset on every write), not absolute. The constant name and spec reference both implied a hard 30-minute session ceiling; the implementation actually resets `created_at` on every `_set_cache`, so an active user keeps the cache alive indefinitely. Doc-only fix. Tests added: - TestBuildStats.test_renders_n_a_for_missing_confidence extended to assert (n/a) on the below-threshold rows (#1) - TestBuildStats.test_renders_percentages_when_confidence_present regression guard against the n/a fallback over-firing (#1) - TestSendOrEditContract.test_send_forget_tag_confirm_raises_without_message covers the new contract gate (#2)
Review by KaiBased on my analysis of the diff directly (the code doesn't exist locally - this is a pre-merge review): Review: feat(memory): /memory Telegram command (spec 310)Prior items (assert→raise, Warning1. Double Every verb block calls Fix: track whether the query has been answered, or move # Before:
await query.answer()
await _send_dashboard(update, context, chat_id, edit=True)
# After:
await _send_dashboard(update, context, chat_id, edit=True)
await query.answer()The Suggestion2. facts = memory.get_by_tag(user_id=str(chat_id), tag=tag)
for fact in facts:
if memory.delete_by_id(...):
deleted += 1
3. pct_07 = (stats.confidence_below_0_7 / stats.extracted_count) * 100
4. Stale keyboard after When Clean
|
…d on fallback Round 4 review fixes (PR #337). #1 Double-answer bug: callback handlers previously called query.answer() BEFORE _send_*. If the send raised, the outer except would call query.answer(_MSG_QUERY_FAILED), which Telegram rejects as BadRequest because the query was already answered, producing an unhandled exception in the error path itself. Fix: every dispatch branch now calls _send_* first, then query.answer(). The outer except is therefore guaranteed to be answering an unanswered query, so its single answer call is always valid. The visual cost is the loading spinner persists slightly longer (until the send returns), which is invisible at typical Mem0 latencies. #4 Stale keyboard on fallback: when edit_message_text fails (BadRequest other than "not modified") we fall back to sending a fresh message. The original message's inline keyboard remained visible in chat history, so a user tap would fire a callback against state we have moved past, producing ghost navigation. Fix: best-effort edit_message_reply_markup(reply_markup=None) strips the buttons before the fresh send. Wrapped in inner try/except since the original message may already be gone or immutable - this is already an error path and the fresh send is what matters. Tests: regression for #1 uses the `ftd` verb (whose Mem0 call escapes the dispatcher with no inner guard) and asserts query.answer is awaited exactly once with _MSG_QUERY_FAILED. Two new tests for #4 cover the happy fallback path (reply_markup=None called) and the swallowed-error path (BadRequest from clear is not propagated).
Review by KaiLooking only at code that changed in this diff; prior-review items not re-raised. Warning1. results = memory.search(query, user_id=str(chat_id), limit=_SEARCH_LIMIT)
filtered = [r for r in results if r.score >= floor]The floor is applied but Fix: post-filter in filtered = [r for r in results if r.score >= floor and r.metadata.get("source") == "extracted"]Suggestion2. if not edit:
assert update.effective_chat is not None # stripped by -O
await update.effective_chat.send_message(...)Every other defensive check in this module uses if update.effective_chat is None:
raise ValueError("_send_or_edit: effective_chat is None")3. Grammar: "confirm forget 1 facts" ( f"confirm forget {count} facts"When Data layer additions ( |
…orget-tag confirm Round 5 review fixes (PR #337). #1 Source filter on search: every other read path in memory_command (get_by_tag, get_by_id) scopes to source=="extracted", but memory.search() is a Mem0 vector lookup that spans all sources (Track 1 exchanges, legacy rows). A non-extracted hit could appear in results, then fail get_by_id's source check on tap and render "This memory no longer exists." for a row the user just saw. Fix: post-filter in _send_search alongside the score floor. The filter belongs in the UI helper (not at the data layer) because Mem0's search API has no source predicate; this is the narrowest place that owns the read. #2 effective_chat assert -> if/raise: the fresh-send branch's None-guard was an `assert`, which `python -O` strips. Every other defensive check in this module uses if/raise ( _encode_callback, the edit=True guard above, _send_forget_tag_confirm). This branch was missed in the round-2 sweep. #3 Pluralization: count==1 used to render "1 facts" / "1 memories" / "confirm forget 1 facts". A tag with one surviving member is a real case (deletes whittle the count). Singular when count==1 across all three sites (header, body, button label) so a future edit can't regress any one of them in isolation. Tests: source-filter test asserts the cache memory_ids list omits non-extracted hits even when they clear the score floor; the ValueError test confirms the new effective_chat guard fires; the pluralization tests cover both count==1 (singular across all three sites) and count==2 (plural everywhere) so the singular branch can't silently catch the plural case.
Review by KaiLooking only at code present in this diff; all prior-review items are confirmed addressed. Review: feat(memory): /memory Telegram command (spec 310)The PR is in good shape. Five rounds of review have cleared every substantive issue. One new finding and one clean call: Suggestion
def _chat_id(update: Update) -> int:
chat = update.effective_chat
assert chat is not None # stripped by python -O
return chat.idEvery other defensive check in this module was explicitly converted to Clean
|
Round 6 review fix (PR #337). The reviewer named three remaining `assert` sites that the prior sweeps missed: - `_chat_id` helper - `_user_id` helper - top-of-handler `assert update.message is not None` in `handle_memory_command` Per the audit-all-sites rule, also converted the same pattern in `handle_memory_callback` (callback_query and query.data asserts) in the same pass, so a future review round doesn't have to flag the same anti-pattern a fourth time. The module now contains zero `assert` statements; every contract gate uses `if/raise` and survives `python -O`. The risk is low in current use (PTB routing guarantees these fields for CommandHandler / CallbackQueryHandler callbacks), but the convention matters: a future caller wiring these helpers from a non-routed path under optimized bytecode would see `AttributeError` on the next access instead of a meaningful `ValueError` naming the violated contract. Tests: four new cases under `TestSendOrEditContract` cover each guard (`_chat_id`, `_user_id`, `handle_memory_command` without message, `handle_memory_callback` without callback_query).
Review by KaiReview: feat(memory): /memory Telegram command (spec 310)All prior-review items are addressed. One new warning and one suggestion on code not previously flagged. Warning
_HELP_TEXT = (
"/memory Browse by tag\n" # 35 chars
"/memory search <q> Semantic search\n" # 37 chars
"/memory stats Counts and confidence distribution\n" # 56 chars
"/memory forget <tag> Delete all memories with a tag\n" # 52 chars
"/memory help Show this help" # 35 chars
)
# Total: ~215 chars. Telegram's answerCallbackQuery text field is limited to 200.When the "Search" button on the dashboard is tapped, PTB passes this string to Fix: strip the alignment padding; the monospace columns are invisible in Telegram's alert modal anyway: _HELP_TEXT = (
"/memory - Browse by tag\n"
"/memory search <q> - Semantic search\n"
"/memory stats - Counts and confidence distribution\n"
"/memory forget <tag> - Delete all memories with a tag\n"
"/memory help - Show this help"
)Suggestion
try:
await update.callback_query.edit_message_reply_markup(reply_markup=None)
except BadRequest as kb_exc: # narrow catch
log.debug("clear stale keyboard failed (best-effort): %s", kb_exc)The comment says "swallow any BadRequest" but a CleanConfig validation, |
…atch Round 7 review fixes (PR #337). #1 _HELP_TEXT exceeded Telegram's 200-char limit: the help body was 215 chars due to monospace alignment padding. The dashboard's Search button calls `query.answer(_HELP_TEXT, show_alert=True)`, which routes through `answerCallbackQuery.text` (capped at 200). Telegram returned 400 BadRequest, the outer except converted it to "Memory query failed.", and the user never saw the help - exactly the wrong UX for a help action. Fix: drop the column padding; the alignment was invisible in Telegram's alert modal anyway since it doesn't render with a monospace font. New length is 195. Added a length-ceiling unit test so a future help-text edit can't silently bust the limit. #2 Keyboard-clear catch was BadRequest-only: the best-effort cleanup before the fallback send caught only `BadRequest`. A transient `NetworkError` / `TimedOut` from PTB would escape and short-circuit the fresh send, surfacing as "Memory query failed." for what should have been a successful re-render. Fix: broaden to `except Exception`. The contract is that cleanup failure must NOT abort the fallback send; the catch width should match. Added a regression test that injects NetworkError and asserts the fresh send still fires.
Review by KaiReview: feat(memory): /memory Telegram command (spec 310)All prior-review items from the seven previous rounds are correctly addressed in this diff. Checking the changed code for new issues only. Clean
Security - Auth scoping is consistent with No new issues found. This is ready to merge. |
Summary
Implements spec 310: a Telegram-facing
/memorycommand surface for browsing, searching, and forgetting Haiku-extracted facts. Closes #310.Four focused commits:
C1 (
6357894) AddMEMORY_SEARCH_FLOORenv var (default 0.3) and consolidate the prior_MIN_RELEVANCE_THRESHOLDconstant into it. One knob, two paths: bothformat_context(retrieval) and/memory search(UI) read the same value at every invocation, so a config edit + restart produces consistent behavior across both paths.C2 (
99f9859) Pure-additive helpers and stats extensions inmemory.py:get_by_tag,delete_by_id, alimitkwarg onget_all, plus nine extracted-only aggregate fields onMemoryStats. The new aggregates default to "no data" sentinels so the legacy two-arg construction still works.delete_by_iddoes a verify-before-delete on bothuser_idandmetadata.source == "extracted"as defense-in-depth, even though Mem0'sdeletealready scopes by user.C3 (
1323760) New modulesrc/kai/memory_command.pyplus 64 unit tests. Pure builders return(text, keyboard)tuples; top-level handlers own the I/O. Per-chat ephemeral cache with lazy 30-minute TTL and single-entry-per-chat semantics keeps every callback under Telegram's 64-byte limit (memory ids are referenced by index, never embedded)._TAG_ENUMmirrors the extraction schema with a drift-guard test so future schema changes can't silently exclude tags from/memory stats.C4 (
fd3663c) Wire the new handlers intobot.pyand add a five-line/memoryblock to/help.UX
Plain text rendering throughout (spec §6.0); user-originated content like fact text and confirmation_quote routinely contains Markdown-reserved characters and rendering them as MarkdownV2 would corrupt the display. Forget flows use
answerCallbackQuerytoasts rather than edit-then-sleep-then-edit, keeping the event loop free.Dashboard hides zero-count tags but stats screen surfaces them as a tuning signal (the spec §6.1 asymmetry, enforced by builders, not the data layer).
Test plan
pytest tests/test_memory_command.py(64 new tests, all passing)pytest tests/test_memory.py tests/test_config.py(full memory + config suites green; 244 passing, 9 deselected for Mem0 lock)make checkclean (ruff lint + format)/memory,/memory search <q>,/memory stats, tag drill-down + back, fact view + forget confirmationNotes
/memoryonly reads or deletes the caller's own memories (spec §9). Safe to expose by default.test_bot.pycovers pure helpers only and the import smoke check verifies registration. The C3 suite already exercises every handler path through monkeypatched memory primitives.source == "extracted"filter on every read/write path makes this PR correct regardless of Remove Track 1 (raw-turn embedding) from memory system #335 ordering.