Problem
The memory.recall structured log line truncates the query field to 80 characters via _RECALL_TRUNC in src/kai/memory.py. The actual semantic search receives the full query, so retrieval quality is unaffected. The truncation is purely cosmetic in logs.
However, the format_context docstring describes the recall log as "designed to be parsed by a downstream retrieval eval harness without re-running search." An eval harness reading the log can see which memories were returned for which queries, but only sees the first 80 characters of the query itself. If a retrieval fires unexpectedly or scores poorly, an investigator cannot reproduce the search input from logs alone; they must cross-reference the chat-history JSONL by timestamp to recover the full query.
This is a small observability gap that gets more visible as the episode pipeline lands and retrieval-quality analysis becomes a more frequent activity.
Proposed fix
Two reasonable options:
- Raise
_RECALL_TRUNC to roughly 256 characters. Covers most user messages without filling logs with paragraph-length lines. Single-line change.
- Keep the existing truncated
query field for log-readability, and additionally emit a separate query_full field for automated consumption. Lets human log-readers and parsers each pick the form they want.
Option 1 is cheaper; option 2 is more disciplined. Either resolves the eval-harness replay use case.
Acceptance
- The recall log carries enough query content for an eval harness to reproduce the search input without joining against chat-history JSONL.
- Existing log readability is preserved (no multi-line query bodies in standard log views).
- Per-hit
snippet truncation behavior (which uses the same _RECALL_TRUNC constant) is reviewed during the change so the two consumers do not silently diverge.
Problem
The
memory.recallstructured log line truncates thequeryfield to 80 characters via_RECALL_TRUNCinsrc/kai/memory.py. The actual semantic search receives the full query, so retrieval quality is unaffected. The truncation is purely cosmetic in logs.However, the
format_contextdocstring describes the recall log as "designed to be parsed by a downstream retrieval eval harness without re-running search." An eval harness reading the log can see which memories were returned for which queries, but only sees the first 80 characters of the query itself. If a retrieval fires unexpectedly or scores poorly, an investigator cannot reproduce the search input from logs alone; they must cross-reference the chat-history JSONL by timestamp to recover the full query.This is a small observability gap that gets more visible as the episode pipeline lands and retrieval-quality analysis becomes a more frequent activity.
Proposed fix
Two reasonable options:
_RECALL_TRUNCto roughly 256 characters. Covers most user messages without filling logs with paragraph-length lines. Single-line change.queryfield for log-readability, and additionally emit a separatequery_fullfield for automated consumption. Lets human log-readers and parsers each pick the form they want.Option 1 is cheaper; option 2 is more disciplined. Either resolves the eval-harness replay use case.
Acceptance
snippettruncation behavior (which uses the same_RECALL_TRUNCconstant) is reviewed during the change so the two consumers do not silently diverge.