feat(memory): deterministic buffer turn-summaries (no LLM)#16
Merged
Conversation
The tier-2 buffer is injected into the system prompt every turn but each
entry was a naive truncation of raw text (message[:97]+"..." or
shorten(s,100)) — a byte slice that can split a UTF-8 rune and usually
captures only filler ("Sure, I'll help. Let me start by..."), leaving the
buffer with almost no recall signal.
Centralize summarization in MemoryManager.AppendBuffer via a new
deterministic, no-LLM, rune-safe summarizeForBuffer: strip fenced/inline
code, conservative markdown, collapse whitespace, drop a leading filler
clause (only when substance remains), and excerpt to 200 runes at a
sentence/word boundary. All 8 call sites now pass raw text.
Placement is load-bearing: summarization lives in AppendBuffer, not
Buffer.Append, so RestoreBuffer (which calls Buffer.Append directly with
already-summarized lines) never re-processes persisted summaries. Old
session buffers restore verbatim — no migration. shorten() is retained
for its non-buffer callers (session labels, tool-result display).
Tests: table-driven heuristic cases (code/markdown/filler/multibyte/
hard-cut), an AppendBuffer no-mid-word-cut regression, and a verbatim
RestoreBuffer guard for the invariant.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adversarial review (AI Verification Protocol) found two defects in
summarizeForBuffer:
- D-01 (critical): when the only sentence terminator in the window was an
early abbreviation/version/domain ("e.g.", "v1.2", "node.js"), the
excerpt collapsed to a few runes (e.g. "e.g.…") — destroying the
summary. Now a sentence cut is only preferred when it lands at least
halfway through the window; otherwise fall back to the word boundary
near the cap.
- D-02 (minor): an unclosed code fence (unmatched by the fenced-block
regex) left a stray backtick in the output. Residual backticks are now
stripped after inline-code unwrapping.
Adds regression tests for both.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
Author
🔍 AI Verification Protocol v5.2.7 — CertificateClassification: Adversarial pass (Agent D) — findings & fixesRed-teaming
A third probe (filler false-positive on Axes
Signals
Verdict: |
CI's `go test -race` flagged a write/write data race on mockLLM.lastUser in TestConsolidateOnEnd_FiresAtSessionEnd. Root cause: OnSessionEndWithProvenance legitimately calls the LLM from two goroutines at once — synchronous episode extraction plus the background consolidation goroutine — but the test mocks held unsynchronized shared state. - mockLLM: guard lastUser with a mutex; add getLastUser() accessor. - countCallsLLM: replace the non-atomic *int counter with atomic.Int64 and a calls() accessor (same latent race, same OnSessionEnd trigger). Test-only change; no production code affected. Verified with `go test ./internal/memory/... -short -race -count=30` and the full `go test ./... -short -race` suite. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The tier-2 buffer (short-term working memory) is injected into the system prompt on every turn, but each entry was a naive truncation of raw text —
message[:97]+"..."(a byte slice that can split a UTF-8 rune) orshorten(s, 100). For an assistant turn this usually captured only filler ("Sure, I'll help with that. Let me start by…"), so the buffer carried almost no recall signal — close to dead weight.Change
Centralize summarization in
MemoryManager.AppendBuffervia a new deterministic, no-LLM, rune-safesummarizeForBuffer(no per-turn LLM calls — consistent with the hot-path cleanup in #12/#13). Pipeline:[code]placeholder, never blank).!?) → word → hard-cut boundary; never splits a runeAll 8 call sites now pass raw text.
Why placement matters
Summarization lives in
AppendBuffer, notBuffer.Append— becauseRestoreBuffercallsBuffer.Appenddirectly with already-formatted, already-summarized lines. This keeps restore from re-processing (and corrupting) persisted summaries. Documented as an invariant on both functions and guarded by a test.session.Bufferlines restore verbatim; no migration.shorten()retained: still used by non-buffer callers (session labels at main.go:1792-1793, tool-result display :1840, serve.go:639).Tests
summarize_test.go: table-driven heuristic cases (empty, whitespace, passthrough, all-code, code+prose, markdown, inline-code, filler-dropped, filler-only-kept, whitespace-collapse) + sentence-boundary truncation + hard-cut single long token + multibyte boundary safety (世×N,utf8.ValidString).memory_test.go:TestAppendBufferCleansAndDoesNotMidWordCut(no\n/code-fence, valid UTF-8, ≤ cap) andTestRestoreBufferPreservesLinesVerbatim(invariant guard).Verification
🤖 Generated with Claude Code