Skip to content

feat(memory): deterministic buffer turn-summaries (no LLM)#16

Merged
jkyberneees merged 3 commits into
mainfrom
feat/memory-buffer-summaries
Jun 6, 2026
Merged

feat(memory): deterministic buffer turn-summaries (no LLM)#16
jkyberneees merged 3 commits into
mainfrom
feat/memory-buffer-summaries

Conversation

@jkyberneees
Copy link
Copy Markdown
Contributor

Problem

The tier-2 buffer (short-term working memory) is injected into the system prompt on every turn, but each entry was a naive truncation of raw text — message[:97]+"..." (a byte slice that can split a UTF-8 rune) or shorten(s, 100). For an assistant turn this usually captured only filler ("Sure, I'll help with that. Let me start by…"), so the buffer carried almost no recall signal — close to dead weight.

Change

Centralize summarization in MemoryManager.AppendBuffer via a new deterministic, no-LLM, rune-safe summarizeForBuffer (no per-turn LLM calls — consistent with the hot-path cleanup in #12/#13). Pipeline:

  1. Strip fenced code blocks (all-code → [code] placeholder, never blank)
  2. Unwrap inline code/backticks
  3. Conservative markdown strip (headings, bullets, blockquote, bold, links/images)
  4. Collapse all whitespace to single spaces
  5. Drop a leading filler clause — only when substantive text remains
  6. Excerpt to 200 runes at a sentence (.!?) → word → hard-cut boundary; never splits a rune

All 8 call sites now pass raw text.

Why placement matters

Summarization lives in AppendBuffer, not Buffer.Append — because RestoreBuffer calls Buffer.Append directly with already-formatted, already-summarized lines. This keeps restore from re-processing (and corrupting) persisted summaries. Documented as an invariant on both functions and guarded by a test.

  • Backward compatible: old session.Buffer lines restore verbatim; no migration.
  • shorten() retained: still used by non-buffer callers (session labels at main.go:1792-1793, tool-result display :1840, serve.go:639).

Tests

  • summarize_test.go: table-driven heuristic cases (empty, whitespace, passthrough, all-code, code+prose, markdown, inline-code, filler-dropped, filler-only-kept, whitespace-collapse) + sentence-boundary truncation + hard-cut single long token + multibyte boundary safety (×N, utf8.ValidString).
  • memory_test.go: TestAppendBufferCleansAndDoesNotMidWordCut (no \n/code-fence, valid UTF-8, ≤ cap) and TestRestoreBufferPreservesLinesVerbatim (invariant guard).

Verification

go build ./...          # ok
go vet ./...            # clean
gofmt -l ...            # empty
go test ./internal/memory/... -race   # ok
go test ./...           # ALL TESTS PASS

🤖 Generated with Claude Code

jkyberneees and others added 2 commits June 6, 2026 13:34
The tier-2 buffer is injected into the system prompt every turn but each
entry was a naive truncation of raw text (message[:97]+"..." or
shorten(s,100)) — a byte slice that can split a UTF-8 rune and usually
captures only filler ("Sure, I'll help. Let me start by..."), leaving the
buffer with almost no recall signal.

Centralize summarization in MemoryManager.AppendBuffer via a new
deterministic, no-LLM, rune-safe summarizeForBuffer: strip fenced/inline
code, conservative markdown, collapse whitespace, drop a leading filler
clause (only when substance remains), and excerpt to 200 runes at a
sentence/word boundary. All 8 call sites now pass raw text.

Placement is load-bearing: summarization lives in AppendBuffer, not
Buffer.Append, so RestoreBuffer (which calls Buffer.Append directly with
already-summarized lines) never re-processes persisted summaries. Old
session buffers restore verbatim — no migration. shorten() is retained
for its non-buffer callers (session labels, tool-result display).

Tests: table-driven heuristic cases (code/markdown/filler/multibyte/
hard-cut), an AppendBuffer no-mid-word-cut regression, and a verbatim
RestoreBuffer guard for the invariant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adversarial review (AI Verification Protocol) found two defects in
summarizeForBuffer:

- D-01 (critical): when the only sentence terminator in the window was an
  early abbreviation/version/domain ("e.g.", "v1.2", "node.js"), the
  excerpt collapsed to a few runes (e.g. "e.g.…") — destroying the
  summary. Now a sentence cut is only preferred when it lands at least
  halfway through the window; otherwise fall back to the word boundary
  near the cap.
- D-02 (minor): an unclosed code fence (unmatched by the fenced-block
  regex) left a stray backtick in the output. Residual backticks are now
  stripped after inline-code unwrapping.

Adds regression tests for both.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jkyberneees
Copy link
Copy Markdown
Contributor Author

🔍 AI Verification Protocol v5.2.7 — Certificate

Classification: NovelBehavior (new deterministic heuristic) · LOC_filtered: 341 (7 files) — under the 1,500 standard-pipeline ceiling.

Adversarial pass (Agent D) — findings & fixes

Red-teaming summarizeForBuffer surfaced two real defects, both now fixed (commit 860248c):

ID Severity Finding Fix
D-01 🔴 Critical (correctness) A single early sentence terminator collapsed the excerpt: "e.g., we should refactor…""e.g.…" (5 runes), "node.js…""node.…", "v1.2…""v1.…". The code picked the last .!? in the window, which with one early terminator destroyed the summary. Prefer a sentence cut only when it lands ≥ half the window; otherwise fall back to the word boundary near the cap. Early-abbreviation inputs now yield ~196-rune excerpts.
D-02 🟡 Minor An unclosed code fence (unmatched by the fenced-block regex) left a stray backtick. Strip residual backticks after inline-code unwrap.

A third probe (filler false-positive on "Surely…") passed — the \b boundary correctly rejects it.

Axes

Axis Result Evidence
2.1 Semantic correctness Pipeline does what the contract claims; D-01/D-02 fixed and regression-tested.
2.3 / 2.8 Security & adversarial surface No new injection sink — buffer text was already model/user-sourced and injected; summarization only shortens/cleans. RE2 regexes (no ReDoS).
2.4 Structural integrity Single centralized funnel; invariant (summarize on write, not on restore) documented on both AppendBuffer/RestoreBuffer and guarded by a verbatim-restore test.
2.5 Behavioral exploration Adversarial probes for early-terminator, unclosed fence, filler edge cases, multibyte/hard-cut boundaries.
2.9 Documentation coverage No exported-symbol changes (summarizeForBuffer/maxBufferSummaryRunes unexported; AppendBuffer/RestoreBuffer signatures unchanged). No public API/doc surface affected.

Signals

go build ./... ✅ · go vet ./... ✅ · gofmt -l empty ✅ · go test ./internal/memory/... -race ✅ · go test ./... ✅ (all pass).

Verdict: AutoApprove after remediation. Both findings fixed and covered by tests before merge.

CI's `go test -race` flagged a write/write data race on mockLLM.lastUser
in TestConsolidateOnEnd_FiresAtSessionEnd. Root cause:
OnSessionEndWithProvenance legitimately calls the LLM from two goroutines
at once — synchronous episode extraction plus the background
consolidation goroutine — but the test mocks held unsynchronized shared
state.

- mockLLM: guard lastUser with a mutex; add getLastUser() accessor.
- countCallsLLM: replace the non-atomic *int counter with atomic.Int64
  and a calls() accessor (same latent race, same OnSessionEnd trigger).

Test-only change; no production code affected. Verified with
`go test ./internal/memory/... -short -race -count=30` and the full
`go test ./... -short -race` suite.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jkyberneees jkyberneees merged commit 7662fe0 into main Jun 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant