feat(memory): deterministic buffer turn-summaries (no LLM) by jkyberneees · Pull Request #16 · BackendStack21/odek

jkyberneees · 2026-06-06T11:34:25Z

Problem

The tier-2 buffer (short-term working memory) is injected into the system prompt on every turn, but each entry was a naive truncation of raw text — message[:97]+"..." (a byte slice that can split a UTF-8 rune) or shorten(s, 100). For an assistant turn this usually captured only filler ("Sure, I'll help with that. Let me start by…"), so the buffer carried almost no recall signal — close to dead weight.

Change

Centralize summarization in MemoryManager.AppendBuffer via a new deterministic, no-LLM, rune-safe summarizeForBuffer (no per-turn LLM calls — consistent with the hot-path cleanup in #12/#13). Pipeline:

Strip fenced code blocks (all-code → [code] placeholder, never blank)
Unwrap inline code/backticks
Conservative markdown strip (headings, bullets, blockquote, bold, links/images)
Collapse all whitespace to single spaces
Drop a leading filler clause — only when substantive text remains
Excerpt to 200 runes at a sentence (.!?) → word → hard-cut boundary; never splits a rune

All 8 call sites now pass raw text.

Why placement matters

Summarization lives in AppendBuffer, not Buffer.Append — because RestoreBuffer calls Buffer.Append directly with already-formatted, already-summarized lines. This keeps restore from re-processing (and corrupting) persisted summaries. Documented as an invariant on both functions and guarded by a test.

Backward compatible: old session.Buffer lines restore verbatim; no migration.
shorten() retained: still used by non-buffer callers (session labels at main.go:1792-1793, tool-result display :1840, serve.go:639).

Tests

summarize_test.go: table-driven heuristic cases (empty, whitespace, passthrough, all-code, code+prose, markdown, inline-code, filler-dropped, filler-only-kept, whitespace-collapse) + sentence-boundary truncation + hard-cut single long token + multibyte boundary safety (世×N, utf8.ValidString).
memory_test.go: TestAppendBufferCleansAndDoesNotMidWordCut (no \n/code-fence, valid UTF-8, ≤ cap) and TestRestoreBufferPreservesLinesVerbatim (invariant guard).

Verification

go build ./...          # ok
go vet ./...            # clean
gofmt -l ...            # empty
go test ./internal/memory/... -race   # ok
go test ./...           # ALL TESTS PASS

🤖 Generated with Claude Code

The tier-2 buffer is injected into the system prompt every turn but each entry was a naive truncation of raw text (message[:97]+"..." or shorten(s,100)) — a byte slice that can split a UTF-8 rune and usually captures only filler ("Sure, I'll help. Let me start by..."), leaving the buffer with almost no recall signal. Centralize summarization in MemoryManager.AppendBuffer via a new deterministic, no-LLM, rune-safe summarizeForBuffer: strip fenced/inline code, conservative markdown, collapse whitespace, drop a leading filler clause (only when substance remains), and excerpt to 200 runes at a sentence/word boundary. All 8 call sites now pass raw text. Placement is load-bearing: summarization lives in AppendBuffer, not Buffer.Append, so RestoreBuffer (which calls Buffer.Append directly with already-summarized lines) never re-processes persisted summaries. Old session buffers restore verbatim — no migration. shorten() is retained for its non-buffer callers (session labels, tool-result display). Tests: table-driven heuristic cases (code/markdown/filler/multibyte/ hard-cut), an AppendBuffer no-mid-word-cut regression, and a verbatim RestoreBuffer guard for the invariant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adversarial review (AI Verification Protocol) found two defects in summarizeForBuffer: - D-01 (critical): when the only sentence terminator in the window was an early abbreviation/version/domain ("e.g.", "v1.2", "node.js"), the excerpt collapsed to a few runes (e.g. "e.g.…") — destroying the summary. Now a sentence cut is only preferred when it lands at least halfway through the window; otherwise fall back to the word boundary near the cap. - D-02 (minor): an unclosed code fence (unmatched by the fenced-block regex) left a stray backtick in the output. Residual backticks are now stripped after inline-code unwrapping. Adds regression tests for both. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jkyberneees · 2026-06-06T11:37:45Z

🔍 AI Verification Protocol v5.2.7 — Certificate

Classification: NovelBehavior (new deterministic heuristic) · LOC_filtered: 341 (7 files) — under the 1,500 standard-pipeline ceiling.

Adversarial pass (Agent D) — findings & fixes

Red-teaming summarizeForBuffer surfaced two real defects, both now fixed (commit 860248c):

ID	Severity	Finding	Fix
D-01	🔴 Critical (correctness)	A single early sentence terminator collapsed the excerpt: `"e.g., we should refactor…"` → `"e.g.…"` (5 runes), `"node.js…"` → `"node.…"`, `"v1.2…"` → `"v1.…"`. The code picked the last `.!?` in the window, which with one early terminator destroyed the summary.	Prefer a sentence cut only when it lands ≥ half the window; otherwise fall back to the word boundary near the cap. Early-abbreviation inputs now yield ~196-rune excerpts.
D-02	🟡 Minor	An unclosed code fence (unmatched by the fenced-block regex) left a stray backtick.	Strip residual backticks after inline-code unwrap.

A third probe (filler false-positive on "Surely…") passed — the \b boundary correctly rejects it.

Axes

Axis	Result	Evidence
2.1 Semantic correctness	✅	Pipeline does what the contract claims; D-01/D-02 fixed and regression-tested.
2.3 / 2.8 Security & adversarial surface	✅	No new injection sink — buffer text was already model/user-sourced and injected; summarization only shortens/cleans. RE2 regexes (no ReDoS).
2.4 Structural integrity	✅	Single centralized funnel; invariant (summarize on write, not on restore) documented on both `AppendBuffer`/`RestoreBuffer` and guarded by a verbatim-restore test.
2.5 Behavioral exploration	✅	Adversarial probes for early-terminator, unclosed fence, filler edge cases, multibyte/hard-cut boundaries.
2.9 Documentation coverage	✅	No exported-symbol changes (`summarizeForBuffer`/`maxBufferSummaryRunes` unexported; `AppendBuffer`/`RestoreBuffer` signatures unchanged). No public API/doc surface affected.

Signals

go build ./... ✅ · go vet ./... ✅ · gofmt -l empty ✅ · go test ./internal/memory/... -race ✅ · go test ./... ✅ (all pass).

Verdict: AutoApprove after remediation. Both findings fixed and covered by tests before merge.

CI's `go test -race` flagged a write/write data race on mockLLM.lastUser in TestConsolidateOnEnd_FiresAtSessionEnd. Root cause: OnSessionEndWithProvenance legitimately calls the LLM from two goroutines at once — synchronous episode extraction plus the background consolidation goroutine — but the test mocks held unsynchronized shared state. - mockLLM: guard lastUser with a mutex; add getLastUser() accessor. - countCallsLLM: replace the non-atomic *int counter with atomic.Int64 and a calls() accessor (same latent race, same OnSessionEnd trigger). Test-only change; no production code affected. Verified with `go test ./internal/memory/... -short -race -count=30` and the full `go test ./... -short -race` suite. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jkyberneees and others added 2 commits June 6, 2026 13:34

jkyberneees merged commit 7662fe0 into main Jun 6, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): deterministic buffer turn-summaries (no LLM)#16

feat(memory): deterministic buffer turn-summaries (no LLM)#16
jkyberneees merged 3 commits into
mainfrom
feat/memory-buffer-summaries

jkyberneees commented Jun 6, 2026

Uh oh!

jkyberneees commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkyberneees commented Jun 6, 2026

Problem

Change

Why placement matters

Tests

Verification

Uh oh!

jkyberneees commented Jun 6, 2026

🔍 AI Verification Protocol v5.2.7 — Certificate

Adversarial pass (Agent D) — findings & fixes

Axes

Signals

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant