feat: SummaryMemory backend — rolling LLM-generated compression (closes #3)#7
Merged
Conversation
Rolling-summary memory with two compression modes:
- LLM mode (GROQ_API_KEY set): Groq abstractive summarisation — preserves
semantic meaning and handles fact updates in natural language
- Extractive fallback (zero cost): regex fact-pattern extraction — works
with no API key, passes all CI tests
Benchmark results (extractive, 100 turns, 8 facts):
naive 62.5% recall @ 1,189 tokens/query
rag 100.0% recall @ 58 tokens/query
cascading 75.0% recall @ 261 tokens/query
summary 100.0% recall @ 318 tokens/query ← new
SummaryMemory matches RAG recall while carrying richer narrative context
via its running summary, at 5.5x lower token cost than naive.
Changes:
- memory/summary.py: SummaryMemory class + extractive + LLM helpers
- evaluation/benchmark.py: register "summary" in _make_memory()
- tests/test_pipeline.py: 6 new tests (14 total, all passing)
- tests/test_imports.py: SummaryMemory import check
- CHANGELOG.md: [Unreleased] section
There was a problem hiding this comment.
Pull request overview
Adds a new SummaryMemory backend to MemoryLens that maintains a rolling conversation summary plus a bounded recent-message window, with optional Groq LLM summarization and a deterministic extractive fallback. This fits into the existing set of memory backends (naive / RAG / cascading) used by the benchmark runner and pipeline tests.
Changes:
- Introduces
memory/summary.pyimplementingSummaryMemorywith LLM + extractive compression modes. - Registers the
"summary"backend inevaluation/benchmark.pyand tightens unknown-backend handling. - Adds SummaryMemory coverage in
tests/test_pipeline.pyand import smoke coverage intests/test_imports.py, plus changelog entry.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
memory/summary.py |
New rolling-summary backend with LLM/extractive compression and bounded recent buffer. |
evaluation/benchmark.py |
Adds "summary" backend to _make_memory() and makes unknown backends error explicitly. |
tests/test_pipeline.py |
Adds 6 integration-style tests validating SummaryMemory behavior/metrics. |
tests/test_imports.py |
Adds import smoke-test for SummaryMemory. |
CHANGELOG.md |
Documents the new backend and benchmark results under [Unreleased]. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+130
to
+134
| def add_message(self, role: str, content: str, turn: int) -> None: | ||
| self.recent.append({"role": role, "content": content, "turn": turn}) | ||
| # Compress whenever the verbatim buffer grows past the window | ||
| if len(self.recent) > self.window_size: | ||
| self._compress() |
Comment on lines
+46
to
+49
| if name == "summary": | ||
| # use_llm=None → auto-detect from GROQ_API_KEY env var | ||
| return SummaryMemory(window_size=20, use_llm=None) | ||
| raise ValueError(f"Unknown backend: '{name}'. Choose from: naive, rag, cascading, summary") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Implements
SummaryMemory— a new memory backend that compresses conversation history into a rolling summary, addressing Issue #3.The backend has two compression modes so it works in every environment:
GROQ_API_KEYis setBenchmark results (extractive mode, 100 turns, 8 tracked facts)
SummaryMemory matches RAG's recall while carrying richer narrative context through its running summary — at 5.5× lower token cost than naive.
Type of change
Related issue
Closes #3
How was this tested?
6 new tests added:
test_summary_extractive_fallback_recall_early— ≥75% recall at T=15test_summary_compresses_overflow— recent buffer stays within window_sizetest_summary_context_contains_summary_and_recent— correct context structuretest_summary_reset_clears_state— reset() wipes both buffer and summarytest_summary_token_cost_bounded— tokens < 2000 at T=100test_summary_benchmark_registration—_make_memory("summary")resolves correctlyChecklist
python tests/test_pipeline.py)CHANGELOG.mdupdated under## [Unreleased]GROQ_API_KEY(LLM mode auto-detected)Files changed
memory/summary.pyevaluation/benchmark.py"summary"in_make_memory()tests/test_pipeline.pytests/test_imports.pyCHANGELOG.md