Paging, Swapping, and Cache Hierarchies for Agentic Context Management
This repository contains the full working code from the blog post: Virtual Memory for LLMs: Implementing Paging, Swapping, and Cache Hierarchies in Agentic Context Management
Treats the LLM context window as an L1 cache in a four-layer memory hierarchy — borrowing virtual memory concepts from operating systems to handle documents and corpora that are arbitrarily larger than any context window.
| Layer | Name | Technology | Access Time |
|---|---|---|---|
| L1 | Active Context | LLM Prompt Window (in-process dict) | ~1ms |
| L2 | RAM — Short-Term | Redis KV Store with TTL | ~5ms |
| L3 | Swap Space — Mid-Term | ChromaDB (vector DB, semantic retrieval) | ~50ms |
| L4 | Disk — Long-Term | Local filesystem / Object Storage | ~200ms+ |
Key mechanisms:
- Dynamic Context Paging — documents are chunked into 2,000-token pages; only relevant pages enter L1
- Page Fault Handling — agent signals missing info; pager loads from lower tiers automatically
- LRU Eviction — least-recently-used pages are evicted from L1 when capacity is needed
- Abstractive Compression — evicted pages are compressed ~10:1 before archival (gpt-4o-mini)
- Context Serialization — compact agent-to-agent hand-off protocol using pointers, not payloads
| File | Description |
|---|---|
main.py |
Entry point — ingests a large document and runs queries |
agent.py |
VirtualMemoryAgent with page fault handling and hand-off |
context_pager.py |
Core paging engine — swap-in, swap-out, LRU eviction |
memory_layers.py |
L1/L2/L3/L4 layer implementations |
page_table.py |
Central index mapping page IDs to tiers and metadata |
compressor.py |
LLM-based abstractive page compression |
context_serializer.py |
Agent-to-agent context snapshot protocol |
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windowspip install -r requirements.txtdocker run -d -p 6379:6379 redis:alpineexport OPENAI_API_KEY="your-key-here"python main.py- Python 3.11+
- OpenAI API key (GPT-4o for agent, GPT-4o-mini for compression)
- Redis (optional, recommended for production)