LLM proposes. System decides. State persists.
Enterprise-grade memory and state management for any LLM — crash recovery, conflict tracking, audit trails, and deterministic lifecycle control. Single file. Zero dependencies. Zero lock-in. Outperforms multi-tool stacks while fitting inside a chat window.
Every LLM session starts from zero. Close the tab, lose the state. The industry "solutions" are duct tape: chat history dumps, vector DBs that hallucinate retrieval, framework lock-in that breaks across platforms.
RAG Runtime Kernel wraps around your project — it doesn't replace your workflow, it adds a structured memory and orchestration layer on top. One markdown file. Zero dependencies. Drop it into any LLM session and you get: deterministic state persistence, crash recovery, conflict tracking, and cross-session memory that actually works — across Claude, GPT, and any LLM.
In head-to-head benchmarks, this single-file specification matches or exceeds multi-tool stacks (Claude Code, lean-ctx, LLM Wiki) on state management, crash recovery, and cross-platform interoperability — while requiring zero installation.
Key benefits:
- Persistence — your project state survives across sessions, tabs, and platforms
- Reduced context loss — HOT/COLD memory tiers keep only what's needed in context
- Improved autonomy — the LLM self-enforces all rules without external tooling
- Audit trail — every decision, conflict, and state change is logged and traceable
Important: The Init Prompt is a full specification (~16K tokens). It goes into a project session, not the Instructions/System Prompt field (which has size limitations on most platforms).
- Create a new Project (or open an existing one)
- Start a new session within that project
- Drop
INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.6.mdinto the session as a file - Send your first message — the system bootstraps itself
- Follow on-screen steps: provide root paths, optional project description, optional POV config
- Copy the generated pointer block into your Project Instructions when prompted
- All subsequent sessions auto-load the RAG and enforce all rules
- Open a new conversation (or use Custom GPT if available)
- Upload or paste the contents of
INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.6.md - Send your first message — the system bootstraps in autonomous mode
- Follow on-screen steps (same as above)
- At session end, download the generated RAG files and save to your project folder
- Upload RAG files at the start of each new session to restore state
For hard runtime validation of every state transition, use the Python runtime:
# HTTP mode (for GPT Chat Custom Actions or any HTTP client)
python -m rag_kernel serve --project /path/to/your/RAG --port 7437
# MCP mode (for Claude Desktop)
python -m rag_kernel mcp --project /path/to/your/RAGFull setup instructions for all platforms and modes: docs/LAUNCH_MANUAL.md
Cowork is Anthropic's desktop tool for non-developers to automate file and task management.
New project: Create a project folder with a RAG/ subfolder, open Cowork, start a session, and drop the Init Prompt file in. The system bootstraps, scans your project folder, and builds the RAG.
Existing project: Point the system to your existing project folder during bootstrap. The boot scan inventories all existing files, classifies them by tier, and extracts knowledge into COLD storage. Your existing work becomes queryable, trackable, and persistent.
Benefits: Cowork's file access lets the kernel read/write RAG files directly — no manual copy-paste. Task automation pairs naturally with the kernel's checkpoint and audit system.
Claude Code is Anthropic's CLI tool for agentic coding tasks.
New project: Initialize your project directory, reference the Init Prompt in a Claude Code session, and the system creates RAG files in your RAG/ directory via direct filesystem access.
Existing project: Add a RAG/ directory to your existing codebase, bootstrap the kernel — it scans your project, builds inventory, and starts tracking state.
How it enhances Claude Code: Context persistence across stateless sessions. Deterministic state machine structures long-running development. Zero-token file ops via direct filesystem access. Conflict ledger preserves both sides when code changes contradict prior decisions.
Full benchmark: docs/benchmark_comparison.md
| Capability | RAG Runtime Kernel | Claude Code | lean-ctx | LLM Wiki |
|---|---|---|---|---|
| Cross-session memory | Full: HOT/COLD + WAL + crash recovery | Partial: CLAUDE.md, no crash recovery | None | Pattern only |
| Deterministic state machine | BOOTING > READY > WORKING > CHECKPOINTING > CLOSING + RECOVERY | None | None | None |
| Token efficiency | 60-90% reduction (HOT-only boot ~4K tokens) | Unbounded growth without curation | 60-99% raw compression (best-in-class I/O) | Depends on wiki quality |
| Cross-platform | Claude + GPT + any LLM, same spec | Claude Code only | Editor-focused | Platform-agnostic pattern |
| Dependencies | Zero. Single markdown file | Node.js + CLI | Rust binary | Varies |
| Crash recovery | WAL replay + .bak rotation + RECOVERY state | File-history checkpoints | N/A | None |
| Conflict tracking | Explicit ledger — both sources preserved | None | N/A | None |
- Only system with a formal state machine on LLM workflows — deterministic transition guards, not ad-hoc
- Only system that works identically across Claude and GPT — the spec is the invariant
- Only system with atomic write protocol + WAL + backup rotation — enterprise-grade persistence
- Formally verified with TLA+ — the same technique Amazon uses for AWS infrastructure (see below)
- Zero install, zero dependencies — the specification IS the product
- Conflict ledger is unique — no other system tracks disagreements between sources
A specification — a complete protocol that turns any LLM into a controlled, auditable agent with persistent project memory. 3-layer architecture:
LLM (reasoning engine)
| JSON proposals
Policy Layer (this specification)
| validated transitions
Runtime Kernel (state + persistence)
| atomic writes
Filesystem (source of truth)
The state machine is verified using TLA+ and the TLC model checker — the same formal methods technique used by Amazon to verify AWS infrastructure.
TLC exhaustively explored 136,193 states (84,261 distinct) and verified all 8 safety invariants with zero violations:
| Invariant | What It Proves |
|---|---|
| TypeInvariant | All state variables hold valid types at all times |
| TransitionSafety | Every reachable state is legal per the transition graph |
| SingleWriter | At most one proposal staged at any time (no concurrent mutations) |
| WALConsistency | Write-ahead log is append-only, monotone, and never lags behind state |
| TerminalSafety | CLOSING is irreversible — no exit, no crash, no pending proposals |
| NoDeadlock | Every non-terminal state has at least one enabled action |
| CrashRecoveryConsistency | Crash flag is only true when state is RECOVERY |
| WALPrecedesStateChange | WAL entry exists before any state transition commits |
The TLA+ specification (formal/RAGKernel.tla) is a direct transcription of the Python state machine — every transition, guard, and invariant maps 1:1 to the runtime code. Full results in formal/TLC_RESULTS.md.
Unit tests prove "these 337 scenarios work." TLA+ proves "no scenario can ever violate the invariants." That is a fundamentally stronger guarantee.
Structured Memory (HOT/COLD) — Active state stays lean (~15KB). Archival data loads on-demand with automatic partitioning.
Deterministic State Machine — BOOTING > READY > WORKING > CHECKPOINTING > CLOSING with RECOVERY path.
Proposal > Validation > Commit — LLM proposes JSON actions. System validates against policy, then commits or rejects.
Atomic Persistence — All writes atomic and verified. WAL enables crash recovery.
COLD Partitioning — Auto-splits into sessions/inventory/conflicts/evidence with sub-partitioning and integrity-preserving chopping.
Tool Fallback Chain — Ordered fallback for file operations across platform tools.
Cross-Platform — Claude Projects, ChatGPT, Cowork, Claude Code, any LLM.
Multi-Account Safety — Session identity tagging, write collision detection, anti-corruption guards.
Full Audit Trail — Every state transition, decision, and conflict logged.
Token Efficiency — 70-95% reduction vs. naive approaches.
| Mode | How It Works |
|---|---|
| Autonomous | LLM self-enforces all rules. No external software needed. Default mode. |
| Enforced | Python runtime (v3.2) intercepts all mutations. 8 modules, 337 tests, 5811 lines. |
Minimum: An LLM that supports file uploads or long-form input + a project folder.
Recommended: Filesystem MCP for direct file read/write.
Optional: Shell/PowerShell MCP, Python 3.10+ (ENFORCED mode), Claude Code or Cowork.
rag-runtime-kernel/
├── INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.6.md # The specification (the product)
├── CONTRIBUTING.md # How to report issues
├── CHANGELOG.md # Version history
├── docs/
│ ├── architecture.md # System architecture
│ ├── benchmark_comparison.md # Head-to-head vs alternatives
│ ├── design_principles.md # Core design philosophy
│ ├── test_analysis_gpt_web.md # GPT Web test findings
│ ├── LAUNCH_MANUAL.md # Full setup guide (all platforms + modes)
│ ├── LOCAL_TESTING_GUIDE.md # Local dev testing & GPT Custom Actions
│ ├── v3.2_ARCHITECTURE_DESIGN.md # v3.2 runtime architecture
│ └── ROADMAP.md # Development roadmap
├── rag_kernel/ # v3.2 Runtime Bridge (ENFORCED mode)
│ ├── __main__.py # CLI entry point (serve / mcp)
│ ├── api.py # HTTP API (FastAPI)
│ ├── state_machine.py # Deterministic state engine
│ ├── persistence.py # Atomic writes, WAL, hash verification
│ ├── cold_manager.py # COLD partition manager
│ ├── concurrency.py # Lock manager, write collision guard
│ ├── mcp_transport.py # MCP tool interface
│ └── schemas.py # Pydantic models for proposals/state
├── tests/ # Test suites
│ ├── test_state_machine.py # State machine unit tests
│ ├── test_persistence.py # Persistence + WAL tests
│ ├── test_cold_manager.py # COLD partition tests
│ ├── test_concurrency.py # Lock + collision tests
│ ├── test_api.py # HTTP API tests
│ ├── test_mcp_transport.py # MCP transport tests
│ ├── test_schemas.py # Schema validation tests
│ ├── test_main.py # CLI entry point tests
│ ├── UNIT_TEST_CLAUDE_DESKTOP.md # Claude Desktop spec-level tests (42)
│ └── UNIT_TEST_GPT_WEB.md # GPT Web spec-level tests (43)
├── .github/
│ ├── FUNDING.yml # GitHub Sponsors
│ └── ISSUE_TEMPLATE/ # Bug report + feature request templates
├── formal/
│ ├── RAGKernel.tla # TLA+ state machine specification
│ ├── RAGKernel.cfg # TLC model checker configuration
│ └── TLC_RESULTS.md # Verification results (136K states, 8 invariants)
├── LICENSE # AGPL-3.0
└── README.md
- BOOTING — Load HOT, verify consistency, check WAL, probe tools
- READY — Accept tasks
- WORKING / INGESTING — Execute tasks, ingest files, extract knowledge
- CHECKPOINTING — Save atomically with backup rotation
- CLOSING — Audit findings, final save
- Autonomous mode is self-enforced — the LLM follows the spec by instruction, not by hard runtime constraints
- Persistence depends on platform — full atomic writes with MCP; manual file management on GPT Web
- Context window ceiling — spec consumes ~16K tokens; large projects may hit limits
- Not a database — structured file-based memory, not a production database replacement
See docs/test_analysis_gpt_web.md for detailed platform-specific findings.
- Context window bound — spec ~16K tokens; large projects may hit limits
- No cross-filesystem bridge yet — relies on platform tools; user-assisted I/O without them
- Single-writer — concurrent writes detected-and-halted, not auto-merged
- GPT Web — no atomic writes, no real token counter, manual persistence
See docs/ROADMAP.md for complete roadmap.
| Release | Status | Focus |
|---|---|---|
| v3.1.6 | Released | 43-section spec: pre-flight gate enforcement, known-issues registry, tool hierarchy |
| v3.2 | Released | Runtime Bridge: 8 Python modules, 337 tests, 5811 lines. ENFORCED mode live. TLA+ formal verification: 136K states, 8 safety invariants verified. |
| v3.3 | Planned | UX: graduated POV, conflict auto-categorization, delta checkpoints |
| v4.0 | Planned | Graph Orchestrator: DAG execution, parallel tasks, dependency tracking |
Found a bug? Please open an issue using the provided templates. See CONTRIBUTING.md.
Developer: Artem Pakhol LinkedIn: linkedin.com/in/pakhol
This project is licensed under the GNU Affero General Public License v3.0 — see LICENSE.
What this means: You may use, modify, and distribute this software, but any modified version you deploy (including as a network service) must also be released under AGPL-3.0 with attribution to the original project.
