Version: 1.3 Status: Specification & Skill Implementation Author: Dan Sasser / Gorombo Labs
CAL extends Claude Code into a multi-process reasoning system that orchestrates subprocess executors (like ollama-code, claude -p, claude --agent) while maintaining persistent context through a memory ledger.
Proven Results: 5.5× token efficiency | 92.25% context retention | 69,500 tokens saved per session
Large codebases and deep reasoning chains push Claude Code beyond its context window limits. Even with compression, you face:
| Problem | Cause | Impact |
|---|---|---|
| Context Loss | Finite context window | Re-analyzing same files |
| Token Waste | Redundant file ingestion | Inefficient resource use |
| Semantic Drift | Truncated memory | Inconsistent reasoning |
| Session Amnesia | No persistent state | Lost work between sessions |
CAL's Solution: Delegate heavy analysis to subprocess executors, persist results in a ledger, reuse context across sessions.
Developer Request
↓
Claude Code (Controller)
├─ Analyzes task
├─ Decides: Sequential? Concurrent? Staggered?
└─ Spawns ollama-code subprocess(es)
↓
Executors analyze in parallel
↓
Return structured JSON summaries
↓
Claude Code validates checksums
↓
Commits to .cal_memory.json ledger
↓
Reintegrates summaries into context
↓
Developer receives result with preserved context
Three Core Components:
- Controller (
Claude Code) - Orchestrates tasks, validates outputs, manages ledger - Executors (
ollama-code,claude -p,claude --agent,gemini-cli,forgecode) - Perform heavy analysis - Ledger (
.cal_memory.json) - Persistent context store
Spawn multiple executors in parallel for independent analyses:
# Concurrent analysis of different modules
Task 1: ollama-code analyzes src/core/
Task 2: ollama-code analyzes src/api/
Task 3: ollama-code analyzes src/tests/
# All running simultaneously, results mergedThe .cal_memory.json ledger stores:
- Task summaries (compressed analysis results)
- Token usage per task
- Dependency chains (which tasks depend on which)
- Checksums (integrity validation)
- Timestamps (audit trail)
Result: Resume work across sessions without re-analyzing.
CAL decides execution strategy automatically:
| Mode | When | Example |
|---|---|---|
| Sequential | Task depends on prior context | "Analyze deps, THEN find issues" |
| Concurrent | Independent analyses | "Check 5 modules simultaneously" |
| Staggered | High I/O load | "Process logs with 500ms delays" |
| Queued | Ledger conflict detected | "Wait for lock release, then retry" |
- Automatic retry with exponential backoff
- Checksum verification (SHA256)
- Ledger rollback on corruption
- Subprocess crash recovery
Works with any CLI executor that:
- Accepts stdin prompts
- Returns JSON output
- Operates on file system
Source: Context Efficiency Report (2025-10-25)
| Operation | Baseline Tokens | With CAL | Savings | Efficiency |
|---|---|---|---|---|
| Repository Scan | 12,000 | 1,800 | 85% | 6.7× |
| Dependency Map | 7,500 | 900 | 88% | 8.3× |
| Test Summary | 4,300 | 600 | 86% | 7.1× |
| API Validation | 5,200 | 720 | 86% | 7.2× |
| Aggregate Mean | 85,000 | 15,500 | 82% | 5.5× |
Context Utilization (% of available budget)
Without CAL: ████████████████████ (57.5%)
With CAL: ██████████████████████████████ (92.25%)
Key Insight: CAL preserves 34.75% more context per session by offloading analysis to executors and storing compressed summaries.
Without CAL (Direct Analysis):
Claude Code reads 200 files → 85,000 tokens consumed → context overflow → summarize → semantic drift
With CAL (Delegated Analysis):
Claude Code spawns ollama-code → ollama-code reads 200 files → returns 1,500 token summary → Claude Code appends to ledger → 15,500 tokens total
Task: Analyze large codebase (500+ files)
CAL Orchestration:
Step 1: Claude Code breaks task into 3 independent analyses
├─ Subprocess 1: ollama-code analyzes authentication module
├─ Subprocess 2: ollama-code analyzes database layer
└─ Subprocess 3: ollama-code analyzes API endpoints
Step 2: All 3 run concurrently (parallel spawn)
Step 3: Claude Code collects 3 JSON outputs:
{
"task_id": "cal_001",
"executor": "ollama-code",
"summary": "Auth module uses JWT, OAuth2. Found security issue in token refresh.",
"tokens_used": 1200,
"checksum": "sha256:abc123..."
}
Step 4: Validates checksums, commits to .cal_memory.json
Step 5: Sequential dependent task:
ollama-code analyzes how all 3 modules interact (uses ledger context)
Result: 500+ files analyzed with 5.5× efficiency
Session 1 (Monday):
Claude Code delegates: "Analyze project architecture"
ollama-code produces: 15-page architecture document
Ledger stores: Compressed summary (800 tokens)
Session 2 (Tuesday):
Claude Code reads ledger: Loads Monday's architecture summary
Claude Code delegates: "Find performance bottlenecks in architecture"
ollama-code uses: Previous context from ledger
Result: No re-analysis needed, work continues seamlessly
✅ Large Codebase Analysis (100+ files) ✅ Iterative Development (multi-session projects) ✅ Dependency Mapping (complex module interactions) ✅ Documentation Generation (requires broad context) ✅ Code Review (parallel file analysis) ✅ Debugging Sessions (preserve investigation context)
❌ Single file edits ❌ Simple questions with obvious answers ❌ Tasks that fit easily in context window ❌ One-off commands with no state preservation needed
Rule of Thumb: If the task requires > 20,000 tokens OR spans multiple sessions, use CAL.
| Feature | Claude Code Alone |
Claude Sub-Agents | CAL (Multi-Executor) |
|---|---|---|---|
| Delegated Execution | ✅ Internal only | ✅ claude -p, --agent, External CLI & HTTP |
|
| Token Efficiency | ✅ 35-45% savings | ||
| Persistent State | ❌ Ephemeral | ❌ Session-based | ✅ .cal_memory.json ledger |
| Cross-Model Support | ❌ | ❌ | ✅ Claude + Ollama + any CLI/MCP |
| File-System Access | ❌ Sandboxed | ✅ Full read access | |
| Scalability | ✅ Concurrent spawn pool | ||
| Fault Tolerance | ✅ Auto-retry + rollback | ||
| Governance Model | ❌ | ✅ Checksum + sandbox | |
| Empirical Validation | ✅ Verified in lab |
{
"version": "1.3",
"tasks": [
{
"task_id": "cal_2025_10_25_0001",
"executor": "ollama-code",
"summary": "Analyzed authentication module. Uses JWT with 1-hour expiry. Refresh token stored in httpOnly cookie.",
"tokens_used": 1520,
"checksum": "sha256:c94a16d1...",
"linked_tasks": [],
"timestamp": "2025-10-25T14:30:00Z",
"metadata": {
"repo": "my-project",
"branch": "main",
"mode": "concurrent"
}
}
]
}- Task completes → Summary validated → Committed to ledger
- New session starts → Claude Code reads ledger → Context restored
- New task delegates → Executor can reference previous summaries
- Token savings → Compressed summaries replace full file re-reads
- Automatic rotation at 250MB (archives to
.cal_memory_history/) - Checksum verification on every read/write
- Query by task_id, executor, or timestamp
- Dependency tracking (which tasks depend on which)
See SPECIFICATION.md for complete schema details
Default: 3-5 concurrent executor spawns Configurable: Adjust based on system resources Constraint: Prevents overwhelming Ollama/system memory
CAL automatically selects execution strategy:
- Context-Priority: Weight by dependency depth
- Round-Robin: Even distribution across executors
- Dependency-Chain: Sequential execution by task reference
Currently supported:
ollama-code(primary executor)gemini-cli(alternative)forgecode(alternative)
Extensible: Any CLI tool with JSON output
See SPECIFICATION.md for executor registration protocol
Symptom: Executor exits with code ≠ 0 Recovery: CAL automatically respawns with exponential backoff (500ms → 1500ms) Check: Executor logs for root cause
Symptom: Checksum mismatch detected Recovery: CAL rolls back to last valid snapshot Prevention: Regular ledger validation
Symptom: Executor doesn't respond within 5 seconds Recovery: CAL terminates process and retries Adjustment: Increase timeout for large analysis tasks
Symptom: Heap monitor alerts Recovery: CAL purges cache and resumes Prevention: Use ledger rotation (250MB threshold)
See SPECIFICATION.md Section 7 for complete error code table (E001-E008)
Current: Specification complete, skill in development Next: Claude Code skill implementation Future:
- Adaptive memory compaction (v1.4)
- Distributed ledger sync (v1.5)
- Executor auto-discovery (v1.6)
- Federated CAL nodes (v2.0)
See SPECIFICATION.md Section 14 for complete roadmap
- SPECIFICATION.md - Complete implementation blueprint (for developers building CAL)
- Context Efficiency Report - Empirical validation of 5.5× efficiency gains
- deepseek-consolidation-analysis.md - Analysis methodology
- GitHub Repository - [Coming soon]
Dual-Licensed:
- MIT License - For private/commercial use (modify and distribute freely)
- AGPL v3 License - For research/transparency (disclose source and derivatives)
SPDX-License-Identifier: MIT OR AGPL-3.0-only
Copyright © 2025 Sasser Development LLC / Gorombo Labs
Attribution: This tool extends the Context Arbitration Layer (CAL) architecture developed by Dan Sasser / Gorombo Labs AI Research.
Contributors: Claude Code, ollama-code, gemini-cli, forgecode
Key Metrics:
- Token efficiency: 5.5× improvement
- Context retention: 92.25% (vs 57.5% baseline)
- Token savings: 69,500 per session (200K budget)
- Concurrency: 3-5 parallel spawns (configurable)
- Ledger rotation: 250MB threshold
Key Concepts:
- Controller:
Claude Codeorchestrates everything - Executor:
ollama-code,claude -p,claude --agent(or others) perform analysis - Ledger:
.cal_memory.jsonstores persistent context - Arbitration: Automatic decision on sequential/concurrent/staggered/queued execution
Implementation: See SPECIFICATION.md for complete technical details
CAL transforms token-bounded computation into context-persistent orchestration.