claude_light.py is an interactive CLI chat tool for querying and editing a codebase using Claude. By combining Anthropic's prompt caching with a highly optimized hybrid RAG (Retrieval-Augmented Generation) pipeline, it drastically reduces API costs while maintaining full context of your project.
License: Free for personal and hobby use (PolyForm Noncommercial). Commercial use requires a license — contact Peter Isberg.
Measured against 4 popular Python open-source libraries, running the same 10 queries per repo with each tool (live API runs, 2026-03-23).
Claude Code = the Anthropic CLI (claude --print), each query in a fresh isolated session using its built-in tool-calling to fetch files on demand.
claude_light = this project — hybrid RAG + prompt caching with automatic model routing.
| Repository | Repo size | Claude Code | claude_light | Savings |
|---|---|---|---|---|
psf/requests v2.31.0 |
108K tokens | $0.91 | $0.23 | 75% |
pallets/flask 3.0.0 |
144K tokens | $1.62 | $0.17 | 89% |
encode/httpx 0.25.2 |
191K tokens | $1.75 | $0.45 | 74% |
bottlepy/bottle 0.13.2 |
92K tokens | $1.52 | $0.30 | 80% |
| Total (≈40 queries) | $5.81 | $1.16 | 80% / 5× cheaper |
Claude Code is a powerful general-purpose agentic tool and the cost difference reflects that:
- Its built-in system prompt and tool definitions consume ~35–40K tokens per session (cached, but cache-reads still cost $0.30/M).
- It always uses Sonnet, even for simple lookups where Haiku would suffice.
- Multi-turn tool calls (2–14 turns per query) accumulate cached context with each round-trip.
- It rediscovers the codebase via tools on every session — no pre-indexing.
claude_light makes different trade-offs: pre-indexes the codebase offline, routes simple queries to Haiku, and injects only ~2–3K targeted tokens per query.
How to reproduce:
python benchmark_claude_code.py— seedocs/BENCHMARKS.mdfor full methodology.
This tool is aggressively designed to prevent you from paying full price for tokens. It tackles context bloat and cache expiration through several core strategies:
- Hybrid RAG + Prompt Caching: The tool uses a two-tier caching strategy. The project's directory tree source files and all
.mdfiles act as a cached "skeleton" system prompt. - Three-Tier Prompt Caching: The tool places three
ephemeralcache breakpoints in every request: (1) after the skeleton system prompt — stable for the whole session; (2) after the conversation history — stable for all but the newest turn; (3) after the retrieved RAG chunks in the current user message — stable across consecutive questions about the same code area. If you ask follow-up questions about the same module, the chunk block hits the cache and only your new question text is billed at full price ($3.00/M); everything above it costs $0.30/M. - The "Heartbeat" Auto-Warmer: In API-key mode, a background daemon checks the session every 30 seconds and sends a tiny ping if you are idle for more than 4 minutes, keeping your ephemeral cache alive. The skeleton cache uses a 1-hour extended TTL so this rarely needs to fire. (OAuth/Pro-subscription mode skips the heartbeat: each turn already spawns a fresh
claudeCLI subprocess, so there is no persistent process to warm — the Anthropic backend's content-hashed prompt cache still hits across subprocess invocations on its own.) - Method-Level Chunking: Instead of stuffing entire 1,500-line files into the context window, a brace-depth scanner splits source files into one chunk per method or constructor. Each chunk retains its package, imports, and class header to remain self-contained. This provides a massive 5–10× reduction in retrieved tokens.
- Sliding Window History: To prevent conversation history from growing unboundedly and costing you on every turn, the tool caps memory at the last 6 turns via the
MAX_HISTORY_TURNSsetting. - Strict Scoring Threshold: Two filters drop low-quality chunks before sending them to Claude.
MIN_SCORE(0.45) is an absolute cosine-similarity floor, andRELATIVE_SCORE_FLOOR(0.60) drops any chunk whose score is below 60% of the top-ranked chunk for that query. Together they prevent a single weak query from dragging in noise alongside one strong hit. - Compressed Skeleton Tree: The directory tree sent in the cached system prompt is compacted in two ways: single-child directory chains are collapsed (
main/java/com/example/on one line), and sibling files sharing an extension are brace-grouped ({OrderService,UserService,PaymentService}.java). This saves 30–50 % of skeleton tokens on typical Java/Go/Python projects. - Retrieved-Chunk Deduplication: When multiple methods from the same file rank highly, the shared preamble (package, imports, class header) is emitted only once, with all retrieved methods listed underneath. This saves 5–20 % of retrieved-context tokens on class-heavy queries.
For a deeper dive into the implementation, see docs/architecture.md.
claude_light supports two ways to authenticate. If you are a Claude Pro subscriber, it will automatically detect your CLI session. For the best experience on Windows, we recommend a one-time setup of a long-lived Automation Token (claude setup-token) which provides a permanent, headless-friendly connection.
Note:
sentence-transformerspulls in PyTorch, which is approximately 1.5 GB on the first install.
If you have pipx or uv installed, the fastest path is to install claude_light as a globally-available CLI in its own isolated environment:
# pipx
pipx install git+https://github.com/PIsberg/claude_light.git
# uv
uv tool install git+https://github.com/PIsberg/claude_light.gitThen run from any project root:
claude-light # interactive
claude-light "what does OrderService do?" # one-shotOptional extras:
pipx install "git+https://github.com/PIsberg/claude_light.git#egg=claude-light[llmlingua]"A PyPI release (so the git+... URL becomes plain pipx install claude-light) is planned — the existing methods below remain fully supported in the meantime.
- Install Dependencies:
pip install -r requirements.txt
- Authenticate (Pick One):
- Claude Pro: Run
claude auth loginORclaude setup-token(recommended for Windows persistence). - API Key: Set
export ANTHROPIC_API_KEY=sk-ant-...in your shell.
- Claude Pro: Run
- Run:
python claude_light.py
# Clone or download the repo, then:
bash install_macos.shThe macOS installer handles the details specific to macOS:
- Detects Apple Silicon (
/opt/homebrew) vs Intel (/usr/local) Homebrew paths - Installs Python via Homebrew if no 3.9+ interpreter is found
- Creates a
.venvvirtual environment to avoid the PEP 668 "externally managed" error on macOS 13+ - Generates a
run.shwrapper so you don't need to activate the venv manually - Reminds you to add the key to
~/.zshrc(macOS default shell)
# Set API key (zsh — macOS default since Catalina)
echo 'export ANTHROPIC_API_KEY=sk-ant-your-key-here' >> ~/.zshrc && source ~/.zshrc
# Run from your project root
cd /path/to/your/project
/path/to/claude_light/run.sh# Clone or download the repo, then:
bash install.sh
export ANTHROPIC_API_KEY=sk-ant-your-key-hereThe script detects your Python 3.9+ interpreter and installs all required and optional packages via pip.
# Allow scripts for this session, then run the installer:
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass;
.\install.ps1
[System.Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY","sk-ant-your-key-here","User")If you prefer to install packages yourself, use the requirements.txt file which contains verified, secure versions:
pip install -r requirements.txtAlternatively, you can install the core packages manually:
# Required
pip install "sentence-transformers>=5.3.0" "numpy>=2.4.3" "watchdog>=6.0.0" "anthropic>=0.86.0" "prompt_toolkit>=3.0.52"
# Optional — strongly recommended
pip install "tree-sitter>=0.25.2" "rich>=14.3.3" "einops>=0.8.2"Without tree-sitter, chunking falls back to whole-file mode. Without rich, output degrades gracefully to plain text formatting.
claude_light is designed to be flexible with how you pay for tokens. It resolves your identity in this order:
- Environment Variable:
ANTHROPIC_API_KEY(Standard API Key) - Local Dotfiles:
.anthropic(home dir) or.env(project dir) - Claude CLI Config (OAuth): Automatically reads your Claude Pro session from
~/.claude/.credentials.json.
If you have a Claude Pro or Team subscription and have authorized the official Claude CLI, claude_light will automatically use your flat-rate subscription.
- Cost: Included in your $20/mo subscription.
- Setup: Just run
claude auth loginonce.
If you prefer to pay per-token or don't have a Pro subscription:
- Cost: Usage-based (Pay-as-you-go via the Anthropic Console).
- Setup: Set the environment variable:
# Linux / macOS export ANTHROPIC_API_KEY=sk-ant-your-key-here # Windows PowerShell (persistent) [System.Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY","sk-ant-your-key-here","User")
Run the script from the root of your project — it will immediately build the skeleton, chunk your files, and auto-tune the embedding model.
python3 claude_light.py
The script runs three concurrent threads to keep your workflow seamless:
- Main Thread: Handles the input loop, retrieves relevant chunks per query, and manages the conversation history.
- Watchdog: Monitors your directory for file saves. If a source file changes, it automatically re-chunks and re-embeds it; if a
.mdfile changes, it rebuilds the skeleton cache. - Heartbeat Daemon (API-key mode only): Keeps the Anthropic prompt cache warm while you step away. Disabled in OAuth/Pro-subscription mode, where each turn spawns a fresh
claudeCLI subprocess and there is no persistent process to keep warm.
The script automatically selects the most efficient embedding model based on the size of your repository:
- < 50 files:
all-MiniLM-L6-v2(22 MB) for fast startup. - 50–199 files:
all-mpnet-base-v2(420 MB) for better semantic depth. - 200+ files:
nomic-ai/nomic-embed-text-v1.5for optimal recall on large codebases.
To ensure supply chain transparency and prevent dependency confusion, claude_light uses:
- Pinned Versions: All dependencies in
requirements.txtand the installer scripts are pinned to verified, secure minimum versions. - SBOM (Software Bill of Materials): The project maintains an industry-standard SBOM in CycloneDX format.
A GitHub Action automatically generates a fresh sbom.json on every push to main and uploads it as a build artifact.
You can generate the SBOM manually using the provided scripts:
# Linux / macOS
bash scripts/generate_sbom.sh
# Windows
.\scripts\generate_sbom.ps1Run the full unit test suite:
python -m pytest -qRun a quick CLI smoke test with the synthetic mocked codebase:
python -m claude_light --test-mode small "List all public classes or modules in this codebase"Larger synthetic presets are also available for local stress testing:
python -m claude_light --test-mode medium
python -m claude_light --test-mode large
python -m claude_light --test-mode extra-largeThe GitHub Actions workflow currently runs three checks:
- Unit tests
python -m pytest -q- Analytical token-cost regression
python tests/benchmark.py --json > tests/baseline_token_stats_new.json
python tests/check_regression.py tokens tests/baseline_token_stats.json tests/baseline_token_stats_new.json- Offline retrieval regression on the committed local fixture
python tests/benchmark_retrieval.py --fixture tests/fixtures/retrieval_cases.json --output tests/baseline_retrieval_stats_new.json
python tests/check_regression.py retrieval tests/baseline_retrieval_stats.json tests/baseline_retrieval_stats_new.jsonIf retrieval behavior changes intentionally, refresh the baseline with the same fixture command so CI remains deterministic:
python tests/benchmark_retrieval.py --fixture tests/fixtures/retrieval_cases.json --output tests/baseline_retrieval_stats.json