Claude Light: Auto-Warmer & Hybrid RAG Chat CLI

claude_light.py is an interactive CLI chat tool for querying and editing a codebase using Claude. By combining Anthropic's prompt caching with a highly optimized hybrid RAG (Retrieval-Augmented Generation) pipeline, it drastically reduces API costs while maintaining full context of your project.

License: Free for personal and hobby use (PolyForm Noncommercial). Commercial use requires a license — contact Peter Isberg.

📊 Real-World Benchmark Results

Measured against 4 popular Python open-source libraries, running the same 10 queries per repo with each tool (live API runs, 2026-03-23).

Claude Code = the Anthropic CLI (claude --print), each query in a fresh isolated session using its built-in tool-calling to fetch files on demand. claude_light = this project — hybrid RAG + prompt caching with automatic model routing.

Cost per 10 queries — Claude Code vs claude_light

Repository	Repo size	Claude Code	claude_light	Savings
`psf/requests` v2.31.0	108K tokens	$0.91	$0.23	75%
`pallets/flask` 3.0.0	144K tokens	$1.62	$0.17	89%
`encode/httpx` 0.25.2	191K tokens	$1.75	$0.45	74%
`bottlepy/bottle` 0.13.2	92K tokens	$1.52	$0.30	80%
Total (≈40 queries)		$5.81	$1.16	80% / 5× cheaper

Why Claude Code costs more

Claude Code is a powerful general-purpose agentic tool and the cost difference reflects that:

Its built-in system prompt and tool definitions consume ~35–40K tokens per session (cached, but cache-reads still cost $0.30/M).
It always uses Sonnet, even for simple lookups where Haiku would suffice.
Multi-turn tool calls (2–14 turns per query) accumulate cached context with each round-trip.
It rediscovers the codebase via tools on every session — no pre-indexing.

claude_light makes different trade-offs: pre-indexes the codebase offline, routes simple queries to Haiku, and injects only ~2–3K targeted tokens per query.

How to reproduce: python benchmark_claude_code.py — see docs/BENCHMARKS.md for full methodology.

📉 How It Optimizes Token Usage

This tool is aggressively designed to prevent you from paying full price for tokens. It tackles context bloat and cache expiration through several core strategies:

Hybrid RAG + Prompt Caching: The tool uses a two-tier caching strategy. The project's directory tree source files and all .md files act as a cached "skeleton" system prompt.
Three-Tier Prompt Caching: The tool places three ephemeral cache breakpoints in every request: (1) after the skeleton system prompt — stable for the whole session; (2) after the conversation history — stable for all but the newest turn; (3) after the retrieved RAG chunks in the current user message — stable across consecutive questions about the same code area. If you ask follow-up questions about the same module, the chunk block hits the cache and only your new question text is billed at full price ($3.00/M); everything above it costs $0.30/M.
The "Heartbeat" Auto-Warmer: In API-key mode, a background daemon checks the session every 30 seconds and sends a tiny ping if you are idle for more than 4 minutes, keeping your ephemeral cache alive. The skeleton cache uses a 1-hour extended TTL so this rarely needs to fire. (OAuth/Pro-subscription mode skips the heartbeat: each turn already spawns a fresh claude CLI subprocess, so there is no persistent process to warm — the Anthropic backend's content-hashed prompt cache still hits across subprocess invocations on its own.)
Method-Level Chunking: Instead of stuffing entire 1,500-line files into the context window, a brace-depth scanner splits source files into one chunk per method or constructor. Each chunk retains its package, imports, and class header to remain self-contained. This provides a massive 5–10× reduction in retrieved tokens.
Sliding Window History: To prevent conversation history from growing unboundedly and costing you on every turn, the tool caps memory at the last 6 turns via the MAX_HISTORY_TURNS setting.
Strict Scoring Threshold: Two filters drop low-quality chunks before sending them to Claude. MIN_SCORE (0.45) is an absolute cosine-similarity floor, and RELATIVE_SCORE_FLOOR (0.60) drops any chunk whose score is below 60% of the top-ranked chunk for that query. Together they prevent a single weak query from dragging in noise alongside one strong hit.
Compressed Skeleton Tree: The directory tree sent in the cached system prompt is compacted in two ways: single-child directory chains are collapsed (main/java/com/example/ on one line), and sibling files sharing an extension are brace-grouped ({OrderService,UserService,PaymentService}.java). This saves 30–50 % of skeleton tokens on typical Java/Go/Python projects.
Retrieved-Chunk Deduplication: When multiple methods from the same file rank highly, the shared preamble (package, imports, class header) is emitted only once, with all retrieved methods listed underneath. This saves 5–20 % of retrieved-context tokens on class-heavy queries.

For a deeper dive into the implementation, see docs/architecture.md.

🚀 Getting Started

claude_light supports two ways to authenticate. If you are a Claude Pro subscriber, it will automatically detect your CLI session. For the best experience on Windows, we recommend a one-time setup of a long-lived Automation Token (claude setup-token) which provides a permanent, headless-friendly connection.

Note: sentence-transformers pulls in PyTorch, which is approximately 1.5 GB on the first install.

Install as a tool (recommended)

If you have pipx or uv installed, the fastest path is to install claude_light as a globally-available CLI in its own isolated environment:

# pipx
pipx install git+https://github.com/PIsberg/claude_light.git

# uv
uv tool install git+https://github.com/PIsberg/claude_light.git

Then run from any project root:

claude-light                 # interactive
claude-light "what does OrderService do?"   # one-shot

Optional extras:

pipx install "git+https://github.com/PIsberg/claude_light.git#egg=claude-light[llmlingua]"

A PyPI release (so the git+... URL becomes plain pipx install claude-light) is planned — the existing methods below remain fully supported in the meantime.

Quick Start

Install Dependencies:
```
pip install -r requirements.txt
```
Authenticate (Pick One):
- Claude Pro: Run claude auth login OR claude setup-token (recommended for Windows persistence).
- API Key: Set export ANTHROPIC_API_KEY=sk-ant-... in your shell.
Run:
```
python claude_light.py
```

macOS (Automated)

# Clone or download the repo, then:
bash install_macos.sh

The macOS installer handles the details specific to macOS:

Detects Apple Silicon (/opt/homebrew) vs Intel (/usr/local) Homebrew paths
Installs Python via Homebrew if no 3.9+ interpreter is found
Creates a .venv virtual environment to avoid the PEP 668 "externally managed" error on macOS 13+
Generates a run.sh wrapper so you don't need to activate the venv manually
Reminds you to add the key to ~/.zshrc (macOS default shell)

# Set API key (zsh — macOS default since Catalina)
echo 'export ANTHROPIC_API_KEY=sk-ant-your-key-here' >> ~/.zshrc && source ~/.zshrc

# Run from your project root
cd /path/to/your/project
/path/to/claude_light/run.sh

Linux

# Clone or download the repo, then:
bash install.sh
export ANTHROPIC_API_KEY=sk-ant-your-key-here

The script detects your Python 3.9+ interpreter and installs all required and optional packages via pip.

Windows (PowerShell)

# Allow scripts for this session, then run the installer:
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass;
.\install.ps1
[System.Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY","sk-ant-your-key-here","User")

Manual installation (Verified Versions)

If you prefer to install packages yourself, use the requirements.txt file which contains verified, secure versions:

pip install -r requirements.txt

Alternatively, you can install the core packages manually:

# Required
pip install "sentence-transformers>=5.3.0" "numpy>=2.4.3" "watchdog>=6.0.0" "anthropic>=0.86.0" "prompt_toolkit>=3.0.52"

# Optional — strongly recommended
pip install "tree-sitter>=0.25.2" "rich>=14.3.3" "einops>=0.8.2"

Without tree-sitter, chunking falls back to whole-file mode. Without rich, output degrades gracefully to plain text formatting.

🔑 API key

🔑 Authentication

claude_light is designed to be flexible with how you pay for tokens. It resolves your identity in this order:

Environment Variable: ANTHROPIC_API_KEY (Standard API Key)
Local Dotfiles: .anthropic (home dir) or .env (project dir)
Claude CLI Config (OAuth): Automatically reads your Claude Pro session from ~/.claude/.credentials.json.

Option A: Claude Pro Subscription (OAuth)

If you have a Claude Pro or Team subscription and have authorized the official Claude CLI, claude_light will automatically use your flat-rate subscription.

Cost: Included in your $20/mo subscription.
Setup: Just run claude auth login once.

Option B: Anthropic API Key

If you prefer to pay per-token or don't have a Pro subscription:

Cost: Usage-based (Pay-as-you-go via the Anthropic Console).

Setup: Set the environment variable:

# Linux / macOS
export ANTHROPIC_API_KEY=sk-ant-your-key-here

# Windows PowerShell (persistent)
[System.Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY","sk-ant-your-key-here","User")

Run the script from the root of your project — it will immediately build the skeleton, chunk your files, and auto-tune the embedding model.

🚀 Usage

1. Interactive Loop (Default)

python3 claude_light.py

The script runs three concurrent threads to keep your workflow seamless:

Main Thread: Handles the input loop, retrieves relevant chunks per query, and manages the conversation history.
Watchdog: Monitors your directory for file saves. If a source file changes, it automatically re-chunks and re-embeds it; if a .md file changes, it rebuilds the skeleton cache.
Heartbeat Daemon (API-key mode only): Keeps the Anthropic prompt cache warm while you step away. Disabled in OAuth/Pro-subscription mode, where each turn spawns a fresh claude CLI subprocess and there is no persistent process to keep warm.

Auto-Tuned Embedding Models

The script automatically selects the most efficient embedding model based on the size of your repository:

< 50 files: all-MiniLM-L6-v2 (22 MB) for fast startup.
50–199 files: all-mpnet-base-v2 (420 MB) for better semantic depth.
200+ files: nomic-ai/nomic-embed-text-v1.5 for optimal recall on large codebases.

🛡️ Security & SBOM

To ensure supply chain transparency and prevent dependency confusion, claude_light uses:

Pinned Versions: All dependencies in requirements.txt and the installer scripts are pinned to verified, secure minimum versions.
SBOM (Software Bill of Materials): The project maintains an industry-standard SBOM in CycloneDX format.

Continuous Inventory

A GitHub Action automatically generates a fresh sbom.json on every push to main and uploads it as a build artifact.

Manual Generation

You can generate the SBOM manually using the provided scripts:

# Linux / macOS
bash scripts/generate_sbom.sh

# Windows
.\scripts\generate_sbom.ps1

Tests

Run the full unit test suite:

python -m pytest -q

Run a quick CLI smoke test with the synthetic mocked codebase:

python -m claude_light --test-mode small "List all public classes or modules in this codebase"

Larger synthetic presets are also available for local stress testing:

python -m claude_light --test-mode medium
python -m claude_light --test-mode large
python -m claude_light --test-mode extra-large

CI Regression Checks

The GitHub Actions workflow currently runs three checks:

Unit tests

python -m pytest -q

Analytical token-cost regression

python tests/benchmark.py --json > tests/baseline_token_stats_new.json
python tests/check_regression.py tokens tests/baseline_token_stats.json tests/baseline_token_stats_new.json

Offline retrieval regression on the committed local fixture

python tests/benchmark_retrieval.py --fixture tests/fixtures/retrieval_cases.json --output tests/baseline_retrieval_stats_new.json
python tests/check_regression.py retrieval tests/baseline_retrieval_stats.json tests/baseline_retrieval_stats_new.json

If retrieval behavior changes intentionally, refresh the baseline with the same fixture command so CI remains deterministic:

python tests/benchmark_retrieval.py --fixture tests/fixtures/retrieval_cases.json --output tests/baseline_retrieval_stats.json

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
.github		.github
claude_light		claude_light
docs		docs
installers		installers
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CII_BADGE.md		CII_BADGE.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE_SIGNING.md		RELEASE_SIGNING.md
SECURITY.md		SECURITY.md
claude_light.py		claude_light.py
commit_1.sh		commit_1.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
sbom.json		sbom.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Light: Auto-Warmer & Hybrid RAG Chat CLI

📊 Real-World Benchmark Results

Cost per 10 queries — Claude Code vs claude_light

Why Claude Code costs more

📉 How It Optimizes Token Usage

🚀 Getting Started

Install as a tool (recommended)

Quick Start

macOS (Automated)

Linux

Windows (PowerShell)

Manual installation (Verified Versions)

🔑 API key

🔑 Authentication

Option A: Claude Pro Subscription (OAuth)

Option B: Anthropic API Key

🚀 Usage

1. Interactive Loop (Default)

Auto-Tuned Embedding Models

🛡️ Security & SBOM

Continuous Inventory

Manual Generation

Tests

CI Regression Checks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Light: Auto-Warmer & Hybrid RAG Chat CLI

📊 Real-World Benchmark Results

Cost per 10 queries — Claude Code vs claude_light

Why Claude Code costs more

📉 How It Optimizes Token Usage

🚀 Getting Started

Install as a tool (recommended)

Quick Start

macOS (Automated)

Linux

Windows (PowerShell)

Manual installation (Verified Versions)

🔑 API key

🔑 Authentication

Option A: Claude Pro Subscription (OAuth)

Option B: Anthropic API Key

🚀 Usage

1. Interactive Loop (Default)

Auto-Tuned Embedding Models

🛡️ Security & SBOM

Continuous Inventory

Manual Generation

Tests

CI Regression Checks

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages