Stop wasting tokens. Inject only what matters.
ContextCut-PRO is the commercial edition of ContextCut — a transparent semantic RAG proxy for Ollama, OpenClaw, and any OpenAI-compatible local LLM endpoint. Drop it in front of your LLM — zero application changes required.
| Feature | Free | PRO |
|---|---|---|
| Semantic RAG injection | ✅ | ✅ |
| MIN_SCORE threshold filtering | ✅ | ✅ |
| Ingest + watch mode | ✅ | ✅ |
| Basic dashboard | ✅ | ✅ |
| Split-panel live dashboard | — | ✅ |
| Integrated streaming chat | — | ✅ |
| Per-message token analytics | — | ✅ |
| Ollama model selector | — | ✅ |
| Commercial usage rights | — | ✅ |
| Priority support | — | ✅ |
| Advanced context-cutting rules | — | ✅ |
Most RAG implementations stuff your entire knowledge base into every prompt. ContextCut uses vector similarity to inject only the chunks that are actually relevant to each query — and skips injection entirely when nothing scores above your threshold.
| Query | Without ContextCut | With ContextCut |
|---|---|---|
| "What are the guardrails?" | 3,000+ tokens (all docs) | 806 tokens (1 relevant chunk) |
| "Explain quantum physics" | 3,000+ tokens (junk) | ~5 tokens (nothing relevant) |
Result: 50–90% token reduction on real workloads.
- Python 3.10+
- Voyage AI API key (free tier works)
- Qdrant running locally or on your LAN
- Ollama or any OpenAI-compatible LLM endpoint
- Valid PRO license key (delivered via email at purchase)
Your purchase confirmation email contains a personalized install link. Run the single command from that email:
curl -fsSL "https://api.contextcut-pro.com/install/CC-PRO-your-key-here" | bashThis downloads the installer with your license key pre-loaded. Follow the prompts for Voyage AI key, Ollama host, Qdrant host, and ports.
curl -fsSL https://raw.githubusercontent.com/StevoKeano/ContextCut-PRO/main/install.sh -o /tmp/cc-install.sh
chmod +x /tmp/cc-install.sh
bash /tmp/cc-install.shWhen prompted, paste your license key from the purchase email.
On macOS, services are registered as launchd agents and start automatically on login.
On Linux, a start.sh script is generated.
cd ~
bash ~/contextcut/stop.sh
rm -rf ~/contextcut
rm -f ~/.contextcut_sessions.jsonAll settings via environment variables:
| Variable | Default | Description |
|---|---|---|
VOYAGE_API_KEY |
(required) | Voyage AI API key |
CONTEXTCUT_UPSTREAM |
http://localhost:11434 |
Ollama or OpenAI-compatible endpoint |
CONTEXTCUT_QDRANT_HOST |
localhost |
Qdrant host |
CONTEXTCUT_QDRANT_PORT |
6333 |
Qdrant port |
CONTEXTCUT_COLLECTION |
contextcut |
Qdrant collection name |
CONTEXTCUT_KB_DIR |
~/contextcut/knowledge |
Knowledge base directory (ingest only) |
CONTEXTCUT_PROXY_PORT |
18788 |
Proxy listen port |
CONTEXTCUT_DASHBOARD_PORT |
18787 |
Dashboard port |
CONTEXTCUT_CTX_LIMIT |
8192 |
Model context window (for % display) |
CONTEXTCUT_TOP_K |
5 |
Max chunks to retrieve |
CONTEXTCUT_MIN_SCORE |
0.30 |
Minimum relevance threshold (0.0–1.0) |
CONTEXTCUT_MODEL |
(empty) | Default model pre-filled in dashboard |
python ingest.py # one-shot ingest all .md files
python ingest.py --watch # ingest then watch for file changes
python ingest.py --query "guardrails" # test semantic search
python ingest.py --clear # wipe collection and start freshNote: Voyage AI free tier has rate limits. ContextCut handles this automatically.
Open http://localhost:18787 to access the PRO split-panel dashboard:
- Left panel — live stats cards, context usage bar, and per-request token table with relevance scores. Updates every few seconds without page reload.
- Right panel — integrated chat with model selector. Responses stream in as they are generated.
python ingest.py --query "your typical query"- Above
0.35— highly relevant, inject 0.20–0.35— tangentially related, use with caution- Below
0.20— noise, skip
Start at 0.30 and adjust for your domain.
Pro License – $29.88 one-time per seat
ContextCut PRO is proprietary software. Purchase grants a single-seat commercial license. See LICENSE.md for full terms.
- Lifetime commercial usage rights
- Priority support
- Advanced context-cutting rules & presets
- Pro dashboard features
Note: This is the local AI context optimizer for Ollama + Qdrant. There is another unrelated public repo with a similar name — this is the PRO edition.
