Manage your cachly.dev cache instances directly from GitHub Copilot, Claude, Cursor, Windsurf and any other MCP-compatible AI assistant.
Stop your AI from re-reading your entire codebase every time. One command enables context memory and configures all your editors automatically:
CACHLY_JWT=your-jwt npx @cachly-dev/mcp-server setupThe interactive wizard will:
- Authenticate with your cachly account (or prompt for JWT)
- Let you pick which cache instance to use as your AI Brain
- Auto-detect Cursor, Windsurf, VS Code, Claude Code, and Continue.dev
- Write the correct MCP config for every detected editor
- Create/update
CLAUDE.md(idempotent β safe to re-run)
Result: 60% fewer file reads, instant context across sessions, zero re-discovery.
CACHLY_JWT=your-jwt npx @cachly-dev/mcp-server init \
--instance-id your-instance-id \
--editor vscodeOnce connected, just talk to your AI assistant:
"Create a free cachly instance called my-app-cache"
"List all my cache instances"
"Get the connection string for instance abc-123"
"Delete my test-cache instance"
| Tool | Description |
|---|---|
session_start |
Single call returning full briefing: last session, relevant lessons, open failures, brain health. Call at the start of every session. |
session_end |
Save session summary, files changed, duration. Call at the end of every session. |
learn_from_attempts |
Store structured lessons after any bug fix or deploy. Supports severity, file_paths, commands, tags. Deduplicates by topic. |
recall_best_solution |
Retrieve the best known solution for a topic (increments recall count). |
remember_context |
Cache any analysis or architecture finding for future sessions. |
recall_context |
Retrieve cached context by exact key (supports glob: "file:*"). |
smart_recall |
Semantic search across all cached context by meaning/keywords. |
list_remembered |
List all cached context entries. |
forget_context |
Delete stale context. |
| Tool | Description |
|---|---|
list_instances |
List all your cache instances |
create_instance |
Create a new instance (free or paid tier) |
get_instance |
Get details for a specific instance |
get_connection_string |
Get the redis:// connection URL |
delete_instance |
Permanently delete an instance |
| Tool | Description |
|---|---|
cache_get / cache_set / cache_delete |
Live cache operations |
cache_exists / cache_ttl / cache_keys |
Key inspection |
cache_stats |
Memory, hit rate, ops/sec |
cache_mget / cache_mset |
Bulk pipeline operations |
cache_lock_acquire / cache_lock_release |
Distributed locks (Redlock-lite) |
cache_stream_set / cache_stream_get |
LLM token streaming cache |
| Tool | Description |
|---|---|
semantic_search |
Vector similarity search (Speed/Business tier) |
detect_namespace |
Auto-classify prompt into semantic namespace |
cache_warmup |
Pre-warm semantic cache with known Q&A pairs |
index_project |
Index local source files for AI semantic search |
get_api_status |
Check API health + JWT auth info |
CACHLY_JWT=your-jwt npx @cachly-dev/mcp-server setupNo install, no build step. The wizard auto-detects your editors and writes all config files.
Get your JWT token at cachly.dev/settings β API Tokens.
{
"mcpServers": {
"cachly": {
"command": "npx",
"args": ["-y", "@cachly-dev/mcp-server"],
"env": { "CACHLY_JWT": "your-jwt-token-here" }
}
}
}- Claude Code: add to
.claude/mcp.jsonin your project - Claude Desktop (macOS):
~/Library/Application Support/Claude/claude_desktop_config.json
Add to .vscode/mcp.json:
{
"servers": {
"cachly": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@cachly-dev/mcp-server"],
"env": { "CACHLY_JWT": "your-jwt-token-here" }
}
}
}Then: Ctrl/Cmd+Shift+P β "MCP: List Servers" β start cachly.
Add to .cursor/mcp.json:
{
"mcpServers": {
"cachly": {
"command": "npx",
"args": ["-y", "@cachly-dev/mcp-server"],
"env": { "CACHLY_JWT": "your-jwt-token-here" }
}
}
}Same stdio/mcpServers format β add to their respective MCP config file.
| Variable | Required | Default | Description |
|---|---|---|---|
CACHLY_JWT |
β | β | Your Keycloak JWT from cachly.dev/settings |
CACHLY_API_URL |
β | https://api.cachly.dev |
Override for local dev |
User: Create a free cache instance for my OpenAI project
Copilot: I'll create a free cachly instance for you.
[calls create_instance(name="openai-cache", tier="free")]
β
Instance **openai-cache** (FREE) created and provisioning started!
ID: `a1b2c3d4-...`
Status: provisioning
Use `get_connection_string` to get your Redis URL in ~30 seconds.
User: Get the connection string
Copilot: [calls get_connection_string(instance_id="a1b2c3d4-...")]
Connection string for openai-cache:
redis://:password@my-node.cachly.dev:30101
Environment variable:
REDIS_URL="redis://:password@my-node.cachly.dev:30101"
# Run against local API
CACHLY_JWT=your-token CACHLY_API_URL=http://localhost:3001 npm run devThe Problem: Every time you ask Copilot about your codebase, it runs "Ich verschaffe mir einen Γberblick ΓΌber die Codebasis" and re-reads hundreds of files.
The Solution: Cache your AI's "thinking" results:
You: "Analyze the authentication architecture"
AI: [reads 47 files, takes 30 seconds]
The auth uses Keycloak with JWT tokens. The flow is:
1. User hits /sign-in β NextAuth redirect
2. Keycloak validates credentials
3. JWT returned, stored in session
[calls remember_context("auth_architecture", "The auth uses Keycloak...")]
π§ Context saved for future sessions.
--- Next day, new session ---
You: "How does auth work?"
AI: [calls smart_recall("auth")]
π§ Found cached context:
The auth uses Keycloak with JWT tokens...
(No file reading needed β instant answer!)
The setup wizard (see top of this README) writes the session instructions automatically into your CLAUDE.md / .github/copilot-instructions.md β no manual editing needed. Just run npx @cachly-dev/mcp-server setup once and your AI assistant calls session_start at the start of every session automatically.
You're building a microservice that uses multiple cachly instances. Ask your AI assistant directly:
You: "List all my cachly instances and their status"
AI: Using cachly_list_instances...
You have 3 instances:
1. prod-api (Pro, running) β redis://:***@prod.cachly.dev:30101
2. staging (Dev, running) β redis://:***@staging.cachly.dev:30102
3. ml-pipeline (Speed, running) β redis://:***@ml.cachly.dev:30103
Total MRR: β¬72/month
Your semantic cache hit rate dropped overnight. Debug it without leaving your editor:
You: "Check the analytics for my prod-api instance"
AI: Using cachly_semantic_stats...
β οΈ Hit rate dropped 23% in the last 24h (82% β 63%)
Anomaly detected: near-miss spike (+140%)
β 47 queries are hitting similarity 0.80-0.84 (just below your 0.85 threshold)
Recommendation: Lower threshold to 0.82 to capture these near-misses.
You: "Do it"
AI: Using cachly_set_threshold... β
Threshold set to 0.82 for namespace cachly:sem
After deploying a new version, warm the semantic cache with common queries:
You: "Warm up the staging cache with our top 50 support questions"
AI: Using cachly_batch_index with your FAQ embeddings...
β
Indexed 50 entries in 340ms (batch pipeline)
Namespace: cachly:sem:qa
Your cache is ready β first users will get instant responses.
Track your LLM cost savings directly in your IDE:
You: "How much has cachly saved me this month?"
AI: Using cachly_semantic_stats...
π This month's savings:
- Total cache hits: 12,847
- Estimated savings: $384.21 (vs. direct LLM calls)
- Efficiency score: 84/100 (Grade: A)
- Best namespace: cachly:sem:qa (94% hit rate)
Add cache health checks to your deployment workflow:
You: "Check if any cache anomalies would block a deploy"
AI: Using cachly_analytics_anomalies...
β
No critical anomalies detected.
1 info-level notice: stale cache in namespace "translations"
(12 near-misses/24h, 0 new entries)
Recommendation: Run warmup after deploy for translations namespace.
Deploy is safe to proceed.
MIT Β© cachly.dev