Official Python SDK for cachly.dev —
Managed Valkey/Redis cache built for AI apps. GDPR-compliant · German servers · Live in 30 seconds.
pip install cachly
# or
uv add cachlyRequires Python 3.10+. Uses
redis-pyandnumpy(for semantic cache).
import os
from cachly import CachlyClient
cache = CachlyClient(url=os.environ["CACHLY_URL"])
# Set / Get
cache.set("user:42", {"name": "Alice"}, ttl=300)
user = cache.get("user:42") # returns dict or None
# Get-or-Set pattern
report = cache.get_or_set("report:monthly", lambda: db.run_expensive_report(), ttl=3600)
# Atomic counter
views = cache.incr("page:views")
cache.close()Create your free instance at cachly.dev — no credit card required.
from cachly.asyncio import AsyncCachlyClient
async def main():
cache = AsyncCachlyClient(url=os.environ["CACHLY_URL"])
await cache.set("session:abc", session_data, ttl=1800)
data = await cache.get("session:abc")
await cache.close()Cache LLM responses by meaning, not exact text. The same prompt phrased differently returns the cached answer — cutting OpenAI costs by up to 60 %.
from cachly import SemanticOptions
result = cache.semantic.get_or_set(
prompt=user_question,
fn=lambda: openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_question}]
),
embed_fn=lambda text: openai_client.embeddings.create(
model="text-embedding-3-small", input=text
).data[0].embedding,
options=SemanticOptions(similarity_threshold=0.92, ttl_seconds=3600),
)
print("hit" if result.hit else "miss", result.value)Bundle GET/SET/DEL/EXISTS/TTL operations into one HTTP request (or Redis pipeline).
from cachly import CachlyClient, BatchOp
cache = CachlyClient(
url=os.environ["CACHLY_URL"],
batch_url=os.environ.get("CACHLY_BATCH_URL"), # optional
)
result = await cache.batch([
BatchOp("get", "user:1"),
BatchOp("get", "config:app"),
BatchOp("set", "visits", str(time.time()), ttl=86400),
BatchOp("exists", "session:xyz"),
BatchOp("ttl", "token:abc"),
])
user = result[0] # str | None
config = result[1] # str | None
ok = result[2] # bool
present = result[3] # bool
secs = result[4] # int (-1 = no TTL, -2 = key missing)Without batch_url the method falls back automatically to a Redis pipeline (one TCP round-trip).
# FastAPI
from fastapi import FastAPI
from cachly import CachlyClient
app = FastAPI()
cache = CachlyClient(url=os.environ["CACHLY_URL"])
@app.on_event("shutdown")
async def shutdown():
cache.close()
@app.get("/data/{key}")
async def get_data(key: str):
return cache.get_or_set(key, lambda: fetch_from_db(key), ttl=60)cachly ships a 30-tool MCP server that gives Claude Code, Cursor, GitHub Copilot, and Windsurf a persistent memory across sessions — so they never forget your architecture, lessons learned, or last session context.
# One-time setup
npx @cachly-dev/initOr configure manually in your editor (~/.vscode/mcp.json / .cursor/mcp.json):
{
"servers": {
"cachly": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@cachly-dev/mcp-server"],
"env": { "CACHLY_JWT": "your-jwt-token" }
}
}
}Add to your AI assistant instructions (e.g. .github/copilot-instructions.md):
## cachly AI Brain
At the START of every session:
session_start(instance_id = "your-instance-id", focus = "what you're working on today")
At the END of every session:
session_end(instance_id = "your-instance-id", summary = "...", files_changed = [...])
After any bug fix or deploy:
learn_from_attempts(instance_id = "your-instance-id", topic = "category:keyword",
outcome = "success", what_worked = "...", what_failed = "...", severity = "major")session_start returns a full briefing in one call: last session summary, relevant lessons, open failures, brain health. 60 % fewer file reads, instant context, zero re-discovery.
→ Full docs: cachly.dev/docs/ai-memory
Use cachly as a drop-in caching proxy for OpenAI or Anthropic — no SDK changes needed:
# Instead of https://api.openai.com — use your cachly proxy URL:
OPENAI_BASE_URL=https://api.cachly.dev/v1/llm-proxy/YOUR_TOKEN/openai
# Anthropic:
ANTHROPIC_BASE_URL=https://api.cachly.dev/v1/llm-proxy/YOUR_TOKEN/anthropicIdentical requests are served from cache with X-Cachly-Cache: HIT. Check savings via GET /v1/llm-proxy/YOUR_TOKEN/stats.
Checkpoint agent workflow state so agents can resume from the last completed step on crash:
import httpx
base = f"https://api.cachly.dev/v1/workflow/{token}"
# Save a checkpoint after each workflow step
httpx.post(f"{base}/checkpoints", json={
"run_id": "my-run-123",
"step_index": 0,
"step_name": "research",
"agent_name": "researcher",
"status": "completed",
"state": json.dumps({"topic": "AI caching", "results": []}),
})
# Resume: get the latest checkpoint for a run
checkpoint = httpx.get(f"{base}/runs/my-run-123/latest").json()
# → {"step_index": 2, "step_name": "write", "state": "...", "status": "completed"}from cachly import CachlyClient, CachlyConfig, PoolConfig
cache = CachlyClient(config=CachlyConfig(
url=os.environ["CACHLY_URL"],
pool=PoolConfig(
keep_alive_s=30, # PING every 30s (prevents firewall idle-disconnect)
max_retries=10, # reconnect retries with exponential backoff
base_retry_delay_s=0.1, # first retry delay
max_retry_delay_s=10, # retry delay cap
idle_timeout_s=300, # auto-disconnect after 5 min idle (0 = disabled)
on_error=lambda e: print(f"cachly error: {e}"),
on_reconnect=lambda: print("cachly reconnected"),
),
))Every command is automatically retried on transient errors (ConnectionError, TimeoutError, BusyLoadingError, …) using AWS-style full-jitter backoff:
from cachly import CachlyClient, RetryConfig
cache = CachlyClient(
url=os.environ["CACHLY_URL"],
retry=RetryConfig(
max_retries=3, # retry up to 3× (default)
base_delay_s=0.05, # first retry after ~50ms
max_delay_s=2.0, # cap at 2s
),
)Disable retries with RetryConfig(max_retries=0).
from opentelemetry import trace
cache = CachlyClient(
url=os.environ["CACHLY_URL"],
otel_tracer=trace.get_tracer("my-app"),
)
# Every get/set/delete/incr produces OTEL spans:
# span: "cache.get" attributes: { cache.key: "user:42" }
# span: "cache.set" attributes: { cache.key: "user:42", cache.ttl: 300 }| Method | Description |
|---|---|
CachlyClient(url, batch_url=None, pool=None) |
Create client from Redis URL |
get(key) |
Get value (None if missing); auto-deserialises JSON |
set(key, value, ttl=None) |
Set value, optional TTL in seconds |
delete(*keys) |
Delete one or more keys |
exists(key) → bool |
Check existence |
expire(key, seconds) |
Update TTL |
incr(key) → int |
Atomic increment |
get_or_set(key, fn, ttl=None) |
Get-or-set pattern |
batch(ops) → BatchResult |
Bulk ops in one round-trip |
semantic |
SemanticCache for AI workloads |
raw |
Direct redis.Redis access |
close() |
Close connection pool and stop keep-alive |
CACHLY_URL=redis://:your-password@my-app.cachly.dev:30101
CACHLY_BATCH_URL=https://api.cachly.dev/v1/cache/YOUR_TOKEN # optionalMIT © cachly.dev