cachly Python SDK

Official Python SDK for cachly.dev —
Managed Valkey/Redis cache built for AI apps. GDPR-compliant · German servers · Live in 30 seconds.

Installation

pip install cachly
# or
uv add cachly

Requires Python 3.10+. Uses redis-py and numpy (for semantic cache).

Quick Start

import os
from cachly import CachlyClient

cache = CachlyClient(url=os.environ["CACHLY_URL"])

# Set / Get
cache.set("user:42", {"name": "Alice"}, ttl=300)
user = cache.get("user:42")           # returns dict or None

# Get-or-Set pattern
report = cache.get_or_set("report:monthly", lambda: db.run_expensive_report(), ttl=3600)

# Atomic counter
views = cache.incr("page:views")

cache.close()

Create your free instance at cachly.dev — no credit card required.

Async Usage

from cachly.asyncio import AsyncCachlyClient

async def main():
    cache = AsyncCachlyClient(url=os.environ["CACHLY_URL"])

    await cache.set("session:abc", session_data, ttl=1800)
    data = await cache.get("session:abc")

    await cache.close()

Semantic AI Cache

Cache LLM responses by meaning, not exact text. The same prompt phrased differently returns the cached answer — cutting OpenAI costs by up to 60 %.

from cachly import SemanticOptions

result = cache.semantic.get_or_set(
    prompt=user_question,
    fn=lambda: openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_question}]
    ),
    embed_fn=lambda text: openai_client.embeddings.create(
        model="text-embedding-3-small", input=text
    ).data[0].embedding,
    options=SemanticOptions(similarity_threshold=0.92, ttl_seconds=3600),
)

print("hit" if result.hit else "miss", result.value)

Batch API — Multiple Ops in One Round-Trip

Bundle GET/SET/DEL/EXISTS/TTL operations into one HTTP request (or Redis pipeline).

from cachly import CachlyClient, BatchOp

cache = CachlyClient(
    url=os.environ["CACHLY_URL"],
    batch_url=os.environ.get("CACHLY_BATCH_URL"),  # optional
)

result = await cache.batch([
    BatchOp("get",    "user:1"),
    BatchOp("get",    "config:app"),
    BatchOp("set",    "visits", str(time.time()), ttl=86400),
    BatchOp("exists", "session:xyz"),
    BatchOp("ttl",    "token:abc"),
])
user    = result[0]   # str | None
config  = result[1]   # str | None
ok      = result[2]   # bool
present = result[3]   # bool
secs    = result[4]   # int (-1 = no TTL, -2 = key missing)

Without batch_url the method falls back automatically to a Redis pipeline (one TCP round-trip).

Django / FastAPI Integration

# FastAPI
from fastapi import FastAPI
from cachly import CachlyClient

app = FastAPI()
cache = CachlyClient(url=os.environ["CACHLY_URL"])

@app.on_event("shutdown")
async def shutdown():
    cache.close()

@app.get("/data/{key}")
async def get_data(key: str):
    return cache.get_or_set(key, lambda: fetch_from_db(key), ttl=60)

AI Dev Brain — Persistent Memory for Your Coding Assistant

cachly ships a 30-tool MCP server that gives Claude Code, Cursor, GitHub Copilot, and Windsurf a persistent memory across sessions — so they never forget your architecture, lessons learned, or last session context.

# One-time setup
npx @cachly-dev/init

Or configure manually in your editor (~/.vscode/mcp.json / .cursor/mcp.json):

{
  "servers": {
    "cachly": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@cachly-dev/mcp-server"],
      "env": { "CACHLY_JWT": "your-jwt-token" }
    }
  }
}

Add to your AI assistant instructions (e.g. .github/copilot-instructions.md):

## cachly AI Brain

At the START of every session:
session_start(instance_id = "your-instance-id", focus = "what you're working on today")

At the END of every session:
session_end(instance_id = "your-instance-id", summary = "...", files_changed = [...])

After any bug fix or deploy:
learn_from_attempts(instance_id = "your-instance-id", topic = "category:keyword",
  outcome = "success", what_worked = "...", what_failed = "...", severity = "major")

session_start returns a full briefing in one call: last session summary, relevant lessons, open failures, brain health. 60 % fewer file reads, instant context, zero re-discovery.

→ Full docs: cachly.dev/docs/ai-memory

LLM Response Caching Proxy

Use cachly as a drop-in caching proxy for OpenAI or Anthropic — no SDK changes needed:

# Instead of https://api.openai.com — use your cachly proxy URL:
OPENAI_BASE_URL=https://api.cachly.dev/v1/llm-proxy/YOUR_TOKEN/openai

# Anthropic:
ANTHROPIC_BASE_URL=https://api.cachly.dev/v1/llm-proxy/YOUR_TOKEN/anthropic

Identical requests are served from cache with X-Cachly-Cache: HIT. Check savings via GET /v1/llm-proxy/YOUR_TOKEN/stats.

Agent Workflow Persistence

Checkpoint agent workflow state so agents can resume from the last completed step on crash:

import httpx

base = f"https://api.cachly.dev/v1/workflow/{token}"

# Save a checkpoint after each workflow step
httpx.post(f"{base}/checkpoints", json={
    "run_id":     "my-run-123",
    "step_index": 0,
    "step_name":  "research",
    "agent_name": "researcher",
    "status":     "completed",
    "state":      json.dumps({"topic": "AI caching", "results": []}),
})

# Resume: get the latest checkpoint for a run
checkpoint = httpx.get(f"{base}/runs/my-run-123/latest").json()
# → {"step_index": 2, "step_name": "write", "state": "...", "status": "completed"}

Connection Pooling & Keep-Alive

from cachly import CachlyClient, CachlyConfig, PoolConfig

cache = CachlyClient(config=CachlyConfig(
    url=os.environ["CACHLY_URL"],
    pool=PoolConfig(
        keep_alive_s=30,          # PING every 30s (prevents firewall idle-disconnect)
        max_retries=10,           # reconnect retries with exponential backoff
        base_retry_delay_s=0.1,   # first retry delay
        max_retry_delay_s=10,     # retry delay cap
        idle_timeout_s=300,       # auto-disconnect after 5 min idle (0 = disabled)
        on_error=lambda e: print(f"cachly error: {e}"),
        on_reconnect=lambda: print("cachly reconnected"),
    ),
))

Retry with Exponential Backoff

Every command is automatically retried on transient errors (ConnectionError, TimeoutError, BusyLoadingError, …) using AWS-style full-jitter backoff:

from cachly import CachlyClient, RetryConfig

cache = CachlyClient(
    url=os.environ["CACHLY_URL"],
    retry=RetryConfig(
        max_retries=3,       # retry up to 3× (default)
        base_delay_s=0.05,   # first retry after ~50ms
        max_delay_s=2.0,     # cap at 2s
    ),
)

Disable retries with RetryConfig(max_retries=0).

OpenTelemetry Tracing

from opentelemetry import trace

cache = CachlyClient(
    url=os.environ["CACHLY_URL"],
    otel_tracer=trace.get_tracer("my-app"),
)

# Every get/set/delete/incr produces OTEL spans:
#   span: "cache.get"  attributes: { cache.key: "user:42" }
#   span: "cache.set"  attributes: { cache.key: "user:42", cache.ttl: 300 }

API Reference

Method	Description
`CachlyClient(url, batch_url=None, pool=None)`	Create client from Redis URL
`get(key)`	Get value (`None` if missing); auto-deserialises JSON
`set(key, value, ttl=None)`	Set value, optional TTL in seconds
`delete(*keys)`	Delete one or more keys
`exists(key) → bool`	Check existence
`expire(key, seconds)`	Update TTL
`incr(key) → int`	Atomic increment
`get_or_set(key, fn, ttl=None)`	Get-or-set pattern
`batch(ops) → BatchResult`	Bulk ops in one round-trip
`semantic`	`SemanticCache` for AI workloads
`raw`	Direct `redis.Redis` access
`close()`	Close connection pool and stop keep-alive

Environment Variables

CACHLY_URL=redis://:your-password@my-app.cachly.dev:30101
CACHLY_BATCH_URL=https://api.cachly.dev/v1/cache/YOUR_TOKEN   # optional

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cachly		cachly
dist		dist
tests		tests
CHANGELOG.md		CHANGELOG.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cachly Python SDK

Installation

Quick Start

Async Usage

Semantic AI Cache

Batch API — Multiple Ops in One Round-Trip

Django / FastAPI Integration

AI Dev Brain — Persistent Memory for Your Coding Assistant

LLM Response Caching Proxy

Agent Workflow Persistence

Connection Pooling & Keep-Alive

Retry with Exponential Backoff

OpenTelemetry Tracing

API Reference

Environment Variables

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cachly Python SDK

Installation

Quick Start

Async Usage

Semantic AI Cache

Batch API — Multiple Ops in One Round-Trip

Django / FastAPI Integration

AI Dev Brain — Persistent Memory for Your Coding Assistant

LLM Response Caching Proxy

Agent Workflow Persistence

Connection Pooling & Keep-Alive

Retry with Exponential Backoff

OpenTelemetry Tracing

API Reference

Environment Variables

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages