Skip to content

cachly-dev/cachly-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cachly Python SDK

Official Python SDK for cachly.dev
Managed Valkey/Redis cache built for AI apps. GDPR-compliant · German servers · Live in 30 seconds.

PyPI Python 3.10+ License: MIT GDPR: EU-only


Installation

pip install cachly
# or
uv add cachly

Requires Python 3.10+. Uses redis-py and numpy (for semantic cache).


Quick Start

import os
from cachly import CachlyClient

cache = CachlyClient(url=os.environ["CACHLY_URL"])

# Set / Get
cache.set("user:42", {"name": "Alice"}, ttl=300)
user = cache.get("user:42")           # returns dict or None

# Get-or-Set pattern
report = cache.get_or_set("report:monthly", lambda: db.run_expensive_report(), ttl=3600)

# Atomic counter
views = cache.incr("page:views")

cache.close()

Create your free instance at cachly.dev — no credit card required.


Async Usage

from cachly.asyncio import AsyncCachlyClient

async def main():
    cache = AsyncCachlyClient(url=os.environ["CACHLY_URL"])

    await cache.set("session:abc", session_data, ttl=1800)
    data = await cache.get("session:abc")

    await cache.close()

Semantic AI Cache

Cache LLM responses by meaning, not exact text. The same prompt phrased differently returns the cached answer — cutting OpenAI costs by up to 60 %.

from cachly import SemanticOptions

result = cache.semantic.get_or_set(
    prompt=user_question,
    fn=lambda: openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_question}]
    ),
    embed_fn=lambda text: openai_client.embeddings.create(
        model="text-embedding-3-small", input=text
    ).data[0].embedding,
    options=SemanticOptions(similarity_threshold=0.92, ttl_seconds=3600),
)

print("hit" if result.hit else "miss", result.value)

Batch API — Multiple Ops in One Round-Trip

Bundle GET/SET/DEL/EXISTS/TTL operations into one HTTP request (or Redis pipeline).

from cachly import CachlyClient, BatchOp

cache = CachlyClient(
    url=os.environ["CACHLY_URL"],
    batch_url=os.environ.get("CACHLY_BATCH_URL"),  # optional
)

result = await cache.batch([
    BatchOp("get",    "user:1"),
    BatchOp("get",    "config:app"),
    BatchOp("set",    "visits", str(time.time()), ttl=86400),
    BatchOp("exists", "session:xyz"),
    BatchOp("ttl",    "token:abc"),
])
user    = result[0]   # str | None
config  = result[1]   # str | None
ok      = result[2]   # bool
present = result[3]   # bool
secs    = result[4]   # int (-1 = no TTL, -2 = key missing)

Without batch_url the method falls back automatically to a Redis pipeline (one TCP round-trip).


Django / FastAPI Integration

# FastAPI
from fastapi import FastAPI
from cachly import CachlyClient

app = FastAPI()
cache = CachlyClient(url=os.environ["CACHLY_URL"])

@app.on_event("shutdown")
async def shutdown():
    cache.close()

@app.get("/data/{key}")
async def get_data(key: str):
    return cache.get_or_set(key, lambda: fetch_from_db(key), ttl=60)

AI Dev Brain — Persistent Memory for Your Coding Assistant

cachly ships a 30-tool MCP server that gives Claude Code, Cursor, GitHub Copilot, and Windsurf a persistent memory across sessions — so they never forget your architecture, lessons learned, or last session context.

# One-time setup
npx @cachly-dev/init

Or configure manually in your editor (~/.vscode/mcp.json / .cursor/mcp.json):

{
  "servers": {
    "cachly": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@cachly-dev/mcp-server"],
      "env": { "CACHLY_JWT": "your-jwt-token" }
    }
  }
}

Add to your AI assistant instructions (e.g. .github/copilot-instructions.md):

## cachly AI Brain

At the START of every session:
session_start(instance_id = "your-instance-id", focus = "what you're working on today")

At the END of every session:
session_end(instance_id = "your-instance-id", summary = "...", files_changed = [...])

After any bug fix or deploy:
learn_from_attempts(instance_id = "your-instance-id", topic = "category:keyword",
  outcome = "success", what_worked = "...", what_failed = "...", severity = "major")

session_start returns a full briefing in one call: last session summary, relevant lessons, open failures, brain health. 60 % fewer file reads, instant context, zero re-discovery.

→ Full docs: cachly.dev/docs/ai-memory


LLM Response Caching Proxy

Use cachly as a drop-in caching proxy for OpenAI or Anthropic — no SDK changes needed:

# Instead of https://api.openai.com — use your cachly proxy URL:
OPENAI_BASE_URL=https://api.cachly.dev/v1/llm-proxy/YOUR_TOKEN/openai

# Anthropic:
ANTHROPIC_BASE_URL=https://api.cachly.dev/v1/llm-proxy/YOUR_TOKEN/anthropic

Identical requests are served from cache with X-Cachly-Cache: HIT. Check savings via GET /v1/llm-proxy/YOUR_TOKEN/stats.


Agent Workflow Persistence

Checkpoint agent workflow state so agents can resume from the last completed step on crash:

import httpx

base = f"https://api.cachly.dev/v1/workflow/{token}"

# Save a checkpoint after each workflow step
httpx.post(f"{base}/checkpoints", json={
    "run_id":     "my-run-123",
    "step_index": 0,
    "step_name":  "research",
    "agent_name": "researcher",
    "status":     "completed",
    "state":      json.dumps({"topic": "AI caching", "results": []}),
})

# Resume: get the latest checkpoint for a run
checkpoint = httpx.get(f"{base}/runs/my-run-123/latest").json()
# → {"step_index": 2, "step_name": "write", "state": "...", "status": "completed"}

Connection Pooling & Keep-Alive

from cachly import CachlyClient, CachlyConfig, PoolConfig

cache = CachlyClient(config=CachlyConfig(
    url=os.environ["CACHLY_URL"],
    pool=PoolConfig(
        keep_alive_s=30,          # PING every 30s (prevents firewall idle-disconnect)
        max_retries=10,           # reconnect retries with exponential backoff
        base_retry_delay_s=0.1,   # first retry delay
        max_retry_delay_s=10,     # retry delay cap
        idle_timeout_s=300,       # auto-disconnect after 5 min idle (0 = disabled)
        on_error=lambda e: print(f"cachly error: {e}"),
        on_reconnect=lambda: print("cachly reconnected"),
    ),
))

Retry with Exponential Backoff

Every command is automatically retried on transient errors (ConnectionError, TimeoutError, BusyLoadingError, …) using AWS-style full-jitter backoff:

from cachly import CachlyClient, RetryConfig

cache = CachlyClient(
    url=os.environ["CACHLY_URL"],
    retry=RetryConfig(
        max_retries=3,       # retry up to 3× (default)
        base_delay_s=0.05,   # first retry after ~50ms
        max_delay_s=2.0,     # cap at 2s
    ),
)

Disable retries with RetryConfig(max_retries=0).


OpenTelemetry Tracing

from opentelemetry import trace

cache = CachlyClient(
    url=os.environ["CACHLY_URL"],
    otel_tracer=trace.get_tracer("my-app"),
)

# Every get/set/delete/incr produces OTEL spans:
#   span: "cache.get"  attributes: { cache.key: "user:42" }
#   span: "cache.set"  attributes: { cache.key: "user:42", cache.ttl: 300 }

API Reference

Method Description
CachlyClient(url, batch_url=None, pool=None) Create client from Redis URL
get(key) Get value (None if missing); auto-deserialises JSON
set(key, value, ttl=None) Set value, optional TTL in seconds
delete(*keys) Delete one or more keys
exists(key) → bool Check existence
expire(key, seconds) Update TTL
incr(key) → int Atomic increment
get_or_set(key, fn, ttl=None) Get-or-set pattern
batch(ops) → BatchResult Bulk ops in one round-trip
semantic SemanticCache for AI workloads
raw Direct redis.Redis access
close() Close connection pool and stop keep-alive

Environment Variables

CACHLY_URL=redis://:your-password@my-app.cachly.dev:30101
CACHLY_BATCH_URL=https://api.cachly.dev/v1/cache/YOUR_TOKEN   # optional

Links


MIT © cachly.dev

About

Official Cachly sdk-python SDK

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages