Python client for Cachecore — the LLM API caching proxy that reduces cost and latency for AI agent workloads.
Cachecore sits transparently between your application and LLM providers (OpenAI, Anthropic via OpenAI-compat, etc.) and caches responses at two levels: L1 exact-match and L2 semantic similarity. This client handles the Cachecore-specific plumbing — header injection, dependency encoding, invalidation — without replacing your LLM SDK.
pip install cachecore-pythonimport cachecore # the import name is 'cachecore'Point your existing SDK at Cachecore and get L1 exact-match caching immediately.
No import cachecore required.
from openai import AsyncOpenAI
oai = AsyncOpenAI(
api_key="ignored", # gateway injects its own upstream key
base_url="https://gateway.cachecore.it/v1", # ← only change
)
# Identical requests are now served from cache.
resp = await oai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is 2+2?"}],
)Add CachecoreClient to unlock tenant-scoped namespaces, L2 semantic caching, and per-tenant
metrics. Three extra lines wired into the SDK's http_client.
from cachecore import CachecoreClient
import httpx
from openai import AsyncOpenAI
cc = CachecoreClient(
gateway_url="https://gateway.cachecore.it",
tenant_jwt="ey...", # your tenant JWT from the Cachecore dashboard
)
oai = AsyncOpenAI(
api_key="ignored", # gateway injects its own upstream key
base_url="https://gateway.cachecore.it/v1",
http_client=httpx.AsyncClient(transport=cc.transport),
)
# Requests now carry your tenant identity.
# Semantically similar prompts hit L2 cache.
resp = await oai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain photosynthesis"}],
)Declare which data a cached response depends on. When that data changes, invalidate the dep and all stale entries are evicted automatically.
from cachecore import CachecoreClient, Dep
import httpx
from openai import AsyncOpenAI
cc = CachecoreClient(
gateway_url="https://gateway.cachecore.it",
tenant_jwt="ey...",
)
oai = AsyncOpenAI(
api_key="ignored",
base_url="https://gateway.cachecore.it/v1",
http_client=httpx.AsyncClient(transport=cc.transport),
)
# Read path — declare what data this response depends on
with cc.request_context(deps=[Dep("table:products"), Dep("table:orders")]):
resp = await oai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "List all products under $50"}],
)
# Write path — bypass cache for the LLM call, then invalidate
with cc.request_context(bypass=True):
resp = await oai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Confirm order created."}],
)
await cc.invalidate("table:products")
# Invalidate multiple deps at once
await cc.invalidate_many(["table:orders", "table:products"])The transport works with any SDK that accepts an httpx.AsyncClient:
from langchain_openai import ChatOpenAI
import httpx
from cachecore import CachecoreClient, Dep
cc = CachecoreClient(gateway_url="https://gateway.cachecore.it", tenant_jwt="ey...")
llm = ChatOpenAI(
model="gpt-4o",
api_key="ignored",
base_url="https://gateway.cachecore.it/v1",
http_async_client=httpx.AsyncClient(transport=cc.transport),
)
# Use request_context() around any ainvoke / astream call
with cc.request_context(deps=[Dep("doc:policy-42")]):
result = await llm.ainvoke("Summarise the compliance policy")CachecoreClient(
gateway_url: str, # "https://gateway.cachecore.it"
tenant_jwt: str, # tenant HS256/RS256 JWT
timeout: float = 30.0, # for invalidation calls
debug: bool = False, # log cache status per request
)| Property / Method | Description |
|---|---|
.transport |
httpx.AsyncBaseTransport — pass to httpx.AsyncClient(transport=...) |
.request_context(deps, bypass) |
Context manager — sets per-request deps / bypass |
await .invalidate(dep_id) |
Evict all entries tagged with this dep |
await .invalidate_many(dep_ids) |
Invalidate multiple deps concurrently |
await .aclose() |
Close HTTP clients. Also works as async with CachecoreClient(...): |
Dep("table:products") # simple — hash defaults to "v1"
Dep("table:products", hash="abc123") # explicit hash for versioned depsParsed from response headers after a proxied request:
from cachecore import CacheStatus
status = CacheStatus.from_headers(response.headers)
# status.status → "HIT_L1" | "HIT_L1_STALE" | "HIT_L2" | "MISS" | "BYPASS" | "UNKNOWN"
# status.similarity → float 0.0–1.0 (non-zero on L2 hits)
# status.age_seconds → int| Exception | When |
|---|---|
CachecoreError |
Base class for all Cachecore errors |
CachecoreAuthError |
401 / 403 from the gateway |
CachecoreRateLimitError |
429 — check .retry_after attribute (seconds, or None) |
The client injects headers at the httpx transport layer — below the LLM SDK, above the network. Your SDK continues to work exactly as before:
Your code → openai SDK → httpx → [CachecoreTransport] → Cachecore proxy → OpenAI API
↑
injects X-Cachecore-Token
injects X-Cachecore-Deps
- Python 3.10+
httpx >= 0.25.0
- Website: cachecore.it
- Source: github.com/cachecore/cachecore-python
MIT — see LICENSE