Skip to content

cachecore-labs/cachecore-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cachecore

Python client for Cachecore — the LLM API caching proxy that reduces cost and latency for AI agent workloads.

Cachecore sits transparently between your application and LLM providers (OpenAI, Anthropic via OpenAI-compat, etc.) and caches responses at two levels: L1 exact-match and L2 semantic similarity. This client handles the Cachecore-specific plumbing — header injection, dependency encoding, invalidation — without replacing your LLM SDK.

Install

pip install cachecore-python
import cachecore  # the import name is 'cachecore'

Quick start

Rung 1 — zero code changes: swap base_url

Point your existing SDK at Cachecore and get L1 exact-match caching immediately. No import cachecore required.

from openai import AsyncOpenAI

oai = AsyncOpenAI(
    api_key="ignored",                             # gateway injects its own upstream key
    base_url="https://gateway.cachecore.it/v1",  # ← only change
)

# Identical requests are now served from cache.
resp = await oai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)

Rung 2 — tenant isolation (3 lines)

Add CachecoreClient to unlock tenant-scoped namespaces, L2 semantic caching, and per-tenant metrics. Three extra lines wired into the SDK's http_client.

from cachecore import CachecoreClient
import httpx
from openai import AsyncOpenAI

cc = CachecoreClient(
    gateway_url="https://gateway.cachecore.it",
    tenant_jwt="ey...",  # your tenant JWT from the Cachecore dashboard
)

oai = AsyncOpenAI(
    api_key="ignored",  # gateway injects its own upstream key
    base_url="https://gateway.cachecore.it/v1",
    http_client=httpx.AsyncClient(transport=cc.transport),
)

# Requests now carry your tenant identity.
# Semantically similar prompts hit L2 cache.
resp = await oai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
)

Rung 3 — dep invalidation

Declare which data a cached response depends on. When that data changes, invalidate the dep and all stale entries are evicted automatically.

from cachecore import CachecoreClient, Dep
import httpx
from openai import AsyncOpenAI

cc = CachecoreClient(
    gateway_url="https://gateway.cachecore.it",
    tenant_jwt="ey...",
)

oai = AsyncOpenAI(
    api_key="ignored",
    base_url="https://gateway.cachecore.it/v1",
    http_client=httpx.AsyncClient(transport=cc.transport),
)

# Read path — declare what data this response depends on
with cc.request_context(deps=[Dep("table:products"), Dep("table:orders")]):
    resp = await oai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "List all products under $50"}],
    )

# Write path — bypass cache for the LLM call, then invalidate
with cc.request_context(bypass=True):
    resp = await oai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Confirm order created."}],
    )
await cc.invalidate("table:products")

# Invalidate multiple deps at once
await cc.invalidate_many(["table:orders", "table:products"])

Works with LangChain / LangGraph

The transport works with any SDK that accepts an httpx.AsyncClient:

from langchain_openai import ChatOpenAI
import httpx
from cachecore import CachecoreClient, Dep

cc = CachecoreClient(gateway_url="https://gateway.cachecore.it", tenant_jwt="ey...")

llm = ChatOpenAI(
    model="gpt-4o",
    api_key="ignored",
    base_url="https://gateway.cachecore.it/v1",
    http_async_client=httpx.AsyncClient(transport=cc.transport),
)

# Use request_context() around any ainvoke / astream call
with cc.request_context(deps=[Dep("doc:policy-42")]):
    result = await llm.ainvoke("Summarise the compliance policy")

API reference

CachecoreClient

CachecoreClient(
    gateway_url: str,       # "https://gateway.cachecore.it"
    tenant_jwt: str,        # tenant HS256/RS256 JWT
    timeout: float = 30.0,  # for invalidation calls
    debug: bool = False,    # log cache status per request
)
Property / Method Description
.transport httpx.AsyncBaseTransport — pass to httpx.AsyncClient(transport=...)
.request_context(deps, bypass) Context manager — sets per-request deps / bypass
await .invalidate(dep_id) Evict all entries tagged with this dep
await .invalidate_many(dep_ids) Invalidate multiple deps concurrently
await .aclose() Close HTTP clients. Also works as async with CachecoreClient(...):

Dep / DepDeclaration

Dep("table:products")                  # simple — hash defaults to "v1"
Dep("table:products", hash="abc123")   # explicit hash for versioned deps

CacheStatus

Parsed from response headers after a proxied request:

from cachecore import CacheStatus

status = CacheStatus.from_headers(response.headers)
# status.status      → "HIT_L1" | "HIT_L1_STALE" | "HIT_L2" | "MISS" | "BYPASS" | "UNKNOWN"
# status.similarity  → float 0.0–1.0  (non-zero on L2 hits)
# status.age_seconds → int

Exceptions

Exception When
CachecoreError Base class for all Cachecore errors
CachecoreAuthError 401 / 403 from the gateway
CachecoreRateLimitError 429 — check .retry_after attribute (seconds, or None)

How it works

The client injects headers at the httpx transport layer — below the LLM SDK, above the network. Your SDK continues to work exactly as before:

Your code  →  openai SDK  →  httpx  →  [CachecoreTransport]  →  Cachecore proxy  →  OpenAI API
                                              ↑
                                  injects X-Cachecore-Token
                                  injects X-Cachecore-Deps

Requirements

  • Python 3.10+
  • httpx >= 0.25.0

Links

License

MIT — see LICENSE

About

Python client for CacheCore

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages