cachecore

Python client for Cachecore — the LLM API caching proxy that reduces cost and latency for AI agent workloads.

Cachecore sits transparently between your application and LLM providers (OpenAI, Anthropic via OpenAI-compat, etc.) and caches responses at two levels: L1 exact-match and L2 semantic similarity. This client handles the Cachecore-specific plumbing — header injection, dependency encoding, invalidation — without replacing your LLM SDK.

Install

pip install cachecore-python

import cachecore  # the import name is 'cachecore'

Quick start

Rung 1 — zero code changes: swap `base_url`

Point your existing SDK at Cachecore and get L1 exact-match caching immediately. No import cachecore required.

from openai import AsyncOpenAI

oai = AsyncOpenAI(
    api_key="ignored",                             # gateway injects its own upstream key
    base_url="https://gateway.cachecore.it/v1",  # ← only change
)

# Identical requests are now served from cache.
resp = await oai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)

Rung 2 — tenant isolation (3 lines)

Add CachecoreClient to unlock tenant-scoped namespaces, L2 semantic caching, and per-tenant metrics. Three extra lines wired into the SDK's http_client.

from cachecore import CachecoreClient
import httpx
from openai import AsyncOpenAI

cc = CachecoreClient(
    gateway_url="https://gateway.cachecore.it",
    tenant_jwt="ey...",  # your tenant JWT from the Cachecore dashboard
)

oai = AsyncOpenAI(
    api_key="ignored",  # gateway injects its own upstream key
    base_url="https://gateway.cachecore.it/v1",
    http_client=httpx.AsyncClient(transport=cc.transport),
)

# Requests now carry your tenant identity.
# Semantically similar prompts hit L2 cache.
resp = await oai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
)

Rung 3 — dep invalidation

Declare which data a cached response depends on. When that data changes, invalidate the dep and all stale entries are evicted automatically.

from cachecore import CachecoreClient, Dep
import httpx
from openai import AsyncOpenAI

cc = CachecoreClient(
    gateway_url="https://gateway.cachecore.it",
    tenant_jwt="ey...",
)

oai = AsyncOpenAI(
    api_key="ignored",
    base_url="https://gateway.cachecore.it/v1",
    http_client=httpx.AsyncClient(transport=cc.transport),
)

# Read path — declare what data this response depends on
with cc.request_context(deps=[Dep("table:products"), Dep("table:orders")]):
    resp = await oai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "List all products under $50"}],
    )

# Write path — bypass cache for the LLM call, then invalidate
with cc.request_context(bypass=True):
    resp = await oai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Confirm order created."}],
    )
await cc.invalidate("table:products")

# Invalidate multiple deps at once
await cc.invalidate_many(["table:orders", "table:products"])

Works with LangChain / LangGraph

The transport works with any SDK that accepts an httpx.AsyncClient:

from langchain_openai import ChatOpenAI
import httpx
from cachecore import CachecoreClient, Dep

cc = CachecoreClient(gateway_url="https://gateway.cachecore.it", tenant_jwt="ey...")

llm = ChatOpenAI(
    model="gpt-4o",
    api_key="ignored",
    base_url="https://gateway.cachecore.it/v1",
    http_async_client=httpx.AsyncClient(transport=cc.transport),
)

# Use request_context() around any ainvoke / astream call
with cc.request_context(deps=[Dep("doc:policy-42")]):
    result = await llm.ainvoke("Summarise the compliance policy")

API reference

`CachecoreClient`

CachecoreClient(
    gateway_url: str,       # "https://gateway.cachecore.it"
    tenant_jwt: str,        # tenant HS256/RS256 JWT
    timeout: float = 30.0,  # for invalidation calls
    debug: bool = False,    # log cache status per request
)

Property / Method	Description
`.transport`	`httpx.AsyncBaseTransport` — pass to `httpx.AsyncClient(transport=...)`
`.request_context(deps, bypass)`	Context manager — sets per-request deps / bypass
`await .invalidate(dep_id)`	Evict all entries tagged with this dep
`await .invalidate_many(dep_ids)`	Invalidate multiple deps concurrently
`await .aclose()`	Close HTTP clients. Also works as `async with CachecoreClient(...):`

`Dep` / `DepDeclaration`

Dep("table:products")                  # simple — hash defaults to "v1"
Dep("table:products", hash="abc123")   # explicit hash for versioned deps

`CacheStatus`

Parsed from response headers after a proxied request:

from cachecore import CacheStatus

status = CacheStatus.from_headers(response.headers)
# status.status      → "HIT_L1" | "HIT_L1_STALE" | "HIT_L2" | "MISS" | "BYPASS" | "UNKNOWN"
# status.similarity  → float 0.0–1.0  (non-zero on L2 hits)
# status.age_seconds → int

Exceptions

Exception	When
`CachecoreError`	Base class for all Cachecore errors
`CachecoreAuthError`	401 / 403 from the gateway
`CachecoreRateLimitError`	429 — check `.retry_after` attribute (seconds, or `None`)

How it works

The client injects headers at the httpx transport layer — below the LLM SDK, above the network. Your SDK continues to work exactly as before:

Your code  →  openai SDK  →  httpx  →  [CachecoreTransport]  →  Cachecore proxy  →  OpenAI API
                                              ↑
                                  injects X-Cachecore-Token
                                  injects X-Cachecore-Deps

Requirements

Python 3.10+
httpx >= 0.25.0

Links

Website: cachecore.it
Source: github.com/cachecore/cachecore-python

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cachecore		cachecore
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cachecore

Install

Quick start

Rung 1 — zero code changes: swap `base_url`

Rung 2 — tenant isolation (3 lines)

Rung 3 — dep invalidation

Works with LangChain / LangGraph

API reference

`CachecoreClient`

`Dep` / `DepDeclaration`

`CacheStatus`

Exceptions

How it works

Requirements

Links

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cachecore

Install

Quick start

Rung 1 — zero code changes: swap base_url

Rung 2 — tenant isolation (3 lines)

Rung 3 — dep invalidation

Works with LangChain / LangGraph

API reference

CachecoreClient

Dep / DepDeclaration

CacheStatus

Exceptions

How it works

Requirements

Links

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Rung 1 — zero code changes: swap `base_url`

`CachecoreClient`

`Dep` / `DepDeclaration`

`CacheStatus`

Packages