CacheSentry v0.2.0
CacheSentry v0.2.0
This release turns CacheSentry into a reproducible prompt-cache regression guardrail for LLM engineering teams.
Highlights
- Baseline creation and regression diffing
- CI gating for cacheability regressions
- LiteLLM trace ingestion
- OpenTelemetry GenAI trace ingestion
- Provider-aware offline cacheability projections
- Reproducible cacheability regression case study
- SARIF / GitHub Code Scanning support
- Privacy-first normalization and reporting
- Release hygiene cleanup for
.pyc/__pycache__
What this release helps with
CacheSentry can compare a current prompt trace against a known-good baseline and detect when structural prompt cacheability regresses, such as when timestamps, UUIDs, request IDs, or dynamic metadata are introduced near the front of prompts.
Important caveat
CacheSentry performs offline structural analysis. Provider projections do not guarantee actual cache hits, cost savings, or TTFT reductions. Runtime behavior depends on provider/runtime policy, routing, TTL, eviction, isolation, and cache state.
Start here
See:
docs/CACHEABILITY_CASE_STUDY.md
Note
Local Docker smoke testing was skipped in this release audit because Docker Engine was unavailable in the local environment.