Skip to content

CacheSentry v0.2.0

Choose a tag to compare

@PS4Emp PS4Emp released this 28 Jun 07:22
· 8 commits to master since this release

CacheSentry v0.2.0

This release turns CacheSentry into a reproducible prompt-cache regression guardrail for LLM engineering teams.

Highlights

  • Baseline creation and regression diffing
  • CI gating for cacheability regressions
  • LiteLLM trace ingestion
  • OpenTelemetry GenAI trace ingestion
  • Provider-aware offline cacheability projections
  • Reproducible cacheability regression case study
  • SARIF / GitHub Code Scanning support
  • Privacy-first normalization and reporting
  • Release hygiene cleanup for .pyc / __pycache__

What this release helps with

CacheSentry can compare a current prompt trace against a known-good baseline and detect when structural prompt cacheability regresses, such as when timestamps, UUIDs, request IDs, or dynamic metadata are introduced near the front of prompts.

Important caveat

CacheSentry performs offline structural analysis. Provider projections do not guarantee actual cache hits, cost savings, or TTFT reductions. Runtime behavior depends on provider/runtime policy, routing, TTL, eviction, isolation, and cache state.

Start here

See:

docs/CACHEABILITY_CASE_STUDY.md

Note

Local Docker smoke testing was skipped in this release audit because Docker Engine was unavailable in the local environment.