Releases: PS4Emp/cachesentry
CacheSentry v0.3.0
CacheSentry v0.3.0
CacheSentry v0.3.0 introduces major runtime observability enhancements, validating offline CI protections against live API signals, and adds the first iteration of the CacheSentry Runtime Agent.
New Features
- Runtime Agent / LiteLLM Callback Plugin: A new LiteLLM callback that acts as the first layer of the CacheSentry runtime agent. It asynchronously records structural cacheability metrics using bounded ephemeral state and safely extracts telemetry without keeping raw messages.
- Runtime Validation Pack: Validates offline structural predictions against observed runtime provider signals. It correlates projected cache reuse (prefix diff) with actual
cached_tokensreported by APIs (LiteLLM, OpenTelemetry GenAI, OpenAI traces). - Live OpenAI Cached_Tokens Validation: Added evidence that CacheSentry's projected metrics successfully correlate with live OpenAI responses. Controlled tests show that early dynamic fields (UUIDs) drop
cached_tokensto 0, while stable prompts retain maximum cache reuse.
Security & Privacy Audit
A stringent security and privacy audit was performed on the codebase:
- CacheSentry enforces strict privacy: No raw prompts, raw responses, headers, API keys, Authorization values, or provider responses are stored.
- The
RuntimeAgentnow utilizes a recursive privacy sanitizer (cachesentry/runtime_agent/privacy.py) that strictly drops API keys, secrets, bearer tokens, and headers from metadata. - All docs and examples were scrubbed of any realistic-looking placeholder secrets.
- No API keys are required for standard CI testing, and no live APIs are called by default.
Caveats
- No Guaranteed Savings: CacheSentry detects structural cacheability regressions (stable prefixes vs. dynamic fields). It does not guarantee exact cache hits, cost savings, or latency reduction in production, as runtime caching depends on provider-specific isolation, eviction, TTL, and routing policies.
GitHub Marketplace Readiness
This release brings CacheSentry to full GitHub Marketplace readiness with a certified action.yml configuration and standard SARIF integration!
CacheSentry v0.2.0
CacheSentry v0.2.0
This release turns CacheSentry into a reproducible prompt-cache regression guardrail for LLM engineering teams.
Highlights
- Baseline creation and regression diffing
- CI gating for cacheability regressions
- LiteLLM trace ingestion
- OpenTelemetry GenAI trace ingestion
- Provider-aware offline cacheability projections
- Reproducible cacheability regression case study
- SARIF / GitHub Code Scanning support
- Privacy-first normalization and reporting
- Release hygiene cleanup for
.pyc/__pycache__
What this release helps with
CacheSentry can compare a current prompt trace against a known-good baseline and detect when structural prompt cacheability regresses, such as when timestamps, UUIDs, request IDs, or dynamic metadata are introduced near the front of prompts.
Important caveat
CacheSentry performs offline structural analysis. Provider projections do not guarantee actual cache hits, cost savings, or TTFT reductions. Runtime behavior depends on provider/runtime policy, routing, TTL, eviction, isolation, and cache state.
Start here
See:
docs/CACHEABILITY_CASE_STUDY.md
Note
Local Docker smoke testing was skipped in this release audit because Docker Engine was unavailable in the local environment.
CacheSentry v0.1.0
CacheSentry v0.1.0
CacheSentry is an offline prompt-cache hygiene guardrail for LLM applications.
It helps teams detect unstable prompt-prefix content such as UUIDs, timestamps, dynamic metadata, and template drift before those changes reduce prompt/KV-cache reuse.
Highlights
- Trace-wide prompt-prefix stability audit
- CI policy mode with exit codes 0/1/2
- GitHub annotations for cache-breaking prompt patterns
- Culprit detection for UUID/request IDs, timestamps, and dynamic metadata
- Field-level attribution such as
messages[0].content - Safe fix recommendations requiring human review
- Redacted Markdown and JSON reports
- Docker image support
- GitHub Action packaging
- Optional local benchmark mode for backend validation
- Release checklist, positioning docs, and overclaiming tests
What CacheSentry does not do
- It does not replace vLLM, SGLang, TensorRT-LLM, LMCache, LangSmith, Promptfoo, or semantic caches.
- It does not call paid APIs in CI mode.
- It does not guarantee latency improvement.
- It does not measure backend TTFT/cache metrics in CI mode.
- It does not automatically rewrite prompts.
Best first command
python -m cachesentry.cli ci examples/traces/mixed_cache_breakers.jsonl \
--model Qwen/Qwen2.5-1.5B-Instruct \
--fail-on-severity high \
--redaction-mode mask \
--github-annotationsExpected result: the mixed trace fails with cache-breaker violations for UUID/request ID and timestamp patterns.
Release status
This is a v0.1.0 first public MVP release. The CLI and reports are usable, but the project is still pre-1.0 and future versions may change APIs or schemas.