CacheSentry v0.1.0
CacheSentry v0.1.0
CacheSentry is an offline prompt-cache hygiene guardrail for LLM applications.
It helps teams detect unstable prompt-prefix content such as UUIDs, timestamps, dynamic metadata, and template drift before those changes reduce prompt/KV-cache reuse.
Highlights
- Trace-wide prompt-prefix stability audit
- CI policy mode with exit codes 0/1/2
- GitHub annotations for cache-breaking prompt patterns
- Culprit detection for UUID/request IDs, timestamps, and dynamic metadata
- Field-level attribution such as
messages[0].content - Safe fix recommendations requiring human review
- Redacted Markdown and JSON reports
- Docker image support
- GitHub Action packaging
- Optional local benchmark mode for backend validation
- Release checklist, positioning docs, and overclaiming tests
What CacheSentry does not do
- It does not replace vLLM, SGLang, TensorRT-LLM, LMCache, LangSmith, Promptfoo, or semantic caches.
- It does not call paid APIs in CI mode.
- It does not guarantee latency improvement.
- It does not measure backend TTFT/cache metrics in CI mode.
- It does not automatically rewrite prompts.
Best first command
python -m cachesentry.cli ci examples/traces/mixed_cache_breakers.jsonl \
--model Qwen/Qwen2.5-1.5B-Instruct \
--fail-on-severity high \
--redaction-mode mask \
--github-annotationsExpected result: the mixed trace fails with cache-breaker violations for UUID/request ID and timestamp patterns.
Release status
This is a v0.1.0 first public MVP release. The CLI and reports are usable, but the project is still pre-1.0 and future versions may change APIs or schemas.