Skip to content

CacheSentry v0.1.0

Choose a tag to compare

@PS4Emp PS4Emp released this 24 Jun 10:17
· 18 commits to master since this release

CacheSentry v0.1.0

CacheSentry is an offline prompt-cache hygiene guardrail for LLM applications.

It helps teams detect unstable prompt-prefix content such as UUIDs, timestamps, dynamic metadata, and template drift before those changes reduce prompt/KV-cache reuse.

Highlights

  • Trace-wide prompt-prefix stability audit
  • CI policy mode with exit codes 0/1/2
  • GitHub annotations for cache-breaking prompt patterns
  • Culprit detection for UUID/request IDs, timestamps, and dynamic metadata
  • Field-level attribution such as messages[0].content
  • Safe fix recommendations requiring human review
  • Redacted Markdown and JSON reports
  • Docker image support
  • GitHub Action packaging
  • Optional local benchmark mode for backend validation
  • Release checklist, positioning docs, and overclaiming tests

What CacheSentry does not do

  • It does not replace vLLM, SGLang, TensorRT-LLM, LMCache, LangSmith, Promptfoo, or semantic caches.
  • It does not call paid APIs in CI mode.
  • It does not guarantee latency improvement.
  • It does not measure backend TTFT/cache metrics in CI mode.
  • It does not automatically rewrite prompts.

Best first command

python -m cachesentry.cli ci examples/traces/mixed_cache_breakers.jsonl \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --fail-on-severity high \
  --redaction-mode mask \
  --github-annotations

Expected result: the mixed trace fails with cache-breaker violations for UUID/request ID and timestamp patterns.

Release status

This is a v0.1.0 first public MVP release. The CLI and reports are usable, but the project is still pre-1.0 and future versions may change APIs or schemas.