Skip to content

Releases: PS4Emp/cachesentry

CacheSentry v0.3.0

29 Jun 10:10

Choose a tag to compare

CacheSentry v0.3.0

CacheSentry v0.3.0 introduces major runtime observability enhancements, validating offline CI protections against live API signals, and adds the first iteration of the CacheSentry Runtime Agent.

New Features

  • Runtime Agent / LiteLLM Callback Plugin: A new LiteLLM callback that acts as the first layer of the CacheSentry runtime agent. It asynchronously records structural cacheability metrics using bounded ephemeral state and safely extracts telemetry without keeping raw messages.
  • Runtime Validation Pack: Validates offline structural predictions against observed runtime provider signals. It correlates projected cache reuse (prefix diff) with actual cached_tokens reported by APIs (LiteLLM, OpenTelemetry GenAI, OpenAI traces).
  • Live OpenAI Cached_Tokens Validation: Added evidence that CacheSentry's projected metrics successfully correlate with live OpenAI responses. Controlled tests show that early dynamic fields (UUIDs) drop cached_tokens to 0, while stable prompts retain maximum cache reuse.

Security & Privacy Audit

A stringent security and privacy audit was performed on the codebase:

  • CacheSentry enforces strict privacy: No raw prompts, raw responses, headers, API keys, Authorization values, or provider responses are stored.
  • The RuntimeAgent now utilizes a recursive privacy sanitizer (cachesentry/runtime_agent/privacy.py) that strictly drops API keys, secrets, bearer tokens, and headers from metadata.
  • All docs and examples were scrubbed of any realistic-looking placeholder secrets.
  • No API keys are required for standard CI testing, and no live APIs are called by default.

Caveats

  • No Guaranteed Savings: CacheSentry detects structural cacheability regressions (stable prefixes vs. dynamic fields). It does not guarantee exact cache hits, cost savings, or latency reduction in production, as runtime caching depends on provider-specific isolation, eviction, TTL, and routing policies.

GitHub Marketplace Readiness

This release brings CacheSentry to full GitHub Marketplace readiness with a certified action.yml configuration and standard SARIF integration!

CacheSentry v0.2.0

28 Jun 07:22

Choose a tag to compare

CacheSentry v0.2.0

This release turns CacheSentry into a reproducible prompt-cache regression guardrail for LLM engineering teams.

Highlights

  • Baseline creation and regression diffing
  • CI gating for cacheability regressions
  • LiteLLM trace ingestion
  • OpenTelemetry GenAI trace ingestion
  • Provider-aware offline cacheability projections
  • Reproducible cacheability regression case study
  • SARIF / GitHub Code Scanning support
  • Privacy-first normalization and reporting
  • Release hygiene cleanup for .pyc / __pycache__

What this release helps with

CacheSentry can compare a current prompt trace against a known-good baseline and detect when structural prompt cacheability regresses, such as when timestamps, UUIDs, request IDs, or dynamic metadata are introduced near the front of prompts.

Important caveat

CacheSentry performs offline structural analysis. Provider projections do not guarantee actual cache hits, cost savings, or TTFT reductions. Runtime behavior depends on provider/runtime policy, routing, TTL, eviction, isolation, and cache state.

Start here

See:

docs/CACHEABILITY_CASE_STUDY.md

Note

Local Docker smoke testing was skipped in this release audit because Docker Engine was unavailable in the local environment.

CacheSentry v0.1.0

24 Jun 10:17

Choose a tag to compare

CacheSentry v0.1.0

CacheSentry is an offline prompt-cache hygiene guardrail for LLM applications.

It helps teams detect unstable prompt-prefix content such as UUIDs, timestamps, dynamic metadata, and template drift before those changes reduce prompt/KV-cache reuse.

Highlights

  • Trace-wide prompt-prefix stability audit
  • CI policy mode with exit codes 0/1/2
  • GitHub annotations for cache-breaking prompt patterns
  • Culprit detection for UUID/request IDs, timestamps, and dynamic metadata
  • Field-level attribution such as messages[0].content
  • Safe fix recommendations requiring human review
  • Redacted Markdown and JSON reports
  • Docker image support
  • GitHub Action packaging
  • Optional local benchmark mode for backend validation
  • Release checklist, positioning docs, and overclaiming tests

What CacheSentry does not do

  • It does not replace vLLM, SGLang, TensorRT-LLM, LMCache, LangSmith, Promptfoo, or semantic caches.
  • It does not call paid APIs in CI mode.
  • It does not guarantee latency improvement.
  • It does not measure backend TTFT/cache metrics in CI mode.
  • It does not automatically rewrite prompts.

Best first command

python -m cachesentry.cli ci examples/traces/mixed_cache_breakers.jsonl \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --fail-on-severity high \
  --redaction-mode mask \
  --github-annotations

Expected result: the mixed trace fails with cache-breaker violations for UUID/request ID and timestamp patterns.

Release status

This is a v0.1.0 first public MVP release. The CLI and reports are usable, but the project is still pre-1.0 and future versions may change APIs or schemas.