29 Jun 10:10

PS4Emp

7497619

CacheSentry v0.3.0 Latest

Marketplace

Latest

Marketplace

CacheSentry v0.3.0

CacheSentry v0.3.0 introduces major runtime observability enhancements, validating offline CI protections against live API signals, and adds the first iteration of the CacheSentry Runtime Agent.

New Features

Runtime Agent / LiteLLM Callback Plugin: A new LiteLLM callback that acts as the first layer of the CacheSentry runtime agent. It asynchronously records structural cacheability metrics using bounded ephemeral state and safely extracts telemetry without keeping raw messages.
Runtime Validation Pack: Validates offline structural predictions against observed runtime provider signals. It correlates projected cache reuse (prefix diff) with actual cached_tokens reported by APIs (LiteLLM, OpenTelemetry GenAI, OpenAI traces).
Live OpenAI Cached_Tokens Validation: Added evidence that CacheSentry's projected metrics successfully correlate with live OpenAI responses. Controlled tests show that early dynamic fields (UUIDs) drop cached_tokens to 0, while stable prompts retain maximum cache reuse.

Security & Privacy Audit

A stringent security and privacy audit was performed on the codebase:

CacheSentry enforces strict privacy: No raw prompts, raw responses, headers, API keys, Authorization values, or provider responses are stored.
The RuntimeAgent now utilizes a recursive privacy sanitizer (cachesentry/runtime_agent/privacy.py) that strictly drops API keys, secrets, bearer tokens, and headers from metadata.
All docs and examples were scrubbed of any realistic-looking placeholder secrets.
No API keys are required for standard CI testing, and no live APIs are called by default.

Caveats

No Guaranteed Savings: CacheSentry detects structural cacheability regressions (stable prefixes vs. dynamic fields). It does not guarantee exact cache hits, cost savings, or latency reduction in production, as runtime caching depends on provider-specific isolation, eviction, TTL, and routing policies.

GitHub Marketplace Readiness

This release brings CacheSentry to full GitHub Marketplace readiness with a certified action.yml configuration and standard SARIF integration!

Assets 2

28 Jun 07:22

PS4Emp

v0.2.0

7ecc600

CacheSentry v0.2.0

This release turns CacheSentry into a reproducible prompt-cache regression guardrail for LLM engineering teams.

Highlights

Baseline creation and regression diffing
CI gating for cacheability regressions
LiteLLM trace ingestion
OpenTelemetry GenAI trace ingestion
Provider-aware offline cacheability projections
Reproducible cacheability regression case study
SARIF / GitHub Code Scanning support
Privacy-first normalization and reporting
Release hygiene cleanup for .pyc / __pycache__

What this release helps with

CacheSentry can compare a current prompt trace against a known-good baseline and detect when structural prompt cacheability regresses, such as when timestamps, UUIDs, request IDs, or dynamic metadata are introduced near the front of prompts.

Important caveat

CacheSentry performs offline structural analysis. Provider projections do not guarantee actual cache hits, cost savings, or TTFT reductions. Runtime behavior depends on provider/runtime policy, routing, TTL, eviction, isolation, and cache state.

Start here

See:

docs/CACHEABILITY_CASE_STUDY.md

Note

Local Docker smoke testing was skipped in this release audit because Docker Engine was unavailable in the local environment.

Assets 2

24 Jun 10:17

PS4Emp

v0.1.0

8efca09

CacheSentry v0.1.0

CacheSentry is an offline prompt-cache hygiene guardrail for LLM applications.

It helps teams detect unstable prompt-prefix content such as UUIDs, timestamps, dynamic metadata, and template drift before those changes reduce prompt/KV-cache reuse.

Highlights

Trace-wide prompt-prefix stability audit
CI policy mode with exit codes 0/1/2
GitHub annotations for cache-breaking prompt patterns
Culprit detection for UUID/request IDs, timestamps, and dynamic metadata
Field-level attribution such as messages[0].content
Safe fix recommendations requiring human review
Redacted Markdown and JSON reports
Docker image support
GitHub Action packaging
Optional local benchmark mode for backend validation
Release checklist, positioning docs, and overclaiming tests

What CacheSentry does not do

It does not replace vLLM, SGLang, TensorRT-LLM, LMCache, LangSmith, Promptfoo, or semantic caches.
It does not call paid APIs in CI mode.
It does not guarantee latency improvement.
It does not measure backend TTFT/cache metrics in CI mode.
It does not automatically rewrite prompts.

Best first command

python -m cachesentry.cli ci examples/traces/mixed_cache_breakers.jsonl \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --fail-on-severity high \
  --redaction-mode mask \
  --github-annotations

Expected result: the mixed trace fails with cache-breaker violations for UUID/request ID and timestamp patterns.

Release status

This is a v0.1.0 first public MVP release. The CLI and reports are usable, but the project is still pre-1.0 and future versions may change APIs or schemas.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

CacheSentry v0.3.0

New Features

Security & Privacy Audit

Caveats

GitHub Marketplace Readiness

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

CacheSentry v0.2.0

Highlights

What this release helps with

Important caveat

Start here

Note

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

CacheSentry v0.1.0

Highlights

What CacheSentry does not do

Best first command

Release status

Uh oh!

Releases: PS4Emp/cachesentry