cache-layer

Give AI agents a cache for repetition: verified recipes on Cloudflare, escalation for everything new.

AI agents repeatedly spend powerful-model turns rediscovering work that is already understood: inspect status, summarize a known public test result, or find public documentation. cache-layer makes that distinction visible.

The repository includes an executable pi extension, a real local recipe proof, and an honest safety benchmark. If work requires edits, private data, external effects, or new judgment, it escalates instead of pretending.

request
  → approved read-only recipe match?
      yes → verify and route cheaply
      no  → escalate to the frontier model

The proof site deploys on Cloudflare Workers. The local semantic-router benchmark uses Ollama with public/synthetic inputs; Workers AI is the next hosted verifier experiment, not a shipped dependency of the proof site.

Quick start

git clone https://github.com/acoyfellow/cache-layer
cd cache-layer
bun install
bun run dev

Open the local URL printed by Wrangler to inspect the implementation proof and benchmark evidence.

Run the verification loop:

bun run verify       # typecheck, tests, Worker dry-run build
bun run smoke        # headless HTTP checks against a local Worker
bun run bench        # reproducible deterministic routing benchmark
bun run record       # record a browser proof video to recordings/

Local routing benchmark

This repo includes a reproducible synthetic benchmark for a narrow but important question: can a router recover approved read-only recipe hits without incorrectly handling work that must escalate?

Measured locally on an Apple M4 Pro with Ollama. Local models ran three shuffled repetitions after warm-up (138 decisions each); the deterministic gate ran five repetitions (230 decisions).

Router	Correct	Approved hits recovered	Unsafe false hits	Median latency	p95 latency
Deterministic policy/index	175 / 230 (76.1%)	35 / 90 (38.9%)	0 / 140	< 0.1 ms	< 0.1 ms
Ollama `gpt-oss:20b`	84 / 138 (60.9%)	0 / 54 (0.0%)	0 / 84	1,254.5 ms	2,025.9 ms
Ollama `qwen3-coder:30b`	132 / 138 (95.7%)	54 / 54 (100.0%)	6 / 84	158.8 ms	213.0 ms

The evidence is useful precisely because it is not flattering: qwen3-coder:30b recovered intended hits, but also produced unsafe false hits on novel/unbounded requests. A local model therefore cannot be the safety boundary. The architecture must keep deterministic policy in front of semantic routing.

This is routing evidence, not proof of coding quality or premium-token savings. See docs/benchmarks.md for method, failure cases, raw results, and reproduction commands. A pre-publication simulated red-team review and resolved concerns are summarized in docs/red-team.md.

Real pi extension proof

The repository also contains a minimal pi extension at extensions/cache-layer/ and one executable read-only recipe: git-status-summary.

Against this public repository, a real in-memory pi AgentSession was prompted with:

summarize my git status

The extension executed git status --short --branch, produced a local result, and the session contained zero assistant/frontier-model messages:

Prompt	Local recipe result	Frontier assistant messages	Elapsed time
`summarize my git status`	hit	0	305.3 ms

Raw proof: benchmarks/results/pi-extension-public-repo.json.

That proves one real narrow avoidance path. It does not yet prove net token savings across realistic workloads; prompts expected to escalate have deliberately not been paid/run until a controlled upstream baseline is designed.

Reproduce it:

bun run bench:pi

Deploy your own

Click Deploy to Cloudflare above, or deploy from a checkout:

bun install
bun run deploy

The app uses:

Primitive	Purpose
Workers	Public proof site and health endpoint
Observability	Deployed Worker request visibility

The initial release intentionally avoids persistence, user repository access, and a hosted prompt box. The executable behavior belongs in the local agent extension; the deployed site publishes proof and architecture.

The proof

0.0.1 proves one narrow claim:

A real pi extension can complete one bounded read-only recipe without a frontier-model turn, while the routing benchmark demonstrates why semantic matching cannot own the safety boundary.

Example safe recipe:

id: git-status-summary
risk: read_only
handles:
  - summarize my git status
  - what changed locally?
proof:
  - runs read-only git inspection
  - modifies no files
  - makes no remote calls
fallback: escalate

Example escalation:

refactor the authentication architecture
→ ESCALATE
  novel, write-capable, or judgment-heavy work is outside the read-only cache boundary

Safety boundary

This project is deliberately conservative.

Request type	`0.0.1` behavior
Public, read-only, objectively bounded workflow	May route to a verified recipe
Public or dummy output summarization	May route locally
Source edits, commits, deploys, comments, external actions	Escalate
Private repositories, non-public code, secrets, customer or personal data	Escalate
Architecture or security judgment	Escalate
No high-confidence match	Escalate

No prompt bodies are accepted or stored by the deployed proof site. The executable recipe lives in the local pi extension and runs only inside a user's own local repository.

Local evaluation

For fully local experimentation, use Ollama and public OSS or dummy input only. A future hosted semantic-verifier experiment may use Workers AI behind deterministic policy; it is not part of the deployed proof site and there is no public prompt submission surface.

This repository does not claim that all model weights or third-party local runtimes are approved by any employer or organization. Confirm your own tool, model-license, and data-handling policies before using local inference for work.

Architecture

Cloudflare Worker
  ├── static proof site
  └── /health

pi extension
  └── AuthenticationRouter
        ├── PublicReadOnlyPolicy     → deterministic authorization boundary
        ├── ExampleRecipeMatcher     → candidate lookup only
        └── AuthorizedRecipeService  → execute or frontier fallback

The deterministic policy rejects obvious writes and sensitive/private contexts before local recipe execution. Matching never grants authority to a risky request. Workers AI remains a planned hosted verifier inside that policy boundary.

API

`GET /health`

Returns the deployed service/version posture.

What comes next

Additional executable pi recipes with controlled baseline comparisons.
Opt-in recipe persistence and local metrics.
Local reduction of large public tool outputs before premium inference.
Real session benchmarks measuring upstream token/cost avoidance against a baseline.
User-approved proposal flow for turning successful repeated tasks into recipes.

Not next: silent file-edit recipes or pretending a small router should solve novel engineering tasks.

Status

0.0.1. Public-data-only experiment. MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
benchmarks		benchmarks
docs		docs
extensions/cache-layer		extensions/cache-layer
public		public
recordings		recordings
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json
wrangler.jsonc		wrangler.jsonc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cache-layer

Quick start

Local routing benchmark

Real pi extension proof

Deploy your own

The proof

Safety boundary

Local evaluation

Architecture

API

`GET /health`

What comes next

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cache-layer

Quick start

Local routing benchmark

Real pi extension proof

Deploy your own

The proof

Safety boundary

Local evaluation

Architecture

API

GET /health

What comes next

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

Packages