Skip to content

bUxEE/prompt-cache-php

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

prompt-cache

A small PHP library for caching LLM responses so you don't keep paying to get the same answer back. It does two things: exact match caching (hash the prompt, look it up) and semantic caching (compare embeddings when the exact match misses).

$reply = PromptCache::remember($prompt, function () use ($client, $prompt) {
    return $client->chat($prompt);
});

First call runs the closure. Second call gives you the saved reply. That's it.

Why I wrote this

I had a side project that talked to OpenAI a lot, and the bill was getting silly considering most of the prompts were variations of the same handful of questions. I wanted something like Cache::remember() but for LLM stuff. I couldn't find anything that wasn't tied to a specific SDK or buried inside some big agent framework, so I made one.

Requirements

  • PHP 8.2 or newer
  • PDO with the sqlite driver (for the default storage)
  • predis/predis if you want to use Redis instead
  • cURL if you use the OpenAI or Ollama embedding providers

Install

composer require prompt-cache/prompt-cache

The first time you call it, it creates a sqlite file at storage/prompt-cache.sqlite and you're done. No config to write unless you want to.

Basic usage

use PromptCache\PromptCache;

$prompt = 'Summarise this article in three bullet points: ...';

$summary = PromptCache::remember($prompt, function () use ($prompt) {
    return $myOpenAi->chat($prompt);  // or anthropic, mistral, whatever
});

The prompt gets normalised (extra whitespace flattened, timestamps and UUIDs replaced with placeholders) before it's hashed, so two prompts that only differ by formatting still hit the same cache row.

Semantic cache

When the exact hash doesn't match, semantic() looks at previously stored prompts and picks the closest one. If the similarity is above the threshold (0.92 by default), you get the old answer back.

$reply = PromptCache::semantic(
    $prompt,
    fn () => $client->chat($prompt)
);

Tighten or loosen the threshold per call if you want:

$reply = PromptCache::semantic(
    $prompt,
    fn () => $client->chat($prompt),
    0.88
);

I deliberately didn't depend on any LLM SDK. You hand me a closure, I either run it or I don't. Use whatever client you like.

Streaming

For streaming responses you get a Generator back. First call streams from upstream while quietly stitching the chunks together for the cache. Second call replays from the cache, same chunked interface, no upstream traffic.

foreach (PromptCache::stream($prompt, fn () => $client->stream($prompt)) as $chunk) {
    echo $chunk;
}

Stats

I added counters mostly because I wanted to see for myself how much I was actually saving. There's a stats() method that gives you a running total:

print_r(PromptCache::stats());
/*
Array (
    [requests]            => 1200
    [exact_hits]          => 400
    [semantic_hits]       => 300
    [misses]              => 500
    [tokens_saved]        => 2838282
    [estimated_usd_saved] => 482.22
)
*/

The token count is a back-of-the-envelope figure (4 chars per token), not a real tokeniser, so treat the dollar number as a rough sanity check, not a billing reconciliation.

Debug

If you want to see what it's actually doing:

PromptCache::debug(true);

You'll see hits, misses, similarity scores and embedding timings written to STDERR (or error_log() outside of CLI). Or pass a callable and route the events yourself:

PromptCache::debug(function ($event, $data) {
    Log::info("prompt-cache.$event", $data);
});

Storage drivers

Three options shipped. Pick whichever fits.

Driver Class When to use it
SQLite PromptCache\Stores\SqliteStore Default. No setup. Good for most apps.
File PromptCache\Stores\FileStore One JSON file. Handy for shipping a warm cache.
Redis PromptCache\Stores\RedisStore When you have multiple workers sharing a cache.

Want a different backend? Implement PromptCache\Contracts\Store. It has eight methods, none of them surprising.

Embedding providers

Provider Class Notes
Null PromptCache\Embeddings\NullEmbeddingProvider Local CRC32 thing. No keys needed. Rough quality.
OpenAI PromptCache\Embeddings\OpenAIEmbeddingProvider Uses text-embedding-3-small by default.
Ollama PromptCache\Embeddings\OllamaEmbeddingProvider Hits a local Ollama server.

The Null provider exists so the semantic API works out of the box without anyone having to wire up an API key. It's not great. Use one of the real ones when it matters.

Laravel

The service provider auto-discovers. If you want to tweak settings, publish the config:

php artisan vendor:publish --tag=prompt-cache-config

Then use the facade wherever:

use PromptCache;

$reply = PromptCache::semantic($prompt, fn () => $client->chat($prompt));

Or set things from .env:

PROMPT_CACHE_DRIVER=redis
PROMPT_CACHE_EMBEDDINGS=openai
OPENAI_API_KEY=sk-...
PROMPT_CACHE_THRESHOLD=0.9

Examples

There's a folder of runnable scripts under examples/:

  • 01_exact_cache.php - the most basic case
  • 02_semantic_cache.php - rephrased prompt that still hits
  • 03_streaming.php - stream, cache, replay
  • 04_stats_and_debug.php - counters and the debug logger
  • 05_openai_real.php - real OpenAI embeddings, needs OPENAI_API_KEY

Run them with php examples/01_exact_cache.php from the package root. They use a tiny autoloader so they work without composer install.

Tests

composer install
vendor/bin/pest

The suite covers the exact cache, semantic cache, cosine similarity, streaming, sqlite persistence, the file store, the normaliser and the stats counters.

What this isn't

It's a cache. There's no agent framework here, no RAG helpers, no chain-of-thought thing. Bring your own LLM library and put this in front of it.

License

MIT. See LICENSE.

About

Cache LLM prompt responses by exact hash or by embedding similarity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages