Prompt version control and A/B testing for LLM applications.
pip install pguard-llmYou tweak a prompt. It gets better. You tweak it again. Now you don't remember what changed, which version was best, or how much each version costs to run.
pguard-llm fixes this — version your prompts, run them, compare them.
from pguard import Prompt
p = Prompt("summarize")
# Save versions
p.save("v1", "Summarize this: {text}", description="Simple")
p.save("v2", "In 3 bullet points, summarize: {text}", description="Structured")
# Run against an LLM
result = p.run(
"v1",
provider="openai",
model="gpt-4o",
api_key="sk-...",
input_vars={"text": "Your article text here..."}
)
print(result.output)
print(result.cost_usd)
print(result.latency_ms)
# Compare v1 vs v2
comparison = p.compare("v1", "v2")
print(comparison.summary())| Provider | Install | Models |
|---|---|---|
| OpenAI | pip install openai |
gpt-4o, gpt-4o-mini, ... |
| Anthropic | pip install anthropic |
claude-sonnet-4, claude-haiku-4 |
| Gemini | pip install google-genai |
gemini-2.5-flash, gemini-1.5-pro |
pip install "pguard-llm[openai]"
pip install "pguard-llm[anthropic]"
pip install "pguard-llm[gemini]"
pip install "pguard-llm[all]"# File storage (default) — zero setup
p = Prompt("summarize", storage="file")
# SQLite — better for querying
p = Prompt("summarize", storage="sqlite")comparison = p.compare("v1", "v2")
summary = comparison.summary()
# summary contains:
# - latency_ms: avg latency per version + winner
# - cost_usd: avg cost per version + winner
# - quality_score: avg quality per version + winner
# - tokens_avg: avg tokens per versionpguard list # list all prompts
pguard versions summarize # list versions
pguard show summarize v1 # show template
pguard runs summarize v1 # show run history
pguard compare summarize v1 v2 # compare versionsresult.output # LLM response text
result.cost_usd # Cost in USD
result.latency_ms # Latency in milliseconds
result.tokens_in # Input tokens
result.tokens_out # Output tokens
result.quality_score # Quality score (0-1)
result.provider # Provider used
result.model # Model usedMIT