Skip to content

ColinHarker/agentest

Repository files navigation

Agentest

Universal testing and evaluation toolkit for AI agents.

PyPI version Python versions CI License Downloads


pip install agentest

Get Started

import agentest

# Auto-record all LLM calls (works with Anthropic and OpenAI SDKs)
agentest.instrument()

# Run your agent and capture a trace
result, trace = agentest.run(my_agent, "Summarize README.md", task="Summarize")

# Evaluate it
for r in agentest.evaluate(trace):
    print(f"{r.evaluator}: {'PASS' if r.passed else 'FAIL'}")

That's it. Three lines to instrument, trace, and evaluate any agent — no matter what framework or LLM provider you use.

What You Get

  • Record & Replay — Capture real agent sessions, replay them deterministically without LLM calls
  • Tool Mocking — Mock any tool with a fluent API: .when(...).returns(...)
  • 10 Built-in Evaluators — Task completion, safety, cost, latency, tool usage, LLM judges, and more
  • Auto-Instrumentationagentest.instrument() patches Anthropic/OpenAI clients with zero code changes
  • Framework Adapters — LangChain, CrewAI, AutoGen, LlamaIndex, Claude Agent SDK, OpenAI Agents SDK
  • MCP Server Testing — Protocol compliance, schema validation, and security testing
  • pytest Plugin — Auto-registered fixtures, markers, and CLI flags
  • Benchmarking — Compare pass rates, cost, and latency across models
  • CLIagentest evaluate, agentest replay, agentest summary, and more
  • Web Dashboard — Browse and explore traces in your browser

Learn More

License

MIT

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors