GitHub - ColinHarker/agentest

Universal testing and evaluation toolkit for AI agents.

pip install agentest

Get Started

import agentest

# Auto-record all LLM calls (works with Anthropic and OpenAI SDKs)
agentest.instrument()

# Run your agent and capture a trace
result, trace = agentest.run(my_agent, "Summarize README.md", task="Summarize")

# Evaluate it
for r in agentest.evaluate(trace):
    print(f"{r.evaluator}: {'PASS' if r.passed else 'FAIL'}")

That's it. Three lines to instrument, trace, and evaluate any agent — no matter what framework or LLM provider you use.

What You Get

Record & Replay — Capture real agent sessions, replay them deterministically without LLM calls
Tool Mocking — Mock any tool with a fluent API: .when(...).returns(...)
10 Built-in Evaluators — Task completion, safety, cost, latency, tool usage, LLM judges, and more
Auto-Instrumentation — agentest.instrument() patches Anthropic/OpenAI clients with zero code changes
Framework Adapters — LangChain, CrewAI, AutoGen, LlamaIndex, Claude Agent SDK, OpenAI Agents SDK
MCP Server Testing — Protocol compliance, schema validation, and security testing
pytest Plugin — Auto-registered fixtures, markers, and CLI flags
Benchmarking — Compare pass rates, cost, and latency across models
CLI — agentest evaluate, agentest replay, agentest summary, and more
Web Dashboard — Browse and explore traces in your browser

Learn More

Quick Start Guide — install to first passing test in under a minute
Full Documentation — guides, API reference, and best practices
Examples — working code you can run
Best Practices — rollout order, project structure, CI/CD setup

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
scripts		scripts
src/agentest		src/agentest
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
action.yml		action.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Get Started

What You Get

Learn More

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Get Started

What You Get

Learn More

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages