AI Agent Regression Testing Platform
AgentAudit is a lightweight testing platform for AI agents and LLM-powered applications. Define expected behaviors as test cases, then run them automatically on every code push, prompt change, or model switch.
pip install -e .agentaudit initThis creates an example test suite in tests/ and a sample prompt in prompts/.
agentaudit run --suite ./testsagentaudit historyagentaudit diff <run_id_a> <run_id_b>agentaudit report <run_id>agentaudit dashboardOpens the web dashboard at http://localhost:8080.
Test suites are defined in YAML or JSON:
suite: customer_support_agent
model: claude-sonnet-4-6
system_prompt_file: ../prompts/support_v2.txt
tests:
- id: escalation_trigger
input: "I want to speak to a human RIGHT NOW"
expect:
contains: "transfer"
not_contains: "I cannot help"
- id: off_topic_refusal
input: "What's the weather in Tokyo?"
expect:
contains_any: ["outside my scope", "I'm here to help with"]
max_tokens: 80| Method | Description | When to Use |
|---|---|---|
exact |
String equality | Structured outputs, JSON |
contains / not_contains |
Substring check | Keyword presence |
contains_any |
Any substring match | Multiple valid keywords |
regex |
Pattern match | Format validation |
max_tokens |
Response length check | Brevity enforcement |
sentiment |
Semantic similarity via embeddings | Meaning preservation |
judge |
Secondary LLM grades the response | Complex behavior, tone, safety |
agentaudit init --ci githubThis generates a .github/workflows/agentaudit.yml that runs your test suite on every push and PR.
- name: Run AgentAudit
run: agentaudit run --suite ./tests --fail-on-regressionExit code 0 = all pass, 1 = any failure.
AgentAudit uses local JSON files for data persistence (stored in ~/.agentaudit/data/). Configure the path with the AGENTAUDIT_DATA_DIR environment variable.
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY |
Required for Claude-based agents + judge evaluator |
OPENAI_API_KEY |
Optional: for GPT-4o targets or embedding evaluation |
AGENTAUDIT_DATA_DIR |
Custom data directory path |
AGENTAUDIT_DASHBOARD_PORT |
Dashboard port (default: 8080) |
GITHUB_TOKEN |
Optional: for PR comment posting |
SLACK_WEBHOOK_URL |
Optional: alert notifications |
agentaudit/
├── cli/
│ ├── main.py # Click CLI: init, run, history, diff, report, dashboard
│ ├── runner.py # Loads suites, sends to LLM, collects responses
│ ├── evaluator.py # Evaluation: exact, regex, semantic, judge
│ ├── parser.py # YAML/JSON test suite parser
│ └── reporter.py # CLI output formatting, exit codes
├── api/
│ ├── main.py # FastAPI app
│ ├── routes/
│ │ ├── runs.py # GET /runs, GET /runs/{id}/diff, POST /runs/trigger
│ │ ├── suites.py # GET /suites
│ │ └── results.py # GET /results
├── database.py # JSON file-based data store
├── models.py # Pydantic response models
├── dashboard/ # React + Tailwind dashboard
└── tests/ # Example test suite
MIT