AgentAudit

AI Agent Regression Testing Platform

AgentAudit is a lightweight testing platform for AI agents and LLM-powered applications. Define expected behaviors as test cases, then run them automatically on every code push, prompt change, or model switch.

Quick Start

Install

pip install -e .

Initialize

agentaudit init

This creates an example test suite in tests/ and a sample prompt in prompts/.

Run Tests

agentaudit run --suite ./tests

View History

agentaudit history

Compare Runs

agentaudit diff <run_id_a> <run_id_b>

View Run Details

agentaudit report <run_id>

Start Dashboard

agentaudit dashboard

Opens the web dashboard at http://localhost:8080.

Test Suite Format

Test suites are defined in YAML or JSON:

suite: customer_support_agent
model: claude-sonnet-4-6
system_prompt_file: ../prompts/support_v2.txt
tests:
  - id: escalation_trigger
    input: "I want to speak to a human RIGHT NOW"
    expect:
      contains: "transfer"
      not_contains: "I cannot help"

  - id: off_topic_refusal
    input: "What's the weather in Tokyo?"
    expect:
      contains_any: ["outside my scope", "I'm here to help with"]
      max_tokens: 80

Evaluation Methods

Method	Description	When to Use
`exact`	String equality	Structured outputs, JSON
`contains` / `not_contains`	Substring check	Keyword presence
`contains_any`	Any substring match	Multiple valid keywords
`regex`	Pattern match	Format validation
`max_tokens`	Response length check	Brevity enforcement
`sentiment`	Semantic similarity via embeddings	Meaning preservation
`judge`	Secondary LLM grades the response	Complex behavior, tone, safety

CI Integration

GitHub Actions

agentaudit init --ci github

This generates a .github/workflows/agentaudit.yml that runs your test suite on every push and PR.

Manual CI

- name: Run AgentAudit
  run: agentaudit run --suite ./tests --fail-on-regression

Exit code 0 = all pass, 1 = any failure.

Data Storage

AgentAudit uses local JSON files for data persistence (stored in ~/.agentaudit/data/). Configure the path with the AGENTAUDIT_DATA_DIR environment variable.

Environment Variables

Variable	Description
`ANTHROPIC_API_KEY`	Required for Claude-based agents + judge evaluator
`OPENAI_API_KEY`	Optional: for GPT-4o targets or embedding evaluation
`AGENTAUDIT_DATA_DIR`	Custom data directory path
`AGENTAUDIT_DASHBOARD_PORT`	Dashboard port (default: 8080)
`GITHUB_TOKEN`	Optional: for PR comment posting
`SLACK_WEBHOOK_URL`	Optional: alert notifications

Architecture

agentaudit/
├── cli/
│   ├── main.py          # Click CLI: init, run, history, diff, report, dashboard
│   ├── runner.py        # Loads suites, sends to LLM, collects responses
│   ├── evaluator.py     # Evaluation: exact, regex, semantic, judge
│   ├── parser.py        # YAML/JSON test suite parser
│   └── reporter.py      # CLI output formatting, exit codes
├── api/
│   ├── main.py          # FastAPI app
│   ├── routes/
│   │   ├── runs.py      # GET /runs, GET /runs/{id}/diff, POST /runs/trigger
│   │   ├── suites.py    # GET /suites
│   │   └── results.py   # GET /results
├── database.py          # JSON file-based data store
├── models.py            # Pydantic response models
├── dashboard/           # React + Tailwind dashboard
└── tests/               # Example test suite

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agentaudit		agentaudit
dashboard		dashboard
prompts		prompts
tests		tests
.gitignore		.gitignore
README.md		README.md
agentaudit-proposal.md		agentaudit-proposal.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentAudit

Quick Start

Install

Initialize

Run Tests

View History

Compare Runs

View Run Details

Start Dashboard

Test Suite Format

Evaluation Methods

CI Integration

GitHub Actions

Manual CI

Data Storage

Environment Variables

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentAudit

Quick Start

Install

Initialize

Run Tests

View History

Compare Runs

View Run Details

Start Dashboard

Test Suite Format

Evaluation Methods

CI Integration

GitHub Actions

Manual CI

Data Storage

Environment Variables

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages