Skip to content

bkarras12/agentaudit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentAudit

AI Agent Regression Testing Platform

AgentAudit is a lightweight testing platform for AI agents and LLM-powered applications. Define expected behaviors as test cases, then run them automatically on every code push, prompt change, or model switch.

Quick Start

Install

pip install -e .

Initialize

agentaudit init

This creates an example test suite in tests/ and a sample prompt in prompts/.

Run Tests

agentaudit run --suite ./tests

View History

agentaudit history

Compare Runs

agentaudit diff <run_id_a> <run_id_b>

View Run Details

agentaudit report <run_id>

Start Dashboard

agentaudit dashboard

Opens the web dashboard at http://localhost:8080.

Test Suite Format

Test suites are defined in YAML or JSON:

suite: customer_support_agent
model: claude-sonnet-4-6
system_prompt_file: ../prompts/support_v2.txt
tests:
  - id: escalation_trigger
    input: "I want to speak to a human RIGHT NOW"
    expect:
      contains: "transfer"
      not_contains: "I cannot help"

  - id: off_topic_refusal
    input: "What's the weather in Tokyo?"
    expect:
      contains_any: ["outside my scope", "I'm here to help with"]
      max_tokens: 80

Evaluation Methods

Method Description When to Use
exact String equality Structured outputs, JSON
contains / not_contains Substring check Keyword presence
contains_any Any substring match Multiple valid keywords
regex Pattern match Format validation
max_tokens Response length check Brevity enforcement
sentiment Semantic similarity via embeddings Meaning preservation
judge Secondary LLM grades the response Complex behavior, tone, safety

CI Integration

GitHub Actions

agentaudit init --ci github

This generates a .github/workflows/agentaudit.yml that runs your test suite on every push and PR.

Manual CI

- name: Run AgentAudit
  run: agentaudit run --suite ./tests --fail-on-regression

Exit code 0 = all pass, 1 = any failure.

Data Storage

AgentAudit uses local JSON files for data persistence (stored in ~/.agentaudit/data/). Configure the path with the AGENTAUDIT_DATA_DIR environment variable.

Environment Variables

Variable Description
ANTHROPIC_API_KEY Required for Claude-based agents + judge evaluator
OPENAI_API_KEY Optional: for GPT-4o targets or embedding evaluation
AGENTAUDIT_DATA_DIR Custom data directory path
AGENTAUDIT_DASHBOARD_PORT Dashboard port (default: 8080)
GITHUB_TOKEN Optional: for PR comment posting
SLACK_WEBHOOK_URL Optional: alert notifications

Architecture

agentaudit/
├── cli/
│   ├── main.py          # Click CLI: init, run, history, diff, report, dashboard
│   ├── runner.py        # Loads suites, sends to LLM, collects responses
│   ├── evaluator.py     # Evaluation: exact, regex, semantic, judge
│   ├── parser.py        # YAML/JSON test suite parser
│   └── reporter.py      # CLI output formatting, exit codes
├── api/
│   ├── main.py          # FastAPI app
│   ├── routes/
│   │   ├── runs.py      # GET /runs, GET /runs/{id}/diff, POST /runs/trigger
│   │   ├── suites.py    # GET /suites
│   │   └── results.py   # GET /results
├── database.py          # JSON file-based data store
├── models.py            # Pydantic response models
├── dashboard/           # React + Tailwind dashboard
└── tests/               # Example test suite

License

MIT

About

AI agent regression testing platform — CI/CD-style test suites for LLM-powered applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors