Automated regulatory compliance testing for AI systems. Generate framework-aligned test suites, exercise live UIs and APIs, and produce multi-expert risk assessments - from one open-source pipeline.
AIRTA is a free, production-ready AI Risk Testing Agent. Point it at your chatbot, copilot, or API; build structured compliance tests from rubrics such as the EU AI Act and OECD; run them through browser automation with saved auth and live previews; then assess every response against regulatory mandates. Export structured reports to AIRTA Systems when you are ready to operationalize results.
python start.py
# → http://localhost:8000| Capability | What you get |
|---|---|
| Rubric-grounded tests | Compliance suites generated from JSON regulatory frameworks - aligned to mandates, not ad-hoc prompts. |
| Real target interaction | Playwright drives the same UI your users see: login, cookies, blockers, multi-turn flows. |
| API or browser | One component config: transport: ui for SaaS apps, transport: api for HTTP chat endpoints. |
| 10 prompting strategies | Zero-shot through tree-of-thoughts, few-shot, self-reflection, and more - each tuned for regulatory scenarios. |
| Risk pipeline | Compliance logs → multi-expert LangGraph assessment → mandate rollups → export. |
| Operator-first UI | FastAPI web app: discover selectors, save auth, run suites, watch live browser previews, resolve login walls. |
| Runs anywhere | Single start.py bootstraps venv, Chromium, and the server. CLI subcommands for CI and scripting. |
You own the stack: local execution, your API keys, your targets, your logs under browser-bot/sites/<site>/<component>/logs/.
- Persistent auth - Session cookies, localStorage, and sessionStorage captured to
auth.jsonand replayed on every new browser context (including multi-origin dashboards). - Tiered fetchers - Pool, cluster, and human-mimic browser tiers with stealth, locale, and retry handling.
- Parallel test runs - Configurable concurrency with per-slot live screenshots during suite execution.
- Login & blocker handling - Detects login walls, rate limits, and cookie banners; guides re-auth from the UI.
- Layered config - Global defaults → site → component overrides; documented in YAML with inline comments.
- Local smoke target -
test-target/mock chat app for end-to-end validation without an external SaaS.
rubrics/ generate-tests/ browser-bot/ pipeline/
(EU AI Act, compliance JSON → Playwright runs → compliance_log.json
OECD, NIST…) + auth + logs │
▼
risk-level-agent/
pipeline_report.json
│
▼
export → AIRTA Systems
- Generate - LLM builds regulatory compliance prompts from a framework + strategy.
- Discover - Record login, input, submit, and response selectors (or wire an API).
- Run - Submit every prompt; capture responses and HTTP metadata.
- Risk-assess - Multi-expert + judge scoring per mandate.
- Export - Push structured reports to your AIRTA Systems program (optional).
Requirements: Python 3.10+, network for LLM calls (GEMINI_API_KEY in .env).
git clone <your-repo-url>
cd AIRTA
python start.pyOpen the URL printed in the terminal (default http://localhost:8000). API reference: /api/docs.
start.py creates airta-venv/, installs dependencies, runs playwright install chromium once, and starts the web UI.
python test-target/app.py # mock chat on :3000
# In the UI: site localhost:3000, component test-target - see test-target/README.mdpython -m venv airta-venv && source airta-venv/bin/activate
pip install -r requirements.txt
playwright install chromium
python web/app.pyRun from the repo root so .config and .env load correctly.
- Create site - Domain or URL (e.g.
www.example.com). - Add login - Browser opens with saved auth; sign in; Save auth.
- Connect target - Discovery (UI) or API endpoint; AIRTA writes
config.yaml. - Generate tests - Pick strategy + framework (EU AI Act, OECD, FRIA, NIST, PLD, …).
- Run tests - Live previews, compliance log, optional inline risk assessment.
- Export - Send
pipeline_report.jsonto AIRTA Systems when configured.
Terminal users: python main.py for the interactive menu, or direct subcommands below.
Strategies (under generate-tests/strategies/):
| Strategy | Pattern |
|---|---|
zero_shot |
Single compliance prompt |
multi_shot |
Multi-turn regulatory dialogue |
few_shot |
In-context examples before the test case |
iterative |
Refinement across turns |
chain_of_thought |
Reasoning-style prompts |
prompt_chaining |
Chained steps |
tree_of_thoughts |
Branching exploration |
self_consistency |
Multiple samples |
self_reflection |
Critique-and-revise |
directional_stimulus |
Stimulus-guided framing |
Frameworks (under rubrics/): eu_ai_act, oecd, nist_ai_rmf, fria_core, fria_extended, pld, and more.
python main.py generate --strategy zero_shot --framework eu_ai_act
python main.py generate --all| Command | Purpose |
|---|---|
python main.py |
Interactive site/component menu |
python main.py generate … |
Build compliance test suites from rubrics |
python main.py discover |
Browser-bot login & config menu |
python main.py run <suite.json> --site S --component C |
Execute suite against target |
python main.py run … --assess |
Run + risk assessment in one step |
python main.py risk-assess <compliance_log.json> |
Assess existing log |
python main.py export <pipeline_report.json> … |
Upload to AIRTA Systems |
Example:
python main.py run generate-tests/zero_shot/eu_ai_act.json \
--site chatgpt.com --component chat --assess| Layer | Location |
|---|---|
| Defaults | config.defaults.yaml |
| Secrets / cache | .env |
| LLM model | .config |
| Site | browser-bot/sites/<site>/config.yaml |
| Component | browser-bot/sites/<site>/<component>/config.yaml |
See config.defaults.yaml and annotated examples under browser-bot/sites/.
| Path | Role |
|---|---|
start.py |
Bootstrap venv + launch web UI |
main.py |
CLI: generate, discover, run, risk-assess, export |
web/ |
FastAPI backend + SPA |
generate-tests/ |
Rubric → compliance suite generation |
browser-bot/ |
Playwright runner, sites, auth, fetchers |
risk-level-agent/ |
Multi-expert + judge assessment |
pipeline/ |
Log conversion, export helpers |
rubrics/ |
Regulatory framework JSON |
test-target/ |
Local mock chat for smoke tests |
AIRTA is free and open source under the AIRTA License:
- Free use - Run AIRTA as-is for any purpose, including commercial production.
- No commercial modification - You may not modify the software and use or distribute those modifications commercially.
- Unmodified redistribution - Share copies freely with the license intact.
See LICENSE for full terms.
Port in use - The app auto-picks the next free port, or free 8000 with lsof -i :8000 and kill.
Login / auth - Re-run Add login if sessions expire; site folder name must match the URL host (e.g. www.example.com).
Chromium - start.py installs Playwright Chromium automatically.
AIRTA - AI Risk Testing Agent. Regulatory compliance testing, real browsers, structured risk reports. Run python start.py and start validating your AI systems.
