Argus — AI-Native Web Testing

Every bug has nowhere to hide.

Stop writing tests. Start describing them.

Argus is an open-source, AI-native test platform that lets you test web applications by simply describing what you want to check — in plain English. No Selenium. No Playwright scripts. No page objects to maintain.

argus run --goal "Submit the contact form and verify the success message" \
          --url "https://example.com/contact"

An LLM plans the browser actions, Playwright executes them, and a second LLM evaluates whether the goal was met — with screenshots, DOM snapshots, and structured reports at every step. When something fails, Argus recovers and retries instead of giving up.

Built for teams that want AI-driven test automation without the script tax.

中文文档

Overview

Argus bridges the gap between human intent and automated testing. Instead of writing brittle Selenium scripts or complex Playwright code, you express what you want to test in plain language:

argus run --goal "Test the login form — check required fields and error messages" --url "https://example.com/login"

The system handles planning, execution, failure recovery, evidence collection (screenshots, DOM snapshots), and report generation. Built for teams that want AI-driven test automation without maintaining script-heavy test suites.

When to use Argus

Scenario	Description
Exploratory testing	Quickly verify a page renders correctly, links work, forms submit
Regression smoke tests	Reuse saved auth states to check post-login pages across deployments
Form & login flow validation	Test validation rules, error states, and submission flows
Pre-release sanity checks	Automate a batch of URL checks before a release
Demo / prototype QA	Get test coverage on early-stage products where UI changes frequently

Features

Natural language test execution — Describe what to test; Argus figures out the steps.
LLM-driven Planner & Evaluator — Two specialized prompts: one plans browser actions, the other judges if the goal is met. Both support business-rule extensions per project or task.
Self-healing execution — Failed actions don't abort the task. Argus records the failure, re-observes the page, and retries with failure-aware planning (default 2 recovery attempts).
Playwright browser automation — Chromium, Firefox, WebKit. Supports goto, click, type, select, wait, screenshot, and DOM snapshots with smart selector recommendations.
Browser auth state management — Save login state (cookies, localStorage) once and reuse across tasks via argus auth save / list and --auth-state.
Structured reporting — HTML reports (human-readable with collapsible steps, screenshots, click-to-enlarge) and JSON reports (machine-readable) for every task.
Task observability — Per-task execution timeline persisted in SQLite, real-time WebSocket streaming, LLM call traces (full prompt/response/error), and ZIP debug bundles for offline analysis.
Model configuration management — Multiple LLM provider configs stored in SQLite with encrypted API keys (Fernet), assignable per task.
Prompt business extensions — Append custom rules to Planner/Evaluator prompts at the project or task level without touching built-in templates.
Sensitive data redaction — Recursively masks api_key, password, token, authorization, etc. in logs, traces, and debug bundles.
Web Console — Vue 3 + Element Plus SPA for managing projects, tasks, models, and viewing reports with execution timeline and LLM debug tabs.
REST API + WebSocket — Full RESTful API with OpenAPI docs, real-time task event streaming via WebSocket.
Docker deployment — Containerized with SSRF protection, CORS/WebSocket origin validation, rate limiting, optional API token auth, automated DB backups, and schema migrations.

Quick Start

Prerequisites

Python 3.11+
Playwright browser environment
An OpenAI Chat Completions-compatible LLM API

Install

pip install -e ".[dev]"
argus --version

Install Playwright Chromium:

playwright install chromium

Configure LLM

argus config llm

This walks you through API Key, endpoint, and model name. Configuration is saved to the database (encrypted).

Verify connectivity:

argus llm check

Run Your First Test

argus run --goal "Open the page and take a screenshot" --url "https://httpbin.org"

CLI Reference

Command	Description
`argus serve`	Start the FastAPI web server
`argus run --goal <text> --url <url>`	Execute a black-box test task
`argus run --create-only`	Create a task snapshot without execution
`argus browser check --url <url>`	Debug browser capabilities
`argus auth save --url <url>`	Save browser login state
`argus auth list`	List saved browser login states
`argus llm check`	Verify LLM API connectivity
`argus config llm`	Interactive LLM configuration
`argus config llm --advanced`	Configure advanced parameters (max tokens, temperature, retries)

`argus run` Options

Option	Description
`--goal`	Test goal in natural language
`--url`	Target URL
`--headed`	Show browser window during execution
`--auth-state <name>`	Reuse saved browser login state
`--no-screenshot`	Disable step screenshots
`--create-only`	Create task snapshot, don't execute
`--project <id>`	Associate task with a project
`--max-steps <n>`	Override max planning steps
`--timeout <s>`	Override execution timeout
`--planner-extension <file>`	Custom rules for Planner prompt
`--evaluator-extension <file>`	Custom rules for Evaluator prompt

Web Console & API

Start the web server:

argus serve
# Opens at http://localhost:8000

The Web Console (Vue 3 SPA) provides:

Dashboard — Overview of projects and tasks
Projects — CRUD, prompt extension editor with live system prompt preview
Tasks — Create, start, stop; view reports, execution timeline, and LLM debug traces
Models — Manage LLM provider configurations, test connectivity

Key API Endpoints

Method	Path	Description
GET	`/health`	Health check
GET/POST	`/argus/api/projects`	List / create projects
GET/POST	`/argus/api/tasks`	List / create tasks
POST	`/argus/api/tasks/{id}/start`	Start task execution
POST	`/argus/api/tasks/{id}/stop`	Stop running task
GET	`/argus/api/tasks/{id}/report`	Get task report (HTML or JSON)
GET	`/argus/api/tasks/{id}/events`	Get execution timeline
GET	`/argus/api/tasks/{id}/llm-traces`	Get LLM call traces
GET	`/argus/api/tasks/{id}/debug-bundle`	Download debug bundle (ZIP)
GET/POST	`/argus/api/config/models`	Manage model configurations
WS	`/argus/api/ws/tasks/{id}`	Real-time task events
—	`/docs`	OpenAPI / Swagger UI

Architecture

┌─────────────────────────────────────────────────┐
│                   CLI (argus)                    │
│  run │ serve │ browser │ auth │ llm │ config     │
└──────────┬──────────────────────────────────────┘
           │
┌──────────▼──────────────────────────────────────┐
│              FastAPI Web Server                  │
│  REST API │ WebSocket │ Vue 3 Console (SPA)      │
└──────────┬──────────────────────────────────────┘
           │
┌──────────▼──────────────────────────────────────┐
│              Black-box Agent                     │
│  ┌─────────┐  ┌──────────┐  ┌───────────┐       │
│  │ Planner │─►│ Executor │─►│ Evaluator │       │
│  │  (LLM)  │  │Playwright│  │  (LLM)    │       │
│  └─────────┘  └──────────┘  └───────────┘       │
│         │           │              │             │
│         ▼           ▼              ▼             │
│   Step Logs    Screenshots    Issue Records      │
└──────────┬──────────────────────────────────────┘
           │
┌──────────▼──────────────────────────────────────┐
│              Infrastructure                      │
│  SQLite │ File System │ Event Bus │ Task Queue   │
└─────────────────────────────────────────────────┘

Execution flow:

Planner (LLM) receives the goal + page snapshot, outputs next browser action
Executor runs the action via Playwright, captures screenshot and DOM snapshot
Evaluator (LLM) assesses whether the goal is achieved
If not satisfied, loop back to Planner with updated context
On failure, recovery logic re-observes the page and re-plans (up to 2 retries)
When done, generate HTML + JSON reports

Prompt Extension System

Argus separates built-in prompts from user extensions:

Built-in templates (argus_py/llm/prompts/) — Planner and Evaluator prompts shipped with the package, not overridable.
Business extensions — Append custom rules per project or per task via parameters.prompt_extensions.{planner,evaluator}.

Concatenation order: Built-in → Project extension → Task extension

This allows tailoring test behavior per application without forking the codebase. The Web Console provides a Markdown editor with live system-prompt preview.

Tech Stack

Component	Choice
Python	3.11+
LLM API	OpenAI Chat Completions-compatible
Browser	Playwright (Chromium)
Web framework	FastAPI + Uvicorn
Frontend	TypeScript + Vue 3 + Element Plus + Vite
Reporting	Jinja2 (HTML) + JSON
Database	SQLite (WAL mode)
Observability	SQLite events + JSONL traces + WebSocket
Deployment	Docker / Docker Compose

Project Structure

argus/
├── argus_py/
│   ├── cli/           # CLI entry points and interactive prompts
│   ├── api/           # FastAPI app, routes, schemas, middleware, static hosting
│   ├── core/          # Constants, paths, enums, exceptions, IDs
│   ├── config/        # Configuration loading, model config service, SQLite storage
│   ├── llm/           # LLM client, provider adapters, prompts, parsing, retry
│   ├── observability/ # Audit, redaction, LLM traces
│   ├── task/          # Task model, state machine, SQLite storage, timeline, lifecycle
│   ├── blackbox/      # Planner, Executor, Evaluator, recovery
│   ├── browser/       # Playwright lifecycle, actions, selectors, snapshots
│   ├── report/        # Report model, HTML/JSON export
│   ├── project/       # Project model, SQLite storage, CRUD
│   ├── infra/         # SQLite infra, migrations, task queue, event bus
│   ├── execution/     # Task runner facade
│   ├── runtime/       # DI container
│   └── whitebox/      # Java white-box analysis stub (planned)
├── frontend/          # TypeScript + Vite + Vue 3 SPA source
├── config/            # Configuration files (logging.yaml, server.yaml)
├── docs/              # Documentation
├── tests/             # Unit, contract, and integration tests
├── examples/          # Example task JSON files
├── scripts/           # Utility scripts (backup, cleanup)
├── outputs/           # Runtime artifacts (reports, screenshots, traces) — gitignored
└── java_analyzer/     # Java analyzer submodule stub (planned)

Deployment

Argus supports Docker-based deployment for private networks. See the deployment guide for:

Docker Compose setup
SSRF protection and CORS configuration
API token authentication
Automated DB backups
Schema migrations
Security hardening

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github		.github
argus_py		argus_py
config		config
docs		docs
examples		examples
frontend		frontend
java_analyzer		java_analyzer
outputs		outputs
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
README.zh.md		README.zh.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Argus — AI-Native Web Testing

Overview

When to use Argus

Features

Quick Start

Prerequisites

Install

Configure LLM

Run Your First Test

CLI Reference

`argus run` Options

Web Console & API

Key API Endpoints

Architecture

Prompt Extension System

Tech Stack

Project Structure

Deployment

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Argus — AI-Native Web Testing

Overview

When to use Argus

Features

Quick Start

Prerequisites

Install

Configure LLM

Run Your First Test

CLI Reference

argus run Options

Web Console & API

Key API Endpoints

Architecture

Prompt Extension System

Tech Stack

Project Structure

Deployment

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`argus run` Options

Packages