Argus

A multimodal code agent that works together with web use agent. Give an frontier agent a reliable tool to visually verify its code changes in a live browser environment.

Multimodal — captures browser screenshots and feeds them directly into the agent's context for visual verification
Dual-agent collaboration — a Code Agent fixes issues in a sandboxed Docker environment while a Web Agent visually verifies the result in a live browser
Extensible tool system — drop in any Tool subclass; built-in tools cover shell execution, browser control, and cross-agent delegation

Setup

Prerequisites

Python 3.10+
Docker

Install

pip install -e .
pip install -e ".[dev]"      # includes pytest, pytest-mock, ruff
playwright install chromium  # required for WebAgent

Run tests

pytest                  # run all tests
pytest -v               # verbose output
pytest -k test_name     # run a single test by name

Running the General-Purpose Agent

run_agent.py runs Argus on any task you describe. The agent executes inside a Docker container with a randomly assigned (or user-specified) host port forwarded, and optionally accepts images alongside the task description.

# Basic usage
python run_agent.py --task "Fix the off-by-one error in src/parser.js"

# Mount a custom Docker image and working directory
python run_agent.py --docker myproject:latest --workdir /app --task "Add dark mode support"

# Attach images (local files or URLs)
python run_agent.py --task "Reproduce the layout issue shown in the screenshot" --images bug.png

# Pin the forwarded port
python run_agent.py --task "..." --port 3000

Use configs/examples/general_purpose.yaml as a starting config:

cp configs/examples/general_purpose.yaml config.yaml  # then fill in api_key

JSON trajectory logs are saved under logs/ by default.

Running SWE-bench Multimodal

run_swebench_multimodal.py evaluates Argus on the SWE-bench Multimodal dataset.

# Run all test instances
python run_swebench_multimodal.py --config config.yaml

# Run specific instances
python run_swebench_multimodal.py --config config.yaml --instance-ids django__django-1234 flask__flask-5678

# Run on the dev split
python run_swebench_multimodal.py --config config.yaml --split dev

Use configs/examples/swebench_multimodal.yaml as a starting config:

cp configs/examples/swebench_multimodal.yaml config.yaml  # then fill in api_key

JSON trajectory logs are saved under logs/<instance_id>/ by default.

Long-term Memory

Argus optionally integrates with EverMemOS for memory across runs. When enabled, the agent retrieves relevant past interactions and injects them into the system prompt before each run, then stores the new interaction afterward.

Enable with enable_memory: true under agent: in config.yaml.

Architecture

argus/
├── agent.py          # Agent loop: maintains history, dispatches tool calls
├── web_agent.py      # WebAgent: browser-controlling verification agent
├── config.py         # Config dataclasses: AgentConfig, WebAgentConfig (each with a nested LLMConfig and MemoryConfig)
├── data/
│   ├── message.py    # Provider-agnostic types: SystemMessage, UserMessage, AssistantMessage, ToolMessage, ToolCall, ToolResult
│   └── content.py    # Content dataclass (text + base64 images)
├── llm/
│   ├── base.py       # LLMClient ABC with exponential-backoff retry
│   ├── openai.py     # OpenAI function-calling implementation
│   └── anthropic.py  # Anthropic tool_use implementation
├── tools/
│   ├── base.py           # Tool ABC
│   ├── shell.py          # Persistent bash session inside a Docker container
│   ├── browser.py        # BrowserTool: Playwright-based headless browser; actions: navigate, screenshot (grid overlay), click, dblclick, hover, drag_and_drop, type, press, scroll, reload, get_text, get_console_logs, get_element_bounds
│   ├── ask_web_agent.py  # AskWebAgentTool: lets the code agent delegate to WebAgent
│   └── checklist.py      # ChecklistTool: stateful in-run task planner
└── utils/
    └── evermind.py   # EverMind long-term memory client

Agent.run() calls LLMClient.chat() each step and dispatches any ToolCall objects to the matching Tool.execute(). The loop ends when the LLM replies without tool calls or max_steps is reached. ShellTool maintains a persistent bash session via pexpect — environment variables and working directory changes persist across calls. A host port is forwarded from the container and injected into the task prompt so the agent knows where to bind services.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github		.github
argus		argus
configs/examples		configs/examples
docs/images		docs/images
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_agent.py		run_agent.py
run_swebench_multimodal.py		run_swebench_multimodal.py
run_webgen_bench.py		run_webgen_bench.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Argus

Setup

Prerequisites

Install

Run tests

Running the General-Purpose Agent

Running SWE-bench Multimodal

Long-term Memory

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Argus

Setup

Prerequisites

Install

Run tests

Running the General-Purpose Agent

Running SWE-bench Multimodal

Long-term Memory

Architecture

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages