Alpha release β expect rough edges. We're iterating fast and would love your feedback.
Turn any existing project into a self-improving pipeline. Draw your own harness for agentic loops.
- π€ Auto Mode β Full auto 24/7, agents set up their own intuition into next experiment plan
- π§βπ¬ Manual Mode β Human can interfere via chat and instill their intuition for next trial
- π Automated Experiment Tracking via Git
- π Isolated containers for each agent, for proper sandboxing during evaluation
- π Agentic loop customizable via
art compose /my/project
Prerequisites: Docker, Git, Node.js β₯ 20, Claude Code CLI
# Install ART (pick one)
npm install -g @aer-org/art
curl -fsSL https://raw.githubusercontent.com/aer-org/art/main/install.sh | bashFor your own projects, just point ART at any directory:
art run /my/projectRequires Node.js β₯ 20 and Docker (or Podman).
Quick example demo: autoresearch as a pipeline
ART can harness karpathy/autoresearch with clear stage separation: build stage modifies train.py, a separate test stage runs the experiment, and a review stage decides whether to keep or revert, all in isolated containers.
git clone https://github.com/aer-org/art
cd art/examples/autoresearch
art run . # requires NVIDIA Ampere+ GPU| Without ART | With ART |
|---|---|
| One-off chat sessions, lost context | Repeatable agent workflows with run history |
| Agent writes anywhere in your repo | File-level mount permissions (rw / ro / hidden) per stage |
| No structure between steps | Stage boundaries with transitions and retry logic |
| Can't resume after failure | Checkpointed stages, resume from where you left off |
| Secrets leak into agent context | Credential proxy + .env shadowed with /dev/null |
1. Run it:
art run /my/projectEach stage runs a Claude agent in its own Docker container. Your project is read-only by default β specific files get write access only where needed. Everything lands in __art__/:
my-project/
βββ src/, data/, ... # Your project (read-only by default)
βββ __art__/ # All ART artifacts
βββ PIPELINE.json # Pipeline definition
βββ PLAN.md # What you want built
βββ src/ # Agent-written code
βββ outputs/ # Run outputs
βββ logs/ # Per-stage logs
βββ runs/ # Run history manifests
2. Customize your pipeline:
art compose /my/projectOpens a browser-based visual editor with an AI chat. Collaboratively design your pipeline β it becomes the contract that stages execute against. The default template: plan β build β test β review, but you can design any pipeline.
π€ Auto Mode β Goes full auto 24/7. The planner agent sets up its own intuition into each experiment plan, runs trials, reviews results, and loops back. You wake up to a git log of everything it tried.
π§βπ¬ Manual Mode β Human in the loop. You can interfere via chat at any point and instill your own intuition for the next trial. Good for early exploration where you want to steer.
All experiment history is tracked automatically via Git β every run, every plan revision, every result.
A pipeline is a list of stages connected by transitions. Each stage runs in its own container and communicates via output markers.
Here's what the default template looks like β but ART has no hardcoded stage knowledge. It understands stages, transitions, mounts, and markers. Design any pipeline via art compose.
ββββββββββββ
β BUILD β β reads PLAN.md, writes code to src/
ββββββ¬ββββββ
β [STAGE_COMPLETE]
βΌ
ββββββββββββ
β TEST β β runs tests against src/
ββββββ¬ββββββ
β [STAGE_COMPLETE]
βΌ
ββββββββββββ
β REVIEW β β examines outputs, writes REPORT.md
ββββββ¬ββββββ
β [STAGE_COMPLETE]
βΌ
ββββββββββββ
β HISTORY β β distills insights into MEMORY.md
ββββββββββββ
- Agent mode (default): Claude agent receives a prompt and works autonomously
- Command mode: Runs shell commands via
sh -c, parses markers from stdout
Stages emit markers like [STAGE_COMPLETE] or [STAGE_ERROR: msg] to trigger transitions. Retry transitions re-send the prompt with the error description. Non-retry transitions advance to the next stage.
Completed stages are checkpointed. On restart, execution resumes from the next incomplete stage with previous context.
art compose /my/projectOpens a ComfyUI-style browser-based visual editor (React + ReactFlow) where you can:
- Drag-and-drop stage nodes and wire them with transition edges
- Configure per-stage: prompt, mount policies (rw/ro/hidden), container image
- Browse your project's mount tree and override sub-directory permissions
- Pick from preset base images (Ubuntu, CUDA, Python, Node, ROS)
- Chat with an AI agent to collaboratively design your plan
- Review diffs with hunk-based AI edit suggestions
Agents run in containers with minimal access:
- File-level mount permissions β project defaults to read-only; write access granted per stage
.envshadowed with/dev/nullβ secrets never exposed inside containers- Credential proxy β containers never see real API keys; a host-side proxy injects credentials per-request
- Per-stage isolation β each stage gets independent mount configuration
- Mount allowlist β additional mounts validated against external allowlist
ART is designed to reduce accidental access and constrain agent execution, but it is not a formal sandbox. See docs/SECURITY.md for the full trust model and known limitations.
art compose <path> # Open visual pipeline editor
art compose --headless <path> # One-shot planning agent (no browser, CI-friendly)
art run <path> # Execute pipeline
art run --skip-preflight <path> # Skip Claude CLI/auth check (command-mode only)
art update # Rebuild all images in the registryART is under active development. Core pipeline execution, the visual editor, and container isolation are functional. The API surface may change between minor versions.
Supported: Linux, macOS Β· Not supported: Windows (use WSL)
| Document | Content |
|---|---|
docs/PIPELINE-REFERENCE.md |
PIPELINE.json field reference β stages, mounts, transitions, command mode |
docs/ARCHITECTURE.md |
System architecture β pipeline FSM, container runtime, mount isolation |
docs/REQUIREMENTS.md |
Design philosophy and decisions |
docs/SECURITY.md |
Trust model, mount isolation, credential proxy |
docs/TESTING.md |
Test files, mocking patterns, E2E tests, CI configuration |
git clone https://github.com/aer-org/art.git
cd art
npm install
npm run build # Compile TypeScript
npm run dev # Watch mode
./container/build.sh # Rebuild agent container
npm test # Unit tests
npm run test:e2e # E2E tests (Docker required)Released under Apache-2.0.
