Skip to content

aer-org/art

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

165 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🎨 ART: Agent Runtime

ART Teaser

Alpha release β€” expect rough edges. We're iterating fast and would love your feedback.

Turn any existing project into a self-improving pipeline. Draw your own harness for agentic loops.

  • πŸ€– Auto Mode β€” Full auto 24/7, agents set up their own intuition into next experiment plan
  • πŸ§‘β€πŸ”¬ Manual Mode β€” Human can interfere via chat and instill their intuition for next trial
  • πŸ“Š Automated Experiment Tracking via Git
  • πŸ”’ Isolated containers for each agent, for proper sandboxing during evaluation
  • πŸ”„ Agentic loop customizable via art compose /my/project

Install

Prerequisites: Docker, Git, Node.js β‰₯ 20, Claude Code CLI

# Install ART (pick one)
npm install -g @aer-org/art
curl -fsSL https://raw.githubusercontent.com/aer-org/art/main/install.sh | bash

For your own projects, just point ART at any directory:

art run /my/project

Requires Node.js β‰₯ 20 and Docker (or Podman).

Quick example demo: autoresearch as a pipeline

ART can harness karpathy/autoresearch with clear stage separation: build stage modifies train.py, a separate test stage runs the experiment, and a review stage decides whether to keep or revert, all in isolated containers.

git clone https://github.com/aer-org/art
cd art/examples/autoresearch
art run .  # requires NVIDIA Ampere+ GPU

Why ART

Without ART With ART
One-off chat sessions, lost context Repeatable agent workflows with run history
Agent writes anywhere in your repo File-level mount permissions (rw / ro / hidden) per stage
No structure between steps Stage boundaries with transitions and retry logic
Can't resume after failure Checkpointed stages, resume from where you left off
Secrets leak into agent context Credential proxy + .env shadowed with /dev/null

30-Second Walkthrough

1. Run it:

art run /my/project

Each stage runs a Claude agent in its own Docker container. Your project is read-only by default β€” specific files get write access only where needed. Everything lands in __art__/:

my-project/
β”œβ”€β”€ src/, data/, ...                # Your project (read-only by default)
└── __art__/                        # All ART artifacts
    β”œβ”€β”€ PIPELINE.json               # Pipeline definition
    β”œβ”€β”€ PLAN.md                     # What you want built
    β”œβ”€β”€ src/                        # Agent-written code
    β”œβ”€β”€ outputs/                    # Run outputs
    β”œβ”€β”€ logs/                       # Per-stage logs
    └── runs/                       # Run history manifests

2. Customize your pipeline:

art compose /my/project

Opens a browser-based visual editor with an AI chat. Collaboratively design your pipeline β€” it becomes the contract that stages execute against. The default template: plan β†’ build β†’ test β†’ review, but you can design any pipeline.


Two Ways to Run

πŸ€– Auto Mode β€” Goes full auto 24/7. The planner agent sets up its own intuition into each experiment plan, runs trials, reviews results, and loops back. You wake up to a git log of everything it tried.

πŸ§‘β€πŸ”¬ Manual Mode β€” Human in the loop. You can interfere via chat at any point and instill your own intuition for the next trial. Good for early exploration where you want to steer.

All experiment history is tracked automatically via Git β€” every run, every plan revision, every result.


How Pipelines Work

A pipeline is a list of stages connected by transitions. Each stage runs in its own container and communicates via output markers.

Here's what the default template looks like β€” but ART has no hardcoded stage knowledge. It understands stages, transitions, mounts, and markers. Design any pipeline via art compose.

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  BUILD   β”‚ ← reads PLAN.md, writes code to src/
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚ [STAGE_COMPLETE]
         β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   TEST   β”‚ ← runs tests against src/
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚ [STAGE_COMPLETE]
         β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  REVIEW  β”‚ ← examines outputs, writes REPORT.md
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚ [STAGE_COMPLETE]
         β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ HISTORY  β”‚ ← distills insights into MEMORY.md
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Stage modes

  • Agent mode (default): Claude agent receives a prompt and works autonomously
  • Command mode: Runs shell commands via sh -c, parses markers from stdout

Transitions and retries

Stages emit markers like [STAGE_COMPLETE] or [STAGE_ERROR: msg] to trigger transitions. Retry transitions re-send the prompt with the error description. Non-retry transitions advance to the next stage.

Resume on interrupt

Completed stages are checkpointed. On restart, execution resumes from the next incomplete stage with previous context.


Customizing Pipelines

art compose /my/project

Opens a ComfyUI-style browser-based visual editor (React + ReactFlow) where you can:

  • Drag-and-drop stage nodes and wire them with transition edges
  • Configure per-stage: prompt, mount policies (rw/ro/hidden), container image
  • Browse your project's mount tree and override sub-directory permissions
  • Pick from preset base images (Ubuntu, CUDA, Python, Node, ROS)
  • Chat with an AI agent to collaboratively design your plan
  • Review diffs with hunk-based AI edit suggestions

Security

Agents run in containers with minimal access:

  • File-level mount permissions β€” project defaults to read-only; write access granted per stage
  • .env shadowed with /dev/null β€” secrets never exposed inside containers
  • Credential proxy β€” containers never see real API keys; a host-side proxy injects credentials per-request
  • Per-stage isolation β€” each stage gets independent mount configuration
  • Mount allowlist β€” additional mounts validated against external allowlist

ART is designed to reduce accidental access and constrain agent execution, but it is not a formal sandbox. See docs/SECURITY.md for the full trust model and known limitations.


CLI Reference

art compose <path>              # Open visual pipeline editor
art compose --headless <path>   # One-shot planning agent (no browser, CI-friendly)
art run <path>                  # Execute pipeline
art run --skip-preflight <path> # Skip Claude CLI/auth check (command-mode only)
art update                      # Rebuild all images in the registry

Status

ART is under active development. Core pipeline execution, the visual editor, and container isolation are functional. The API surface may change between minor versions.

Supported: Linux, macOS Β· Not supported: Windows (use WSL)


Documentation

Document Content
docs/PIPELINE-REFERENCE.md PIPELINE.json field reference β€” stages, mounts, transitions, command mode
docs/ARCHITECTURE.md System architecture β€” pipeline FSM, container runtime, mount isolation
docs/REQUIREMENTS.md Design philosophy and decisions
docs/SECURITY.md Trust model, mount isolation, credential proxy
docs/TESTING.md Test files, mocking patterns, E2E tests, CI configuration

Development

git clone https://github.com/aer-org/art.git
cd art
npm install
npm run build        # Compile TypeScript
npm run dev          # Watch mode
./container/build.sh # Rebuild agent container
npm test             # Unit tests
npm run test:e2e     # E2E tests (Docker required)

License

Released under Apache-2.0.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors