diff --git a/multi-agent/deploy/README.md b/multi-agent/deploy/README.md new file mode 100644 index 0000000..effef5f --- /dev/null +++ b/multi-agent/deploy/README.md @@ -0,0 +1,22 @@ +# deploy + +Production bring-up templates for the agents in `cmd/`. Unlike `examples/` +(which are end-to-end Go demos), each subdirectory here ships an installer +script and config templates you point at a real host. + +| Path | Target | +|---|---| +| [`linux/observer`](linux/observer/) | Generic `observer-server` install. SQLite-backed HTTP daemon (default `:8090`); foreground or `--systemd`. amd64 / arm64. | +| [`linux/driver`](linux/driver/) | Generic `driver-agent` install into a Claude Code project dir (no systemd — Claude Code launches the MCP server on demand). | +| [`linux/slave`](linux/slave/) | Generic `slave-agent` install on any Linux host. Foreground smoke mode or `--systemd` for a managed service. amd64 / arm64. | +| [`linux/compose-test`](linux/compose-test/) | docker-compose end-to-end test wiring all three installers together against a local observer; surfaces the device-code "join workspace" URLs each role prints on first start. | + +Pre-built binaries for each release are published at +. Each `install.sh` accepts +`--bin PATH` to point at a downloaded asset; otherwise it looks in `./bin/` +relative to itself. + +For the pre-wired prod-test bundle (`driver-prod`, `slave-jetson-prod`, +`slave-local-prod` against `agent.cs.ac.cn` / `ws-prod`), see +[`../tests/prod_test/`](../tests/prod_test/) — that bundle is for the +project's own staging environment and is gitignored. diff --git a/multi-agent/deploy/linux/bin/.gitignore b/multi-agent/deploy/linux/bin/.gitignore new file mode 100644 index 0000000..58dff62 --- /dev/null +++ b/multi-agent/deploy/linux/bin/.gitignore @@ -0,0 +1,4 @@ +# Binaries land here at deploy time. Pull pre-built ones from +# https://github.com/agentserver/loom/releases or build into this dir. +* +!.gitignore diff --git a/multi-agent/deploy/linux/compose-test/.gitignore b/multi-agent/deploy/linux/compose-test/.gitignore new file mode 100644 index 0000000..ed27d65 --- /dev/null +++ b/multi-agent/deploy/linux/compose-test/.gitignore @@ -0,0 +1,2 @@ +# Volumes / runtime state if you ever bind-mount instead of using named volumes +state/ diff --git a/multi-agent/deploy/linux/compose-test/Dockerfile b/multi-agent/deploy/linux/compose-test/Dockerfile new file mode 100644 index 0000000..8613d19 --- /dev/null +++ b/multi-agent/deploy/linux/compose-test/Dockerfile @@ -0,0 +1,13 @@ +FROM debian:bookworm-slim + +# install.sh uses sudo (no-op as root, but the binary must exist), xxd for +# random keygen, curl for healthchecks, ca-certs for HTTPS to agent.cs.ac.cn. +RUN apt-get update && apt-get install -y --no-install-recommends \ + bash sudo ca-certificates curl python3 xxd \ + && rm -rf /var/lib/apt/lists/* + +# Compose bind-mounts deploy/linux/ at /opt/loom/deploy at runtime. +# Each container's per-instance dir lives under /var/lib/loom/. +WORKDIR /var/lib/loom + +ENV API_KEY="COMPOSE_TESTKEY_dont_use_in_prod" diff --git a/multi-agent/deploy/linux/compose-test/README.md b/multi-agent/deploy/linux/compose-test/README.md new file mode 100644 index 0000000..4241d07 --- /dev/null +++ b/multi-agent/deploy/linux/compose-test/README.md @@ -0,0 +1,114 @@ +# compose-test + +End-to-end smoke test for the three `deploy/linux/` templates +(observer / driver / slave). Spins up a local observer in one container, +then exercises the slave and driver installers against it in two more +containers, and surfaces the device-code "join workspace" URL each agent +prints on first start. + +## What it verifies + +1. `observer/install.sh` renders `observer.yaml`, stages the binary, and + the daemon opens a TCP listener on `:8090`. +2. `slave/install.sh` renders config (with `--observer-url` / + `--workspace`), stages the binary, and `slave-agent` boots and reaches + the device-code OAuth step. +3. `driver/install.sh` renders the project (`config.yaml` + `.mcp.json`) + and `driver-agent register` reaches the device-code step. +4. The same `api-key` flows: observer's workspace bootstrap key ↔ + slave / driver `observer.api_key` ↔ per-agent token mint. + +Steps that require human interaction (clicking the device-code URLs) are +left as the operator's job — the test surfaces the URLs prominently. + +## What it does NOT cover + +- `claude` CLI inside the slave container — `chat`-skill tasks won't run. + Add `claude` (and `ANTHROPIC_API_KEY` via `--anthropic-key`) if you want + to exercise that path. +- The driver's `serve-mcp` step. Driver stops at `register` because + `serve-mcp` is invoked by Claude Code's `.mcp.json`, not directly. +- Reachability to `agent.cs.ac.cn` — the tunnel registration is a hard + external dependency. If your sandbox blocks that host, both driver and + slave will stall at the device-code step. + +## Prereqs + +1. Docker + `docker compose` v2 (or the legacy `docker-compose` binary). +2. The three `linux-amd64` binaries dropped into `../bin/`: + ```bash + cd ../bin + for n in observer-server driver-agent slave-agent; do + curl -L -o "$n.linux-amd64" \ + "https://github.com/agentserver/loom/releases/download/v0.0.1/$n.linux-amd64" + chmod +x "$n.linux-amd64" + done + ``` + Or build them with `make` / `go build` (see the per-role install.sh + error messages for the exact `go build` commands). + +## Run + +```bash +cd deploy/linux/compose-test +docker compose up --build +``` + +Expected output (interleaved across services): + +``` +loom-observer | ==> creating /var/lib/loom/compose-observer +loom-observer | ==> done. +loom-observer | ============================================== +loom-observer | Observer is up. Wire other agents with: +loom-observer | observer.url: http://observer:8090 +loom-observer | observer.workspace_id: ws-test +loom-observer | observer.api_key: COMPOSE_TESTKEY_dont_use_in_prod +loom-observer | ============================================== +loom-observer | 2026/05/21 17:30:00 observer-server listening on :8090 +loom-slave | ==> creating /var/lib/loom/compose-slave +loom-slave | ============================================== +loom-slave | slave: deploy succeeded. Starting slave-agent. +loom-slave | ============================================== +loom-slave | +loom-slave | Open this URL to authenticate: +loom-slave | +loom-slave | https://agent.cs.ac.cn/oauth2/device/verify?user_code=VhspjQLp +loom-slave | +loom-driver | Open this URL to register "compose-driver": +loom-driver | https://agent.cs.ac.cn/oauth2/device/verify?user_code=7AUTGKNs +``` + +Visit each URL in a browser, approve. After approval: + +- driver's `register` will exit 0 and the `loom-driver` container exits. +- slave's `slave-agent` continues running, mints an observer token, and + publishes its capability card. You can verify with: + ```bash + curl -sS -H "Authorization: Bearer COMPOSE_TESTKEY_dont_use_in_prod" \ + http://127.0.0.1:18090/api/agents | python3 -m json.tool + ``` + +## Tear down + +```bash +docker compose down -v # -v wipes the per-instance dirs in named volumes +docker image rm loom-deploy-test:latest +``` + +## Files + +| Path | Purpose | +|---|---| +| `Dockerfile` | debian:bookworm-slim + bash/sudo/curl/ca-certs/python3/xxd | +| `docker-compose.yml` | three services bind-mounting `../` and per-role entrypoints | +| `entrypoint-observer.sh` | runs `observer/install.sh` then execs `observer-server` | +| `entrypoint-driver.sh` | runs `driver/install.sh` then execs `driver-agent register` | +| `entrypoint-slave.sh` | runs `slave/install.sh` then execs `slave-agent` | + +## Tweaking the test + +Change `API_KEY` in `Dockerfile` and rebuild. Add a second slave by +duplicating the `slave:` service block with a different `container_name` +and `INSTANCE` (edit `entrypoint-slave.sh` to read INSTANCE from env, or +copy it to `entrypoint-slave-2.sh`). diff --git a/multi-agent/deploy/linux/compose-test/docker-compose.yml b/multi-agent/deploy/linux/compose-test/docker-compose.yml new file mode 100644 index 0000000..41fced0 --- /dev/null +++ b/multi-agent/deploy/linux/compose-test/docker-compose.yml @@ -0,0 +1,69 @@ +name: loom-deploy-test + +# Verifies the three deploy templates (observer / driver / slave) work +# end-to-end against a local observer. Each service runs the actual +# deploy/linux//install.sh inside a debian container, then execs +# the binary in foreground. +# +# The driver and slave hit https://agent.cs.ac.cn for tunnel registration on +# first start. Each will print a "device-code" verification URL — visit it +# in a browser and approve to advance the deploy. +# +# Bind-mounts: this file lives at deploy/linux/compose-test/, so `../` is +# deploy/linux/ and `../bin` is where the binaries are expected. +# +# Usage: +# 1. Drop pre-built or downloaded binaries into deploy/linux/bin/ +# (observer-server.linux-amd64, driver-agent.linux-amd64, slave-agent.linux-amd64) +# 2. cd deploy/linux/compose-test +# 3. docker compose up --build +# 4. Watch the logs — each role prints the URL you need to visit. + +services: + observer: + build: + context: . + dockerfile: Dockerfile + # Use the host network during build so apt-get can reach deb.debian.org + # in sandboxes where the docker daemon's default build network has no DNS. + network: host + image: loom-deploy-test:latest + container_name: loom-observer + ports: + - "127.0.0.1:18090:8090" + volumes: + - ../:/opt/loom/deploy:ro + command: ["bash", "/opt/loom/deploy/compose-test/entrypoint-observer.sh"] + healthcheck: + test: ["CMD-SHELL", "bash -c 'echo > /dev/tcp/127.0.0.1/8090' 2>/dev/null || exit 1"] + interval: 2s + timeout: 1s + retries: 30 + start_period: 5s + init: true + + driver: + image: loom-deploy-test:latest + container_name: loom-driver + depends_on: + observer: + condition: service_healthy + volumes: + - ../:/opt/loom/deploy:ro + command: ["bash", "/opt/loom/deploy/compose-test/entrypoint-driver.sh"] + init: true + stdin_open: true + tty: true + + slave: + image: loom-deploy-test:latest + container_name: loom-slave + depends_on: + observer: + condition: service_healthy + volumes: + - ../:/opt/loom/deploy:ro + command: ["bash", "/opt/loom/deploy/compose-test/entrypoint-slave.sh"] + init: true + stdin_open: true + tty: true diff --git a/multi-agent/deploy/linux/compose-test/entrypoint-driver.sh b/multi-agent/deploy/linux/compose-test/entrypoint-driver.sh new file mode 100755 index 0000000..b4aa083 --- /dev/null +++ b/multi-agent/deploy/linux/compose-test/entrypoint-driver.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# Compose-test entrypoint for the driver. +# 1. Sanity-check the bind-mounted binary +# 2. Run deploy/linux/driver/install.sh to render project dir + config + .mcp.json +# 3. Run `driver-agent register` — blocks on a device-code URL printed on stdout +# 4. After approval, register exits 0; the container exits (the next step, +# `claude` opening the .mcp.json, is up to the operator) + +set -euo pipefail + +INSTANCE=compose-driver +PROJECT=/var/lib/loom/$INSTANCE +BIN=/opt/loom/deploy/bin/driver-agent.linux-amd64 + +if [[ ! -x "$BIN" ]]; then + cat <&2 +ERROR: missing $BIN + Drop the driver binary into deploy/linux/bin/ before 'docker compose up': + curl -L -o deploy/linux/bin/driver-agent.linux-amd64 \\ + https://github.com/agentserver/loom/releases/download/v0.0.1/driver-agent.linux-amd64 + chmod +x deploy/linux/bin/driver-agent.linux-amd64 +EOF + exit 1 +fi + +# install.sh's --skill-bundle default path doesn't resolve inside the +# container's mount layout; pass empty (no bundle) explicitly. +/opt/loom/deploy/driver/install.sh \ + --project "$PROJECT" \ + --name "$INSTANCE" \ + --observer-url "http://observer:8090" \ + --workspace ws-test \ + --api-key "$API_KEY" \ + --token-dir "$PROJECT" \ + --skill-bundle "" \ + --bin "$BIN" + +cat <&2 +ERROR: missing $BIN + Drop the observer binary into deploy/linux/bin/ before 'docker compose up': + curl -L -o deploy/linux/bin/observer-server.linux-amd64 \\ + https://github.com/agentserver/loom/releases/download/v0.0.1/observer-server.linux-amd64 + chmod +x deploy/linux/bin/observer-server.linux-amd64 +EOF + exit 1 +fi + +/opt/loom/deploy/observer/install.sh \ + --name "$INSTANCE" \ + --user root \ + --loom-home "$LOOM" \ + --listen ":8090" \ + --workspace ws-test \ + --workspace-name "Compose Test Workspace" \ + --api-key "$API_KEY" \ + --bin "$BIN" + +cat <&2 +ERROR: missing $BIN + Drop the slave binary into deploy/linux/bin/ before 'docker compose up': + curl -L -o deploy/linux/bin/slave-agent.linux-amd64 \\ + https://github.com/agentserver/loom/releases/download/v0.0.1/slave-agent.linux-amd64 + chmod +x deploy/linux/bin/slave-agent.linux-amd64 +EOF + exit 1 +fi + +/opt/loom/deploy/slave/install.sh \ + --name "$INSTANCE" \ + --user root \ + --loom-home "$LOOM" \ + --observer-url "http://observer:8090" \ + --workspace ws-test \ + --api-key "$API_KEY" \ + --tag compose --tag test \ + --bin "$BIN" + +cat <` (override with `--bin PATH`). + ```bash + # Option A — pre-built (replace amd64 with arm64 for aarch64 hosts) + mkdir -p ../bin && curl -L -o ../bin/driver-agent.linux-amd64 \ + https://github.com/agentserver/loom/releases/download/v0.0.1/driver-agent.linux-amd64 + chmod +x ../bin/driver-agent.linux-amd64 + + # Option B — build from source (from multi-agent/ ) + CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -trimpath -ldflags='-s -w' \ + -o deploy/linux/bin/driver-agent.linux-amd64 ./cmd/driver-agent + ``` +2. **Claude Code** installed locally — `claude` on `PATH`. +3. **Shared ws-prod observer api-key** — pass via `--api-key` or hand-edit + `config.yaml` post-install. +4. A target **project directory** where you'll run `claude`. + +## Quick start + +```bash +./install.sh \ + --project ~/code/my-driver \ + --name driver-myhost \ + --observer-url http://observer.example.com:8090 \ + --workspace ws-prod \ + --api-key 'de4a8e22…' + +# one-time agentserver registration (device-code OAuth) +~/code/my-driver/driver-agent register --config ~/code/my-driver/config.yaml +# → open the printed verification URL; creds get written back into config.yaml + +cd ~/code/my-driver +claude +# In the Claude prompt: +# mcp__driver__list_agents +# expect your slaves to appear (after they've registered too) +``` + +## Flag reference + +| Flag | Default | Notes | +|---|---|---| +| `--project PATH` | (required) | Target dir; created if absent. Holds binary, config, `.mcp.json`, `.claude/`, `logs/`. | +| `--name NAME` | (required) | `discovery.display_name` and `observer.agent_id`. | +| `--observer-url URL` | (required) | `observer.url`, e.g. `http://observer.example.com:8090`. | +| `--workspace ID` | `ws-default` | `observer.workspace_id`. Must match a workspace defined on the observer. | +| `--desc TEXT` | `Linux driver-agent ()` | `discovery.description`. | +| `--api-key KEY` | (none) | Writes `observer.api_key`. Without this, edit `config.yaml` by hand. | +| `--skill-bundle PATH` | `../../../tests/prod_test/driver/.claude/skills/multiagent` if present | Skill dir to copy under `/.claude/skills/`. | +| `--token-dir PATH` | `~/.loom/` | Parent dir for `observer.token`. Must be absolute. | +| `--bin PATH` | `../bin/driver-agent.linux-` | Override the binary path (e.g., point at a downloaded release asset). | + +## Project layout after install + +``` +/ +├── driver-agent # binary, 0755 +├── config.yaml # 0600 — server, observer creds, driver_defaults +├── .mcp.json # Claude Code MCP server registration +├── .claude/ +│ └── skills/ +│ └── multiagent/ # only if --skill-bundle resolved +└── logs/ # audit logs (driver_defaults.audit_log_dir) + +~/.loom// +└── observer.token # 0600 — written on first start by observerclient +``` + +## Why no systemd unit? + +The driver process is owned by Claude Code via the project's `.mcp.json`. +Claude starts it on session open, talks to it over stdio, and tears it down +on exit. Running it under systemd would create a second copy that fights +for the same observer agent_id. + +If you need the driver MCP server up independent of any Claude session +(e.g., for testing), launch it manually: + +```bash +cd +./driver-agent serve-mcp --config ./config.yaml +``` + +## Reset / re-registration + +- **Rotate observer per-agent token** — `rm ~/.loom//observer.token` + and re-launch; agent re-registers and the old token is invalidated. +- **Rotate agentserver sandbox** — blank out `credentials.sandbox_id` and + `credentials.tunnel_token` in `config.yaml`, then re-run `driver-agent + register`. +- **Full cleanup** — `rm -rf ~/.loom/`. diff --git a/multi-agent/deploy/linux/driver/config.yaml.template b/multi-agent/deploy/linux/driver/config.yaml.template new file mode 100644 index 0000000..f8bdb7f --- /dev/null +++ b/multi-agent/deploy/linux/driver/config.yaml.template @@ -0,0 +1,49 @@ +server: + url: https://agent.cs.ac.cn + name: __AGENT_NAME__ + +# Auto-filled by `driver-agent register`: device-code OAuth flow against +# server.url writes the issued sandbox / tunnel / proxy creds back into +# THIS file. Run `register` once before launching the MCP server. +credentials: + sandbox_id: "" + tunnel_token: "" + proxy_token: "" + workspace_id: "" + short_id: "" + +discovery: + display_name: __AGENT_NAME__ + description: __DESCRIPTION__ + skills: [] + +listen_addr: 127.0.0.1:0 + +planner: + bin: claude + timeout_sec: 300 + extra_args: [] + +fanout: + max_concurrency: 2 + default_policy: "" + policy_by_skill: {} + subtask_defaults: + timeout_sec: 600 + max_budget_usd: 0 + +driver_defaults: + target_display_name: "" + task_timeout_sec: 600 + audit_log_dir: ./logs + disable_uid_check: true + max_dir_cache_entries: 50000 + artifact_transport: observer_lazy + +observer: + enabled: true + url: __OBSERVER_URL__ + workspace_id: __WORKSPACE_ID__ + agent_id: __AGENT_NAME__ + api_key: "" # paste the workspace bootstrap api-key here, then chmod 0600 this file + token_state_path: __LOOM_HOME__/observer.token diff --git a/multi-agent/deploy/linux/driver/install.sh b/multi-agent/deploy/linux/driver/install.sh new file mode 100755 index 0000000..8a50090 --- /dev/null +++ b/multi-agent/deploy/linux/driver/install.sh @@ -0,0 +1,157 @@ +#!/usr/bin/env bash +# Generic Linux driver install — sets up a Claude Code project that hosts +# the driver MCP server. +# +# Unlike the slave, the driver is NOT a long-running daemon: Claude Code +# launches `driver-agent serve-mcp` on demand via the project's .mcp.json, +# and shuts it down when the Claude session ends. So there's no systemd +# unit here — just a project directory with binary + config + .mcp.json. +# +# What it does: +# 1. Detects host arch (amd64 / arm64), picks ../bin/driver-agent.linux-. +# 2. Renders config.yaml + .mcp.json from templates into PROJECT_DIR. +# 3. Pre-creates the observer token-state parent dir. +# 4. Prints the one-time `register` command. +# +# Usage: +# ./install.sh --project ~/code/my-driver --name driver-myhost +# ./install.sh --project ~/code/my-driver --name driver-myhost --api-key 'de4a8e22…' +# +# Flags: +# --project PATH target project dir; will be created (REQUIRED) +# --name NAME agent display name (REQUIRED) +# --observer-url URL observer.url, e.g. http://observer.example.com:8090 (REQUIRED) +# --workspace ID observer.workspace_id (default: ws-default) +# --desc TEXT discovery.description +# --api-key KEY observer.api_key (skip manual edit; or hand-edit after) +# --skill-bundle PATH copy a multiagent skill bundle into .claude/skills/ +# (default: ../../../tests/prod_test/driver/.claude/skills/multiagent if present) +# --token-dir PATH observer token parent dir (default: ~/.loom/) +# --bin PATH override driver-agent binary path +# (default: ../bin/driver-agent.linux-{arch}) +# +# Binary download: +# https://github.com/agentserver/loom/releases/download/v0.0.1/driver-agent.linux-amd64 +# https://github.com/agentserver/loom/releases/download/v0.0.1/driver-agent.linux-arm64 + +set -euo pipefail + +HERE="$(cd "$(dirname "$0")" && pwd)" +BIN_DIR="$HERE/../bin" + +PROJECT="" +NAME="" +DESC="" +API_KEY="" +SKILL_BUNDLE="" +TOKEN_DIR="" +BIN_OVERRIDE="" +OBSERVER_URL="" +WORKSPACE_ID="ws-default" + +while (( $# )); do + case "$1" in + --project) PROJECT="$2"; shift 2 ;; + --name) NAME="$2"; shift 2 ;; + --desc) DESC="$2"; shift 2 ;; + --api-key) API_KEY="$2"; shift 2 ;; + --skill-bundle) SKILL_BUNDLE="$2"; shift 2 ;; + --token-dir) TOKEN_DIR="$2"; shift 2 ;; + --bin) BIN_OVERRIDE="$2"; shift 2 ;; + --observer-url) OBSERVER_URL="$2"; shift 2 ;; + --workspace) WORKSPACE_ID="$2"; shift 2 ;; + -h|--help) sed -n '2,40p' "$0"; exit 0 ;; + *) echo "unknown flag: $1" >&2; exit 2 ;; + esac +done + +[[ -n "$PROJECT" ]] || { echo "ERROR: --project is required" >&2; exit 2; } +[[ -n "$NAME" ]] || { echo "ERROR: --name is required" >&2; exit 2; } +[[ -n "$OBSERVER_URL" ]] || { echo "ERROR: --observer-url is required (e.g. http://observer.example.com:8090)" >&2; exit 2; } + +arch="$(uname -m)" +case "$arch" in + x86_64|amd64) CPU_ARCH=amd64 ;; + aarch64|arm64) CPU_ARCH=arm64 ;; + *) echo "ERROR: unsupported arch $arch" >&2; exit 2 ;; +esac +BIN_NAME="driver-agent.linux-$CPU_ARCH" +BIN="${BIN_OVERRIDE:-$BIN_DIR/$BIN_NAME}" +[[ -x "$BIN" ]] || { + echo "ERROR: missing $BIN" >&2 + echo " download: curl -L -o $BIN_DIR/$BIN_NAME \\" >&2 + echo " https://github.com/agentserver/loom/releases/download/v0.0.1/$BIN_NAME && chmod +x $BIN_DIR/$BIN_NAME" >&2 + echo " or build from multi-agent/ :" >&2 + echo " CGO_ENABLED=0 GOOS=linux GOARCH=$CPU_ARCH go build -trimpath -ldflags='-s -w' \\" >&2 + echo " -o deploy/linux/bin/$BIN_NAME ./cmd/driver-agent" >&2 + exit 2 +} + +DESC="${DESC:-Linux driver-agent ($NAME)}" +TOKEN_DIR="${TOKEN_DIR:-$HOME/.loom/$NAME}" +PROJECT_ABS="$(mkdir -p "$PROJECT" && cd "$PROJECT" && pwd)" + +# Default skill bundle = the multiagent skill shipped with the prod_test driver +if [[ -z "$SKILL_BUNDLE" && -d "$HERE/../../../tests/prod_test/driver/.claude/skills/multiagent" ]]; then + SKILL_BUNDLE="$HERE/../../../tests/prod_test/driver/.claude/skills/multiagent" +fi + +echo "==> staging into $PROJECT_ABS" +install -m 0755 "$BIN" "$PROJECT_ABS/driver-agent" + +sed \ + -e "s|__AGENT_NAME__|$NAME|g" \ + -e "s|__DESCRIPTION__|$DESC|g" \ + -e "s|__LOOM_HOME__|$TOKEN_DIR|g" \ + -e "s|__OBSERVER_URL__|$OBSERVER_URL|g" \ + -e "s|__WORKSPACE_ID__|$WORKSPACE_ID|g" \ + "$HERE/config.yaml.template" > "$PROJECT_ABS/config.yaml" +chmod 0600 "$PROJECT_ABS/config.yaml" + +sed \ + -e "s|__PROJECT_DIR__|$PROJECT_ABS|g" \ + "$HERE/.mcp.json.template" > "$PROJECT_ABS/.mcp.json" + +if [[ -n "$API_KEY" ]]; then + sed -i "s|api_key: \"\"|api_key: \"$API_KEY\"|" "$PROJECT_ABS/config.yaml" +fi + +if [[ -n "$SKILL_BUNDLE" && -d "$SKILL_BUNDLE" ]]; then + echo "==> copying skill bundle from $SKILL_BUNDLE" + mkdir -p "$PROJECT_ABS/.claude/skills" + cp -r "$SKILL_BUNDLE" "$PROJECT_ABS/.claude/skills/" +fi + +mkdir -p "$TOKEN_DIR" +chmod 0700 "$TOKEN_DIR" +mkdir -p "$PROJECT_ABS/logs" + +cat < project ready at $PROJECT_ABS + Files: + driver-agent # binary (amd64) + config.yaml # 0600 — paste observer.api_key if you didn't pass --api-key + .mcp.json # tells Claude Code how to launch the MCP server + .claude/skills/... # multiagent skill bundle (if found) + logs/ # audit logs land here + +==> one-time agentserver registration (device-code OAuth): + $PROJECT_ABS/driver-agent register --config $PROJECT_ABS/config.yaml + Open the printed verification URL in a browser; creds get written back into + config.yaml. + +EOF + +if [[ -z "$API_KEY" ]]; then + echo "==> WARN: observer.api_key is empty in config.yaml — fill it in before launching Claude Code." + echo +fi + +cat < launch: + cd $PROJECT_ABS + claude # Claude Code will start the driver MCP server on demand + Then in the Claude prompt: + mcp__driver__list_agents +EOF diff --git a/multi-agent/deploy/linux/observer/README.md b/multi-agent/deploy/linux/observer/README.md new file mode 100644 index 0000000..e6ad90c --- /dev/null +++ b/multi-agent/deploy/linux/observer/README.md @@ -0,0 +1,139 @@ +# linux-observer + +Generic `observer-server` bring-up for any Linux host. Observer is a single +HTTP daemon backed by SQLite — drivers and slaves POST `/api/agents/register` +with a workspace `api_key` to mint a per-agent token, then push telemetry to +`/api/events`. + +For the prod-test observer instance at `39.104.86.73`, see the operator +notes in `../../../tests/prod_test/README.md` (not version-controlled). + +## What you get + +| File | Purpose | +|---|---| +| `install.sh` | Detects arch, renders templates, drops binary + config into `~/.loom//`, optional systemd unit | +| `config.yaml.template` | Observer config — listen addr, db path, one workspace + one bootstrap api-key | +| `observer-server.service.template` | Systemd unit with hardening (`NoNewPrivileges`, `ProtectSystem=full`, `ReadWritePaths=`) | + +## Prereqs + +1. **Binary** at `../bin/observer-server.linux-` (override with `--bin PATH`). + ```bash + # Option A — pre-built (replace amd64 with arm64) + mkdir -p ../bin && curl -L -o ../bin/observer-server.linux-amd64 \ + https://github.com/agentserver/loom/releases/download/v0.0.1/observer-server.linux-amd64 + chmod +x ../bin/observer-server.linux-amd64 + + # Option B — build from source (from multi-agent/ ) + CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -trimpath -ldflags='-s -w' \ + -o deploy/linux/bin/observer-server.linux-amd64 ./cmd/observer-server + ``` +2. A free TCP port (default `:8090`). Adjust with `--listen`. +3. (Optional) sudo, if you want systemd to manage the service. + +## Quick start + +```bash +# foreground smoke test (current user, no systemd) +./install.sh --name obs-dev +# → prints a generated bootstrap api-key. Save it. +# → tells you the exact foreground launch command. + +# production: systemd-managed, dedicated user, fixed workspace and api-key +sudo useradd -m -s /bin/bash loom # if it doesn't exist +./install.sh \ + --name obs-prod \ + --user loom \ + --systemd \ + --workspace ws-prod \ + --workspace-name "Production" \ + --api-key "$(head -c 32 /dev/urandom | xxd -p -c 64)" + +# verify it's up +curl -sS -o /dev/null -w 'http=%{http_code}\n' http://127.0.0.1:8090/ +sudo journalctl -u observer-server-obs-prod.service -f --since '1 min ago' +``` + +## Flag reference + +| Flag | Default | Notes | +|---|---|---| +| `--name NAME` | (required) | Instance name; becomes unit name (`observer-server-.service`) and install dir suffix. | +| `--user USER` | `$USER` | Service user. Must already exist; `$HOME` is read from `/etc/passwd`. | +| `--loom-home PATH` | `/.loom/` | Install dir. Holds binary, `observer.yaml`, `observer.db`, `observer.log`. | +| `--systemd` | off | Install `/etc/systemd/system/observer-server-.service` (sudo). Without this, you start the binary yourself. | +| `--listen ADDR` | `:8090` | `listen_addr` in observer.yaml. Use `:port` for all interfaces, `127.0.0.1:port` for loopback only. | +| `--workspace ID` | `ws-default` | First workspace's `id`. | +| `--workspace-name TEXT` | same as `--workspace` | Workspace `name` (display). | +| `--api-key KEY` | (random hex) | Bootstrap api-key for that workspace. If omitted, a 32-byte hex key is generated and printed once. | +| `--bin PATH` | `../bin/observer-server.linux-` | Override the binary path. | + +## Layout after install + +``` +~/.loom// +├── observer-server # binary, 0755 +├── observer.yaml # 0600 — listen addr, db path, workspaces + api_keys +├── observer.db # SQLite — created on first start +├── observer.db-wal # WAL, journal +├── observer.db-shm # shared mem +└── observer.log # service stdout+stderr (if --systemd) + +/etc/systemd/system/observer-server-.service # if --systemd, 0644 +``` + +## Wiring slaves and drivers + +Each slave / driver `config.yaml` references the observer through: + +```yaml +observer: + enabled: true + url: http://:8090 + workspace_id: + agent_id: + api_key: + token_state_path: /absolute/path/to/.token +``` + +On first start the agent POSTs `/api/agents/register` with the bootstrap +`api_key` as Bearer; the server mints a per-agent token and the slave writes +it to `token_state_path` (mode 0600). Subsequent `/api/events` calls use +that per-agent token automatically. + +## Multiple workspaces / multiple keys + +The `install.sh` only writes a single workspace × single api-key for +simplicity. To add more, edit `~/.loom//observer.yaml`: + +```yaml +workspaces: + - id: ws-prod + name: "Production" + api_keys: + - id: bootstrap + key: "..." + - id: ci-bootstrap # additional key for CI to register agents + key: "..." + - id: ws-dev + name: "Development" + api_keys: + - id: bootstrap + key: "..." +``` + +Then `sudo systemctl restart observer-server-.service`. Per-agent +tokens already issued under the old config remain valid — `api_keys` are +only used at registration time. + +## Reset / re-issue + +- **Rotate a bootstrap api-key** — edit `api_keys[*].key`, restart the + service. Existing per-agent tokens are unaffected. +- **Wipe an agent's per-agent token** — easier to do on the agent side + (`rm ` then restart) than via the API. +- **Full reset** — `sudo systemctl disable --now observer-server-.service` + (if used), `sudo rm /etc/systemd/system/observer-server-.service`, + `rm -rf ~/.loom//`. Note this wipes the SQLite DB → all per-agent + tokens, events, and audit history are lost. diff --git a/multi-agent/deploy/linux/observer/config.yaml.template b/multi-agent/deploy/linux/observer/config.yaml.template new file mode 100644 index 0000000..f2227b7 --- /dev/null +++ b/multi-agent/deploy/linux/observer/config.yaml.template @@ -0,0 +1,13 @@ +listen_addr: "__LISTEN_ADDR__" +db_path: __LOOM_HOME__/observer.db + +# Workspaces are the units other agents register against. Each workspace's +# api_keys[] entries are shared bootstrap keys: slaves and drivers POST one +# of them as Bearer on /api/agents/register to mint a per-agent token. +# Add more workspaces (or more api_keys per workspace) by hand after install. +workspaces: + - id: __WORKSPACE_ID__ + name: __WORKSPACE_NAME__ + api_keys: + - id: bootstrap + key: "__WS_APIKEY__" diff --git a/multi-agent/deploy/linux/observer/install.sh b/multi-agent/deploy/linux/observer/install.sh new file mode 100755 index 0000000..ba30d95 --- /dev/null +++ b/multi-agent/deploy/linux/observer/install.sh @@ -0,0 +1,165 @@ +#!/usr/bin/env bash +# Generic Linux observer-server installer. +# +# What it does: +# 1. Detects host arch (amd64 / arm64), picks ../bin/observer-server.linux-. +# 2. Renders observer.yaml from the template, filling in instance name, +# install dir, listen address, single workspace + bootstrap api-key. +# 3. Generates a random api-key if --api-key was not passed (printed once). +# 4. Copies the binary + config into LOOM_HOME. +# 5. With --systemd: installs the unit under /etc/systemd/system/, daemon-reloads, +# enables + starts the service. +# 6. Without --systemd: prints the foreground command so you can smoke-test. +# +# Usage: +# ./install.sh --name obs-prod # foreground-mode install +# ./install.sh --name obs-prod --systemd # also install systemd unit +# ./install.sh --name obs-prod --workspace ws-prod --api-key XX --systemd +# +# Flags: +# --name NAME instance name (REQUIRED); becomes unit name and install dir suffix +# --systemd install + enable systemd unit (needs sudo) +# --user USER service user (default: current $USER); home dir is read from /etc/passwd +# --loom-home PATH install dir (default: /.loom/) +# --listen ADDR listen_addr (default: ":8090") +# --workspace ID workspace.id (default: ws-default) +# --workspace-name TEXT workspace.name (default: same as --workspace) +# --api-key KEY bootstrap api-key for that workspace +# (default: 32-byte random hex printed to stdout) +# --bin PATH override observer-server binary path +# (default: ../bin/observer-server.linux-) +# +# Binary download: +# https://github.com/agentserver/loom/releases/download/v0.0.1/observer-server.linux-amd64 +# https://github.com/agentserver/loom/releases/download/v0.0.1/observer-server.linux-arm64 +# +# Prereqs: +# * Binary at ../bin/observer-server.linux- or --bin PATH +# * sudo if installing the systemd unit + +set -euo pipefail + +HERE="$(cd "$(dirname "$0")" && pwd)" +BIN_DIR="$HERE/../bin" + +NAME="" +SERVICE_USER="${USER:-$(id -un)}" +LOOM_HOME="" +USE_SYSTEMD=0 +LISTEN_ADDR=":8090" +WORKSPACE_ID="ws-default" +WORKSPACE_NAME="" +API_KEY="" +BIN_OVERRIDE="" + +while (( $# )); do + case "$1" in + --name) NAME="$2"; shift 2 ;; + --user) SERVICE_USER="$2"; shift 2 ;; + --loom-home) LOOM_HOME="$2"; shift 2 ;; + --systemd) USE_SYSTEMD=1; shift ;; + --listen) LISTEN_ADDR="$2"; shift 2 ;; + --workspace) WORKSPACE_ID="$2"; shift 2 ;; + --workspace-name) WORKSPACE_NAME="$2"; shift 2 ;; + --api-key) API_KEY="$2"; shift 2 ;; + --bin) BIN_OVERRIDE="$2"; shift 2 ;; + -h|--help) sed -n '2,40p' "$0"; exit 0 ;; + *) echo "unknown flag: $1" >&2; exit 2 ;; + esac +done + +[[ -n "$NAME" ]] || { echo "ERROR: --name is required" >&2; exit 2; } + +SERVICE_USER_HOME="$(getent passwd "$SERVICE_USER" | cut -d: -f6)" +[[ -n "$SERVICE_USER_HOME" ]] || { echo "ERROR: user $SERVICE_USER not found" >&2; exit 2; } +LOOM_HOME="${LOOM_HOME:-$SERVICE_USER_HOME/.loom/$NAME}" +WORKSPACE_NAME="${WORKSPACE_NAME:-$WORKSPACE_ID}" + +arch="$(uname -m)" +case "$arch" in + x86_64|amd64) CPU_ARCH=amd64 ;; + aarch64|arm64) CPU_ARCH=arm64 ;; + *) echo "ERROR: unsupported arch $arch" >&2; exit 2 ;; +esac +BIN_NAME="observer-server.linux-$CPU_ARCH" +BIN="${BIN_OVERRIDE:-$BIN_DIR/$BIN_NAME}" +[[ -x "$BIN" ]] || { + echo "ERROR: missing $BIN" >&2 + echo " download: curl -L -o $BIN_DIR/$BIN_NAME \\" >&2 + echo " https://github.com/agentserver/loom/releases/download/v0.0.1/$BIN_NAME && chmod +x $BIN_DIR/$BIN_NAME" >&2 + echo " or build from multi-agent/ :" >&2 + echo " CGO_ENABLED=0 GOOS=linux GOARCH=$CPU_ARCH go build -trimpath -ldflags='-s -w' \\" >&2 + echo " -o deploy/linux/bin/$BIN_NAME ./cmd/observer-server" >&2 + exit 2 +} + +# Generate a 32-byte random hex key if none was supplied +GENERATED_KEY=0 +if [[ -z "$API_KEY" ]]; then + API_KEY="$(head -c 32 /dev/urandom | xxd -p -c 64)" + GENERATED_KEY=1 +fi + +# Render config +CONFIG_OUT="$(mktemp)" +sed \ + -e "s|__LISTEN_ADDR__|$LISTEN_ADDR|g" \ + -e "s|__LOOM_HOME__|$LOOM_HOME|g" \ + -e "s|__WORKSPACE_ID__|$WORKSPACE_ID|g" \ + -e "s|__WORKSPACE_NAME__|$WORKSPACE_NAME|g" \ + -e "s|__WS_APIKEY__|$API_KEY|g" \ + "$HERE/config.yaml.template" > "$CONFIG_OUT" + +echo "==> creating $LOOM_HOME" +sudo -u "$SERVICE_USER" mkdir -p "$LOOM_HOME" +sudo install -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0755 "$BIN" "$LOOM_HOME/observer-server" +sudo install -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0600 "$CONFIG_OUT" "$LOOM_HOME/observer.yaml" +rm -f "$CONFIG_OUT" + +if (( USE_SYSTEMD )); then + UNIT_OUT="$(mktemp)" + sed \ + -e "s|__INSTANCE_NAME__|$NAME|g" \ + -e "s|__LOOM_HOME__|$LOOM_HOME|g" \ + -e "s|__SERVICE_USER__|$SERVICE_USER|g" \ + "$HERE/observer-server.service.template" > "$UNIT_OUT" + UNIT_PATH="/etc/systemd/system/observer-server-$NAME.service" + echo "==> installing $UNIT_PATH" + sudo install -o root -g root -m 0644 "$UNIT_OUT" "$UNIT_PATH" + rm -f "$UNIT_OUT" + sudo systemctl daemon-reload + sudo systemctl enable --now "observer-server-$NAME.service" + sleep 2 + sudo systemctl --no-pager status "observer-server-$NAME.service" | head -15 +fi + +cat < done. + listen_addr: $LISTEN_ADDR + db_path: $LOOM_HOME/observer.db + config: $LOOM_HOME/observer.yaml (0600) +EOF + +if (( GENERATED_KEY )); then + cat < Generated bootstrap api-key for workspace "$WORKSPACE_ID" — store it now, + it is also written into $LOOM_HOME/observer.yaml but won't be re-shown. + + WORKSPACE: $WORKSPACE_ID + API_KEY: $API_KEY + + Paste this api-key into each slave / driver's config.yaml as observer.api_key. +EOF +fi + +if (( ! USE_SYSTEMD )); then + cat < foreground mode. Start it manually: + sudo -u $SERVICE_USER $LOOM_HOME/observer-server --config $LOOM_HOME/observer.yaml + + To convert to a managed service later, re-run with --systemd. +EOF +fi diff --git a/multi-agent/deploy/linux/observer/observer-server.service.template b/multi-agent/deploy/linux/observer/observer-server.service.template new file mode 100644 index 0000000..1077645 --- /dev/null +++ b/multi-agent/deploy/linux/observer/observer-server.service.template @@ -0,0 +1,24 @@ +[Unit] +Description=Loom observer-server (__INSTANCE_NAME__) +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User=__SERVICE_USER__ +WorkingDirectory=__LOOM_HOME__ +ExecStart=__LOOM_HOME__/observer-server --config __LOOM_HOME__/observer.yaml +Restart=on-failure +RestartSec=5 +StandardOutput=append:__LOOM_HOME__/observer.log +StandardError=append:__LOOM_HOME__/observer.log + +# Hardening (loosen if you need to write outside LOOM_HOME) +NoNewPrivileges=true +ProtectSystem=full +ProtectHome=read-only +ReadWritePaths=__LOOM_HOME__ +PrivateTmp=true + +[Install] +WantedBy=multi-user.target diff --git a/multi-agent/deploy/linux/slave/README.md b/multi-agent/deploy/linux/slave/README.md new file mode 100644 index 0000000..8cda0ac --- /dev/null +++ b/multi-agent/deploy/linux/slave/README.md @@ -0,0 +1,136 @@ +# linux-slave + +Generic `slave-agent` bring-up for any Linux host (host-native, no docker, no SSH). +For the pre-registered prod-test variants (Jetson host-native, local docker) see +`../../../tests/prod_test/jetson/` and `../../../tests/prod_test/slave/`. + +## What you get + +| File | Purpose | +|---|---| +| `install.sh` | Validates inputs, renders templates, installs binary + config (and optional systemd unit) into `~/.loom//` | +| `config.yaml.template` | Slave config with placeholders for name, install dir, host resources | +| `slave-agent.service.template` | Systemd unit template with placeholders for service user and install dir | + +## Prereqs + +1. **Binary** at `../bin/slave-agent.linux-` (override with `--bin PATH`). + ```bash + # Option A — pre-built (replace amd64 with arm64 for aarch64 hosts) + mkdir -p ../bin && curl -L -o ../bin/slave-agent.linux-amd64 \ + https://github.com/agentserver/loom/releases/download/v0.0.1/slave-agent.linux-amd64 + chmod +x ../bin/slave-agent.linux-amd64 + + # Option B — build from source (from multi-agent/ ) + CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -trimpath -ldflags='-s -w' \ + -o deploy/linux/bin/slave-agent.linux-amd64 ./cmd/slave-agent + # or GOARCH=arm64 for aarch64 hosts + ``` +2. **`claude` CLI** installed and on the service user's `PATH` (or edit + `claude.bin` in the rendered `config.yaml` to its absolute path). +3. **Shared ws-prod observer api-key** — pasted via `--api-key` or hand-edited + into `~/.loom//config.yaml` after install. +4. **`ANTHROPIC_API_KEY`** — passed via `--anthropic-key` to land in + `~/.loom//slave.env`, or set in the unit's env some other way. + +## Quick start + +```bash +# foreground smoke test (current user, no systemd, no sudo beyond install steps) +./install.sh \ + --name slave-myhost \ + --observer-url http://observer.example.com:8090 \ + --workspace ws-prod \ + --tag linux --tag prod \ + --api-key 'de4a8e22…' # the workspace bootstrap api-key + +# then run it manually to see the device-code URL: +~/.loom/slave-myhost/slave-agent ~/.loom/slave-myhost/config.yaml +``` + +```bash +# production-style install as a dedicated user with systemd +sudo useradd -m -s /bin/bash loom # only if the user doesn't exist +./install.sh \ + --name slave-myhost \ + --observer-url http://observer.example.com:8090 \ + --workspace ws-prod \ + --user loom \ + --systemd \ + --tag linux --tag prod \ + --api-key 'de4a8e22…' \ + --anthropic-key 'sk-ant-…' + +# tail logs to grab the FIRST-RUN device-code URL +sudo tail -f /home/loom/.loom/slave-myhost/slave.log +``` + +After the device-code URL is approved, the slave persists the issued +sandbox/tunnel credentials back into its own `config.yaml`, then registers +with observer using the `api_key`, and starts publishing its capability +card. From the driver host: + +``` +mcp__driver__list_agents +# expect "slave-myhost" to appear +``` + +## Flag reference + +| Flag | Default | Notes | +|---|---|---| +| `--name NAME` | (required) | Becomes `discovery.display_name`, `observer.agent_id`, install dir suffix, systemd unit name. | +| `--observer-url URL` | (required) | Goes into `observer.url`. Pre-flight: agent will POST `/api/agents/register` here on first start. | +| `--workspace ID` | `ws-default` | `observer.workspace_id`. Must match a workspace defined on the observer. | +| `--user USER` | `$USER` | Service user. The user must already exist; its `$HOME` is read from `/etc/passwd`. | +| `--loom-home PATH` | `/.loom/` | Install dir. Holds binary, config, log, `observer.token`, optional `slave.env`. | +| `--systemd` | off | Install `/etc/systemd/system/slave-agent-.service` (sudo). Without this, you start the binary yourself. | +| `--desc TEXT` | `Linux slave-agent ()` | `discovery.description`. | +| `--tag TAG` | `linux` | Repeatable. Becomes `resources.tags`. | +| `--api-key KEY` | (none) | Writes `observer.api_key`. Without this, edit the rendered config manually. | +| `--anthropic-key KEY` | (none) | Writes `ANTHROPIC_API_KEY=...` to `slave.env` (mode 0600). | +| `--bin PATH` | `../bin/slave-agent.linux-` | Override the binary path (e.g., point at a downloaded release asset). | + +Host CPU cores (`nproc`), arch (`uname -m`), and total memory (`/proc/meminfo`) +are auto-detected and written into the config's `resources` block. + +## Skills advertised + +The rendered `config.yaml` lists five `discovery.skills`, each backed by a +different code path in `slave-agent`. The driver routes work by skill, so +removing one disables that capability for this slave. + +| Skill | What it lets the driver do | +|---|---| +| `chat` | Natural-language Claude Code task in the slave workspace. General-purpose. | +| `bash` | Run an explicit `script` (with `env`, `timeout_sec`) — native Go exec, no Claude. | +| `file` | Stateless `read` / `write` / `stat` on slave-local paths — native Go I/O. | +| `register_mcp` | Register a pre-built stdio MCP server file; tool calls then route via `skill:"mcp"`. Pair with the driver-side `scaffold-mcp-server` and `mcp-acceptance` skills — `register_mcp` only does structural validation. | +| `claude_permissions` | Read / patch this slave's Claude Code `settings.json` permissions through native code (don't ask `chat` to edit its own permissions). | + +Drop any of these from `discovery.skills` if you don't want this slave to +accept that workload. + +## Layout after install + +``` +~/.loom// +├── slave-agent # binary, 0755 +├── config.yaml # 0600 — server, observer creds, discovery, resources +├── slave.env # 0600 — optional, ANTHROPIC_API_KEY=... +├── observer.token # 0600 — written on first boot by observerclient +└── slave.log # service stdout+stderr (if --systemd) + +/etc/systemd/system/slave-agent-.service # if --systemd, 0644 +``` + +## Reset / re-registration + +- **Rotate observer per-agent token** — `rm ~/.loom//observer.token` and + restart; agent re-registers and the old token is invalidated. +- **Rotate agentserver sandbox** — blank out `credentials.sandbox_id` and + `credentials.tunnel_token` in `config.yaml`, restart; device-code flow runs + again. +- **Full cleanup** — `sudo systemctl disable --now slave-agent-.service` + (if used), `sudo rm /etc/systemd/system/slave-agent-.service`, + `rm -rf ~/.loom//`. diff --git a/multi-agent/deploy/linux/slave/config.yaml.template b/multi-agent/deploy/linux/slave/config.yaml.template new file mode 100644 index 0000000..071ca86 --- /dev/null +++ b/multi-agent/deploy/linux/slave/config.yaml.template @@ -0,0 +1,56 @@ +server: + url: https://agent.cs.ac.cn + name: __AGENT_NAME__ + +# Auto-filled by tunnel.EnsureRegistered on first start: if sandbox_id and +# tunnel_token are empty, the slave runs the device-code OAuth flow against +# server.url, prints the verification URL on stderr (visible in slave.log), +# then writes the issued credentials back into THIS file. +credentials: + sandbox_id: "" + tunnel_token: "" + proxy_token: "" + short_id: "" + +claude: + bin: claude # absolute path if claude is not in PATH of the service user + workdir: __LOOM_HOME__ + extra_args: [] + +mcp_servers: {} + +discovery: + display_name: __AGENT_NAME__ + description: __DESCRIPTION__ + skills: + - chat + - bash + - register_mcp + - claude_permissions + - file + +planner: + bin: claude + timeout_sec: 300 + extra_args: [] + +fanout: + max_concurrency: 1 + default_policy: best_effort + policy_by_skill: {} + +resources: + cpu: + cores: __CPU_CORES__ + arch: __CPU_ARCH__ # amd64 | aarch64 + memory_gb: __MEMORY_GB__ + tags: + - __TAG__ # add more tags as needed + +observer: + enabled: true + url: __OBSERVER_URL__ + workspace_id: __WORKSPACE_ID__ + agent_id: __AGENT_NAME__ + api_key: "" # paste the workspace bootstrap api-key here, then chmod 0600 this file + token_state_path: __LOOM_HOME__/observer.token diff --git a/multi-agent/deploy/linux/slave/install.sh b/multi-agent/deploy/linux/slave/install.sh new file mode 100755 index 0000000..2ce414b --- /dev/null +++ b/multi-agent/deploy/linux/slave/install.sh @@ -0,0 +1,185 @@ +#!/usr/bin/env bash +# Generic Linux slave-agent installer. +# +# What it does: +# 1. Detects host arch (amd64 / arm64), picks the matching binary from ../bin/. +# 2. Renders config.yaml + (optional) systemd unit from the templates, filling +# in agent name, install dir, service user, host resources. +# 3. Copies the binary + config (and optional slave.env) into LOOM_HOME. +# 4. With --systemd: installs the unit under /etc/systemd/system/, daemon-reloads, +# enables + starts the service. +# 5. Without --systemd: prints the foreground command so you can smoke-test. +# +# On first start the slave will: +# * Print a device-code verification URL on stderr (tail slave.log) — open +# in a browser; agentserver creds get written back into config.yaml. +# * POST /api/agents/register with observer.api_key, persist the returned +# per-agent token at observer.token_state_path. +# +# Usage: +# ./install.sh --name slave-foo # foreground-mode install +# ./install.sh --name slave-foo --systemd # also install systemd unit +# ./install.sh --name slave-foo --systemd --user alice # run as user `alice` +# +# Flags: +# --name NAME agent display name (REQUIRED) +# --observer-url URL observer.url, e.g. http://observer.example.com:8090 (REQUIRED) +# --workspace ID observer.workspace_id (default: ws-default) +# --systemd install + enable systemd unit (needs sudo) +# --user USER service user (default: current $USER); home dir is read from /etc/passwd +# --loom-home PATH install dir (default: /.loom/) +# --desc TEXT discovery description (default: "Linux slave-agent ()") +# --tag TAG extra discovery tag (repeatable) +# --api-key KEY observer.api_key (skips manual edit; otherwise you must paste it) +# --anthropic-key KEY write ANTHROPIC_API_KEY into slave.env +# +# Prereqs: +# * Binary downloaded or built into ../bin/slave-agent.linux-{amd64,arm64} +# (override with --bin PATH). Downloads: +# https://github.com/agentserver/loom/releases/download/v0.0.1/slave-agent.linux-amd64 +# https://github.com/agentserver/loom/releases/download/v0.0.1/slave-agent.linux-arm64 +# * `claude` CLI installed and in PATH for the service user (or set claude.bin +# in config.yaml to its absolute path post-install) + +set -euo pipefail + +HERE="$(cd "$(dirname "$0")" && pwd)" +BIN_DIR="$HERE/../bin" +BIN_OVERRIDE="" + +NAME="" +SERVICE_USER="${USER:-$(id -un)}" +LOOM_HOME="" +USE_SYSTEMD=0 +DESC="" +TAGS=() +API_KEY="" +ANTHROPIC_KEY="" +OBSERVER_URL="" +WORKSPACE_ID="ws-default" + +while (( $# )); do + case "$1" in + --name) NAME="$2"; shift 2 ;; + --user) SERVICE_USER="$2"; shift 2 ;; + --loom-home) LOOM_HOME="$2"; shift 2 ;; + --systemd) USE_SYSTEMD=1; shift ;; + --desc) DESC="$2"; shift 2 ;; + --tag) TAGS+=("$2"); shift 2 ;; + --api-key) API_KEY="$2"; shift 2 ;; + --anthropic-key) ANTHROPIC_KEY="$2"; shift 2 ;; + --bin) BIN_OVERRIDE="$2"; shift 2 ;; + --observer-url) OBSERVER_URL="$2"; shift 2 ;; + --workspace) WORKSPACE_ID="$2"; shift 2 ;; + -h|--help) sed -n '2,45p' "$0"; exit 0 ;; + *) echo "unknown flag: $1" >&2; exit 2 ;; + esac +done + +[[ -n "$NAME" ]] || { echo "ERROR: --name is required" >&2; exit 2; } +[[ -n "$OBSERVER_URL" ]] || { echo "ERROR: --observer-url is required (e.g. http://observer.example.com:8090)" >&2; exit 2; } + +# Resolve service user home +SERVICE_USER_HOME="$(getent passwd "$SERVICE_USER" | cut -d: -f6)" +[[ -n "$SERVICE_USER_HOME" ]] || { echo "ERROR: user $SERVICE_USER not found" >&2; exit 2; } +LOOM_HOME="${LOOM_HOME:-$SERVICE_USER_HOME/.loom/$NAME}" +DESC="${DESC:-Linux slave-agent ($NAME)}" + +# Arch → binary +arch="$(uname -m)" +case "$arch" in + x86_64|amd64) BIN_NAME="slave-agent.linux-amd64"; CPU_ARCH=amd64 ;; + aarch64|arm64) BIN_NAME="slave-agent.linux-arm64"; CPU_ARCH=aarch64 ;; + *) echo "ERROR: unsupported arch $arch" >&2; exit 2 ;; +esac +BIN="${BIN_OVERRIDE:-$BIN_DIR/$BIN_NAME}" +[[ -x "$BIN" ]] || { + echo "ERROR: missing $BIN" >&2 + echo " download: curl -L -o $BIN_DIR/$BIN_NAME \\" >&2 + echo " https://github.com/agentserver/loom/releases/download/v0.0.1/$BIN_NAME && chmod +x $BIN_DIR/$BIN_NAME" >&2 + echo " or build from multi-agent/ :" >&2 + echo " CGO_ENABLED=0 GOOS=linux GOARCH=$CPU_ARCH go build -trimpath -ldflags='-s -w' \\" >&2 + echo " -o deploy/linux/bin/$BIN_NAME ./cmd/slave-agent" >&2 + exit 2 +} + +# Host resources for the discovery card +CPU_CORES="$(nproc 2>/dev/null || echo 1)" +MEMORY_GB="$(awk '/MemTotal/ {printf "%d", $2/1024/1024+0.5}' /proc/meminfo 2>/dev/null || echo 1)" +TAG_LINES="" +for t in "${TAGS[@]:-linux}"; do TAG_LINES+=" - $t"$'\n'; done +[[ -z "$TAG_LINES" ]] && TAG_LINES=" - linux"$'\n' + +# Render config +CONFIG_OUT="$(mktemp)" +sed \ + -e "s|__AGENT_NAME__|$NAME|g" \ + -e "s|__LOOM_HOME__|$LOOM_HOME|g" \ + -e "s|__DESCRIPTION__|$DESC|g" \ + -e "s|__CPU_CORES__|$CPU_CORES|g" \ + -e "s|__CPU_ARCH__|$CPU_ARCH|g" \ + -e "s|__MEMORY_GB__|$MEMORY_GB|g" \ + -e "s|__OBSERVER_URL__|$OBSERVER_URL|g" \ + -e "s|__WORKSPACE_ID__|$WORKSPACE_ID|g" \ + "$HERE/config.yaml.template" > "$CONFIG_OUT" + +# Replace the placeholder tag block with the user-supplied tags +python3 - "$CONFIG_OUT" "$TAG_LINES" <<'PY' +import sys, pathlib +p = pathlib.Path(sys.argv[1]) +text = p.read_text() +text = text.replace(" - __TAG__ # add more tags as needed\n", sys.argv[2]) +p.write_text(text) +PY + +[[ -n "$API_KEY" ]] && sed -i "s|api_key: \"\"|api_key: \"$API_KEY\"|" "$CONFIG_OUT" + +echo "==> creating $LOOM_HOME" +sudo -u "$SERVICE_USER" mkdir -p "$LOOM_HOME" +sudo install -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0755 "$BIN" "$LOOM_HOME/slave-agent" +sudo install -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0600 "$CONFIG_OUT" "$LOOM_HOME/config.yaml" +rm -f "$CONFIG_OUT" + +if [[ -n "$ANTHROPIC_KEY" ]]; then + ENV_TMP="$(mktemp)" + printf 'ANTHROPIC_API_KEY=%s\n' "$ANTHROPIC_KEY" > "$ENV_TMP" + sudo install -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0600 "$ENV_TMP" "$LOOM_HOME/slave.env" + rm -f "$ENV_TMP" +fi + +if [[ -z "$API_KEY" ]]; then + echo "==> WARN: observer.api_key is empty in $LOOM_HOME/config.yaml — fill it in before starting." +fi + +if (( USE_SYSTEMD )); then + UNIT_OUT="$(mktemp)" + sed \ + -e "s|__AGENT_NAME__|$NAME|g" \ + -e "s|__LOOM_HOME__|$LOOM_HOME|g" \ + -e "s|__SERVICE_USER__|$SERVICE_USER|g" \ + -e "s|__SERVICE_USER_HOME__|$SERVICE_USER_HOME|g" \ + "$HERE/slave-agent.service.template" > "$UNIT_OUT" + UNIT_PATH="/etc/systemd/system/slave-agent-$NAME.service" + echo "==> installing $UNIT_PATH" + sudo install -o root -g root -m 0644 "$UNIT_OUT" "$UNIT_PATH" + rm -f "$UNIT_OUT" + sudo systemctl daemon-reload + sudo systemctl enable --now "slave-agent-$NAME.service" + sleep 2 + sudo systemctl --no-pager status "slave-agent-$NAME.service" | head -15 + cat < done. Tail the log for the FIRST-RUN device-code URL: + sudo tail -f $LOOM_HOME/slave.log + Open the printed verification URL in a browser and approve. +EOF +else + cat < done (foreground mode). Start it manually: + sudo -u $SERVICE_USER $LOOM_HOME/slave-agent $LOOM_HOME/config.yaml + Watch stderr for the device-code URL on first run. + + To convert to a managed service later, re-run with --systemd. +EOF +fi diff --git a/multi-agent/deploy/linux/slave/slave-agent.service.template b/multi-agent/deploy/linux/slave/slave-agent.service.template new file mode 100644 index 0000000..ed4bff0 --- /dev/null +++ b/multi-agent/deploy/linux/slave/slave-agent.service.template @@ -0,0 +1,22 @@ +[Unit] +Description=loom slave-agent (__AGENT_NAME__) +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User=__SERVICE_USER__ +Group=__SERVICE_USER__ +WorkingDirectory=__LOOM_HOME__ +ExecStart=__LOOM_HOME__/slave-agent __LOOM_HOME__/config.yaml +Restart=on-failure +RestartSec=5s +StandardOutput=append:__LOOM_HOME__/slave.log +StandardError=append:__LOOM_HOME__/slave.log +Environment="HOME=__SERVICE_USER_HOME__" +# Optional override for the Anthropic API endpoint: +# Environment="ANTHROPIC_BASE_URL=https://code.ai.cs.ac.cn" +EnvironmentFile=-__LOOM_HOME__/slave.env + +[Install] +WantedBy=multi-user.target diff --git a/skills/mcp-acceptance/SKILL.md b/skills/mcp-acceptance/SKILL.md index 974bc89..4c2c013 100644 --- a/skills/mcp-acceptance/SKILL.md +++ b/skills/mcp-acceptance/SKILL.md @@ -123,6 +123,30 @@ Flags: - `--keep` — skip cleanup so you can inspect the workdir if a case failed. - `--runner PATH` — override the embedded runner (default: the canonical copy alongside `remote_run.py`). +### Alternative: file-tools path (no base64 embedding) + +When the slave advertises `file`, you can skip the bundled payload entirely: + +```text +write_slave_file(target=slave-a, path="/tmp/mcpa/server.py", source_path="generated_mcp/foo/v1.py") +write_slave_file(target=slave-a, path="/tmp/mcpa/cases.jsonl", source_path="generated_mcp/foo/cases.jsonl") +write_slave_file(target=slave-a, path="/tmp/mcpa/runner.py", source_path="skills/mcp-acceptance/scripts/mcp_acceptance.py") +run_slave_bash(target=slave-a, script="python3 /tmp/mcpa/runner.py --server 'python3 /tmp/mcpa/server.py' --cases /tmp/mcpa/cases.jsonl") +# exit code propagates: 0 = pass, 1 = case failed, 2 = server unreachable. +``` + +Tradeoffs vs `remote_run.py`: + +| | `remote_run.py` (Option A) | file-tools (Option B) | +|---|---|---| +| Cleanup | automatic `trap` on exit | manual; survives for `read_slave_file`-based debug | +| Payload shape | one base64 shell blob | three plain file writes + one bash call | +| Re-running with edits | rebuild & re-ship payload | re-`write_slave_file` only the changed file | +| Inspect server source after run | `--keep` then `read_slave_file` | always available, no flag | +| Shell-pipeline gating | exit code from one command | exit code from final `run_slave_bash` | + +Pick A for CI-like one-shot validation. Pick B when iterating on cases or expecting a failure you'll want to dig into. + ## Writing Good Cases | Cover | Why | diff --git a/skills/multiagent/SKILL.md b/skills/multiagent/SKILL.md index 79d4f5f..fbd9920 100644 --- a/skills/multiagent/SKILL.md +++ b/skills/multiagent/SKILL.md @@ -86,3 +86,4 @@ Do not call `register_slave_mcp` directly from a one-shot Claude generation: `re - Calling `skill:"mcp"` with natural language instead of JSON `{server, tool, args}`. - Asking slave Claude Code to edit its own permissions; permission changes go through native `skill:"claude_permissions"` for now. - Using `127.0.0.1` or local file paths as if they were reachable from other machines. +- Hand-rolling `cat < {blob_handle: H, ...} +write_slave_file(target=B, path=dst, source_blob=H) +``` + +The bytes round-trip through the driver's `FileRegistry`, never through chat or `run_slave_bash`. Use this instead of routing through observer artifacts when both endpoints are slaves currently advertising `file` and the payload fits comfortably in driver-local cache. diff --git a/skills/multiagent/references/orchestration-patterns.md b/skills/multiagent/references/orchestration-patterns.md index 31b2164..970fd06 100644 --- a/skills/multiagent/references/orchestration-patterns.md +++ b/skills/multiagent/references/orchestration-patterns.md @@ -35,13 +35,37 @@ Rules: ## File Transfer -Driver files are local to the driver machine. Remote agents receive artifact URLs, not local paths. +Driver files are local to the driver machine. Remote agents receive artifact URLs, not local paths. Two transports are available; pick by payload size and lifecycle. -- `submit_task.read_paths` registers driver-local files. -- `submit_task.write_paths` creates PUT targets for remote output. -- With `artifact_transport: observer_lazy`, observer stores lazy artifact/write records and syncs writes after task completion. +### Option A: PUT-manifest via observer (large / archival) + +- `submit_task.read_paths` registers driver-local files; the slave fetches via observer URL. +- `submit_task.write_paths` creates PUT targets; outputs sync back after the task completes. +- With `artifact_transport: observer_lazy`, observer stores lazy artifact/write records. +- Best for: large artifacts, anything that should survive in the observer artifact store, asynchronous output collection. - Do not use `127.0.0.1` as a cross-machine URL. +### Option B: In-band file tools (small / synchronous / driver-mediated) + +`read_slave_file` / `write_slave_file` / `stat_slave_file` (see `driver-tools.md`) move bytes synchronously through the driver's `FileRegistry`. Bytes never enter LLM context — `read_slave_file` returns a `blob_handle` and a driver-local `cache_path`; `write_slave_file` accepts `content`, `source_blob`, or `source_path`. + +- Best for: shipping a generated MCP server source up to a slave, pulling a small log back for review, cross-slave copy (`read` from A → `write source_blob=...` to B), `stat`-then-write probing. +- Slave-cap is 8 MiB per read call; chunk larger files via `offset`/`length` or fall back to Option A. +- Requires the target slave to advertise `file`. + +### Decision rule + +| Situation | Use | +|---|---| +| Need observer artifact lineage / PR review | PUT-manifest | +| Shipping a scaffolded MCP source to a slave | `write_slave_file` | +| Pulling back a server file or log for inspection | `read_slave_file` | +| Copying file between two slaves | `read_slave_file` → `write_slave_file source_blob=...` | +| File > a few MiB and one-shot | PUT-manifest | +| Need to check "does this already exist?" cheaply | `stat_slave_file` | + +Anti-pattern: `run_slave_bash` with a `cat < str: 1. Write `spec.json`. 2. `scaffold_mcp.py --spec spec.json --out generated_mcp//v1.py`. -3. Edit handler bodies between business markers. -4. Validate with `mcp-acceptance` (see that skill) — **do not skip; register_mcp does not check semantics**. -5. `register_slave_mcp` with the same `spec.json` and `source_path`. +3. Edit handler bodies between business markers (driver-local). +4. Ship the source to the slave so acceptance + register can see it: + - **Preferred** — `write_slave_file(target=..., path="generated_mcp//v1.py", source_path="")`. Source lives in the slave fs, easy to re-read with `read_slave_file` for debugging. + - **Alternative** — bundle source + cases + runner with `remote_run.py` (see `mcp-acceptance`) when you want a one-shot self-cleaning shell payload. +5. Validate with `mcp-acceptance` (see that skill) — **do not skip; register_mcp does not check semantics**. +6. `register_slave_mcp` with the same `spec.json` and `source_path` (must match what step 4 wrote). ## Common Mistakes