A supervisor for LLM coding agent CLIs. Runs Claude, Codex, Gemini, and opencode inside Docker containers with tmux session management and declarative supervision rules.
Spiritual successor to sir-claudius — rebuilt from scratch with multi-agent support, configurable supervision via babysit.yaml, and a single static binary.
curl -fsSL https://raw.githubusercontent.com/actuallymentor/babysit/main/scripts/install.sh | bashThe installer detects your OS and architecture, downloads the correct binary to ~/.local/bin/babysit (no sudo required), and checks that docker, tmux, and git are installed. If ~/.local/bin isn't on your $PATH yet, the script tells you the line to add to your shell rc.
# Start Claude in yolo mode (max autonomy, skip permissions)
babysit claude --yolo
# Codex in a sandbox (no host files mounted) with loop mode
babysit codex --sandbox --loop
# Gemini with read-only workspace
babysit gemini --mudbox
# Resume a previous session
babysit resume <session_id> --yolo
# List active sessions
babysit list
# Attach to a running session
babysit open <session_id>- Docker container — babysit starts a container with all four agent CLIs preinstalled, your credentials passed through, and your workspace mounted at
/workspace - Tmux session — the container runs inside a tmux session that babysit attaches you to. Detach with Ctrl+B d to exit the cli; the agent and supervisor keep running in the background. Re-attach with
babysit open <id> - Monitor daemon — a detached background process watches the tmux output and takes actions based on your
babysit.yamlrules. Outlives your foreground cli, so the agent stays supervised after you detach - macOS caffeine — on macOS, the monitor runs
caffeinatewhile a session is active so the system does not sleep mid-run - Credential sync — host credentials are refreshed in the background so long-running sessions don't lose auth
- Resume state — agent-native session history is kept in persistent Docker volumes, so
babysit resumecan reopen Claude, Codex, Gemini, and OpenCode sessions after their containers exit
Created automatically on first run. Defines on/do rules — first match wins.
config:
initial_prompt: |-
You are running inside a Docker container — an isolated sandbox built for coding agents.
...
idle_timeout_s: 300
commands:
notify_command: >
curl -f -X POST -d \
"token=$PUSHOVER_TOKEN&user=$PUSHOVER_USER&title=Babysit&message=I need your input" \
https://api.pushover.net/1/messages.json
babysit:
# Send IDLE.md contents when the agent goes idle
- on: idle
do: ./IDLE.md
timeout: 30:00
# Notify when the agent asks for input
- on: choice
do: notify_command
timeout: 1:00:00
# Notify on errors
- on: /error/i
do: notify_command
timeout: 05:00config.initial_prompt is typed into the agent screen once the session starts.
New babysit.yaml files include Babysit's default launch prompt here. Existing
configs that omit it use the generated default prompt. Set it to null or ""
to disable startup prompt typing.
| Trigger | Description |
|---|---|
idle |
No new output for idle_timeout_s seconds |
plan |
Agent is asking to accept a plan (detected per-agent) |
choice |
Agent is waiting for any user input |
"literal" |
Exact string match in last N lines of output |
/regex/flags |
Regex match in last N lines of output |
| Action | Description |
|---|---|
enter |
Press Enter |
shift_tab |
Press Shift+Tab (plan acceptance in Claude) |
command_name |
Run a named command from config.commands |
"text" |
Type text and press Enter |
./file.md |
Send markdown file contents, splitting on === lines (waits for idle between segments) |
Supports SS, MM:SS, or HH:MM:SS. Overrides idle_timeout_s per rule.
| Flag | Workspace | Description |
|---|---|---|
| (none) | read-write mount | Default — full access |
--yolo |
read-write mount | Skip agent permissions, set AGENT_AUTONOMY_MODE=yolo |
--sandbox |
no mount | Ephemeral container, no host files |
--mudbox |
read-only mount | Agent can read but not modify files |
--docker |
(additive) | Mount the host Docker socket so Docker commands can run from inside the Babysit container |
--loop |
(additive) | Override on: idle with ./LOOP.md or ~/.agents/LOOP.md or "Keep going" |
Modes combine: --mudbox --yolo --loop gives a read-only workspace with max autonomy and loop. The exception is --sandbox and --mudbox together — they describe contradictory mount strategies, so babysit rejects the combination.
--docker uses Docker-outside-of-Docker: Babysit mounts /var/run/docker.sock,
sets DOCKER_HOST, and installs the Docker CLI in the agent image. Docker
commands run inside the session create sibling containers on the host daemon.
For nested Babysit testing, BABYSIT_HOST_WORKSPACE preserves the original host
path so inner containers can bind-mount the same project correctly.
Because Docker socket access can create containers with host bind mounts,
--docker --sandbox and --docker --mudbox weaken those modes. Babysit warns
and requires an explicit Y before starting those combinations, except in YOLO
mode where confirmations are skipped.
With --loop, the idle action is overridden. Babysit looks for instructions in order:
./LOOP.mdin the current directory~/.agents/LOOP.mdglobal fallback"Keep going"hardcoded default
Use === lines in LOOP.md to split into segments executed between idle periods:
/clear
===
Check for bugs
===
Check if the specification is fully implementedBy default, babysit mounts node_modules and .venv as named Docker volumes instead of bind-mounting the host copies. This avoids cross-platform binary mismatches (host macOS binaries vs container Linux). Disable with:
config:
isolate_dependencies: falsebabysit <agent> [flags] Start a new session
babysit <agent> resume <id> [flags] Resume a previous session
babysit list List active sessions
babysit open <session_id> Attach to an active session
babysit resume <session_id> [flags] Resume a previous session
babysit resume <session_id> accepts the id printed when a Babysit session
exits. If Babysit did not capture the agent's native session id before exit, it
resumes the latest agent session from the original workspace.
Unrecognised flags are passed through to the coding agent CLI:
babysit claude --yolo --model sonnet --effort highPass --log to append everything the tmux pane renders to a logfile. The header Babysit session start: YYYY-MM-DD HH:MM:SS is prepended to each session's block, so several runs can share one file.
babysit claude --log # default path: .YYYY_MM_DD_HH_MM.babysit.log in cwd
babysit claude --log=babysit.log # custom path (relative to cwd)
babysit claude --log ~/.logs/babysit.log # absolute path; ~ expandedThe log is append-only — it is never truncated, so it's safe to point multiple sessions at the same file. tmux writes raw pane bytes including ANSI color/cursor sequences; for a plain-text view pipe through sed -E 's/\x1B\[[0-9;?]*[a-zA-Z]//g' or open with less -R.
Updates are explicit. Run babysit update to refresh everything in one sweep:
git pullon the babysit repo (or download the latest GitHub-release binary, for compiled installs)git pullon~/.agents(if it exists)docker pullthe latest container image- Upgrades each host-installed agent CLI (
claude,codex,gemini,opencode) using the agent's built-in self-update if available, otherwise the matching package manager (npm or brew, auto-detected from the binary's install path). Agents not on PATH are skipped.
Requires Bun:
npm install
npm run buildProduces static binaries in dist/ for linux-x64, linux-arm64, darwin-x64, darwin-arm64.
bun test
npm run test:e2e
npm run test:allnpm run test:e2e builds a local Babysit image, derives a fake-agent image
from it, then starts real Docker/tmux sessions to verify startup prompts,
resume handoff, monitor actions, logging, nested Docker, mount modes,
dependency isolation, and credential sync without calling model APIs.
For faster repeat runs with an existing base image:
BABYSIT_E2E_BASE_IMAGE=actuallymentor/babysit:latest \
BABYSIT_E2E_SKIP_BASE_BUILD=1 \
npm run test:e2eISC