babysit

A supervisor for LLM coding agent CLIs. Runs Claude, Codex, Gemini, and opencode inside Docker containers with tmux session management and declarative supervision rules.

Spiritual successor to sir-claudius — rebuilt from scratch with multi-agent support, configurable supervision via babysit.yaml, and a single static binary.

Install

curl -fsSL https://raw.githubusercontent.com/actuallymentor/babysit/main/scripts/install.sh | bash

The installer detects your OS and architecture, downloads the correct binary to ~/.local/bin/babysit (no sudo required), and checks that docker, tmux, and git are installed. If ~/.local/bin isn't on your $PATH yet, the script tells you the line to add to your shell rc.

Quick start

# Start Claude in yolo mode (max autonomy, skip permissions)
babysit claude --yolo

# Codex in a sandbox (no host files mounted) with loop mode
babysit codex --sandbox --loop

# Gemini with read-only workspace
babysit gemini --mudbox

# Resume a previous session
babysit resume <session_id> --yolo

# List active sessions
babysit list

# Attach to a running session
babysit open <session_id>

How it works

Docker container — babysit starts a container with all four agent CLIs preinstalled, your credentials passed through, and your workspace mounted at /workspace
Tmux session — the container runs inside a tmux session that babysit attaches you to. Detach with Ctrl+B d to exit the cli; the agent and supervisor keep running in the background. Re-attach with babysit open <id>
Monitor daemon — a detached background process watches the tmux output and takes actions based on your babysit.yaml rules. Outlives your foreground cli, so the agent stays supervised after you detach
macOS caffeine — on macOS, the monitor runs caffeinate while a session is active so the system does not sleep mid-run
Credential sync — host credentials are refreshed in the background so long-running sessions don't lose auth
Resume state — agent-native session history is kept in persistent Docker volumes, so babysit resume can reopen Claude, Codex, Gemini, and OpenCode sessions after their containers exit

`babysit.yaml`

Created automatically on first run. Defines on/do rules — first match wins.

config:
    initial_prompt: |-
        You are running inside a Docker container — an isolated sandbox built for coding agents.
        ...
    idle_timeout_s: 300
    commands:
        notify_command: >
            curl -f -X POST -d \
                "token=$PUSHOVER_TOKEN&user=$PUSHOVER_USER&title=Babysit&message=I need your input" \
                https://api.pushover.net/1/messages.json

babysit:

    # Send IDLE.md contents when the agent goes idle
    - on: idle
      do: ./IDLE.md
      timeout: 30:00

    # Notify when the agent asks for input
    - on: choice
      do: notify_command
      timeout: 1:00:00

    # Notify on errors
    - on: /error/i
      do: notify_command
      timeout: 05:00

config.initial_prompt is typed into the agent screen once the session starts. New babysit.yaml files include Babysit's default launch prompt here. Existing configs that omit it use the generated default prompt. Set it to null or "" to disable startup prompt typing.

`on:` triggers

Trigger	Description
`idle`	No new output for `idle_timeout_s` seconds
`plan`	Agent is asking to accept a plan (detected per-agent)
`choice`	Agent is waiting for any user input
`"literal"`	Exact string match in last N lines of output
`/regex/flags`	Regex match in last N lines of output

`do:` actions

Action	Description
`enter`	Press Enter
`shift_tab`	Press Shift+Tab (plan acceptance in Claude)
`command_name`	Run a named command from `config.commands`
`"text"`	Type text and press Enter
`./file.md`	Send markdown file contents, splitting on `===` lines (waits for idle between segments)

`timeout:` format

Supports SS, MM:SS, or HH:MM:SS. Overrides idle_timeout_s per rule.

Modes

Flag	Workspace	Description
(none)	read-write mount	Default — full access
`--yolo`	read-write mount	Skip agent permissions, set `AGENT_AUTONOMY_MODE=yolo`
`--sandbox`	no mount	Ephemeral container, no host files
`--mudbox`	read-only mount	Agent can read but not modify files
`--docker`	(additive)	Mount the host Docker socket so Docker commands can run from inside the Babysit container
`--loop`	(additive)	Override `on: idle` with `./LOOP.md` or `~/.agents/LOOP.md` or "Keep going"

Modes combine: --mudbox --yolo --loop gives a read-only workspace with max autonomy and loop. The exception is --sandbox and --mudbox together — they describe contradictory mount strategies, so babysit rejects the combination.

--docker uses Docker-outside-of-Docker: Babysit mounts /var/run/docker.sock, sets DOCKER_HOST, and installs the Docker CLI in the agent image. Docker commands run inside the session create sibling containers on the host daemon. For nested Babysit testing, BABYSIT_HOST_WORKSPACE preserves the original host path so inner containers can bind-mount the same project correctly.

Because Docker socket access can create containers with host bind mounts, --docker --sandbox and --docker --mudbox weaken those modes. Babysit warns and requires an explicit Y before starting those combinations, except in YOLO mode where confirmations are skipped.

Loop mode

With --loop, the idle action is overridden. Babysit looks for instructions in order:

./LOOP.md in the current directory
~/.agents/LOOP.md global fallback
"Keep going" hardcoded default

Use === lines in LOOP.md to split into segments executed between idle periods:

/clear
===
Check for bugs
===
Check if the specification is fully implemented

Dependency isolation

By default, babysit mounts node_modules and .venv as named Docker volumes instead of bind-mounting the host copies. This avoids cross-platform binary mismatches (host macOS binaries vs container Linux). Disable with:

config:
    isolate_dependencies: false

Subcommands

babysit <agent> [flags]              Start a new session
babysit <agent> resume <id> [flags]  Resume a previous session
babysit list                         List active sessions
babysit open <session_id>            Attach to an active session
babysit resume <session_id> [flags]  Resume a previous session

babysit resume <session_id> accepts the id printed when a Babysit session exits. If Babysit did not capture the agent's native session id before exit, it resumes the latest agent session from the original workspace.

Unrecognised flags are passed through to the coding agent CLI:

babysit claude --yolo --model sonnet --effort high

Logging tmux output

Pass --log to append everything the tmux pane renders to a logfile. The header Babysit session start: YYYY-MM-DD HH:MM:SS is prepended to each session's block, so several runs can share one file.

babysit claude --log                            # default path: .YYYY_MM_DD_HH_MM.babysit.log in cwd
babysit claude --log=babysit.log                # custom path (relative to cwd)
babysit claude --log ~/.logs/babysit.log        # absolute path; ~ expanded

The log is append-only — it is never truncated, so it's safe to point multiple sessions at the same file. tmux writes raw pane bytes including ANSI color/cursor sequences; for a plain-text view pipe through sed -E 's/\x1B\[[0-9;?]*[a-zA-Z]//g' or open with less -R.

Self-update

Updates are explicit. Run babysit update to refresh everything in one sweep:

git pull on the babysit repo (or download the latest GitHub-release binary, for compiled installs)
git pull on ~/.agents (if it exists)
docker pull the latest container image
Upgrades each host-installed agent CLI (claude, codex, gemini, opencode) using the agent's built-in self-update if available, otherwise the matching package manager (npm or brew, auto-detected from the binary's install path). Agents not on PATH are skipped.

Building from source

Requires Bun:

npm install
npm run build

Produces static binaries in dist/ for linux-x64, linux-arm64, darwin-x64, darwin-arm64.

Testing

bun test
npm run test:e2e
npm run test:all

npm run test:e2e builds a local Babysit image, derives a fake-agent image from it, then starts real Docker/tmux sessions to verify startup prompts, resume handoff, monitor actions, logging, nested Docker, mount modes, dependency isolation, and credential sync without calling model APIs.

For faster repeat runs with an existing base image:

BABYSIT_E2E_BASE_IMAGE=actuallymentor/babysit:latest \
BABYSIT_E2E_SKIP_BASE_BUILD=1 \
npm run test:e2e

License

ISC

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
.husky		.husky
.notes		.notes
.vscode		.vscode
scripts		scripts
src		src
tests		tests
.agentignore		.agentignore
.babelrc		.babelrc
.gitignore		.gitignore
.nvmrc		.nvmrc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
SPECIFICATION.md		SPECIFICATION.md
babysit.yaml		babysit.yaml
bun.lock		bun.lock
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

babysit

Install

Quick start

How it works

`babysit.yaml`

`on:` triggers

`do:` actions

`timeout:` format

Modes

Loop mode

Dependency isolation

Subcommands

Logging tmux output

Self-update

Building from source

Testing

License

About

Uh oh!

Releases 23

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

babysit

Install

Quick start

How it works

babysit.yaml

on: triggers

do: actions

timeout: format

Modes

Loop mode

Dependency isolation

Subcommands

Logging tmux output

Self-update

Building from source

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`babysit.yaml`

`on:` triggers

`do:` actions

`timeout:` format

Packages